> On Oct. 16, 2019, 7:36 p.m., Na Li wrote:
> > addons/models/1000-Hadoop/1111-ml_model.json
> > Lines 10 (patched)
> > <https://reviews.apache.org/r/71619/diff/1/?file=2169131#file2169131line10>
> >
> >     "DataSet" in Atlas has different meaning from the meaning in ML.  
> >     
> >     DataSet: This type extends Referenceable and Asset. Conceptually, it 
> > can be used to represent an type that stores data. In Atlas, hive tables, 
> > Sqoop RDBMS tables etc are all types that extend from DataSet. Types that 
> > extend DataSet can be expected to have a Schema in the sense that they 
> > would have an attribute that defines attributes of that dataset. For e.g. 
> > the columns attribute in a hive_table. Also entities of types that extend 
> > DataSet participate in data transformation and this transformation can be 
> > captured by Atlas via lineage (or provenance) graphs. 
> > https://atlas.apache.org/0.8.1/TypeSystem.html
> 
> Anand Patil wrote:
>     Hmm, it still feels like an odd fit to me, eg model builds and users 
> don't really have a schema. What would you think about instead making our 
> types inherit from Referenceable and Asset?

The type hierarchy is below

1) Asset derives from Referenceable
2) DataSet derives from Asset
3) Infrastructure derives from Asset
4) Process derives from Asset
5) ProcessExecution derives from Asset

It seems to me 
a) DataSet is used for storing data, 
b) Infrastructure is used as container, 
c) Process and ProcessExecution represent any data transformation operation or 
action

Deriving "model builds" or "users" from Asset directly violate this convention. 
I search the integration code from other modules, no type is derived from 
Referenceable or Asset directly.

{
      "name": "Referenceable",
      "superTypes": [],
      "serviceType": "atlas_core",
      "typeVersion": "1.0",
      "attributeDefs": [
        {
          "name": "qualifiedName",
          "typeName": "string",
          "cardinality": "SINGLE",
          "isIndexable": true,
          "isOptional": false,
          "isUnique": true
        }
      ]
    },
    {
      "name": "Asset",
      "superTypes": [
        "Referenceable"
      ],
      "serviceType": "atlas_core",
      "typeVersion": "1.1",
      "attributeDefs": [
        {
          "name": "name",
          "typeName": "string",
          "cardinality": "SINGLE",
          "isIndexable": true,
          "isOptional": false,
          "isUnique": false,
          "indexType": "STRING"
        },
    {
      "name": "DataSet",
      "superTypes": [
        "Asset"
      ],
      "serviceType": "atlas_core",
      "typeVersion": "1.1",
      "attributeDefs": []
    },
    {
      "name": "Infrastructure",
      "description": "Infrastructure can be IT infrastructure, which contains 
hosts and servers. Infrastructure might not be IT orientated, such as 'Car' for 
IoT applications.",
      "superTypes": [
        "Asset"
      ],
      "serviceType": "atlas_core",
      "typeVersion": "1.1",
      "attributeDefs": []
    },
    {
      "name": "Process",
      "superTypes": [
        "Asset"
      ],
      "serviceType": "atlas_core",
      "typeVersion": "1.1",
      "attributeDefs": [
        {
          "name": "inputs",
          "typeName": "array<DataSet>",
          "cardinality": "SET",
          "isIndexable": false,
          "isOptional": true,
          "isUnique": false
        },
        {
          "name": "outputs",
          "typeName": "array<DataSet>",
          "cardinality": "SET",
          "isIndexable": false,
          "isOptional": true,
          "isUnique": false
        }
      ]
    },
    {
      "name": "ProcessExecution",
      "superTypes": [
        "Asset"
      ],
      "serviceType": "atlas_core",
      "typeVersion": "1.0",
      "attributeDefs": []
    }


> On Oct. 16, 2019, 7:36 p.m., Na Li wrote:
> > addons/models/1000-Hadoop/1111-ml_model.json
> > Lines 18 (patched)
> > <https://reviews.apache.org/r/71619/diff/1/?file=2169131#file2169131line18>
> >
> >     I don't want this to be optional. However, right now, some data source 
> > does not provide this info.
> 
> Anand Patil wrote:
>     Could you give an example?

every Referenceable has an attribute "qualifiedName", which has to be unique. I 
am thinking we should get rid of this attribute "uniqueId", which duplicates 
the purpose of "qualifiedName". the attibute "name" can contain user-friendly 
name, and may not be unique.


- Na


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71619/#review218237
-----------------------------------------------------------


On Oct. 16, 2019, 12:30 a.m., Na Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71619/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2019, 12:30 a.m.)
> 
> 
> Review request for atlas, Austin Nobis, Ashutosh Mestry, Karthik Manamcheri, 
> Sridhar K, Madhan Neethiraj, and Sarath Subramanian.
> 
> 
> Bugs: atlas-3464
>     https://issues.apache.org/jira/browse/atlas-3464
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Define entities used for Machine Learning Governance
> 
> 
> Diffs
> -----
> 
>   addons/models/1000-Hadoop/1111-ml_model.json PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71619/diff/1/
> 
> 
> Testing
> -------
> 
> verified it is valid json file
> 
> 
> Thanks,
> 
> Na Li
> 
>

Reply via email to