> On Oct. 16, 2019, 7:36 p.m., Na Li wrote: > > addons/models/1000-Hadoop/1111-ml_model.json > > Lines 10 (patched) > > <https://reviews.apache.org/r/71619/diff/1/?file=2169131#file2169131line10> > > > > "DataSet" in Atlas has different meaning from the meaning in ML. > > > > DataSet: This type extends Referenceable and Asset. Conceptually, it > > can be used to represent an type that stores data. In Atlas, hive tables, > > Sqoop RDBMS tables etc are all types that extend from DataSet. Types that > > extend DataSet can be expected to have a Schema in the sense that they > > would have an attribute that defines attributes of that dataset. For e.g. > > the columns attribute in a hive_table. Also entities of types that extend > > DataSet participate in data transformation and this transformation can be > > captured by Atlas via lineage (or provenance) graphs. > > https://atlas.apache.org/0.8.1/TypeSystem.html > > Anand Patil wrote: > Hmm, it still feels like an odd fit to me, eg model builds and users > don't really have a schema. What would you think about instead making our > types inherit from Referenceable and Asset?
The type hierarchy is below 1) Asset derives from Referenceable 2) DataSet derives from Asset 3) Infrastructure derives from Asset 4) Process derives from Asset 5) ProcessExecution derives from Asset It seems to me a) DataSet is used for storing data, b) Infrastructure is used as container, c) Process and ProcessExecution represent any data transformation operation or action Deriving "model builds" or "users" from Asset directly violate this convention. I search the integration code from other modules, no type is derived from Referenceable or Asset directly. { "name": "Referenceable", "superTypes": [], "serviceType": "atlas_core", "typeVersion": "1.0", "attributeDefs": [ { "name": "qualifiedName", "typeName": "string", "cardinality": "SINGLE", "isIndexable": true, "isOptional": false, "isUnique": true } ] }, { "name": "Asset", "superTypes": [ "Referenceable" ], "serviceType": "atlas_core", "typeVersion": "1.1", "attributeDefs": [ { "name": "name", "typeName": "string", "cardinality": "SINGLE", "isIndexable": true, "isOptional": false, "isUnique": false, "indexType": "STRING" }, { "name": "DataSet", "superTypes": [ "Asset" ], "serviceType": "atlas_core", "typeVersion": "1.1", "attributeDefs": [] }, { "name": "Infrastructure", "description": "Infrastructure can be IT infrastructure, which contains hosts and servers. Infrastructure might not be IT orientated, such as 'Car' for IoT applications.", "superTypes": [ "Asset" ], "serviceType": "atlas_core", "typeVersion": "1.1", "attributeDefs": [] }, { "name": "Process", "superTypes": [ "Asset" ], "serviceType": "atlas_core", "typeVersion": "1.1", "attributeDefs": [ { "name": "inputs", "typeName": "array<DataSet>", "cardinality": "SET", "isIndexable": false, "isOptional": true, "isUnique": false }, { "name": "outputs", "typeName": "array<DataSet>", "cardinality": "SET", "isIndexable": false, "isOptional": true, "isUnique": false } ] }, { "name": "ProcessExecution", "superTypes": [ "Asset" ], "serviceType": "atlas_core", "typeVersion": "1.0", "attributeDefs": [] } > On Oct. 16, 2019, 7:36 p.m., Na Li wrote: > > addons/models/1000-Hadoop/1111-ml_model.json > > Lines 18 (patched) > > <https://reviews.apache.org/r/71619/diff/1/?file=2169131#file2169131line18> > > > > I don't want this to be optional. However, right now, some data source > > does not provide this info. > > Anand Patil wrote: > Could you give an example? every Referenceable has an attribute "qualifiedName", which has to be unique. I am thinking we should get rid of this attribute "uniqueId", which duplicates the purpose of "qualifiedName". the attibute "name" can contain user-friendly name, and may not be unique. - Na ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71619/#review218237 ----------------------------------------------------------- On Oct. 16, 2019, 12:30 a.m., Na Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71619/ > ----------------------------------------------------------- > > (Updated Oct. 16, 2019, 12:30 a.m.) > > > Review request for atlas, Austin Nobis, Ashutosh Mestry, Karthik Manamcheri, > Sridhar K, Madhan Neethiraj, and Sarath Subramanian. > > > Bugs: atlas-3464 > https://issues.apache.org/jira/browse/atlas-3464 > > > Repository: atlas > > > Description > ------- > > Define entities used for Machine Learning Governance > > > Diffs > ----- > > addons/models/1000-Hadoop/1111-ml_model.json PRE-CREATION > > > Diff: https://reviews.apache.org/r/71619/diff/1/ > > > Testing > ------- > > verified it is valid json file > > > Thanks, > > Na Li > >