> On Oct. 16, 2019, 7:36 p.m., Na Li wrote:
> > addons/models/1000-Hadoop/1111-ml_model.json
> > Lines 51 (patched)
> > <https://reviews.apache.org/r/71619/diff/1/?file=2169131#file2169131line51>
> >
> >     Those fields are optional. So if some projects cannot provide those 
> > fields, that is OK. The attributes I put here are used in several projects.
> >     
> >     If we put several "key:value" pairs in a single string, we risk losing 
> > some info. 
> >     
> >     I think the big drawback of putting several key:value pairs in a single 
> > json string is that:some can be lost by accident. For example the 
> > customAttributes = "{githubRepoURL:https://host1/project1 }". Then later 
> > on, an update comes in with customAttributes = "{mlFramework:tensorFlow}" 
> > because that application has no idea someone is tracking githubRepoURL, or 
> > did not set githubRepoURL because only mlFramework has changed. Then the 
> > previous info on where the source code is got lost by accident. It is hard 
> > to debug. Besides, each time someone wants to use the info, has to parse 
> > the string.
> >     
> >     The counter argument can be how about putting these attributes here. If 
> > they are not popular, we just don't use them.
> >     
> >     The following info is from Ashutosh. 
> >     Having an attribute that contains json blob is not searchable at Atlas 
> > in general. 
> >     
> >     Other benefit of adding an attribute to the model is that entities 
> > created will get validated for the type
> >     because of validation, we can be certain about the data present in that 
> > field
> >     
> >     A hacky approach to add json blob and make it searchable is to set them 
> > in customAttributes, which is system property. We cannot see what keys are 
> > used in customAttributes at model files. it is set by whoever creates the 
> > Atlas entity.
> 
> Anand Patil wrote:
>     Instead of a string type containing json, what about a 
> map<string,string>? I see that type used in some other attributes.

That is a really good idea without the risk of lossing info!

I tested Atlas behavior and Atlas can update part of an attribute's value if 
its type is map<>. I modified a HMS hook test in HiveMetastoreHookIT. 
Basically, the hive_table.parameters attribute's type is map<string, string>. 
When I add another parameter after it has two parameters, all three parameters 
were stored and retrieved successfully.

I will reduce the attributes and have a general purpose attribute of 
map<string, string>. So users can experiment and we can know what attributes 
are universally useful.


- Na


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71619/#review218237
-----------------------------------------------------------


On Oct. 16, 2019, 12:30 a.m., Na Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71619/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2019, 12:30 a.m.)
> 
> 
> Review request for atlas, Austin Nobis, Ashutosh Mestry, Karthik Manamcheri, 
> Sridhar K, Madhan Neethiraj, and Sarath Subramanian.
> 
> 
> Bugs: atlas-3464
>     https://issues.apache.org/jira/browse/atlas-3464
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Define entities used for Machine Learning Governance
> 
> 
> Diffs
> -----
> 
>   addons/models/1000-Hadoop/1111-ml_model.json PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71619/diff/1/
> 
> 
> Testing
> -------
> 
> verified it is valid json file
> 
> 
> Thanks,
> 
> Na Li
> 
>

Reply via email to