> On Oct. 16, 2019, 7:36 p.m., Na Li wrote: > > addons/models/1000-Hadoop/1111-ml_model.json > > Lines 51 (patched) > > <https://reviews.apache.org/r/71619/diff/1/?file=2169131#file2169131line51> > > > > Those fields are optional. So if some projects cannot provide those > > fields, that is OK. The attributes I put here are used in several projects. > > > > If we put several "key:value" pairs in a single string, we risk losing > > some info. > > > > I think the big drawback of putting several key:value pairs in a single > > json string is that:some can be lost by accident. For example the > > customAttributes = "{githubRepoURL:https://host1/project1 }". Then later > > on, an update comes in with customAttributes = "{mlFramework:tensorFlow}" > > because that application has no idea someone is tracking githubRepoURL, or > > did not set githubRepoURL because only mlFramework has changed. Then the > > previous info on where the source code is got lost by accident. It is hard > > to debug. Besides, each time someone wants to use the info, has to parse > > the string. > > > > The counter argument can be how about putting these attributes here. If > > they are not popular, we just don't use them. > > > > The following info is from Ashutosh. > > Having an attribute that contains json blob is not searchable at Atlas > > in general. > > > > Other benefit of adding an attribute to the model is that entities > > created will get validated for the type > > because of validation, we can be certain about the data present in that > > field > > > > A hacky approach to add json blob and make it searchable is to set them > > in customAttributes, which is system property. We cannot see what keys are > > used in customAttributes at model files. it is set by whoever creates the > > Atlas entity. > > Anand Patil wrote: > Instead of a string type containing json, what about a > map<string,string>? I see that type used in some other attributes.
That is a really good idea without the risk of lossing info! I tested Atlas behavior and Atlas can update part of an attribute's value if its type is map<>. I modified a HMS hook test in HiveMetastoreHookIT. Basically, the hive_table.parameters attribute's type is map<string, string>. When I add another parameter after it has two parameters, all three parameters were stored and retrieved successfully. I will reduce the attributes and have a general purpose attribute of map<string, string>. So users can experiment and we can know what attributes are universally useful. - Na ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71619/#review218237 ----------------------------------------------------------- On Oct. 16, 2019, 12:30 a.m., Na Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71619/ > ----------------------------------------------------------- > > (Updated Oct. 16, 2019, 12:30 a.m.) > > > Review request for atlas, Austin Nobis, Ashutosh Mestry, Karthik Manamcheri, > Sridhar K, Madhan Neethiraj, and Sarath Subramanian. > > > Bugs: atlas-3464 > https://issues.apache.org/jira/browse/atlas-3464 > > > Repository: atlas > > > Description > ------- > > Define entities used for Machine Learning Governance > > > Diffs > ----- > > addons/models/1000-Hadoop/1111-ml_model.json PRE-CREATION > > > Diff: https://reviews.apache.org/r/71619/diff/1/ > > > Testing > ------- > > verified it is valid json file > > > Thanks, > > Na Li > >