Barbara Eckman created ATLAS-3570:
-------------------------------------

             Summary: Atlas typedefs for Machine Learning Models, Feature Sets, 
and Feature Engineering Engines
                 Key: ATLAS-3570
                 URL: https://issues.apache.org/jira/browse/ATLAS-3570
             Project: Atlas
          Issue Type: New Feature
            Reporter: Barbara Eckman


Currently the base types in Atlas do not include Machine Learning (ML) Model 
tables. It would be nice to add typedefs for them, so they could be part of 
enterprise discovery and versioning.  

ENTITIES COULD INCLUDE:

MLModel (overview info), with attributes:
 * uniqueId
 * version
 * businessUseCase
 * modelFramework (eg scikit-learn)
 * modelTypes (eg random forest regressor)
 * modelClass (eg random forest (bagging + decision trees))
 * isEnsemble boolean
 * outcomeTypeDescription (eg single float)
 * **dataScienceOwnerEmail
 * githubRepoURL where the model code is founc
 * modelDeploymentDate
 * populationScored (eg in Comcast, residential or business customers)
 * accuracyMeasures

MLModelExecution, with attributes:
 * exampleInputDatasetURL (URL where a sample input dataset can be found)
 * outputTargetDatasetURLs
 * opsOwnerEmail
 * executionEndpointURL
 * dockerContainerURL
 * MLFlowPointerURL
 * executionNotebookURL (eg Databricks, Jupyter)

MLModelTraining, with attributes:
 * hyperParameters
 * trainingDatasetURLs
 * trainingNotebookURL (eg Databricks, Jupyter)

FeatureSet (a set of features prepared as input to an ML model), with 
attributes:
 * version
 * locationURL 

FeatureEngineeringEngine (the engine that generates the feature set for an ML 
model), with attributes:
 * version
 * ownerEmail
 * inputSourceURL
 * processingEngineInfoURL (docs on the processing engine)
 * githubRepoURL 
 * outputTargetURL

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to