Barbara Eckman created ATLAS-3570: ------------------------------------- Summary: Atlas typedefs for Machine Learning Models, Feature Sets, and Feature Engineering Engines Key: ATLAS-3570 URL: https://issues.apache.org/jira/browse/ATLAS-3570 Project: Atlas Issue Type: New Feature Reporter: Barbara Eckman
Currently the base types in Atlas do not include Machine Learning (ML) Model tables. It would be nice to add typedefs for them, so they could be part of enterprise discovery and versioning. ENTITIES COULD INCLUDE: MLModel (overview info), with attributes: * uniqueId * version * businessUseCase * modelFramework (eg scikit-learn) * modelTypes (eg random forest regressor) * modelClass (eg random forest (bagging + decision trees)) * isEnsemble boolean * outcomeTypeDescription (eg single float) * **dataScienceOwnerEmail * githubRepoURL where the model code is founc * modelDeploymentDate * populationScored (eg in Comcast, residential or business customers) * accuracyMeasures MLModelExecution, with attributes: * exampleInputDatasetURL (URL where a sample input dataset can be found) * outputTargetDatasetURLs * opsOwnerEmail * executionEndpointURL * dockerContainerURL * MLFlowPointerURL * executionNotebookURL (eg Databricks, Jupyter) MLModelTraining, with attributes: * hyperParameters * trainingDatasetURLs * trainingNotebookURL (eg Databricks, Jupyter) FeatureSet (a set of features prepared as input to an ML model), with attributes: * version * locationURL FeatureEngineeringEngine (the engine that generates the feature set for an ML model), with attributes: * version * ownerEmail * inputSourceURL * processingEngineInfoURL (docs on the processing engine) * githubRepoURL * outputTargetURL -- This message was sent by Atlassian Jira (v8.3.4#803005)