[ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517207#comment-16517207
 ] 

Eckman, Barbara commented on ATLAS-2708:
----------------------------------------

Awesome!  Thanks for your suggestions, too!

On 6/19/18, 2:00 AM, "Don Bosco Durai (JIRA)" <j...@apache.org> wrote:

    
        [ 
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516825#comment-16516825
 ] 
    
    Don Bosco Durai commented on ATLAS-2708:
    ----------------------------------------
    
    {quote}[~bosco] [~madhan.neethiraj] Of course you can use my JSON for the 
demo!
    {quote}
    [~barbara] I was able to swap out my model with yours. Now it looks pretty 
good :) Thanks
    
    > AWS S3 data lake typedefs for Atlas
    > -----------------------------------
    >
    >                 Key: ATLAS-2708
    >                 URL: https://issues.apache.org/jira/browse/ATLAS-2708
    >             Project: Atlas
    >          Issue Type: New Feature
    >          Components:  atlas-core
    >            Reporter: Barbara Eckman
    >            Assignee: Barbara Eckman
    >            Priority: Critical
    >             Fix For: 1.1.0, 2.0.0
    >
    >         Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, 
ATLAS-2708.patch, all_AWS_common_typedefs.json, 
all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
all_datalake_typedefs_v2.json
    >
    >
    > Currently the base types in Atlas do not include AWS data lake objects. 
It would be nice to add typedefs for AWS data lake objects (buckets and 
pseudo-directories) and lineage processes that move the data from another 
source (e.g., kafka topic) to the data lake.  For example:
    >  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of 
objects in an S3 bucket.  For example, in the case of an object with key 
“myWork/Development/Projects1.xls”, “myWork/Development” is the 
pseudo-directory.  It supports:
    >  ** Array of avro schemas that are associated with the data in the 
pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
    >  ** what type of data it contains, e.g., avro, json, unstructured
    >  ** time of creation
    >  * AWSS3BucketLifeCycleRule type represents a rule specifying a 
transition of the data in a bucket to a storageClass after a specific time 
interval, or expiration.  For example, transition to GLACIER after 60 days, or 
expire (i.e. be deleted) after 90 days:
    >  ** ruleType (e.g., transition or expiration)
    >  ** time interval in days before rule is executed  
    >  ** storageClass to which the data is transitioned (null if ruleType is 
expiration)
    >  * AWSTag type represents a tag-value pair created by the user and 
associated with an AWS object.
    >  **  tag
    >  ** value
    >  * AWSCloudWatchMetric type represents a storage or request metric that 
is monitored by AWS CloudWatch and can be configured for a bucket
    >  ** metricName, for example, “AllRequests”, “GetRequests”, 
TotalRequestLatency, BucketSizeBytes
    >  ** scope: null if entire bucket; otherwise, the prefixes/tags that 
filter or limit the monitoring of the metric.
    >  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
    >  ** Array of AWSS3PseudoDirectories that are associated with objects 
stored in the bucket 
    >  ** AWS region
    >  ** IsEncrypted (boolean) 
    >  ** encryptionType, e.g., AES-256
    >  ** S3AccessPolicy, a JSON object expressing access policies, eg 
GetObject, PutObject
    >  ** time of creation
    >  ** Array of AWSS3BucketLifeCycleRules that are associated with the 
bucket 
    >  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket 
or its tags or prefixes
    >  ** Array of AWSTags that are associated with the bucket
    >  * Generic dataset2Dataset process to represent movement of data from one 
dataset to another.  It supports:
    >  ** array of transforms performed by the process 
    >  ** map of tag/value pairs representing configurationParameters of the 
process
    >  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic 
and S3 pseudo-directory.
    >  
    
    
    
    --
    This message was sent by Atlassian JIRA
    (v7.6.3#76005)


> AWS S3 data lake typedefs for Atlas
> -----------------------------------
>
>                 Key: ATLAS-2708
>                 URL: https://issues.apache.org/jira/browse/ATLAS-2708
>             Project: Atlas
>          Issue Type: New Feature
>          Components:  atlas-core
>            Reporter: Barbara Eckman
>            Assignee: Barbara Eckman
>            Priority: Critical
>             Fix For: 1.1.0, 2.0.0
>
>         Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, 
> ATLAS-2708.patch, all_AWS_common_typedefs.json, 
> all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, 
> all_datalake_typedefs_v2.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It 
> would be nice to add typedefs for AWS data lake objects (buckets and 
> pseudo-directories) and lineage processes that move the data from another 
> source (e.g., kafka topic) to the data lake.  For example:
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in 
> an S3 bucket.  For example, in the case of an object with key 
> “myWork/Development/Projects1.xls”, “myWork/Development” is the 
> pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the 
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of 
> the data in a bucket to a storageClass after a specific time interval, or 
> expiration.  For example, transition to GLACIER after 60 days, or expire 
> (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is 
> expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated 
> with an AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is 
> monitored by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, 
> TotalRequestLatency, BucketSizeBytes
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or 
> limit the monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored 
> in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, 
> PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or 
> its tags or prefixes
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one 
> dataset to another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and 
> S3 pseudo-directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to