[ https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514229#comment-16514229 ]
Barbara Eckman edited comment on ATLAS-2708 at 6/15/18 7:28 PM: ---------------------------------------------------------------- [~bosco] {quote}You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is a list of Statement Structure. If we are not using it now, we should probably remove it and add it when we need to. Or we can create a placeholder S3BucketPolicy entity and associate that with AWSS3Bucket {quote} You're right, it is a list of statement structure. We made it a string because we only need to display it, and because we didn't want to bother parsing the json we got from AWS API and putting it into a structured Atlas entity. (blush) We are using it, so I created a placeholder S3AccessPolicy structure that consists of a string now, but can be expanded into the structure when someone needs/wants it. My new jsons are all_AWS_common_typedefs_v2.json and all_datalake_typedefs_v2.json was (Author: barbara): [~bosco] {quote}You have S3AccessPolicy in AWSS3Bucket as string. In S3, Bucket Policy is a list of Statement Structure. If we are not using it now, we should probably remove it and add it when we need to. Or we can create a placeholder S3BucketPolicy entity and associate that with AWSS3Bucket {quote} You're right, it is a list of statement structure. We made it a string because we only need to display it, and because we didn't want to bother parsing the json we got from AWS API and putting it into a structured Atlas entity. (blush) We are using it, so I created a placeholder S3AccessPolicy structure that consists of a string now, but can be expanded into the structure when someone needs/wants it. > AWS S3 data lake typedefs for Atlas > ----------------------------------- > > Key: ATLAS-2708 > URL: https://issues.apache.org/jira/browse/ATLAS-2708 > Project: Atlas > Issue Type: New Feature > Components: atlas-core > Reporter: Barbara Eckman > Assignee: Barbara Eckman > Priority: Critical > Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json, > all_datalake_typedefs.json > > > Currently the base types in Atlas do not include AWS data lake objects. It > would be nice to add typedefs for AWS data lake objects (buckets and > pseudo-directories) and lineage processes that move the data from another > source (e.g., kafka topic) to the data lake. For example: > * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in > an S3 bucket. For example, in the case of an object with key > “myWork/Development/Projects1.xls”, “myWork/Development” is the > pseudo-directory. It supports: > ** Array of avro schemas that are associated with the data in the > pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694) > ** what type of data it contains, e.g., avro, json, unstructured > ** time of creation > * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of > the data in a bucket to a storageClass after a specific time interval, or > expiration. For example, transition to GLACIER after 60 days, or expire > (i.e. be deleted) after 90 days: > ** ruleType (e.g., transition or expiration) > ** time interval in days before rule is executed > ** storageClass to which the data is transitioned (null if ruleType is > expiration) > * AWSTag type represents a tag-value pair created by the user and associated > with an AWS object. > ** tag > ** value > * AWSCloudWatchMetric type represents a storage or request metric that is > monitored by AWS CloudWatch and can be configured for a bucket > ** metricName, for example, “AllRequests”, “GetRequests”, > TotalRequestLatency, BucketSizeBytes > ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or > limit the monitoring of the metric. > * AWSS3Bucket type represents a bucket in an S3 instance. It supports: > ** Array of AWSS3PseudoDirectories that are associated with objects stored > in the bucket > ** AWS region > ** IsEncrypted (boolean) > ** encryptionType, e.g., AES-256 > ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, > PutObject > ** time of creation > ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket > ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or > its tags or prefixes > ** Array of AWSTags that are associated with the bucket > * Generic dataset2Dataset process to represent movement of data from one > dataset to another. It supports: > ** array of transforms performed by the process > ** map of tag/value pairs representing configurationParameters of the process > ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and > S3 pseudo-directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)