Hi, I extended the Redshift operator to pull the Role needed for the copy from the connection object. I stored the role arm in the extras of the connection. Works well so far. I’m not sure if that helps or if you’re looking for an out of the box solution, but I’d be happy to share my code with you.
Andrew On Wed, Jan 30, 2019 at 5:57 PM Austin Bennett <whatwouldausti...@gmail.com> wrote: > @ash I'll look into that as an option. Given I am still a novice user, I'm > consistently impressed with the simplicity (once understood) given the > layers of abstractions. I am not familiar enough with Instance profiles to > say whether that is suitable. > > Was reading the copy_query default ( > > https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py#L93 > ): > copy_query = """ > COPY {schema}.{table} > FROM 's3://{s3_bucket}/{s3_key}/{table}' > with credentials > 'aws_access_key_id={access_key};aws_secret_access_key={secret_key}' > {copy_options}; > """.format(schema=self.schema, > table=self.table, > s3_bucket=self.s3_bucket, > s3_key=self.s3_key, > access_key=credentials.access_key, > secret_key=credentials.secret_key, > copy_options=copy_options) > > Which then seems like there could be issues if not providing keys. And > ultimately I'd like to issue an alternate (albeit similar) copy_query: > > Currently, for perhaps 100 tables, am issuing the command "COPY <TABLENAME> > FROM 's3://<LOCATION>' iam_role > 'arn:aws:iam::<####>:role/<redshift-copy-unload>' <OPTIONS DELIMITER GZIP, > etc etc>;". > > Or, from their docs (linked in this thread): > > copy catdemo > from 's3://awssampledbuswest2/tickit/category_pipe.txt' > iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>' > region 'us-west-2'; > > > Was anchoring on getting the exact same statement built, which seems > straight forward (I could sketch that out, if you think that'd be of > interest). Perhaps that goal makes sense (whether that is the best > solution is certainly a different story). > > I must acknowledge inexperience with Airflow, so believe there is more to > it than my simplistic approach. Redshift with Airflow is queued for > another couple weeks (haven't even touched Redshift in months), was trying > to get ahead of things. I can start with an empty connection type and > reading up on instance profiles. > > Thank you for the input/feedback! > > > > > On Wed, Jan 30, 2019 at 1:56 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > If you create an "empty" connection of type "AWS" (i.e. don't specify a > > username or password) then the AWSHook/S3Hook will use instance profiles. > > > > Is that what you want? > > > > -ash > > > > > On 30 Jan 2019, at 18:45, Austin Bennett <whatwouldausti...@gmail.com> > > wrote: > > > > > > Have started to push our group to standardizing on airflow. We still > > have > > > a few large Redshift clusters. > > > > > > The s3_to_redshift_operator.py only appears to be written to > authenticate > > > via secret/access-keys. We no longer use Key Based authentication and > > rely > > > upon Role Based, therefore IAM groups. > > > > > > What would the recommended way - in airflow - to rely upon IAM for > > > authentication for COPY Commands to Redshift? Should I rely on a > plugin? > > > I'd be interested to contribute to the project by updating/extending > > > s3_to_redshift_operator.py should that be welcome. > > > > > > > > > References: > > > * https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html > > > * > > > > > > https://airflow.readthedocs.io/en/latest/_modules/airflow/operators/s3_to_redshift_operator.html > > > > > > * > > > > > > https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py > > > > > -- Sent from Gmail Mobile