@Andrew, indeed, having to authenticate to Redshift, separate from
credentials that allow S3 access, is how I work (outside of Airflow), so
also sensible that would be how is done in Airflow.  I guess I should use
ARN - rather than IAM - as the acronym (referring to the redshift-copy
role/credentials).

Certainly happy to go through your code!  Follow up (does the operator need
to be extended for all?) becomes an open question of whether this is a need
that isn't addressed with what is currently available (though I also am
going to checkout what @ash mentions around instance profiles).

@david our group standardized around practices of not using
SECRET_ACCESS_KEY (at least as refers to tied to AWS "Users" -- I'm also
not an expert with aws security, so might be messing up the terminology),
which is why the aim is to not have to go against that.



On Wed, Jan 30, 2019 at 4:08 PM Andrew Harmon <andrewhhar...@gmail.com>
wrote:

> Maybe just to clarify, to connect to Redshift and issue a COPY, you’ll need
> a Redshift username and password. You would store that in a Postgres
> connection. This is a un/pw in Redshift, not AWS creds. The SQL text needed
> to issue the COPY requires either AWS creds or the arn of a role to use.
>
> Andrew
>
> On Wed, Jan 30, 2019 at 6:52 PM Andrew Harmon <andrewhhar...@gmail.com>
> wrote:
>
> > Hi, I extended the Redshift operator to pull the Role needed for the copy
> > from the connection object. I stored the role arm in the extras of the
> > connection. Works well so far. I’m not sure if that helps or if you’re
> > looking for an out of the box solution, but I’d be happy to share my code
> > with you.
> >
> > Andrew
> >
> > On Wed, Jan 30, 2019 at 5:57 PM Austin Bennett <
> > whatwouldausti...@gmail.com> wrote:
> >
> >> @ash I'll look into that as an option.  Given I am still a novice user,
> >> I'm
> >> consistently impressed with the simplicity (once understood) given the
> >> layers of abstractions.  I am not familiar enough with Instance profiles
> >> to
> >> say whether that is suitable.
> >>
> >> Was reading the copy_query default (
> >>
> >>
> https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py#L93
> >> ):
> >> copy_query = """
> >> COPY {schema}.{table}
> >> FROM 's3://{s3_bucket}/{s3_key}/{table}'
> >> with credentials
> >> 'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
> >> {copy_options};
> >> """.format(schema=self.schema,
> >> table=self.table,
> >> s3_bucket=self.s3_bucket,
> >> s3_key=self.s3_key,
> >> access_key=credentials.access_key,
> >> secret_key=credentials.secret_key,
> >> copy_options=copy_options)
> >>
> >> Which then seems like there could be issues if not providing keys.  And
> >> ultimately I'd like to issue an alternate (albeit similar) copy_query:
> >>
> >> Currently, for perhaps 100 tables, am issuing the command "COPY
> >> <TABLENAME>
> >> FROM 's3://<LOCATION>' iam_role
> >> 'arn:aws:iam::<####>:role/<redshift-copy-unload>' <OPTIONS DELIMITER
> GZIP,
> >> etc etc>;".
> >>
> >> Or, from their docs (linked in this thread):
> >>
> >> copy catdemo
> >> from 's3://awssampledbuswest2/tickit/category_pipe.txt'
> >> iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
> >> region 'us-west-2';
> >>
> >>
> >> Was anchoring on getting the exact same statement built, which seems
> >> straight forward (I could sketch that out, if you think that'd be of
> >> interest).  Perhaps that goal makes sense (whether that is the best
> >> solution is certainly a different story).
> >>
> >> I must acknowledge inexperience with Airflow, so believe there is more
> to
> >> it than my simplistic approach.  Redshift with Airflow is queued for
> >> another couple weeks (haven't even touched Redshift in months), was
> trying
> >> to get ahead of things.  I can start with an empty connection type and
> >> reading up on instance profiles.
> >>
> >> Thank you for the input/feedback!
> >>
> >>
> >>
> >>
> >> On Wed, Jan 30, 2019 at 1:56 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> >>
> >> > If you create an "empty" connection of type "AWS" (i.e. don't specify
> a
> >> > username or password) then the AWSHook/S3Hook will use instance
> >> profiles.
> >> >
> >> > Is that what you want?
> >> >
> >> > -ash
> >> >
> >> > > On 30 Jan 2019, at 18:45, Austin Bennett <
> whatwouldausti...@gmail.com
> >> >
> >> > wrote:
> >> > >
> >> > > Have started to push our group to standardizing on airflow.  We
> still
> >> > have
> >> > > a few large Redshift clusters.
> >> > >
> >> > > The s3_to_redshift_operator.py only appears to be written to
> >> authenticate
> >> > > via secret/access-keys.  We no longer use Key Based authentication
> and
> >> > rely
> >> > > upon Role Based, therefore IAM groups.
> >> > >
> >> > > What would the recommended way - in airflow - to rely upon IAM for
> >> > > authentication for COPY Commands to Redshift?  Should I rely on a
> >> plugin?
> >> > > I'd be interested to contribute to the project by updating/extending
> >> > > s3_to_redshift_operator.py should that be welcome.
> >> > >
> >> > >
> >> > > References:
> >> > > * https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
> >> > > *
> >> > >
> >> >
> >>
> https://airflow.readthedocs.io/en/latest/_modules/airflow/operators/s3_to_redshift_operator.html
> >> > >
> >> > > *
> >> > >
> >> >
> >>
> https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py
> >> >
> >> >
> >>
> > --
> > Sent from Gmail Mobile
> >
>

Reply via email to