[jira] [Commented] (AIRFLOW-2862) S3ToRedshiftTransfer Copy Command Flexibility

Daniel Lamblin (JIRA) Tue, 15 Jan 2019 02:21:03 -0800


    [ 
https://issues.apache.org/jira/browse/AIRFLOW-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742936#comment-16742936
 ]


Daniel Lamblin commented on AIRFLOW-2862:
-----------------------------------------

This would be a breaking change and would need to be held back for the 2.0 
release (if breaking this is okay then) or made non-default behavior via an 
option… or just write a different operator named a little differently.

TBH if it were my Airflow deployments I was doing this for, I would do 
something like:
 * Copy this operator and name it like S3ToRedshiftTransfer2
 * put the more sensible (it's a good suggestion) change into this operator in 
a way that the S3ToRedshiftTransfer operator can subclass the 
S3ToRedshiftTransfer2 operator and
 * override the command template to provide the existing behavior in a subclass 
named S3ToRedshiftTransfer (I know that sounds backward but...).; then
 * when 2.0 is released
 ** rename S3ToRedshiftTransfer2 operator back to S3ToRedshiftTransfer,
 ** rename the subclassed operator to S3ToRedshiftTransferDeprecated, and
 ** leave it's implementation in documentation only for users who are upgrading 
and can't update some X number of DAGs.

> S3ToRedshiftTransfer Copy Command Flexibility
> ---------------------------------------------
>
>                 Key: AIRFLOW-2862
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2862
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>            Reporter: Micheal Ascah
>            Assignee: Micheal Ascah
>            Priority: Minor
>
> Currently, the S3ToRedshiftTransfer class requires that the target table to 
> be loaded is suffixed to the end of the S3 key provided.
> It doesn't seem justifiable that the operator should require the file be 
> named by any convention. The S3 bucket + S3 key should be all that is needed. 
> This makes it possible to load any S3 Key into a Redshift table, rather than 
> only files that have the table name at the end of the S3 key.
> The S3 key parameter should also be template-able so that files created in S3 
> using timestamps from macros in other tasks in the current DAG run can be 
> used to identify files when loading from S3 to Redshift.
> The command template should change from 
> {code:java}
> COPY {schema}.{table}
>  FROM 's3://{s3_bucket}/{s3_key}/{table}'
>  with credentials
>  'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
>  {copy_options};{code}
>  To
>  
> {code:java}
> COPY {schema}.{table}
>  FROM 's3://{s3_bucket}/{s3_key}'
>  with credentials
>  'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
>  {copy_options};
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2862) S3ToRedshiftTransfer Copy Command Flexibility

Reply via email to