I have been thinking on a Redshift reader/writer, basically to wrap UNLOAD
and COPY in a PTransform. For example, steps to UNLOAD into a PCollection:

1) JDBC to Redshift - UNLOAD
<http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html> TO
's3://bucket/tmp-prefix'
2) S3 to PCollection - work in progress <https://github.com/Kochava/beam-s3>
3) delete tmp files from S3

To implement steps 1 and 3, I can't see a way to perform a task exactly
once, globally, in a PTransform. Sure, I could do those steps in main() or
even in a separate script, but the result isn't code that can be shared and
reused very well.

Am I missing something? Seems like the kind of problem that I shouldn't be
the first to encounter.

Thanks,

Jacob

Reply via email to