Reuven, I think I found an example of the pattern you describe in JdbcIO.Read.expand(). Thanks for this.
On Wed, Sep 27, 2017 at 9:13 AM, Reuven Lax <[email protected]> wrote: > Create is essentially a BoundedSource under the covers. > > There are multiple ways to handle step 3. One is to produce a > PCollection<String> containing the filenames. You could then attach a Void > key (using WithKeys), GBK the filenames together and delete in the next > step. > > Reuven > > On Wed, Sep 27, 2017 at 9:04 AM, Jacob Marble <[email protected]> wrote: > > > Thanks, Reuven, that makes sense for step 1. After sending my original > > message, I started down the path of BoundedSource, but I think this could > > be better. > > > > Do you know any trick for step 3? > > > > On Wed, Sep 27, 2017 at 8:58 AM, Reuven Lax <[email protected]> > > wrote: > > > > > A common pattern is the following > > > > > > p.apply(Create.of((Void) null)) > > > .apply(MapElements.via((Void v) -> /* once operation */); > > > > > > Of course as is always the case with any Beam DoFn, your operation > might > > be > > > executed multiple times (e.g. if something fails before the runner > > commits > > > the fact that the operation has succeeded). You need to ensure that the > > > operation is idempotent. > > > > > > Reuven > > > > > > On Wed, Sep 27, 2017 at 8:51 AM, Jacob Marble <[email protected]> > > wrote: > > > > > > > I have been thinking on a Redshift reader/writer, basically to wrap > > > UNLOAD > > > > and COPY in a PTransform. For example, steps to UNLOAD into a > > > PCollection: > > > > > > > > 1) JDBC to Redshift - UNLOAD > > > > <http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html> TO > > > > 's3://bucket/tmp-prefix' > > > > 2) S3 to PCollection - work in progress <https://github.com/Kochava/ > > > > beam-s3> > > > > 3) delete tmp files from S3 > > > > > > > > To implement steps 1 and 3, I can't see a way to perform a task > exactly > > > > once, globally, in a PTransform. Sure, I could do those steps in > main() > > > or > > > > even in a separate script, but the result isn't code that can be > shared > > > and > > > > reused very well. > > > > > > > > Am I missing something? Seems like the kind of problem that I > shouldn't > > > be > > > > the first to encounter. > > > > > > > > Thanks, > > > > > > > > Jacob > > > > > > > > > > > > > > > -- > > Jacob > > > -- Jacob
