[ https://issues.apache.org/jira/browse/BEAM-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287905#comment-15287905 ]
ASF GitHub Bot commented on BEAM-53: ------------------------------------ GitHub user mshields822 opened a pull request: https://github.com/apache/incubator-beam/pull/346 [BEAM-53] Wire PubsubUnbounded{Source,Sink} into PubsubIO This also refines the handling of record ids in the sink to be random-but-reused-on-failure, using the same trick as we do for the BigQuery sink. Still need to re-do the load tests I did a few weeks back with the actual change. Note that last time I tested the DataflowPipelineTranslator does not kick in and replace the new transforms with the correct native transforms. Need to dig deeper. R: @dhalperi You can merge this pull request into a Git repository by running: $ git pull https://github.com/mshields822/incubator-beam pubsub-runner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #346 ---- commit ce655a6a7480147f7527fa23818e1d546abaa599 Author: Mark Shields <markshie...@google.com> Date: 2016-04-12T00:36:27Z Make java unbounded pub/sub source the default. commit aafcf5f9d0286ec5a6ed0d634df8ff0902897cdc Author: Mark Shields <markshie...@google.com> Date: 2016-05-17T23:44:30Z Refine record id calculation. Prepare for supporting unit tests with re-using record ids. ---- > PubSubIO: reimplement in Java > ----------------------------- > > Key: BEAM-53 > URL: https://issues.apache.org/jira/browse/BEAM-53 > Project: Beam > Issue Type: New Feature > Components: runner-core > Reporter: Daniel Halperin > Assignee: Mark Shields > > PubSubIO is currently only partially implemented in Java: the > DirectPipelineRunner uses a non-scalable API in a single-threaded manner. > In contrast, the DataflowPipelineRunner uses an entirely different code path > implemented in the Google Cloud Dataflow service. > We need to reimplement PubSubIO in Java in order to support other runners in > a scalable way. > Additionally, we can take this opportunity to add new features: > * getting timestamp from an arbitrary lambda in arbitrary formats rather than > from a message attribute in only 2 formats. > * exposing metadata and attributes in the elements produced by PubSubIO.Read > * setting metadata and attributes in the messages written by PubSubIO.Write -- This message was sent by Atlassian JIRA (v6.3.4#6332)