[ 
https://issues.apache.org/jira/browse/BEAM-4796?focusedWorklogId=144873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-144873
 ]

ASF GitHub Bot logged work on BEAM-4796:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Sep/18 15:22
            Start Date: 17/Sep/18 15:22
    Worklog Time Spent: 10m 
      Work Description: nielm opened a new pull request #6409: [BEAM-4796] 
SpannerIO: Add option to wait for Schema to be ready.
URL: https://github.com/apache/beam/pull/6409
 
 
   Current behavior waits for the entire input PCollection to be read
   and closed before reading the Schema. This can delay the pipeline
   for large inputs, and does not guarantee that the schema is ready
   (if it is created in the same pipeline) for small inputs.
   
   It also breaks streaming mode completely as the input PCollection
   is never closed. 
   
   This PR adds an optional parameter with a PCollection to wait for
   before reading the schema. If not specified, the schema is read
   immediately.
   
   This provides a partial -- but not complete -- fix for streaming mode (there 
are still issues with the partitioning/grouping in streaming mode which means 
that NPE's will be thrown with more than trivial load).
   
   @chamikaramj 
   
   Post-Commit Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
 </br> [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 144873)
            Time Spent: 10m
    Remaining Estimate: 0h

> SpannerIO waits for all input before writing
> --------------------------------------------
>
>                 Key: BEAM-4796
>                 URL: https://issues.apache.org/jira/browse/BEAM-4796
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.5.0
>            Reporter: Niel Markwick
>            Assignee: Chamikara Jayalath
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> SpannerIO.Write waits for all input in the window to arrive before getting 
> the schema:
> [https://github.com/apache/beam/blame/release-2.5.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L841]
>  
> In streaming mode, this is not an issue, but in batch mode, this causes the 
> pipeline to stall until all input is read, which could be a significant 
> amount of time (and temp data). 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to