[jira] [Assigned] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-91: - Assignee: (was: Frances Perry) > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (BEAM-2450) Transform names and named applications should not be null or empty
[ https://issues.apache.org/jira/browse/BEAM-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-2450: --- Assignee: (was: Frances Perry) > Transform names and named applications should not be null or empty > -- > > Key: BEAM-2450 > URL: https://issues.apache.org/jira/browse/BEAM-2450 > Project: Beam > Issue Type: Bug > Components: beam-model, sdk-java-core, sdk-py >Reporter: Scott Wegner >Priority: Minor > > Beam SDK allows setting the name of a transform [1] and also naming the > transform application [2]. If no name is specified on application, the name > of the transform is used. If no name is specified for the transform, the > class name is used. > The application name serves as metadata for the applied PTransforms in the > constructed graph. The are effectively extra display data (historically, > PTransform names predate display data). The names are used by runners for UI > and monitoring applications, such as the displayed pipeline graph in the > Dataflow Monitoring UI [3]. > Currently there is no explicit validation on the specified application name. > The current behavior seems to be: > * null application names cause a NullPointerException at construction time. > * Specifying the empty string compiles and succeeds in the DirectRunner, but > causes strange behavior in Dataflow when rendering the graph in the UI. I > have not tested the behavior of other runners. > We should add explicit validation in the model on the specified transform > name and application name. I propose that we disallow null and empty names. > This is technically a breaking change as the SDK currently allows the empty > string, but only because it is under-specified. The upgrade path for any > pipelines broken by this change is simple: specify a non-empty name or > fallback to the default class name. > [1] > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L236 > [2] > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java#L295 > [3] > https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf#viewing-a-pipeline -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (BEAM-1976) Allow only one runner profile active at once in examples and archetypes
[ https://issues.apache.org/jira/browse/BEAM-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-1976: --- Assignee: (was: Frances Perry) > Allow only one runner profile active at once in examples and archetypes > --- > > Key: BEAM-1976 > URL: https://issues.apache.org/jira/browse/BEAM-1976 > Project: Beam > Issue Type: Sub-task > Components: examples-java >Reporter: Aviem Zur > > Since only one SLF4J logger binding is allowed in the classpath, we shouldn't > allow more than one runner profile to be active at once in our > examples/archetype modules since different runners use different bindings. > Also, remove slf4j-jdk14 dependency from root and place it instead in > direct-runner and dataflow-runner profiles, for the same reason. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (BEAM-454) Validate Pubsub Topic exists when reading
[ https://issues.apache.org/jira/browse/BEAM-454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-454: -- Assignee: Borisa Zivkovic > Validate Pubsub Topic exists when reading > - > > Key: BEAM-454 > URL: https://issues.apache.org/jira/browse/BEAM-454 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Frances Perry >Assignee: Borisa Zivkovic >Priority: Minor > Labels: newbie, starter > > When reading from Pubsub, we should validate the pubsub topic exists at graph > construction time (similar to the way we validate a BQ dataset and table > exist). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (BEAM-193) Port existing Dataflow SDK documentation to Beam Programming Guide
[ https://issues.apache.org/jira/browse/BEAM-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry closed BEAM-193. -- Resolution: Fixed Fix Version/s: Not applicable > Port existing Dataflow SDK documentation to Beam Programming Guide > -- > > Key: BEAM-193 > URL: https://issues.apache.org/jira/browse/BEAM-193 > Project: Beam > Issue Type: Task > Components: website >Reporter: Devin Donnelly >Assignee: Melissa Pashniak > Fix For: Not applicable > > > There is an extensive amount of documentation on the Dataflow SDK programming > model and classes. Port this documentation over as a new Beam Programming > Guide covering the following major topics: > - Programming model overview > - Pipeline structure > - PCollections > - Transforms > - I/O -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-622) Add checkpointing tests for DoFnOperator and WindowDoFnOperator
[ https://issues.apache.org/jira/browse/BEAM-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977100#comment-15977100 ] Frances Perry commented on BEAM-622: Is this a blocker for the first stable release? Is there a good owner if so? > Add checkpointing tests for DoFnOperator and WindowDoFnOperator > > > Key: BEAM-622 > URL: https://issues.apache.org/jira/browse/BEAM-622 > Project: Beam > Issue Type: Test > Components: runner-flink >Affects Versions: 0.3.0-incubating >Reporter: Maximilian Michels > Fix For: First stable release > > > Tests which test the correct snapshotting of these two operators are missing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-170) Session windows should not be identified by their bounds
[ https://issues.apache.org/jira/browse/BEAM-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977095#comment-15977095 ] Frances Perry commented on BEAM-170: Ilya, should we find a new owner for this? > Session windows should not be identified by their bounds > > > Key: BEAM-170 > URL: https://issues.apache.org/jira/browse/BEAM-170 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Ilya Ganelin > Labels: backward-incompatible > Fix For: First stable release > > > Today, if two session windows for the same key have the same bounds, they are > considered the same window. This is an accident. It is not intended that any > session windows are considered equal except via the operation of merging them > into the same session. > A risk associated with this behavior is that two windows that happen to > coincide will share per-window-and-key state rather than evolving separately > and having their separate state reconciled by state merging logic. These code > paths are not required to be coherent, and in practice they are not. > In particular, if the trigger for a session window ever finishes, then > subsequent data in a window with the same bounds will be dropped, whereas if > it had differed by a millisecond it would have created a new session, > ignoring the previously closed session. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1247) Session state should not be lost when discardingFiredPanes
[ https://issues.apache.org/jira/browse/BEAM-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977091#comment-15977091 ] Frances Perry commented on BEAM-1247: - Any updates on this one? > Session state should not be lost when discardingFiredPanes > -- > > Key: BEAM-1247 > URL: https://issues.apache.org/jira/browse/BEAM-1247 > Project: Beam > Issue Type: Bug > Components: beam-model, runner-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Labels: backward-incompatible > Fix For: First stable release > > > Today when {{discardingFiredPanes}} the entirety of state is cleared, > including the state of evolving sessions. This means that with multiple > triggerings a single session shows up as multiple. This also stymies > downstream stateful computations. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-501) Update website skin
[ https://issues.apache.org/jira/browse/BEAM-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958315#comment-15958315 ] Frances Perry commented on BEAM-501: Checked with JB that he's not actively working on this right now. Reassigning to Jeremy, who has some great thoughts ;-) > Update website skin > --- > > Key: BEAM-501 > URL: https://issues.apache.org/jira/browse/BEAM-501 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Jeremy Weinstein > > Update the main landing page and website skin as discussed here > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (BEAM-501) Update website skin
[ https://issues.apache.org/jira/browse/BEAM-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-501: -- Assignee: Jeremy Weinstein (was: Jean-Baptiste Onofré) > Update website skin > --- > > Key: BEAM-501 > URL: https://issues.apache.org/jira/browse/BEAM-501 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Frances Perry >Assignee: Jeremy Weinstein > > Update the main landing page and website skin as discussed here > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (BEAM-1069) Add CountingInput Transform to python sdk
[ https://issues.apache.org/jira/browse/BEAM-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-1069: --- Assignee: Tibor Kiss (was: Frances Perry) > Add CountingInput Transform to python sdk > - > > Key: BEAM-1069 > URL: https://issues.apache.org/jira/browse/BEAM-1069 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Vikas Kedigehalli >Assignee: Tibor Kiss >Priority: Minor > Labels: starter > > Similar to java sdk, > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CountingInput.java -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (BEAM-1627) Composite/DisplayData structure changed
[ https://issues.apache.org/jira/browse/BEAM-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-1627: Attachment: ParseGame-0.5.png ParseGame-snapshot-extraComposite.png FixedWindows-0.5.png FixedWindows-snapshot-extraComposite-noDisplayData.png > Composite/DisplayData structure changed > --- > > Key: BEAM-1627 > URL: https://issues.apache.org/jira/browse/BEAM-1627 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Frances Perry >Assignee: Thomas Groh >Priority: Blocker > Fix For: 0.6.0 > > Attachments: FixedWindows-0.5.png, > FixedWindows-snapshot-extraComposite-noDisplayData.png, ParseGame-0.5.png, > ParseGame-snapshot-extraComposite.png > > > When running at head, pipeline composite structure has changed. My guess is > this is related to pull/2145. > (1) Steps that used to be leaf notes are now expandable composites with a > ParMultiDo inside them. > (2) For some (but not all) display data appears to be lost > This can be seen pretty clearly in the Dataflow monitoring UI. Attached > screenshots showing > -- ParseGameEvent transform leaks an extra level of composite. > -- FixedWindows transform leaks an extra composite and loses display data. > [~tgroh] can you triage? > [~altay] FYI potential 0.6 release blocker -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (BEAM-1627) Composite/DisplayData structure changed
Frances Perry created BEAM-1627: --- Summary: Composite/DisplayData structure changed Key: BEAM-1627 URL: https://issues.apache.org/jira/browse/BEAM-1627 Project: Beam Issue Type: Bug Components: runner-dataflow Reporter: Frances Perry Assignee: Thomas Groh Priority: Blocker Fix For: 0.6.0 When running at head, pipeline composite structure has changed. My guess is this is related to pull/2145. (1) Steps that used to be leaf notes are now expandable composites with a ParMultiDo inside them. (2) For some (but not all) display data appears to be lost This can be seen pretty clearly in the Dataflow monitoring UI. Attached screenshots showing -- ParseGameEvent transform leaks an extra level of composite. -- FixedWindows transform leaks an extra composite and loses display data. [~tgroh] can you triage? [~altay] FYI potential 0.6 release blocker -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1624) Unable to deserialize Coder in DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896112#comment-15896112 ] Frances Perry commented on BEAM-1624: - [~altay] FYI -- considering this 0.6 release blocking until triaged. > Unable to deserialize Coder in DataflowRunner > - > > Key: BEAM-1624 > URL: https://issues.apache.org/jira/browse/BEAM-1624 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Frances Perry >Assignee: Davor Bonaci >Priority: Blocker > Fix For: 0.6.0 > > > To repro, sync to head and run the LeaderBoard example with the Dataflow > runner > Does not repro in 0.5. > Caused by: java.lang.RuntimeException: Unable to deserialize Coder: > WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder). > Check that a suitable constructor is defined. See Coder for details. > at > org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231) > at > org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206) > at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153) > at > org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505) > at > org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210) > at > org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340) > ... 6 more > Caused by: java.lang.RuntimeException: Unable to deserialize class interface > org.apache.beam.sdk.coders.Coder > at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102) > at > org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112) > ... 29 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (BEAM-1624) Unable to deserialize Coder in DataflowRunner
Frances Perry created BEAM-1624: --- Summary: Unable to deserialize Coder in DataflowRunner Key: BEAM-1624 URL: https://issues.apache.org/jira/browse/BEAM-1624 Project: Beam Issue Type: Bug Components: runner-dataflow Reporter: Frances Perry Assignee: Davor Bonaci Priority: Blocker Fix For: 0.6.0 To repro, sync to head and run the LeaderBoard example with the Dataflow runner Does not repro in 0.5. Caused by: java.lang.RuntimeException: Unable to deserialize Coder: WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder). Check that a suitable constructor is defined. See Coder for details. at org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231) at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206) at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153) at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505) at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210) at org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340) ... 6 more Caused by: java.lang.RuntimeException: Unable to deserialize class interface org.apache.beam.sdk.coders.Coder at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102) at org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112) ... 29 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (BEAM-1556) Spark executors need to register IO factories
Frances Perry created BEAM-1556: --- Summary: Spark executors need to register IO factories Key: BEAM-1556 URL: https://issues.apache.org/jira/browse/BEAM-1556 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Frances Perry Assignee: Amit Sela The Spark executors need to call IOChannelUtils.registerIOFactories(options) in order to support GCS file and make the default WordCount example work. Context in this thread: https://lists.apache.org/thread.html/469a139c9eb07e64e514cdea42ab8000678ab743794a090c365205d7@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-370) Remove the .named() methods from PTransforms and sub-classes
[ https://issues.apache.org/jira/browse/BEAM-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771289#comment-15771289 ] Frances Perry commented on BEAM-370: Can this issue be closed? > Remove the .named() methods from PTransforms and sub-classes > > > Key: BEAM-370 > URL: https://issues.apache.org/jira/browse/BEAM-370 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ben Chambers >Assignee: Ben Chambers >Priority: Minor > Labels: backward-incompatible > > 1. Update examples/tests/etc. to use named application instead of `.named()` > 2. Remove the `.named()` methods from composite PTransforms > 3. Where appropriate, use the the PTransform constructor which takes a string > to use as the default name. > See further discussion in the related thread > (http://mail-archives.apache.org/mod_mbox/incubator-beam-dev/201606.mbox/%3ccan-7fgzuz1f_szzd2orfyd2pk2_prymhgwjepjpefp01h7s...@mail.gmail.com%3E). -- This message was sent by Atlassian JIRA (v6.3.4#6332)