[beam] branch spark-runner_structured-streaming updated: Add setEnableSparkMetricSinks()

2019-07-08 Thread aromanenko
This is an automated email from the ASF dual-hosted git repository.

aromanenko pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to 
refs/heads/spark-runner_structured-streaming by this push:
 new 239bc93  Add setEnableSparkMetricSinks()
239bc93 is described below

commit 239bc93f09a095bd40a861150c0030bebf1fe6f1
Author: Alexey Romanenko 
AuthorDate: Mon Jul 8 11:55:47 2019 +0200

Add setEnableSparkMetricSinks()
---
 .../structuredstreaming/SparkStructuredStreamingPipelineOptions.java| 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
index 27743bd..bbf89f6 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
@@ -30,4 +30,6 @@ public interface SparkStructuredStreamingPipelineOptions 
extends SparkCommonPipe
   @Description("Enable/disable sending aggregator values to Spark's metric 
sinks")
   @Default.Boolean(true)
   Boolean getEnableSparkMetricSinks();
+
+  void setEnableSparkMetricSinks(Boolean enableSparkMetricSinks);
 }



[beam] branch spark-runner_structured-streaming updated (239bc93 -> 7356ec2)

2019-07-08 Thread echauchot
This is an automated email from the ASF dual-hosted git repository.

echauchot pushed a change to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git.


 discard 239bc93  Add setEnableSparkMetricSinks()
 discard db96d6a  Add missing dependencies to run Spark Structured Streaming 
Runner on Nexmark
 discard 59cc5dd  Add metrics support in DoFn
 discard fc83520  Ignore for now not working test testCombineGlobally
 discard 59b9e3f  Add a test that combine per key preserves windowing
 discard 618cc5c  Clean groupByKeyTest
 discard 0347a07  add comment in combine globally test
 discard c5aad8f  Fixed immutable list bug
 discard e741781  Fix javadoc of AggregatorCombiner
 discard 85e6df4  Clean not more needed WindowingHelpers
 discard 6add5fe  Clean not more needed RowHelpers
 discard df16763  Clean no more needed KVHelpers
 discard bf13e07  Now that there is only Combine.PerKey translation, make only 
one Aggregator
 discard e80c908  Remove CombineGlobally translation because it is less 
performant than the beam sdk one (key + combinePerKey.withHotkeyFanout)
 discard 37cdd7e  Remove the mapPartition that adds a key per partition because 
otherwise spark will reduce values per key instead of globally
 discard 38e9eae  Fix bug in the window merging logic
 discard dcb3949  Fix wrong encoder in combineGlobally GBK
 discard 11c3792  Fix case when a window does not merge into any other window
 discard 884e5f90 Apply a groupByKey avoids for some reason that the spark 
structured streaming fmwk casts data to Row which makes it impossible to 
deserialize without the coder shipped into the data. For performance reasons 
(avoid memory consumption and having to deserialize), we do not ship coder + 
data. Also add a mapparitions before GBK to avoid shuffling
 discard f0522dc  [to remove] temporary: revert extractKey while combinePerKey 
is not done (so that it compiles)
 discard 9a269ef  Fix encoder in combine call
 discard 8f8bae4  Implement merge accumulators part of CombineGlobally 
translation with windowing
 discard 4602f83  Output data after combine
 discard bba08b4  Implement reduce part of CombineGlobally translation with 
windowing
 discard 8d05d46  Fix comment about schemas
 discard 8a4372d  Update KVHelpers.extractKey() to deal with WindowedValue and 
update GBK and CPK
 discard 8cdd143  Add TODO in Combine translations
 discard f86660f  Add a test that GBK preserves windowing
 discard d0f5806  fixup Enable UsesBoundedSplittableParDo category of tests 
Since the default overwrite is already in place and it is breaking only in one 
subcase
 discard 09e6207  Improve visibility of debug messages
 discard c947455  re-enable reduceFnRunner timers for output
 discard 89df2bf  Re-code GroupByKeyTranslatorBatch to conserve windowing 
instead of unwindowing/windowing(GlobalWindow): simplify code, use 
ReduceFnRunner to merge the windows
 discard 6ba3d1c  Add comment about checkpoint mark
 discard 982197c  Update windowAssignTest
 discard 6ae2056  Put back batch/simpleSourceTest.testBoundedSource
 discard 728aa1f  Consider null object case on RowHelpers, fixes empty side 
inputs tests.
 discard 1d9155d  fixup hadoop-format is not mandataory to run ValidatesRunner 
tests
 discard be79a86  Enable UsesSchema tests on ValidatesRunner
 discard ad22b68  Pass transform based doFnSchemaInformation in ParDo 
translation
 discard 5cf3e1b  fixup Enable UsesFailureMessage category of tests
 discard 017497c  Fixes ParDo not calling setup and not tearing down if 
exception on startBundle
 discard d08f110  Limit the number of partitions to make tests go 300% faster
 discard 85a9589  Add Batch Validates Runner tests for Structured Streaming 
Runner
 discard f2f953a  Apply Spotless
 discard bd06201  Update javadoc
 discard f2cac46  implement source.stop
 discard 4786cf1  Ignore spark offsets (cf javadoc)
 discard 496d660  Use PAssert in Spark Structured Streaming transform tests
 discard e542bd3  Rename SparkPipelineResult to 
SparkStructuredStreamingPipelineResult This is done to avoid an eventual 
collision with the one in SparkRunner. However this cannot happen at this 
moment because it is package private, so it is also done for consistency.
 discard 1081207  Add SparkStructuredStreamingPipelineOptions and 
SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added 
to have the new   runner rely only on its specific options.
 discard 182f2d9  Fix logging levels in Spark Structured Streaming translation
 discard 40f2135  Fix spotless issues after rebase
 discard 0262dab  Add doFnSchemaInformation to ParDo batch translation
 discard e1c8e96  Fix non-vendored imports from Spark Streaming Runner classes
 discard 2605bae  Remove specific PipelineOotions and Registrar for Structured 
Streaming Runner
 discard f12ec6b  Rename Runner to SparkStructuredStreamingRunner
 discard dc756a6  Remove spark-structured-streaming module
 discard e76b88b  Merge Spark Structured Streaming 

[beam] branch spark-runner_structured-streaming updated: Add setEnableSparkMetricSinks() method

2019-07-08 Thread aromanenko
This is an automated email from the ASF dual-hosted git repository.

aromanenko pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to 
refs/heads/spark-runner_structured-streaming by this push:
 new d14a29a  Add setEnableSparkMetricSinks() method
d14a29a is described below

commit d14a29ade55bebf476867eefd0283b9a66af00aa
Author: Alexey Romanenko 
AuthorDate: Mon Jul 8 18:02:34 2019 +0200

Add setEnableSparkMetricSinks() method
---
 .../structuredstreaming/SparkStructuredStreamingPipelineOptions.java| 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
index 27743bd..bbf89f6 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
@@ -30,4 +30,6 @@ public interface SparkStructuredStreamingPipelineOptions 
extends SparkCommonPipe
   @Description("Enable/disable sending aggregator values to Spark's metric 
sinks")
   @Default.Boolean(true)
   Boolean getEnableSparkMetricSinks();
+
+  void setEnableSparkMetricSinks(Boolean enableSparkMetricSinks);
 }



[beam] branch master updated (2d5e493 -> 82cfdb8)

2019-07-08 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 2d5e493  Merge pull request #8847: [BEAM-7550] Missing pipeline 
parameters in ParDo Load Test
 add 279f2d4  [BEAM-5605] Update Beam Java SDK backlog to track latest 
changes in Beam Python SDK.
 new 82cfdb8  [BEAM-5605] Update Beam Java SDK backlog to track latest 
changes in Beam Python SDK.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../fn-execution/src/main/proto/beam_fn_api.proto  |   34 +-
 .../beam/model/fnexecution_v1/beam_fn_api.pb.go|  678 ---
 .../beam/model/pipeline_v1/beam_runner_api.pb.go   | 1241 +++-
 .../java/org/apache/beam/sdk/transforms/DoFn.java  |   71 ++
 .../sdk/transforms/splittabledofn/Backlog.java |   90 --
 .../sdk/transforms/splittabledofn/Backlogs.java|   58 -
 .../splittabledofn/ByteKeyRangeTracker.java|   15 +-
 .../splittabledofn/OffsetRangeTracker.java |9 +-
 .../beam/sdk/transforms/splittabledofn/Sizes.java  |   57 +
 .../splittabledofn/ByteKeyRangeTrackerTest.java|   17 +-
 .../splittabledofn/OffsetRangeTrackerTest.java |   13 +-
 .../sdk/fn/splittabledofn/RestrictionTrackers.java |   41 +-
 .../fn/splittabledofn/RestrictionTrackersTest.java |   52 +-
 13 files changed, 1075 insertions(+), 1301 deletions(-)
 delete mode 100644 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/Backlog.java
 delete mode 100644 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/Backlogs.java
 create mode 100644 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/Sizes.java



[beam] 01/01: [BEAM-5605] Update Beam Java SDK backlog to track latest changes in Beam Python SDK.

2019-07-08 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 82cfdb805e4a530443f450f7262ff344d5cc7ede
Merge: 2d5e493 279f2d4
Author: Lukasz Cwik 
AuthorDate: Mon Jul 8 13:30:38 2019 -0700

[BEAM-5605] Update Beam Java SDK backlog to track latest changes in Beam 
Python SDK.

 .../fn-execution/src/main/proto/beam_fn_api.proto  |   34 +-
 .../beam/model/fnexecution_v1/beam_fn_api.pb.go|  678 ---
 .../beam/model/pipeline_v1/beam_runner_api.pb.go   | 1241 +++-
 .../java/org/apache/beam/sdk/transforms/DoFn.java  |   71 ++
 .../sdk/transforms/splittabledofn/Backlog.java |   90 --
 .../sdk/transforms/splittabledofn/Backlogs.java|   58 -
 .../splittabledofn/ByteKeyRangeTracker.java|   15 +-
 .../splittabledofn/OffsetRangeTracker.java |9 +-
 .../beam/sdk/transforms/splittabledofn/Sizes.java  |   57 +
 .../splittabledofn/ByteKeyRangeTrackerTest.java|   17 +-
 .../splittabledofn/OffsetRangeTrackerTest.java |   13 +-
 .../sdk/fn/splittabledofn/RestrictionTrackers.java |   41 +-
 .../fn/splittabledofn/RestrictionTrackersTest.java |   52 +-
 13 files changed, 1075 insertions(+), 1301 deletions(-)



[beam] branch master updated: Fix RestrictionTracker docstring

2019-07-08 Thread boyuanz
This is an automated email from the ASF dual-hosted git repository.

boyuanz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 3772902  Fix RestrictionTracker docstring
 new 69bb363  Merge pull request #8999 from boyuanzz/doc_fix
3772902 is described below

commit 3772902178da3a1178ce53af68594b6e622f1af3
Author: Boyuan Zhang 
AuthorDate: Wed Jul 3 10:54:02 2019 -0700

Fix RestrictionTracker docstring
---
 sdks/python/apache_beam/io/iobase.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sdks/python/apache_beam/io/iobase.py 
b/sdks/python/apache_beam/io/iobase.py
index c7cb987..79c3706 100644
--- a/sdks/python/apache_beam/io/iobase.py
+++ b/sdks/python/apache_beam/io/iobase.py
@@ -436,7 +436,7 @@ class RangeTracker(object):
 
 Returns:
   the approximate fraction of positions that have been consumed by
-  successful 'try_split()' and  'report_current_position()'  calls, or
+  successful 'try_split()' and  'try_claim()'  calls, or
   0.0 if no such calls have happened.
 """
 raise NotImplementedError