[beam] branch spark-runner_structured-streaming updated (239bc93 -> 7356ec2)

echauchot Mon, 08 Jul 2019 05:13:46 -0700

This is an automated email from the ASF dual-hosted git repository.

echauchot pushed a change to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git.



 discard 239bc93  Add setEnableSparkMetricSinks()
 discard db96d6a  Add missing dependencies to run Spark Structured Streaming 
Runner on Nexmark
 discard 59cc5dd  Add metrics support in DoFn
 discard fc83520  Ignore for now not working test testCombineGlobally
 discard 59b9e3f  Add a test that combine per key preserves windowing
 discard 618cc5c  Clean groupByKeyTest
 discard 0347a07  add comment in combine globally test
 discard c5aad8f  Fixed immutable list bug
 discard e741781  Fix javadoc of AggregatorCombiner
 discard 85e6df4  Clean not more needed WindowingHelpers
 discard 6add5fe  Clean not more needed RowHelpers
 discard df16763  Clean no more needed KVHelpers
 discard bf13e07  Now that there is only Combine.PerKey translation, make only 
one Aggregator
 discard e80c908  Remove CombineGlobally translation because it is less 
performant than the beam sdk one (key + combinePerKey.withHotkeyFanout)
 discard 37cdd7e  Remove the mapPartition that adds a key per partition because 
otherwise spark will reduce values per key instead of globally
 discard 38e9eae  Fix bug in the window merging logic
 discard dcb3949  Fix wrong encoder in combineGlobally GBK
 discard 11c3792  Fix case when a window does not merge into any other window
 discard 884e5f90 Apply a groupByKey avoids for some reason that the spark 
structured streaming fmwk casts data to Row which makes it impossible to 
deserialize without the coder shipped into the data. For performance reasons 
(avoid memory consumption and having to deserialize), we do not ship coder + 
data. Also add a mapparitions before GBK to avoid shuffling
 discard f0522dc  [to remove] temporary: revert extractKey while combinePerKey 
is not done (so that it compiles)
 discard 9a269ef  Fix encoder in combine call
 discard 8f8bae4  Implement merge accumulators part of CombineGlobally 
translation with windowing
 discard 4602f83  Output data after combine
 discard bba08b4  Implement reduce part of CombineGlobally translation with 
windowing
 discard 8d05d46  Fix comment about schemas
 discard 8a4372d  Update KVHelpers.extractKey() to deal with WindowedValue and 
update GBK and CPK
 discard 8cdd143  Add TODO in Combine translations
 discard f86660f  Add a test that GBK preserves windowing
 discard d0f5806  fixup Enable UsesBoundedSplittableParDo category of tests 
Since the default overwrite is already in place and it is breaking only in one 
subcase
 discard 09e6207  Improve visibility of debug messages
 discard c947455  re-enable reduceFnRunner timers for output
 discard 89df2bf  Re-code GroupByKeyTranslatorBatch to conserve windowing 
instead of unwindowing/windowing(GlobalWindow): simplify code, use 
ReduceFnRunner to merge the windows
 discard 6ba3d1c  Add comment about checkpoint mark
 discard 982197c  Update windowAssignTest
 discard 6ae2056  Put back batch/simpleSourceTest.testBoundedSource
 discard 728aa1f  Consider null object case on RowHelpers, fixes empty side 
inputs tests.
 discard 1d9155d  fixup hadoop-format is not mandataory to run ValidatesRunner 
tests
 discard be79a86  Enable UsesSchema tests on ValidatesRunner
 discard ad22b68  Pass transform based doFnSchemaInformation in ParDo 
translation
 discard 5cf3e1b  fixup Enable UsesFailureMessage category of tests
 discard 017497c  Fixes ParDo not calling setup and not tearing down if 
exception on startBundle
 discard d08f110  Limit the number of partitions to make tests go 300% faster
 discard 85a9589  Add Batch Validates Runner tests for Structured Streaming 
Runner
 discard f2f953a  Apply Spotless
 discard bd06201  Update javadoc
 discard f2cac46  implement source.stop
 discard 4786cf1  Ignore spark offsets (cf javadoc)
 discard 496d660  Use PAssert in Spark Structured Streaming transform tests
 discard e542bd3  Rename SparkPipelineResult to 
SparkStructuredStreamingPipelineResult This is done to avoid an eventual 
collision with the one in SparkRunner. However this cannot happen at this 
moment because it is package private, so it is also done for consistency.
 discard 1081207  Add SparkStructuredStreamingPipelineOptions and 
SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added 
to have the new   runner rely only on its specific options.
 discard 182f2d9  Fix logging levels in Spark Structured Streaming translation
 discard 40f2135  Fix spotless issues after rebase
 discard 0262dab  Add doFnSchemaInformation to ParDo batch translation
 discard e1c8e96  Fix non-vendored imports from Spark Streaming Runner classes
 discard 2605bae  Remove specific PipelineOotions and Registrar for Structured 
Streaming Runner
 discard f12ec6b  Rename Runner to SparkStructuredStreamingRunner
 discard dc756a6  Remove spark-structured-streaming module
 discard e76b88b  Merge Spark Structured Streaming runner into main Spark module
 discard 17aa0f7  Fix access level issues, typos and modernize code to Java 8 
style
 discard 385f1a6  Disable never ending test SimpleSourceTest.testUnboundedSource
 discard ec633d1  Fix spotbogs warnings
 discard be0e9c3  Fix invalid code style with spotless
 discard 4ba0af3  Deal with checkpoint and offset based read
 discard 33b310cc Fllow up on offsets for streaming source
 discard 8a0def6  Clean streaming source
 discard f1340fd  Remove unneeded 0 arg constructor in batch source
 discard 482bc7f  Clean unneeded 0 arg constructor in batch source
 discard b27b982  Specify checkpointLocation at the pipeline start
 discard 0204d52  Add source streaming test
 discard a0f25eb  Add transformators registry in PipelineTranslatorStreaming
 discard 24f45d9  Add a TODO on spark output modes
 discard 5e857be  Start streaming source
 discard 2edf26e  Add streaming source initialisation
 discard cdfffaf  And unchecked warning suppression
 discard 3a83862  Added TODO comment for ReshuffleTranslatorBatch
 discard b0a4272  Added using CachedSideInputReader
 discard 303b6a6  Don't use Reshuffle translation
 discard 7d87b0a  Fix CheckStyle violations
 discard 0e1ac77  Added SideInput support
 discard 40ab0cb  Temporary fix remove fail on warnings
 discard 9493bca  Fix build wordcount: to squash
 discard f7f0da9  Fix javadoc
 discard dccf33a  Implement WindowAssignTest
 discard bbb881a  Implement WindowAssignTranslatorBatch
 discard fcb2746  Cleaning
 discard 056359b  [TO UPGRADE WITH THE 2 SPARK RUNNERS BEFORE MERGE] Change de 
wordcount build to test on new spark runner
 discard a4591c4  Fix encoder bug in combinePerkey
 discard c844fda  Add explanation about receiving a Row as input in the combiner
 discard 32ae8dc  Use more generic Row instead of GenericRowWithSchema
 discard ae07b1a  Fix combine. For unknown reason GenericRowWithSchema is used 
as input of combine so extract its content to be able to proceed
 discard 06b4632  Update test with Long
 discard 7b3dfae  Fix various type checking issues in Combine.Globally
 discard a936327  Get back to classes in translators resolution because URNs 
cannot translate Combine.Globally
 discard 42414191 Cleaning
 discard d1e7eea  Add CombineGlobally translation to avoid translating 
Combine.perKey as a composite transform based on Combine.PerKey (which uses low 
perf GBK)
 discard 86c421b  Introduce RowHelpers
 discard 8889f71  Add combinePerKey and CombineGlobally tests
 discard c829100  Fix combiner using KV as input, use binary encoders in place 
of accumulatorEncoder and outputEncoder, use helpers, spotless
 discard 9ed9d5b  Introduce WindowingHelpers (and helpers package) and use it 
in Pardo, GBK and CombinePerKey
 discard 11a87c8  Improve type checking of Tuple2 encoder
 discard c4e008d  First version of combinePerKey
 discard b37ac03  Extract binary schema creation in a helper class
 discard f361a7e  Fix getSideInputs
 discard a6e4b37  Generalize the use of SerializablePipelineOptions in place of 
(not serializable) PipelineOptions
 discard 205f96d  Rename SparkDoFnFilterFunction to DoFnFilterFunction for 
consistency
 discard 8e7fc66  Add a test for the most simple possible Combine
 discard 01cfa11  Added "testTwoPardoInRow"
 discard 7f22722  Fix for test elements container in GroupByKeyTest
 discard 07852b1  Rename SparkOutputManager for consistency
 discard c5b4593  Fix kryo issue in GBK translator with a workaround
 discard 6abc251  Simplify logic of ParDo translator
 discard c9ad8d9  Don't use deprecated sideInput.getWindowingStrategyInternal()
 discard e8d6b77  Rename pruneOutput() to pruneOutputFilteredByTag()
 discard 112704f  Rename SparkSideInputReader class
 discard e730fa8  Fixed Javadoc error
 discard 7951f7b  Apply spotless
 discard 67c7d19  Fix Encoders: create an Encoder for every manipulated type to 
avoid Spark fallback to genericRowWithSchema and cast to allow having Encoders 
with generic types such as WindowedValue<T> and get the type checking back
 discard 76fd11c  Fail in case of having SideInouts or State/Timers
 discard 09a5ba2  Add ComplexSourceTest
 discard 8b1b17e  Remove no more needed putDatasetRaw
 discard 1faf89b  Port latest changes of ReadSourceTranslatorBatch to 
ReadSourceTranslatorStreaming
 discard 720681f  Fix type checking with Encoder of WindowedValue<T>
 discard 62d4f5d  Add comments and TODO to GroupByKeyTranslatorBatch
 discard 562ebfa  Add GroupByKeyTest
 discard 45ef1af  Clean
 discard 92b991a  Address minor review notes
 discard e2fca21  Add ParDoTest
 discard f896c3b  Cleaning
 discard c89ac48  Fix split bug
 discard eb76dce  Remove bundleSize parameter and always use spark default 
parallelism
 discard f70bb56  Cleaning
 discard 5e185b3  Fix testMode output to comply with new binary schema
 discard 63053ec  Fix errorprone
 discard 418c61c  Comment schema choices
 discard 253e350  Serialize windowedValue to byte[] in source to be able to 
specify a binary dataset schema and deserialize windowedValue from Row to get a 
dataset<WindowedValue>
 discard df5409e  First attempt for ParDo primitive implementation
 discard 2255618  Add flatten test
 discard b3f018d  Enable gradle build scan
 discard d208890  Enable test mode
 discard 59777ba  Put all transform translators Serializable
 discard 8a4b4af  Simplify beam reader creation as it created once the source 
as already been partitioned
 discard ac00f29  Fix SourceTest
 discard 53b9370  Move SourceTest to same package as tested class
 discard cc0ab21  Add serialization test
 discard fd63639  Fix SerializationDebugger
 discard e97b5be  Add SerializationDebugger
 discard 5835df2  Fix serialization issues
 discard ed569f5  Cleaning unneeded fields in DatasetReader
 discard fcf47dd  improve readability of options passing to the source
 discard a83aa58  Fix pipeline triggering: use a spark action instead of 
writing the dataset
 discard bed6a02  Deactivate deps resolution forcing that prevent using correct 
spark transitive dep
 discard 79640b0  Refactor SourceTest to a UTest instaed of a main
 discard 8096514  Checkstyle and Findbugs
 discard 0518568  Clean
 discard 92c4cf2  Add empty 0-arg constructor for mock source
 discard 6f33718  Add a dummy schema for reader
 discard 5da4168  Fix checkstyle
 discard a3f5ca1  Use new PipelineOptionsSerializationUtils
 discard b048aa7  Apply spotless
 discard 90d9426  Add missing 0-arg public constructor
 discard fc8d7f4  Wire real SourceTransform and not mock and update the test
 discard 2707178  Refactor DatasetSource fields
 discard 3c7688f  Pass Beam Source and PipelineOptions to the spark DataSource 
as serialized strings
 discard c9c2b3b  Move Source and translator mocks to a mock package.
 discard 78db97c  Add ReadSourceTranslatorStreaming
 discard d56ef09  Cleaning
 discard 46c60cd  Use raw Encoder<WindowedValue> also in regular 
ReadSourceTranslatorBatch
 discard c706109  Split batch and streaming sources and translators
 discard 37b0a2a  Run pipeline in batch mode or in streaming mode
 discard f0211b5  Move DatasetSourceMock to proper batch mode
 discard 4b40103  clean deps
 discard 6724b88  Use raw WindowedValue so that spark Encoders could work 
(temporary)
 discard 89d910e  fix mock, wire mock in translators and create a main test.
 discard 590fe0a  Add source mocks
 discard f3d50d4  Experiment over using spark Catalog to pass in Beam Source 
through spark Table
 discard 90c92af  Improve type enforcement in ReadSourceTranslator
 discard 50088ea  Improve exception flow
 discard 8bdcd4c  start source instanciation
 discard 2d8db45  Apply spotless
 discard 261bb7c  update TODO
 discard 9e33e38  Implement read transform
 discard df0a396  Use Iterators.transform() to return Iterable
 discard e5b3b64  Add primitive GroupByKeyTranslatorBatch implementation
 discard 9c3b75a  Add Flatten transformation translator
 discard 356a830  Create Datasets manipulation methods
 discard 073cf1e  Create PCollections manipulation methods
 discard aa1234e  Add basic pipeline execution. Refactor translatePipeline() to 
return the translationContext on which we can run startPipeline()
 discard 27c7fd8  Added SparkRunnerRegistrar
 discard e4a334b  Add precise TODO for multiple TransformTranslator per 
transform URN
 discard c084b60  Post-pone batch qualifier in all classes names for readability
 discard dce77db  Add TODOs
 discard 9351397  Make codestyle and firebug happy
 discard 22c8338  apply spotless for e-formatting
     new af2e92b  apply spotless
     new 3504543  Make codestyle and firebug happy
     new 419211a  Add TODOs
     new 60bc61b  Post-pone batch qualifier in all classes names for readability
     new d23f4fd  Add precise TODO for multiple TransformTranslator per 
transform URN
     new 52da1ea  Added SparkRunnerRegistrar
     new e3ac83a  Add basic pipeline execution. Refactor translatePipeline() to 
return the translationContext on which we can run startPipeline()
     new 7d7dcc6  Create PCollections manipulation methods
     new 6ff5192  Create Datasets manipulation methods
     new b045f0d  Add Flatten transformation translator
     new 3491edf  Add primitive GroupByKeyTranslatorBatch implementation
     new 6a9021d  Use Iterators.transform() to return Iterable
     new dfcfa6a  Implement read transform
     new 039976a  update TODO
     new 4b55ed6  Apply spotless
     new 9b3a72e  start source instanciation
     new aec130e  Improve exception flow
     new 6cfdf31  Improve type enforcement in ReadSourceTranslator
     new bc6b4e6  Experiment over using spark Catalog to pass in Beam Source 
through spark Table
     new 8a0343a  Add source mocks
     new f99d1d1  fix mock, wire mock in translators and create a main test.
     new b5cc52a  Use raw WindowedValue so that spark Encoders could work 
(temporary)
     new 80b91bd  clean deps
     new a0243bf  Move DatasetSourceMock to proper batch mode
     new 0b9875f  Run pipeline in batch mode or in streaming mode
     new 772efc4  Split batch and streaming sources and translators
     new 98d6e1e  Use raw Encoder<WindowedValue> also in regular 
ReadSourceTranslatorBatch
     new 2dca3c2  Clean
     new 5559890  Add ReadSourceTranslatorStreaming
     new 8bfa1a5  Move Source and translator mocks to a mock package.
     new a96039e  Pass Beam Source and PipelineOptions to the spark DataSource 
as serialized strings
     new 0c64b46  Refactor DatasetSource fields
     new 84f1963  Wire real SourceTransform and not mock and update the test
     new af3dfad  Add missing 0-arg public constructor
     new c38d1d2  Use new PipelineOptionsSerializationUtils
     new 73d8c71  Apply spotless and fix  checkstyle
     new 0bdbf93  Add a dummy schema for reader
     new 2bc0d23  Add empty 0-arg constructor for mock source
     new b9acd02  Clean
     new ab70e9e  Checkstyle and Findbugs
     new 4f6d5b9  Refactor SourceTest to a UTest instaed of a main
     new 6173a33  Deactivate deps resolution forcing that prevent using correct 
spark transitive dep
     new 38b1514  Fix pipeline triggering: use a spark action instead of 
writing the dataset
     new 3b868db  improve readability of options passing to the source
     new 0cfde2b  Clean unneeded fields in DatasetReader
     new ec13b5b  Fix serialization issues
     new 8ba03d6f Add SerializationDebugger
     new 4d18207  Add serialization test
     new 155a65b  Move SourceTest to same package as tested class
     new b6a6912  Fix SourceTest
     new b39024a  Simplify beam reader creation as it created once the source 
as already been partitioned
     new 33754a8  Put all transform translators Serializable
     new 7c4ff5c  Enable test mode
     new 21c426e  Enable gradle build scan
     new 5067ca1  Add flatten test
     new 000381c  First attempt for ParDo primitive implementation
     new d9d424f  Serialize windowedValue to byte[] in source to be able to 
specify a binary dataset schema and deserialize windowedValue from Row to get a 
dataset<WindowedValue>
     new 80aa7ea  Comment schema choices
     new 355fa9e  Fix errorprone
     new 152d858  Fix testMode output to comply with new binary schema
     new 53ddaa0  Cleaning
     new 6b87e80  Remove bundleSize parameter and always use spark default 
parallelism
     new bab98f1  Fix split bug
     new afc53ee  Clean
     new 32bf4da  Add ParDoTest
     new cd58952  Address minor review notes
     new 8e999c9  Clean
     new df89095  Add GroupByKeyTest
     new 6eb2ff1  Add comments and TODO to GroupByKeyTranslatorBatch
     new 55a6419  Fix type checking with Encoder of WindowedValue<T>
     new 4f55612  Port latest changes of ReadSourceTranslatorBatch to 
ReadSourceTranslatorStreaming
     new e4948df  Remove no more needed putDatasetRaw
     new d3c6e42  Add ComplexSourceTest
     new 24a0f7d  Fail in case of having SideInouts or State/Timers
     new fe00df0  Fix Encoders: create an Encoder for every manipulated type to 
avoid Spark fallback to genericRowWithSchema and cast to allow having Encoders 
with generic types such as WindowedValue<T> and get the type checking back
     new 6efe901  Apply spotless
     new 8407488  Fixed Javadoc error
     new 1f6c3b7  Rename SparkSideInputReader class and rename pruneOutput() to 
pruneOutputFilteredByTag()
     new c8cefc8  Don't use deprecated sideInput.getWindowingStrategyInternal()
     new 72bf4d3  Simplify logic of ParDo translator
     new 106e54e  Fix kryo issue in GBK translator with a workaround
     new 8ede26e  Rename SparkOutputManager for consistency
     new 2b32d52  Fix for test elements container in GroupByKeyTest
     new d7104ee  Added "testTwoPardoInRow"
     new 3bbb0d1  Add a test for the most simple possible Combine
     new a33477f  Rename SparkDoFnFilterFunction to DoFnFilterFunction for 
consistency
     new 28d0be3  Generalize the use of SerializablePipelineOptions in place of 
(not serializable) PipelineOptions
     new 05a1a11  Fix getSideInputs
     new c7256da  Extract binary schema creation in a helper class
     new d1bafd1  First version of combinePerKey
     new 9407138  Improve type checking of Tuple2 encoder
     new 6da1c02  Introduce WindowingHelpers (and helpers package) and use it 
in Pardo, GBK and CombinePerKey
     new 9b3c604  Fix combiner using KV as input, use binary encoders in place 
of accumulatorEncoder and outputEncoder, use helpers, spotless
     new 11f6322  Add combinePerKey and CombineGlobally tests
     new 7629b22  Introduce RowHelpers
     new 0c48a94  Add CombineGlobally translation to avoid translating 
Combine.perKey as a composite transform based on Combine.PerKey (which uses low 
perf GBK)
     new b390509  Cleaning
     new 571b3e9  Get back to classes in translators resolution because URNs 
cannot translate Combine.Globally
     new 3ede119  Fix various type checking issues in Combine.Globally
     new ac7f271  Update test with Long
     new c453b90  Fix combine. For unknown reason GenericRowWithSchema is used 
as input of combine so extract its content to be able to proceed
     new 5d86507  Use more generic Row instead of GenericRowWithSchema
     new 0bd6370  Add explanation about receiving a Row as input in the combiner
     new 2a8d92e  Fix encoder bug in combinePerkey
     new 976783c  [TO UPGRADE WITH THE 2 SPARK RUNNERS BEFORE MERGE] Change de 
wordcount build to test on new spark runner
     new e712187  Cleaning
     new dafe5c4  Implement WindowAssignTranslatorBatch
     new 07bceca  Implement WindowAssignTest
     new f7f3b54  Fix javadoc
     new 8e029bd  Fix build wordcount:disable deps versions forcing, remove 
fail on warnings
     new 2128544  Added SideInput support
     new d705c22  Fix CheckStyle violations
     new d629b1c  Don't use Reshuffle translation
     new 519c9b2  Added using CachedSideInputReader
     new 9ff127d  Added TODO comment for ReshuffleTranslatorBatch
     new 8bd0f66  And unchecked warning suppression
     new cb261f2  Add streaming source initialisation
     new 9803c07  Implement first streaming source
     new 10fac36  Add a TODO on spark output modes
     new c46ce22  Add transformators registry in PipelineTranslatorStreaming
     new a14698f  Add source streaming test
     new e0ecd02  Specify checkpointLocation at the pipeline start
     new 4ef8635  Clean unneeded 0 arg constructor in batch source
     new b9af1b5  Clean streaming source
     new 559e80f  Continue impl of offsets for streaming source
     new 1bc4d75  Deal with checkpoint and offset based read
     new 85635ff  Apply spotless and fix spotbugs warnings
     new df02413  Disable never ending test SimpleSourceTest.testUnboundedSource
     new 51630e2  Fix access level issues, typos and modernize code to Java 8 
style
     new 07b01b1  Merge Spark Structured Streaming runner into main Spark 
module. Remove spark-structured-streaming module.Rename Runner to 
SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar 
for Structured Streaming Runner
     new f36c3b8  Fix non-vendored imports from Spark Streaming Runner classes
     new 8602eaa  Pass doFnSchemaInformation to ParDo batch translation
     new 1d48ae5  Fix spotless issues after rebase
     new 45781a8  Fix logging levels in Spark Structured Streaming translation
     new 4f0e0e6  Add SparkStructuredStreamingPipelineOptions and 
SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added 
to have the new   runner rely only on its specific options.
     new 9d6caa1  Rename SparkPipelineResult to 
SparkStructuredStreamingPipelineResult This is done to avoid an eventual 
collision with the one in SparkRunner. However this cannot happen at this 
moment because it is package private, so it is also done for consistency.
     new 4b568da  Use PAssert in Spark Structured Streaming transform tests
     new e04d455  Ignore spark offsets (cf javadoc)
     new 82ed0fc  implement source.stop
     new a75e88a  Update javadoc
     new 72f18c7  Apply Spotless
     new a9bdab6  Add Batch Validates Runner tests for Structured Streaming 
Runner
     new 20d5a68  Limit the number of partitions to make tests go 300% faster
     new d551903  Fixes ParDo not calling setup and not tearing down if 
exception on startBundle
     new 785c0d9  fixup Enable UsesFailureMessage and UsesSchema categories on 
validatesRunner tests
     new e2eb99a  Pass transform based doFnSchemaInformation in ParDo 
translation
     new b0f9afd  fixup hadoop-format is not mandataory to run ValidatesRunner 
tests
     new a5ca4df  Consider null object case on RowHelpers, fixes empty side 
inputs tests.
     new 8ee2e60  Put back batch/simpleSourceTest.testBoundedSource
     new f0c415c  Update windowAssignTest
     new ff937d9  Add comment about checkpoint mark
     new d2cebcc  Re-code GroupByKeyTranslatorBatch to conserve windowing 
instead of unwindowing/windowing(GlobalWindow): simplify code, use 
ReduceFnRunner to merge the windows
     new 32756ef  re-enable reduceFnRunner timers for output
     new 4d75434  Improve visibility of debug messages
     new 8dad2a7  fixup Enable UsesBoundedSplittableParDo category of tests 
Since the default overwrite is already in place and it is breaking only in one 
subcase
     new 0c30d20  Add a test that GBK preserves windowing
     new 2c16deb  Add TODO in Combine translations
     new 6a6b959  Update KVHelpers.extractKey() to deal with WindowedValue and 
update GBK and CPK
     new 50dac73  Fix comment about schemas
     new dea9155  Implement reduce part of CombineGlobally translation with 
windowing
     new 49293d7  Output data after combine
     new a269fa26 Implement merge accumulators part of CombineGlobally 
translation with windowing
     new c4e3f44  Fix encoder in combine call
     new f4f186b  Revert extractKey while combinePerKey is not done (so that it 
compiles)
     new a210466  Apply a groupByKey avoids for some reason that the spark 
structured streaming fmwk casts data to Row which makes it impossible to 
deserialize without the coder shipped into the data. For performance reasons 
(avoid memory consumption and having to deserialize), we do not ship coder + 
data. Also add a mapparitions before GBK to avoid shuffling
     new ac06721  Fix case when a window does not merge into any other window
     new 5a5ca48  Fix wrong encoder in combineGlobally GBK
     new aa5543f  Fix bug in the window merging logic
     new d869c50  Remove the mapPartition that adds a key per partition because 
otherwise spark will reduce values per key instead of globally
     new 8569c9d  Remove CombineGlobally translation because it is less 
performant than the beam sdk one (key + combinePerKey.withHotkeyFanout)
     new 1c28c63  Now that there is only Combine.PerKey translation, make only 
one Aggregator
     new 930974f  Clean no more needed KVHelpers
     new a0cd463  Clean not more needed RowHelpers
     new 319f4c9  Clean not more needed WindowingHelpers
     new 4110d8f  Fix javadoc of AggregatorCombiner
     new 19779af  Fixed immutable list bug
     new 41a2027  add comment in combine globally test
     new 418df85  Clean groupByKeyTest
     new 996a296  Add a test that combine per key preserves windowing
     new 7803025  Ignore for now not working test testCombineGlobally
     new d5f322b  Add metrics support in DoFn
     new 7356ec2  Add missing dependencies to run Spark Structured Streaming 
Runner on Nexmark

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (239bc93)
            \
             N -- N -- N   refs/heads/spark-runner_structured-streaming 
(7356ec2)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 182 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../structuredstreaming/SparkStructuredStreamingPipelineOptions.java    | 2 --
 1 file changed, 2 deletions(-)

[beam] branch spark-runner_structured-streaming updated (239bc93 -> 7356ec2)

Reply via email to