This is an automated email from the ASF dual-hosted git repository. echauchot pushed a change to branch spark-runner_structured-streaming in repository https://gitbox.apache.org/repos/asf/beam.git.
discard 620a27a Remove Encoders based on kryo now that we call Beam coders in the runner discard 824b344 Fix: Remove generic hack of using object. Use actual Coder encodedType in Encoders discard 27ef6de Remove unneeded cast discard 6a27839 Use beam encoders also in the output of the source translation discard 62a87b6 Apply spotless, fix typo and javadoc discard c5e78a0 Apply new Encoders to Pardo. Replace Tuple2Coder with MultiOutputCoder to deal with multiple output to use in Spark Encoder for DoFnRunner discard 039f58a Apply new Encoders to GroupByKey discard 21accab Create a Tuple2Coder to encode scala tuple2 discard 29f7e93 Apply new Encoders to AggregatorCombiner discard 7f1060a Apply new Encoders to Window assign translation discard c48d032 Ignore long time failing test: SparkMetricsSinkTest discard 68d3d67 Improve performance of source: the mapper already calls windowedValueCoder.decode, no need to call it also in the Spark encoder discard 3cc256e Apply new Encoders to Read source discard 7d456b4 Apply new Encoders to CombinePerKey discard c33fdda Catch Exception instead of IOException because some coders to not throw Exceptions at all (e.g.VoidCoder) discard c8bfcf3 Put Encoders expressions serializable discard 72c267c Wrap exceptions in UserCoderExceptions discard c6f2ac9 Apply spotless and checkstyle and add javadocs discard 78b2d22 Add an assert of equality in the encoders test discard 34e8aa8 Fix generated code: uniform exceptions catching, fix parenthesis and variable declarations discard f48067b Fix equal and hashcode discard ca01777 Remove example code discard 50060a8 Remove lazy init of beam coder because there is no generic way on instanciating a beam coder discard 0cf2c87 Fix beam coder lazy init using reflexion: use .clas + try catch + cast discard d7c9a4a Fix getting the output value in code generation discard 8b07ec8 Fix ExpressionEncoder generated code: typos, try catch, fqcn discard fdba22d Fix warning in coder construction by reflexion discard e6b68a8 Lazy init coder because coder instance cannot be interpolated by catalyst discard d5645ff Fix code generation in Beam coder wrapper discard e4478ff Add a simple spark native test to test Beam coders wrapping into Spark Encoders discard 2aaf07a Conform to spark ExpressionEncoders: pass classTags, implement scala Product, pass children from within the ExpressionEncoder, fix visibilities discard fff5092 Fix scala Product in Encoders to avoid StackEverflow discard c9e3534 type erasure: spark encoders require a Class<T>, pass Object and cast to Class<T> discard 5fa6331 Wrap Beam Coders into Spark Encoders using ExpressionEncoder: deserialization part discard a5c7da3 Wrap Beam Coders into Spark Encoders using ExpressionEncoder: serialization part discard 20d5bbd Use "sparkMaster" in local mode to obtain number of shuffle partitions + spotless apply discard 22d6466 Improve Pardo translation performance: avoid calling a filter transform when there is only one output tag discard 93d425a After testing performance and correctness, launch pipeline with dataset.foreach(). Make both test mode and production mode use foreach for uniformity. Move dataset print as a utility method discard 61f487f Remove no more needed AggregatorCombinerPerKey (there is only AggregatorCombiner) discard f8a5046 fixup! Add PipelineResults to Spark structured streaming. discard 8aafd50 Print number of leaf datasets discard ec43374 Add spark execution plans extended debug messages. discard 3b15128 Update log4j configuration discard cde225a Add PipelineResults to Spark structured streaming. discard 0e36b19 Make spotless happy discard 4aaf456 Added metrics sinks and tests discard dc939c8 Persist all output Dataset if there are multiple outputs in pipeline Enabled Use*Metrics tests discard dab3c2e Add a test to check that CombineGlobally preserves windowing discard 6e9ccdd Fix accumulators initialization in Combine that prevented CombineGlobally to work. discard a797884 Fix javadoc discard 476bc20 Add setEnableSparkMetricSinks() method discard 51ca79a Add missing dependencies to run Spark Structured Streaming Runner on Nexmark discard 5ed3e03 Add metrics support in DoFn discard d29c64e Ignore for now not working test testCombineGlobally discard ff50ccb Add a test that combine per key preserves windowing discard 6638522 Clean groupByKeyTest discard 8c499a5 add comment in combine globally test discard f68ed7a Fixed immutable list bug discard 9601649 Fix javadoc of AggregatorCombiner discard 7784f30 Clean not more needed WindowingHelpers discard 3bd95df Clean not more needed RowHelpers discard 4b5af9c Clean no more needed KVHelpers discard 569a9cb Now that there is only Combine.PerKey translation, make only one Aggregator discard ad0c179 Remove CombineGlobally translation because it is less performant than the beam sdk one (key + combinePerKey.withHotkeyFanout) discard 7b6c914 Remove the mapPartition that adds a key per partition because otherwise spark will reduce values per key instead of globally discard 1589877 Fix bug in the window merging logic discard 4e34632 Fix wrong encoder in combineGlobally GBK discard f00e7d5 Fix case when a window does not merge into any other window discard f36e5c3 Apply a groupByKey avoids for some reason that the spark structured streaming fmwk casts data to Row which makes it impossible to deserialize without the coder shipped into the data. For performance reasons (avoid memory consumption and having to deserialize), we do not ship coder + data. Also add a mapparitions before GBK to avoid shuffling discard 4f4744a Revert extractKey while combinePerKey is not done (so that it compiles) discard 6de9acf Fix encoder in combine call discard 70e3d66 Implement merge accumulators part of CombineGlobally translation with windowing discard 28ba572 Output data after combine discard 960d245 Implement reduce part of CombineGlobally translation with windowing discard 595d9eb Fix comment about schemas discard edaa37f Update KVHelpers.extractKey() to deal with WindowedValue and update GBK and CPK discard 5dc8c24 Add TODO in Combine translations discard 14c703f Add a test that GBK preserves windowing discard 28ee71c Improve visibility of debug messages discard 47b7132 re-enable reduceFnRunner timers for output discard fed93da Re-code GroupByKeyTranslatorBatch to conserve windowing instead of unwindowing/windowing(GlobalWindow): simplify code, use ReduceFnRunner to merge the windows discard c23c07e Add comment about checkpoint mark discard a3e29b4 Update windowAssignTest discard 68e3ae2 Put back batch/simpleSourceTest.testBoundedSource discard cc7a52d Consider null object case on RowHelpers, fixes empty side inputs tests. discard a94045c Pass transform based doFnSchemaInformation in ParDo translation discard 1be0c8a Fixes ParDo not calling setup and not tearing down if exception on startBundle discard 2bfd3d5 Limit the number of partitions to make tests go 300% faster discard c2896b6 Enable batch Validates Runner tests for Structured Streaming Runner discard e4ef07b Apply Spotless discard ab6a879 Update javadoc discard fdcd346 implement source.stop discard d2e65df Ignore spark offsets (cf javadoc) discard d285dd1 Use PAssert in Spark Structured Streaming transform tests discard 9354339 Rename SparkPipelineResult to SparkStructuredStreamingPipelineResult This is done to avoid an eventual collision with the one in SparkRunner. However this cannot happen at this moment because it is package private, so it is also done for consistency. discard fc877cd Add SparkStructuredStreamingPipelineOptions and SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added to have the new runner rely only on its specific options. discard 47376e3 Fix logging levels in Spark Structured Streaming translation discard 03eb450 Fix spotless issues after rebase discard 58f97b8 Pass doFnSchemaInformation to ParDo batch translation discard 2628393 Fix non-vendored imports from Spark Streaming Runner classes discard cb72394 Merge Spark Structured Streaming runner into main Spark module. Remove spark-structured-streaming module.Rename Runner to SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar for Structured Streaming Runner discard c9a8c8c Fix access level issues, typos and modernize code to Java 8 style discard 19e5fdf Disable never ending test SimpleSourceTest.testUnboundedSource discard c68e875 Apply spotless and fix spotbugs warnings discard 79e85ec Deal with checkpoint and offset based read discard b7c68bd Continue impl of offsets for streaming source discard afa6a48 Clean streaming source discard 8caa982 Clean unneeded 0 arg constructor in batch source discard 6e94948 Specify checkpointLocation at the pipeline start discard 4527615 Add source streaming test discard ce46b9b Add transformators registry in PipelineTranslatorStreaming discard cb5dffa Add a TODO on spark output modes discard 4030fb0 Implement first streaming source discard 81c0bbe Add streaming source initialisation discard 2ad1f15 And unchecked warning suppression discard cb1a99c Added TODO comment for ReshuffleTranslatorBatch discard 530dfb0 Added using CachedSideInputReader discard d8ee03e Don't use Reshuffle translation discard c879337 Fix CheckStyle violations discard d759a19 Added SideInput support discard 1355ece Fix javadoc discard 41e6a19 Implement WindowAssignTest discard bf2af77 Implement WindowAssignTranslatorBatch discard 383d58d Cleaning discard 1244549 Fix encoder bug in combinePerkey discard ac67ada Add explanation about receiving a Row as input in the combiner discard a2d1975 Use more generic Row instead of GenericRowWithSchema discard a72afd8 Fix combine. For unknown reason GenericRowWithSchema is used as input of combine so extract its content to be able to proceed discard 00acd7d Update test with Long discard b13839d Fix various type checking issues in Combine.Globally discard 3c25348 Get back to classes in translators resolution because URNs cannot translate Combine.Globally discard 8d0a8b5 Cleaning discard 0a88819 Add CombineGlobally translation to avoid translating Combine.perKey as a composite transform based on Combine.PerKey (which uses low perf GBK) discard f48c109 Introduce RowHelpers discard 2a1d74e Add combinePerKey and CombineGlobally tests discard 684fc4a Fix combiner using KV as input, use binary encoders in place of accumulatorEncoder and outputEncoder, use helpers, spotless discard 72d338b Introduce WindowingHelpers (and helpers package) and use it in Pardo, GBK and CombinePerKey discard 4d3be61 Improve type checking of Tuple2 encoder discard 2c465f8 First version of combinePerKey discard d33508e Extract binary schema creation in a helper class discard bfc10a2 Fix getSideInputs discard 4cbdbb8 Generalize the use of SerializablePipelineOptions in place of (not serializable) PipelineOptions discard 08b580b Rename SparkDoFnFilterFunction to DoFnFilterFunction for consistency discard 674a048 Add a test for the most simple possible Combine discard 3716673 Added "testTwoPardoInRow" discard 2482172 Fix for test elements container in GroupByKeyTest discard bc7fab4 Rename SparkOutputManager for consistency discard 314d935 Fix kryo issue in GBK translator with a workaround discard 3ba4bc6 Simplify logic of ParDo translator discard b7a76d9 Don't use deprecated sideInput.getWindowingStrategyInternal() discard 6d723b1 Rename SparkSideInputReader class and rename pruneOutput() to pruneOutputFilteredByTag() discard 9bee53f Fixed Javadoc error discard 2f37994 Apply spotless discard 7c5b778 Fix Encoders: create an Encoder for every manipulated type to avoid Spark fallback to genericRowWithSchema and cast to allow having Encoders with generic types such as WindowedValue<T> and get the type checking back discard 1b244e9 Fail in case of having SideInouts or State/Timers discard a74d149 Add ComplexSourceTest discard 21902b6 Remove no more needed putDatasetRaw discard a8e50ad Port latest changes of ReadSourceTranslatorBatch to ReadSourceTranslatorStreaming discard 14368ff Fix type checking with Encoder of WindowedValue<T> discard fe7fb4e Add comments and TODO to GroupByKeyTranslatorBatch discard 779e621 Add GroupByKeyTest discard fe19d6c Clean discard ae44706 Address minor review notes discard d4acf25 Add ParDoTest discard f47bb0a Clean discard 1d31831 Fix split bug discard e12226a Remove bundleSize parameter and always use spark default parallelism discard defd5e6 Cleaning discard b4230dc Fix testMode output to comply with new binary schema discard 84bdf70 Fix errorprone discard e4d4b4f Comment schema choices discard 675bf94 Serialize windowedValue to byte[] in source to be able to specify a binary dataset schema and deserialize windowedValue from Row to get a dataset<WindowedValue> discard e0c8fbd First attempt for ParDo primitive implementation discard fc54404 Add flatten test discard 3aea53b Enable gradle build scan discard 3dc4dc3 Enable test mode discard 047af3e Put all transform translators Serializable discard 23ca155 Simplify beam reader creation as it created once the source as already been partitioned discard e54cbc6 Fix SourceTest discard 74693b2 Move SourceTest to same package as tested class discard 00f2e11 Add serialization test discard 1720e6b Add SerializationDebugger discard 625056e Fix serialization issues discard 0e85242 Clean unneeded fields in DatasetReader discard 4936687 improve readability of options passing to the source discard 0ffd98d Fix pipeline triggering: use a spark action instead of writing the dataset discard 02933bd Refactor SourceTest to a UTest instaed of a main discard bb830be Checkstyle and Findbugs discard 037db6e Clean discard d12cc14 Add empty 0-arg constructor for mock source discard 9251dcb Add a dummy schema for reader discard 02458a7 Apply spotless and fix checkstyle discard 101f6f2 Use new PipelineOptionsSerializationUtils discard ca5a120 Add missing 0-arg public constructor discard 2b64bd2 Wire real SourceTransform and not mock and update the test discard ca5f70c Refactor DatasetSource fields discard 9d8dd90 Pass Beam Source and PipelineOptions to the spark DataSource as serialized strings discard 5b0f9a2 Move Source and translator mocks to a mock package. discard 08c05d6 Add ReadSourceTranslatorStreaming discard b617ba4 Clean discard d6e905b Use raw Encoder<WindowedValue> also in regular ReadSourceTranslatorBatch discard 64c8202 Split batch and streaming sources and translators discard f9ed0dd Run pipeline in batch mode or in streaming mode discard 4aa321c Move DatasetSourceMock to proper batch mode discard 44644d3 clean deps discard cb03179 Use raw WindowedValue so that spark Encoders could work (temporary) discard bea28b1 fix mock, wire mock in translators and create a main test. discard eb2fa49 Add source mocks discard b57367f Experiment over using spark Catalog to pass in Beam Source through spark Table discard 6795e4c Improve type enforcement in ReadSourceTranslator discard b8cb742 Improve exception flow discard 3cb2a76 start source instanciation discard 5701c75 Apply spotless discard 238e57a update TODO discard 3ae60a4 Implement read transform discard b0920e3 Use Iterators.transform() to return Iterable discard 1298b77 Add primitive GroupByKeyTranslatorBatch implementation discard 9d721a7 Add Flatten transformation translator discard 8c1a012 Create Datasets manipulation methods discard b458c64 Create PCollections manipulation methods discard d75ac4a Add basic pipeline execution. Refactor translatePipeline() to return the translationContext on which we can run startPipeline() discard cc07798 Added SparkRunnerRegistrar discard cfd3abf Add precise TODO for multiple TransformTranslator per transform URN discard 7612d17 Post-pone batch qualifier in all classes names for readability discard 1c7d381 Add TODOs discard bf5619d Make codestyle and firebug happy discard ff06081 apply spotless discard 6ede65c Move common translation context components to superclass discard 8875d48d Move SparkTransformOverrides to correct package discard 83f44e0 Improve javadocs discard ae2392e Make transform translation clearer: renaming, comments discard 31af0f1 Refactoring: -move batch/streaming common translation visitor and utility methods to PipelineTranslator -rename batch dedicated classes to Batch* to differentiate with their streaming counterparts -Introduce TranslationContext for common batch/streaming components discard 8abe116 Initialise BatchTranslationContext discard 6f1fda8 Organise methods in PipelineTranslator discard 5990104 Renames: better differenciate pipeline translator for transform translator discard a5b9894 Wire node translators with pipeline translator discard 9fea139 Add nodes translators structure discard d7df588 Add global pipeline translation structure discard ee8875a Start pipeline translation discard 253123c Add SparkPipelineOptions discard 2558c22 Fix missing dep discard 7794f02 Add an empty spark-structured-streaming runner project targeting spark 2.4.0 add 14a7a24 [BEAM-8470] Add an empty spark-structured-streaming runner project targeting spark 2.4.0 add 0d314d2 [BEAM-8470] Fix missing dep add 44aa39d [BEAM-8470] Add SparkPipelineOptions add 8b1f07e [BEAM-8470] Start pipeline translation add 5b7efb1 [BEAM-8470] Add global pipeline translation structure add c3828ea [BEAM-8470] Add nodes translators structure add c4c38b3 [BEAM-8470] Wire node translators with pipeline translator add 2222e24 [BEAM-8470] Renames: better differenciate pipeline translator for transform translator add 1ffbf4e [BEAM-8470] Organise methods in PipelineTranslator add 1c1bbab [BEAM-8470] Initialise BatchTranslationContext add b8d4a96 [BEAM-8470] Refactoring: -move batch/streaming common translation visitor and utility methods to PipelineTranslator -rename batch dedicated classes to Batch* to differentiate with their streaming counterparts -Introduce TranslationContext for common batch/streaming components add fef01b3 [BEAM-8470] Make transform translation clearer: renaming, comments add 666c011 [BEAM-8470] Improve javadocs add 5eeca80 [BEAM-8470] Move SparkTransformOverrides to correct package add fd888deb [BEAM-8470] Move common translation context components to superclass add 4e94975 [BEAM-8470] apply spotless add 12ed3f5 [BEAM-8470] Make codestyle and firebug happy add fbc7fbc [BEAM-8470] Add TODOs add 3f29bff [BEAM-8470] Post-pone batch qualifier in all classes names for readability add 79b4541 [BEAM-8470] Add precise TODO for multiple TransformTranslator per transform URN add 7c8cd47 [BEAM-8470] Added SparkRunnerRegistrar add d757185 [BEAM-8470] Add basic pipeline execution. Refactor translatePipeline() to return the translationContext on which we can run startPipeline() add a36bae0 [BEAM-8470] Create PCollections manipulation methods add e1b7644 [BEAM-8470] Create Datasets manipulation methods add ad26cec [BEAM-8470] Add Flatten transformation translator add 129e95f [BEAM-8470] Add primitive GroupByKeyTranslatorBatch implementation add 6965842 [BEAM-8470] Use Iterators.transform() to return Iterable add cce30c9 [BEAM-8470] Implement read transform add 7901d73 [BEAM-8470] update TODO add eb7c77e [BEAM-8470] Apply spotless add cb9bb99 [BEAM-8470] start source instanciation add 6bb32a5 [BEAM-8470] Improve exception flow add 1970760 [BEAM-8470] Improve type enforcement in ReadSourceTranslator add c863dca [BEAM-8470] Experiment over using spark Catalog to pass in Beam Source through spark Table add 2a960dd [BEAM-8470] Add source mocks add be8344e [BEAM-8470] fix mock, wire mock in translators and create a main test. add 756eec3 [BEAM-8470] Use raw WindowedValue so that spark Encoders could work (temporary) add 4480578 [BEAM-8470] clean deps add 26b2a91 [BEAM-8470] Move DatasetSourceMock to proper batch mode add 5385a8f [BEAM-8470] Run pipeline in batch mode or in streaming mode add 8051fac [BEAM-8470] Split batch and streaming sources and translators add ee0ea0e [BEAM-8470] Use raw Encoder<WindowedValue> also in regular ReadSourceTranslatorBatch add ae75196 [BEAM-8470] Clean add 4a59b09 [BEAM-8470] Add ReadSourceTranslatorStreaming add 68f7230 [BEAM-8470] Move Source and translator mocks to a mock package. add 3368a9e [BEAM-8470] Pass Beam Source and PipelineOptions to the spark DataSource as serialized strings add c569f1b [BEAM-8470] Refactor DatasetSource fields add db036ca [BEAM-8470] Wire real SourceTransform and not mock and update the test add 6cac8b2 [BEAM-8470] Add missing 0-arg public constructor add dc615f0 [BEAM-8470] Use new PipelineOptionsSerializationUtils add adfd237 [BEAM-8470] Apply spotless and fix checkstyle add aeee309 [BEAM-8470] Add a dummy schema for reader add e9b0488 [BEAM-8470] Add empty 0-arg constructor for mock source add 9537add [BEAM-8470] Clean add 8b232df [BEAM-8470] Checkstyle and Findbugs add 344f69e [BEAM-8470] Refactor SourceTest to a UTest instaed of a main add 285aab4 [BEAM-8470] Fix pipeline triggering: use a spark action instead of writing the dataset add 244459f [BEAM-8470] improve readability of options passing to the source add 1102280 [BEAM-8470] Clean unneeded fields in DatasetReader add b3796a2 [BEAM-8470] Fix serialization issues add 1dc0352 [BEAM-8470] Add SerializationDebugger add f5a0ba7 [BEAM-8470] Add serialization test add 63aa493 [BEAM-8470] Move SourceTest to same package as tested class add baea877 [BEAM-8470] Fix SourceTest add bb7db53 [BEAM-8470] Simplify beam reader creation as it created once the source as already been partitioned add 833df0e [BEAM-8470] Put all transform translators Serializable add 3004db7 [BEAM-8470] Enable test mode add 3dc7d95 [BEAM-8470] Enable gradle build scan add 068f63d [BEAM-8470] Add flatten test add 5dc598a [BEAM-8470] First attempt for ParDo primitive implementation add 30cbbf4 [BEAM-8470] Serialize windowedValue to byte[] in source to be able to specify a binary dataset schema and deserialize windowedValue from Row to get a dataset<WindowedValue> add 176342f [BEAM-8470] Comment schema choices add 730eed3 [BEAM-8470] Fix errorprone add 067e756 [BEAM-8470] Fix testMode output to comply with new binary schema add 1960559 [BEAM-8470] Cleaning add 7e5399e [BEAM-8470] Remove bundleSize parameter and always use spark default parallelism add e63d794 [BEAM-8470] Fix split bug add fc2239d [BEAM-8470] Clean add f152deb [BEAM-8470] Add ParDoTest add 7fd2de1 [BEAM-8470] Address minor review notes add bd385ff [BEAM-8470] Clean add bdeb934 [BEAM-8470] Add GroupByKeyTest add 990e220 [BEAM-8470] Add comments and TODO to GroupByKeyTranslatorBatch add 997043d [BEAM-8470] Fix type checking with Encoder of WindowedValue<T> add 61bf40c [BEAM-8470] Port latest changes of ReadSourceTranslatorBatch to ReadSourceTranslatorStreaming add c8f2078 [BEAM-8470] Remove no more needed putDatasetRaw add f7a008c [BEAM-8470] Add ComplexSourceTest add 728dc73 [BEAM-8470] Fail in case of having SideInouts or State/Timers add 0cd30d1 [BEAM-8470] Fix Encoders: create an Encoder for every manipulated type to avoid Spark fallback to genericRowWithSchema and cast to allow having Encoders with generic types such as WindowedValue<T> and get the type checking back add 2bea926 [BEAM-8470] Apply spotless add a048341 [BEAM-8470] Fixed Javadoc error add 8286afc [BEAM-8470] Rename SparkSideInputReader class and rename pruneOutput() to pruneOutputFilteredByTag() add 313bc66 [BEAM-8470] Don't use deprecated sideInput.getWindowingStrategyInternal() add 6657d7d [BEAM-8470] Simplify logic of ParDo translator add 359a6b0 [BEAM-8470] Fix kryo issue in GBK translator with a workaround add b7ec102 [BEAM-8470] Rename SparkOutputManager for consistency add 9275c82 [BEAM-8470] Fix for test elements container in GroupByKeyTest add b136699 [BEAM-8470] Added "testTwoPardoInRow" add a9eed5b [BEAM-8470] Add a test for the most simple possible Combine add 222ce45 [BEAM-8470] Rename SparkDoFnFilterFunction to DoFnFilterFunction for consistency add 65dcd0b [BEAM-8470] Generalize the use of SerializablePipelineOptions in place of (not serializable) PipelineOptions add d24bc86 [BEAM-8470] Fix getSideInputs add 126ee79 [BEAM-8470] Extract binary schema creation in a helper class add cb666d3 [BEAM-8470] First version of combinePerKey add e282801 [BEAM-8470] Improve type checking of Tuple2 encoder add f277be2 [BEAM-8470] Introduce WindowingHelpers (and helpers package) and use it in Pardo, GBK and CombinePerKey add 35f7128 [BEAM-8470] Fix combiner using KV as input, use binary encoders in place of accumulatorEncoder and outputEncoder, use helpers, spotless add ca25210 [BEAM-8470] Add combinePerKey and CombineGlobally tests add f416e20 [BEAM-8470] Introduce RowHelpers add 4f670a5 [BEAM-8470] Add CombineGlobally translation to avoid translating Combine.perKey as a composite transform based on Combine.PerKey (which uses low perf GBK) add 071aab5 [BEAM-8470] Cleaning add 2aff552 [BEAM-8470] Get back to classes in translators resolution because URNs cannot translate Combine.Globally add cd19577 [BEAM-8470] Fix various type checking issues in Combine.Globally add 3499784 [BEAM-8470] Update test with Long add f94f5ca [BEAM-8470] Fix combine. For unknown reason GenericRowWithSchema is used as input of combine so extract its content to be able to proceed add 4bb19ce [BEAM-8470] Use more generic Row instead of GenericRowWithSchema add c3b4686 [BEAM-8470] Add explanation about receiving a Row as input in the combiner add 82131f5 [BEAM-8470] Fix encoder bug in combinePerkey add a781ec1 [BEAM-8470] Cleaning add 0c53291 [BEAM-8470] Implement WindowAssignTranslatorBatch add 75dc161 [BEAM-8470] Implement WindowAssignTest add cb0e79a [BEAM-8470] Fix javadoc add 95564cd [BEAM-8470] Added SideInput support add d51e934 [BEAM-8470] Fix CheckStyle violations add 0f1c9ff [BEAM-8470] Don't use Reshuffle translation add c4b8e86 [BEAM-8470] Added using CachedSideInputReader add 4c10a48 [BEAM-8470] Added TODO comment for ReshuffleTranslatorBatch add ff2ed77 [BEAM-8470] And unchecked warning suppression add f5cbbda [BEAM-8470] Add streaming source initialisation add f20a878 [BEAM-8470] Implement first streaming source add 0863bf5 [BEAM-8470] Add a TODO on spark output modes add 2470f3e [BEAM-8470] Add transformators registry in PipelineTranslatorStreaming add 4ab7b03 [BEAM-8470] Add source streaming test add 9f2caa8 [BEAM-8470] Specify checkpointLocation at the pipeline start add 79f22a8 [BEAM-8470] Clean unneeded 0 arg constructor in batch source add 850f56a [BEAM-8470] Clean streaming source add e260811 [BEAM-8470] Continue impl of offsets for streaming source add 45cd090 [BEAM-8470] Deal with checkpoint and offset based read add 2070278 [BEAM-8470] Apply spotless and fix spotbugs warnings add 52e8689 [BEAM-8470] Disable never ending test SimpleSourceTest.testUnboundedSource add 673731f [BEAM-8470] Fix access level issues, typos and modernize code to Java 8 style add f106a04 [BEAM-8470] Merge Spark Structured Streaming runner into main Spark module. Remove spark-structured-streaming module.Rename Runner to SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar for Structured Streaming Runner add 1c82b5e [BEAM-8470] Fix non-vendored imports from Spark Streaming Runner classes add 53acbda [BEAM-8470] Pass doFnSchemaInformation to ParDo batch translation add 102f6fc [BEAM-8470] Fix spotless issues after rebase add fecc40b [BEAM-8470] Fix logging levels in Spark Structured Streaming translation add f2dd748 [BEAM-8470] Add SparkStructuredStreamingPipelineOptions and SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added to have the new runner rely only on its specific options. add 5e4d878 [BEAM-8470] Rename SparkPipelineResult to SparkStructuredStreamingPipelineResult This is done to avoid an eventual collision with the one in SparkRunner. However this cannot happen at this moment because it is package private, so it is also done for consistency. add c9b3e51 [BEAM-8470] Use PAssert in Spark Structured Streaming transform tests add cbe80d4 [BEAM-8470] Ignore spark offsets (cf javadoc) add 2427002 [BEAM-8470] implement source.stop add 0031ef9 [BEAM-8470] Update javadoc add 0c082b89 [BEAM-8470] Apply Spotless add 9b7986a [BEAM-8470] Enable batch Validates Runner tests for Structured Streaming Runner add 7891116 [BEAM-8470] Limit the number of partitions to make tests go 300% faster add 48eaf9d [BEAM-8470] Fixes ParDo not calling setup and not tearing down if exception on startBundle add 386080c [BEAM-8470] Pass transform based doFnSchemaInformation in ParDo translation add f780659 [BEAM-8470] Consider null object case on RowHelpers, fixes empty side inputs tests. add 355e95d [BEAM-8470] Put back batch/simpleSourceTest.testBoundedSource add 2c49efc [BEAM-8470] Update windowAssignTest add 0705e50 [BEAM-8470] Add comment about checkpoint mark add 211cb98 [BEAM-8470] Re-code GroupByKeyTranslatorBatch to conserve windowing instead of unwindowing/windowing(GlobalWindow): simplify code, use ReduceFnRunner to merge the windows add d52ef97 [BEAM-8470] re-enable reduceFnRunner timers for output add 26fcb28 [BEAM-8470] Improve visibility of debug messages add c5970f2 [BEAM-8470] Add a test that GBK preserves windowing add 22abb05 [BEAM-8470] Add TODO in Combine translations add dc4dda7 [BEAM-8470] Update KVHelpers.extractKey() to deal with WindowedValue and update GBK and CPK add 09bc7a1 [BEAM-8470] Fix comment about schemas add 4c2585b [BEAM-8470] Implement reduce part of CombineGlobally translation with windowing add 7222d49 [BEAM-8470] Output data after combine add a8ca39f [BEAM-8470] Implement merge accumulators part of CombineGlobally translation with windowing add 0bdc83c [BEAM-8470] Fix encoder in combine call add 7237dca [BEAM-8470] Revert extractKey while combinePerKey is not done (so that it compiles) add a460052 [BEAM-8470] Apply a groupByKey avoids for some reason that the spark structured streaming fmwk casts data to Row which makes it impossible to deserialize without the coder shipped into the data. For performance reasons (avoid memory consumption and having to deserialize), we do not ship coder + data. Also add a mapparitions before GBK to avoid shuffling add 26f778e [BEAM-8470] Fix case when a window does not merge into any other window add 6f47514 [BEAM-8470] Fix wrong encoder in combineGlobally GBK add d093cad [BEAM-8470] Fix bug in the window merging logic add 5427856 [BEAM-8470] Remove the mapPartition that adds a key per partition because otherwise spark will reduce values per key instead of globally add c0e52c2 [BEAM-8470] Remove CombineGlobally translation because it is less performant than the beam sdk one (key + combinePerKey.withHotkeyFanout) add e7d3283 [BEAM-8470] Now that there is only Combine.PerKey translation, make only one Aggregator add 5a7bbf5 [BEAM-8470] Clean no more needed KVHelpers add 96b147e [BEAM-8470] Clean not more needed RowHelpers add d6683ad [BEAM-8470] Clean not more needed WindowingHelpers add b19b792 [BEAM-8470] Fix javadoc of AggregatorCombiner add c178172 [BEAM-8470] Fixed immutable list bug add 98bc7a4 [BEAM-8470] add comment in combine globally test add 85fe2d0 [BEAM-8470] Clean groupByKeyTest add e205425 [BEAM-8470] Add a test that combine per key preserves windowing add b12e878 [BEAM-8470] Ignore for now not working test testCombineGlobally add 8215482 [BEAM-8470] Add metrics support in DoFn add 35fef0f [BEAM-8470] Add missing dependencies to run Spark Structured Streaming Runner on Nexmark add dc1abf5 [BEAM-8470] Add setEnableSparkMetricSinks() method add a7f04ed [BEAM-8470] Fix javadoc add a6d0e58 [BEAM-8470] Fix accumulators initialization in Combine that prevented CombineGlobally to work. add 37dcb9a [BEAM-8470] Add a test to check that CombineGlobally preserves windowing add b59d8c5 [BEAM-8470] Persist all output Dataset if there are multiple outputs in pipeline Enabled Use*Metrics tests add a004a56 [BEAM-8470] Added metrics sinks and tests add 745ab6e [BEAM-8470] Make spotless happy add 3391f8d [BEAM-8470] Add PipelineResults to Spark structured streaming. add 7d7503a [BEAM-8470] Update log4j configuration add 8c91c11 [BEAM-8470] Add spark execution plans extended debug messages. add 7f71572 [BEAM-8470] Print number of leaf datasets add 65c8457 [BEAM-8470] fixup! Add PipelineResults to Spark structured streaming. add f2388dc [BEAM-8470] Remove no more needed AggregatorCombinerPerKey (there is only AggregatorCombiner) add 134ee35 [BEAM-8470] After testing performance and correctness, launch pipeline with dataset.foreach(). Make both test mode and production mode use foreach for uniformity. Move dataset print as a utility method add a8e0ad9 [BEAM-8470] Improve Pardo translation performance: avoid calling a filter transform when there is only one output tag add c5b8799 [BEAM-8470] Use "sparkMaster" in local mode to obtain number of shuffle partitions + spotless apply add 5ce206b [BEAM-8470] Wrap Beam Coders into Spark Encoders using ExpressionEncoder: serialization part add 750037f [BEAM-8470] Wrap Beam Coders into Spark Encoders using ExpressionEncoder: deserialization part add 69aebb1 [BEAM-8470] type erasure: spark encoders require a Class<T>, pass Object and cast to Class<T> add 02a0f01 [BEAM-8470] Fix scala Product in Encoders to avoid StackEverflow add a0706d7 [BEAM-8470] Conform to spark ExpressionEncoders: pass classTags, implement scala Product, pass children from within the ExpressionEncoder, fix visibilities add 3a87d37 [BEAM-8470] Add a simple spark native test to test Beam coders wrapping into Spark Encoders add cca4034 [BEAM-8470] Fix code generation in Beam coder wrapper add 8e341a1 [BEAM-8470] Lazy init coder because coder instance cannot be interpolated by catalyst add f1a555e [BEAM-8470] Fix warning in coder construction by reflexion add 482bcf6 [BEAM-8470] Fix ExpressionEncoder generated code: typos, try catch, fqcn add 44274ca [BEAM-8470] Fix getting the output value in code generation add e2a134c [BEAM-8470] Fix beam coder lazy init using reflexion: use .clas + try catch + cast add cfff6f7 [BEAM-8470] Remove lazy init of beam coder because there is no generic way on instanciating a beam coder add 72ba1ea [BEAM-8470] Remove example code add 83421c2 [BEAM-8470] Fix equal and hashcode add dc5b243 [BEAM-8470] Fix generated code: uniform exceptions catching, fix parenthesis and variable declarations add 6b31168 [BEAM-8470] Add an assert of equality in the encoders test add ba33722 [BEAM-8470] Apply spotless and checkstyle and add javadocs add 4055d11 [BEAM-8470] Wrap exceptions in UserCoderExceptions add 2daed76 [BEAM-8470] Put Encoders expressions serializable add e8463ce [BEAM-8470] Catch Exception instead of IOException because some coders to not throw Exceptions at all (e.g.VoidCoder) add 5ebc1e9 [BEAM-8470] Apply new Encoders to CombinePerKey add 7e5b6df [BEAM-8470] Apply new Encoders to Read source add cb82036 [BEAM-8470] Improve performance of source: the mapper already calls windowedValueCoder.decode, no need to call it also in the Spark encoder add 163b157 [BEAM-8470] Ignore long time failing test: SparkMetricsSinkTest add a7629eb [BEAM-8470] Apply new Encoders to Window assign translation add 54fd760 [BEAM-8470] Apply new Encoders to AggregatorCombiner add f1f2163 [BEAM-8470] Create a Tuple2Coder to encode scala tuple2 add fbda17a [BEAM-8470] Apply new Encoders to GroupByKey add de92516 [BEAM-8470] Apply new Encoders to Pardo. Replace Tuple2Coder with MultiOutputCoder to deal with multiple output to use in Spark Encoder for DoFnRunner add f180b4d [BEAM-8470] Apply spotless, fix typo and javadoc add fbb8355 [BEAM-8470] Use beam encoders also in the output of the source translation add 2e1ed4b [BEAM-8470] Remove unneeded cast add 9e7f118 [BEAM-8470] Fix: Remove generic hack of using object. Use actual Coder encodedType in Encoders add a0e6ca4 [BEAM-8470] Remove Encoders based on kryo now that we call Beam coders in the runner This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (620a27a) \ N -- N -- N refs/heads/spark-runner_structured-streaming (a0e6ca4) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: