[beam] branch spark-runner_structured-streaming updated (6bf5a93 -> c350188)

iemejia Mon, 12 Aug 2019 01:08:21 -0700

This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git.



 discard 6bf5a93  Use "sparkMaster" in local mode to obtain number of shuffle 
partitions + spotless apply
 discard 9de62ab  Improve Pardo translation performance: avoid calling a filter 
transform when there is only one output tag
 discard 9472a41  Add a TODO on perf improvement of Pardo translation
 discard 53374e1  After testing performance and correctness, launch pipeline 
with dataset.foreach(). Make both test mode and production mode use foreach for 
uniformity. Move dataset print as a utility method
 discard 9b8fd33  Remove no more needed AggregatorCombinerPerKey (there is only 
AggregatorCombiner)
 discard d6830fb  fixup! Add PipelineResults to Spark structured streaming.
 discard 8600bad  Print number of leaf datasets
 discard 2be3d94  Add spark execution plans extended debug messages.
 discard eb3c7d7  Update log4j configuration
 discard ba68f6c  Add PipelineResults to Spark structured streaming.
 discard 8a9d97c  Make spotless happy
 discard 3fd985f  Added metrics sinks and tests
 discard 9879bc9  Persist all output Dataset if there are multiple outputs in 
pipeline Enabled Use*Metrics tests
 discard f1f58a9  Add a test to check that CombineGlobally preserves windowing
 discard 6d4a75d  Fix accumulators initialization in Combine that prevented 
CombineGlobally to work.
 discard b11bd8b  Fix javadoc
 discard 186e8ac  Add setEnableSparkMetricSinks() method
 discard 9472863  Add missing dependencies to run Spark Structured Streaming 
Runner on Nexmark
 discard 6b66509  Add metrics support in DoFn
 discard 8943636  Ignore for now not working test testCombineGlobally
 discard 7dde4fb  Add a test that combine per key preserves windowing
 discard 9a7e5a3  Clean groupByKeyTest
 discard 94a7be6  add comment in combine globally test
 discard d263ff3  Fixed immutable list bug
 discard 33c1554  Fix javadoc of AggregatorCombiner
 discard 1729bb7  Clean not more needed WindowingHelpers
 discard 33f1a39  Clean not more needed RowHelpers
 discard 11d504b  Clean no more needed KVHelpers
 discard 3ff116a  Now that there is only Combine.PerKey translation, make only 
one Aggregator
 discard 5adeca5  Remove CombineGlobally translation because it is less 
performant than the beam sdk one (key + combinePerKey.withHotkeyFanout)
 discard 4e9c9e1  Remove the mapPartition that adds a key per partition because 
otherwise spark will reduce values per key instead of globally
 discard 47f3aa5  Fix bug in the window merging logic
 discard be71f05  Fix wrong encoder in combineGlobally GBK
 discard 29a8e83  Fix case when a window does not merge into any other window
 discard 6353f9c  Apply a groupByKey avoids for some reason that the spark 
structured streaming fmwk casts data to Row which makes it impossible to 
deserialize without the coder shipped into the data. For performance reasons 
(avoid memory consumption and having to deserialize), we do not ship coder + 
data. Also add a mapparitions before GBK to avoid shuffling
 discard 8ad7d4f  Revert extractKey while combinePerKey is not done (so that it 
compiles)
 discard ec7ac95  Fix encoder in combine call
 discard 200e6ce  Implement merge accumulators part of CombineGlobally 
translation with windowing
 discard 7b4ba5b  Output data after combine
 discard 3c5eb1f  Implement reduce part of CombineGlobally translation with 
windowing
 discard 55c6f20  Fix comment about schemas
 discard 5e4e034  Update KVHelpers.extractKey() to deal with WindowedValue and 
update GBK and CPK
 discard 653351f  Add TODO in Combine translations
 discard 8b5424a  Add a test that GBK preserves windowing
 discard 9721a02  Improve visibility of debug messages
 discard b4c70d9  re-enable reduceFnRunner timers for output
 discard 60afc9c  Re-code GroupByKeyTranslatorBatch to conserve windowing 
instead of unwindowing/windowing(GlobalWindow): simplify code, use 
ReduceFnRunner to merge the windows
 discard 258d565  Add comment about checkpoint mark
 discard 018d781  Update windowAssignTest
 discard 6468c43  Put back batch/simpleSourceTest.testBoundedSource
 discard 988a6cf  Consider null object case on RowHelpers, fixes empty side 
inputs tests.
 discard a990264  Pass transform based doFnSchemaInformation in ParDo 
translation
 discard 86d03a4  Fixes ParDo not calling setup and not tearing down if 
exception on startBundle
 discard 55b98f8  Limit the number of partitions to make tests go 300% faster
 discard 0193941  Enable batch Validates Runner tests for Structured Streaming 
Runner
 discard 91a7fc6  Apply Spotless
 discard 1f54355  Update javadoc
 discard 07a123b  implement source.stop
 discard 9a460d0  Ignore spark offsets (cf javadoc)
 discard e889206  Use PAssert in Spark Structured Streaming transform tests
 discard 66f3928  Rename SparkPipelineResult to 
SparkStructuredStreamingPipelineResult This is done to avoid an eventual 
collision with the one in SparkRunner. However this cannot happen at this 
moment because it is package private, so it is also done for consistency.
 discard 57ca802  Add SparkStructuredStreamingPipelineOptions and 
SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added 
to have the new   runner rely only on its specific options.
 discard e9da167  Fix logging levels in Spark Structured Streaming translation
 discard b3cda3c  Fix spotless issues after rebase
 discard 26974f0  Pass doFnSchemaInformation to ParDo batch translation
 discard 63364db  Fix non-vendored imports from Spark Streaming Runner classes
 discard ae3526c  Merge Spark Structured Streaming runner into main Spark 
module. Remove spark-structured-streaming module.Rename Runner to 
SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar 
for Structured Streaming Runner
 discard 96b9e8f  Fix access level issues, typos and modernize code to Java 8 
style
 discard 9143b88  Disable never ending test SimpleSourceTest.testUnboundedSource
 discard abaf2e2  Apply spotless and fix spotbugs warnings
 discard 6d717d9  Deal with checkpoint and offset based read
 discard b366b7f  Continue impl of offsets for streaming source
 discard 5f13f11  Clean streaming source
 discard 646b416  Clean unneeded 0 arg constructor in batch source
 discard 96898cc  Specify checkpointLocation at the pipeline start
 discard 3c5181f  Add source streaming test
 discard 56260a1  Add transformators registry in PipelineTranslatorStreaming
 discard 6552a53  Add a TODO on spark output modes
 discard c9f5de8  Implement first streaming source
 discard 9394fb6  Add streaming source initialisation
 discard b912e8d  And unchecked warning suppression
 discard 53c6409  Added TODO comment for ReshuffleTranslatorBatch
 discard ef58a99  Added using CachedSideInputReader
 discard 64ef766  Don't use Reshuffle translation
 discard 753913e  Fix CheckStyle violations
 discard c00e1c4  Added SideInput support
 discard 9dd08b7  Fix javadoc
 discard 21bb0a6  Implement WindowAssignTest
 discard cb1c0dd  Implement WindowAssignTranslatorBatch
 discard 0d2c5ca  Cleaning
 discard 2ece1d7  Fix encoder bug in combinePerkey
 discard ba1b628  Add explanation about receiving a Row as input in the combiner
 discard 5a6ae19  Use more generic Row instead of GenericRowWithSchema
 discard c72aa4a  Fix combine. For unknown reason GenericRowWithSchema is used 
as input of combine so extract its content to be able to proceed
 discard 2b4bfd3  Update test with Long
 discard 2608e96  Fix various type checking issues in Combine.Globally
 discard c3565ee  Get back to classes in translators resolution because URNs 
cannot translate Combine.Globally
 discard 91f910f  Cleaning
 discard 0b96f4d  Add CombineGlobally translation to avoid translating 
Combine.perKey as a composite transform based on Combine.PerKey (which uses low 
perf GBK)
 discard 6d19f9f  Introduce RowHelpers
 discard 7447090  Add combinePerKey and CombineGlobally tests
 discard e19f0c4  Fix combiner using KV as input, use binary encoders in place 
of accumulatorEncoder and outputEncoder, use helpers, spotless
 discard 5b39a09  Introduce WindowingHelpers (and helpers package) and use it 
in Pardo, GBK and CombinePerKey
 discard 31d80c4  Improve type checking of Tuple2 encoder
 discard ecaf6c8  First version of combinePerKey
 discard 3476c9e  Extract binary schema creation in a helper class
 discard 0deca18  Fix getSideInputs
 discard 2231e1a  Generalize the use of SerializablePipelineOptions in place of 
(not serializable) PipelineOptions
 discard 2de632f  Rename SparkDoFnFilterFunction to DoFnFilterFunction for 
consistency
 discard 7a608ce  Add a test for the most simple possible Combine
 discard 059a45d  Added "testTwoPardoInRow"
 discard bc27442  Fix for test elements container in GroupByKeyTest
 discard 91db78e  Rename SparkOutputManager for consistency
 discard cea7019  Fix kryo issue in GBK translator with a workaround
 discard eb57488  Simplify logic of ParDo translator
 discard 1c3575f  Don't use deprecated sideInput.getWindowingStrategyInternal()
 discard f4b21a8  Rename SparkSideInputReader class and rename pruneOutput() to 
pruneOutputFilteredByTag()
 discard 0a52721  Fixed Javadoc error
 discard 3cabda9  Apply spotless
 discard 21d66a7  Fix Encoders: create an Encoder for every manipulated type to 
avoid Spark fallback to genericRowWithSchema and cast to allow having Encoders 
with generic types such as WindowedValue<T> and get the type checking back
 discard 97a7a3e  Fail in case of having SideInouts or State/Timers
 discard 093cd1e  Add ComplexSourceTest
 discard 94ad34e  Remove no more needed putDatasetRaw
 discard ffa8d02  Port latest changes of ReadSourceTranslatorBatch to 
ReadSourceTranslatorStreaming
 discard 8bad879  Fix type checking with Encoder of WindowedValue<T>
 discard 8f7610b  Add comments and TODO to GroupByKeyTranslatorBatch
 discard f5415fb  Add GroupByKeyTest
 discard 584563c  Clean
 discard 70c0b37  Address minor review notes
 discard 3563aba  Add ParDoTest
 discard e7a00db  Clean
 discard dba34e1  Fix split bug
 discard b1e0801  Remove bundleSize parameter and always use spark default 
parallelism
 discard 0586a34  Cleaning
 discard b11be3e  Fix testMode output to comply with new binary schema
 discard 54190bf  Fix errorprone
 discard 0434229  Comment schema choices
 discard fe966cf  Serialize windowedValue to byte[] in source to be able to 
specify a binary dataset schema and deserialize windowedValue from Row to get a 
dataset<WindowedValue>
 discard 1933059  First attempt for ParDo primitive implementation
 discard 893231a  Add flatten test
 discard b86e2c9  Enable gradle build scan
 discard ab3b891  Enable test mode
 discard b9ecb10  Put all transform translators Serializable
 discard 1f199db  Simplify beam reader creation as it created once the source 
as already been partitioned
 discard b005fd0  Fix SourceTest
 discard 465040b  Move SourceTest to same package as tested class
 discard 253c9bd  Add serialization test
 discard 6e48777  Add SerializationDebugger
 discard fb04533  Fix serialization issues
 discard 5043370  Clean unneeded fields in DatasetReader
 discard af8202e  improve readability of options passing to the source
 discard 874565f  Fix pipeline triggering: use a spark action instead of 
writing the dataset
 discard 7a9c2c6  Refactor SourceTest to a UTest instaed of a main
 discard d45bfbb  Checkstyle and Findbugs
 discard d550ecf  Clean
 discard 182babd  Add empty 0-arg constructor for mock source
 discard e9e6692  Add a dummy schema for reader
 discard 9cab04c  Apply spotless and fix  checkstyle
 discard 2b4ffdf  Use new PipelineOptionsSerializationUtils
 discard c029ebe  Add missing 0-arg public constructor
 discard 3bf20b5  Wire real SourceTransform and not mock and update the test
 discard 5cab09d  Refactor DatasetSource fields
 discard 5b9e4b9  Pass Beam Source and PipelineOptions to the spark DataSource 
as serialized strings
 discard 2cb1b75  Move Source and translator mocks to a mock package.
 discard 35568ab  Add ReadSourceTranslatorStreaming
 discard 5e00117  Clean
 discard 312333f  Use raw Encoder<WindowedValue> also in regular 
ReadSourceTranslatorBatch
 discard 3088c3c  Split batch and streaming sources and translators
 discard 85b33a5  Run pipeline in batch mode or in streaming mode
 discard 2c0fd81  Move DatasetSourceMock to proper batch mode
 discard 5bd0039  clean deps
 discard a2a0665  Use raw WindowedValue so that spark Encoders could work 
(temporary)
 discard 4da27b0  fix mock, wire mock in translators and create a main test.
 discard 6682665  Add source mocks
 discard e084b11  Experiment over using spark Catalog to pass in Beam Source 
through spark Table
 discard a07ce40  Improve type enforcement in ReadSourceTranslator
 discard 39a8b7d  Improve exception flow
 discard af4cd26  start source instanciation
 discard e1c37ac  Apply spotless
 discard ba25b20  update TODO
 discard 6ca4265  Implement read transform
 discard 06a9891  Use Iterators.transform() to return Iterable
 discard fd56501  Add primitive GroupByKeyTranslatorBatch implementation
 discard 9402def  Add Flatten transformation translator
 discard a041977  Create Datasets manipulation methods
 discard 4179dbd  Create PCollections manipulation methods
 discard 8a2bad8  Add basic pipeline execution. Refactor translatePipeline() to 
return the translationContext on which we can run startPipeline()
 discard 11d8b23  Added SparkRunnerRegistrar
 discard 97eb0de  Add precise TODO for multiple TransformTranslator per 
transform URN
 discard 07090b2  Post-pone batch qualifier in all classes names for readability
 discard 63d3ce0  Add TODOs
 discard 47b05fd  Make codestyle and firebug happy
 discard d0137b9  apply spotless
 discard 4ccc2cb  Move common translation context components to superclass
 discard 1cb7f3b  Move SparkTransformOverrides to correct package
 discard 03fe418  Improve javadocs
 discard 9930dc4  Make transform translation clearer: renaming, comments
 discard 992883b  Refactoring: -move batch/streaming common translation visitor 
and utility methods to PipelineTranslator -rename batch dedicated classes to 
Batch* to differentiate with their streaming counterparts -Introduce 
TranslationContext for common batch/streaming components
 discard eacf593  Initialise BatchTranslationContext
 discard 92a63de  Organise methods in PipelineTranslator
 discard 71738cb  Renames: better differenciate pipeline translator for 
transform translator
 discard ccf25c2  Wire node translators with pipeline translator
 discard c555920  Add nodes translators structure
 discard 806d1b8  Add global pipeline translation structure
 discard ed1013d  Start pipeline translation
 discard b766961  Add SparkPipelineOptions
 discard 3519f0f  Fix missing dep
 discard 10ae821  Add an empty spark-structured-streaming runner project 
targeting spark 2.4.0
     add 851339e  [BEAM-7869] Add load testing jobs to README.md
     add 95bd48c  Merge pull request #9230 from 
mwalenia/BEAM-7869-load-tests-readme
     add 3ca9149  [BEAM-7844] Implementing NodeStat Estimations for all the 
nodes
     add 7d56a23  Merge pull request #9198 from riazela/RowRateWindowEstimation
     add 90d6ba1  [BEAM-7814] Fix BigqueryMatcher._matches when called more 
than once.
     add 31c8e87  Merge pull request #9224 from udim/bq-matcher-fix
     add 1031fdf  Fix minor typos (#9192)
     add 76424e7  Update Python dependencies page for 2.14.0
     add c6c3bce  Merge pull request #9055 from rosetn/patch-4
     add 30661c1  [BEAM-7079] Add Chicago Taxi Example runner script with 
Python scripts
     add 846db7d  [BEAM-7079] Add Chicago Taxi Example gradle task and Jenkins 
job
     add dd4283f  Merge pull request #8939 from 
mwalenia/BEAM-7079-chicago-taxi-example
     add 2c8bf9f  [BEAM-7868] optionally skip hidden pipeline options
     add 47cf7ee  Merge pull request #9211 from angoenka/hidden_options
     add 7a0bc8d  [BEAM-7577] Allow ValueProviders in Datastore Query filters 
(#8950)
     add d349553  Retry Datastore writes on [Errno 32] Broken pipe
     add fef16e9  Merge pull request #8346: [BEAM-7476] Retry Datastore writes 
on [Errno 32] Broken pipe
     add 149153b  [BEAM-7060] Introduce Python3-only test modules (#9223)
     add b5e4175  [BEAM-7877] Change the log level when deleting unknown 
temoprary files in FileBasedSink
     add dbcd265  Merge pull request #9227 from ihji/BEAM-7877
     add 2c1932a  Update design-documents.md
     add 6976651  [BEAM-7889] Update RC validation guide due to 
run_rc_validation.sh change
     add cca9e6d  Merge pull request #9241: [BEAM-7889] Update RC validation 
guide due to run_rc_validation.sh change
     add 522190f  [BEAM-7060] Fix 
:sdks:python:apache_beam:testing:load_tests:run breakage
     add ea08174  Merge pull request #9239: [BEAM-7060] Fix 
:sdks:python:apache_beam:testing:load_tests:run breakage
     add c5c7076  Fix some copy paste errors
     add cb27549  Merge pull request #9194 from TheNeuralBit/coder-cleanup
     add 01f3606  [Java] remove unneeded junit dependency.
     add 917a613  Merge pull request #9208 from 
amaliujia/rw-remove-unneeded-junit
     add fc9e0aa  [BEAM-7894] Upgrade AWS KPL to version 0.13.1
     add e70eb15  Merge pull request #9249: [BEAM-7894] Upgrade AWS KPL to 
version 0.13.1
     add 9cf6f0d  [BEAM-7898] Remove default implementation of getRowCount and 
change the name to getTableStatistics
     add 0d911b8  Merge pull request #9254 from riazela/TablesStatEstimation
     add ec8b65a  [BEAM-7389] Add helper conversion samples and simplified tests
     add 913f065  Merge pull request #9252 from 
davidcavazos/element-wise-with-timestamps
     add 08d0146  [BEAM-7607] Per user request, making maxFilesPerBundle public 
(#9160)
     add 2387cbc  [BEAM-7880] Upgrade Jackson databind to version 2.9.9.3
     add 8a33913  Merge pull request #9229: [BEAM-7880] Upgrade Jackson 
databind to version 2.9.9.3
     add 00eef79  [BEAM-7721] Add a new module with BigQuery IO Read 
performance test.
     add 752d163  [BEAM-7721] Add cleanup to test and change the way of 
reporting metrics
     add 997bc70  [BEAM-7721] Refactor metric gathering
     add 49644d3  Merge pull request #9041: [BEAM-7721] Add a new module with 
BigQuery IO Read performance test
     add 6252b66  Update Dataflow container images used by unreleased (dev) SDK.
     add bd54ac5  Merge pull request #9243 from tvalentyn/patch-59
     add a6a57df  Increase default chunk size for gRPC commit and get data 
streams. The initial choice of chunk size was arbitrary, and there is evidence 
from testing that larger chunks improve performance.
     add 7618f93  [BEAM-7901] Increase gRPC stream chunk sizes - fix test
     add 6159162  [BEAM-7901] Increase default chunk size for gRPC commit and 
get data streams.
     add e1b2022  Adapt Jet Runner page to runner being released now
     add a5c8ae0  Merge pull request #9245: [BEAM-7305] Adapt Jet Runner page 
to runner being released now
     add b47bc95  [BEAM-7777] Wiring up BeamCostModel
     add 5fb50c7  [BEAM-7777] Implementing beamComputeSelfCost for all the rel 
nodes
     add a694fda  Merge pull request #9217 from riazela/BeamCostModel
     add 40936ba  BEAM-7018: Added regex transform on Python SDK.
     add 0a2ddc0  Merge pull request #8859 from mszb/BEAM-7018
     add 3d75784  [BEAM-7845] Install Python SDK using pip instead of 
setuptools.
     add f1996da  Merge pull request #9274 from tvalentyn/patch-60
     add ee0d83d  Update Python 2.7.x restriction
     add dda2061  Merge pull request #9134 from rosetn/patch-5
     add 24e9ced  [BEAM-7833] warn user when --region flag is not explicitly 
set (#9173)
     add 164f249  Update env variable name in sync script
     add b45bc6f  [BEAM-6683] add createCrossLanguageValidatesRunner task
     add 9678149  Merge pull request #8174: [BEAM-6683] add 
createCrossLanguageValidatesRunner task
     add 6fa94c9  [BEAM-7862] Moving FakeBigQueryServices to published 
artifacts (#9206)
     add 7ea9825  [BEAM-7899] Fix Python Dataflow VR tests by specify 
sdk_location
     add ff0f308  Merge pull request #9269: [BEAM-7899] Fix Python Dataflow VR 
tests by specifying sdk_location
     add bc2c6ff  [BEAM-7678] Fixes bug in output element_type generation in Kv 
PipelineVisitor (#9238)
     add 1a3eed5  [BEAM-7389] Add code examples for MapTuple and FlatMapTuple
     add 5e6ec5d5 Merge pull request #9276 from davidcavazos/map-flatmap-tuple
     add 41d6dd9  [BEAM-7915] show cross-language validate runner Flink badge 
on github PR template
     add d9b43fc  Merge pull request #9282 from ihji/BEAM-7915
     add 124e6b6  Update python-pipeline-dependencies.md
     add 28a4057  Merge pull request #9291 from rosetn/patch-8
     add 1d20042  [BEAM-7860] Python Datastore: fix key sort order
     add 0325c36  Merge pull request #9240: [BEAM-7860] Python Datastore: fix 
key sort order
     add eb8a29c  Update python-sdk.md
     add 2230cc9  Merge pull request #9290 from rosetn/patch-7
     add 5504aa7  [BEAM-7918] adding nested row implementation for unnest and 
uncollect (#9288)
     add f792e2e  Add helper functions for reading and writing to PubSub 
directly from Python (#9212)
     add 1bc2dd8  [BEAM-7060] Use typing in type decorators of core.py
     add 67c8e9c  Fully qualify use of internal typehints.
     add 212596e  Merge pull request #9179 [BEAM-7060] Use typing in type 
decorators of core.py
     add c9e5ea8  [BEAM -7741] Implement SetState for Python SDK (#9090)
     add 57d8fd3  [BEAM-7912] Optimize GroupIntoBatches for batch Dataflow 
pipelines.
     add 7848ead  [BEAM-7912] Optimize GroupIntoBatches for batch Dataflow 
pipelines.
     add f1dc92f  [BEAM-7613] HadoopFileSystem can work with more than one 
cluster.
     add 25ac4ef  [BEAM-7613] HadoopFileSystem can work with more than one 
cluster.
     add 0604e04  [BEAM-7924] Failure in Python 2 postcommit: 
crossLanguagePythonJavaFlink
     add 497bc77  Merge pull request #9292: [BEAM-7924] Failure in Python 2 
postcommit: crossLanguagePythonJavaFlink
     add 30b1ff0  [BEAM-7776] Create kubernetes.sh script to use kubectl with 
desired namespace and kubeconfig file
     add 22b97be  [BEAM-7776] Stop using PerfkitBenchmarker in MongoDBIOIT job
     add e1d4403  [BEAM-7776] Move repeatable kubernetes jenkins steps to 
Kubernetes.groovy
     add 0ecd166  Merge pull request #9116: [BEAM-7776] Stop Using Perfkit in 
IOIT
     add 13450c4  [BEAM-6907] Reuse Python tarball in integration tests
     add 1545120  Add missing dependency and source copy in tox test
     add b3c2915  Build sdk tarball before running installGcpTest task
     add b69c81a  [BEAM-6907] Reuse Python tarball in tox & dataflow 
integration tests
     add 936110f  [BEAM-7820]  Add basic hot key detection logging in Worker. 
(#9270)
     add ca050b9  use vendor-bytebuddy in sdks-java-core
     add 3e97543  [BEAM-5822] Use vendored bytebuddy in sdks-java-core
     add 459e270  Add assertArrayCountEqual, which checks if two containers 
have the same elements (#9235)
     add c164fbf  Move DirectRunner specific classes into direct runner package.
     add fffe83a  Merge pull request #9297 Move DirectRunner specific classes 
into direct runner package.
     add 9d131d4  [BEAM-7896] Implementing RateEstimation for KafkaTable with 
Unit and Integration Tests
     add cd2ab9e  Merge pull request #9298 from riazela/KafkaRateEstimation2
     add 6454f72  [BEAM-7874], [BEAM-7873] Distributed FnApiRunner bugfixs 
(#9218)
     add ce3d121  [BEAM-7940] Fix beam_Release_Python_NightlySnapshot
     add 59464cf  Merge pull request #9307: [BEAM-7940] Fix 
beam_Release_Python_NightlySnapshot
     add dfd46d8  [BEAM-7846] add test for BEAM-7689
     add a2b57e3  Merge pull request #9228 from ihji/BEAM-7846
     new 7794f02  Add an empty spark-structured-streaming runner project 
targeting spark 2.4.0
     new 2558c22  Fix missing dep
     new 253123c  Add SparkPipelineOptions
     new ee8875a  Start pipeline translation
     new d7df588  Add global pipeline translation structure
     new 9fea139  Add nodes translators structure
     new a5b9894  Wire node translators with pipeline translator
     new 5990104  Renames: better differenciate pipeline translator for 
transform translator
     new 6f1fda8  Organise methods in PipelineTranslator
     new 8abe116  Initialise BatchTranslationContext
     new 31af0f1  Refactoring: -move batch/streaming common translation visitor 
and utility methods to PipelineTranslator -rename batch dedicated classes to 
Batch* to differentiate with their streaming counterparts -Introduce 
TranslationContext for common batch/streaming components
     new ae2392e  Make transform translation clearer: renaming, comments
     new 83f44e0  Improve javadocs
     new 8875d48d Move SparkTransformOverrides to correct package
     new 6ede65c  Move common translation context components to superclass
     new ff06081  apply spotless
     new bf5619d  Make codestyle and firebug happy
     new 1c7d381  Add TODOs
     new 7612d17  Post-pone batch qualifier in all classes names for readability
     new cfd3abf  Add precise TODO for multiple TransformTranslator per 
transform URN
     new cc07798  Added SparkRunnerRegistrar
     new d75ac4a  Add basic pipeline execution. Refactor translatePipeline() to 
return the translationContext on which we can run startPipeline()
     new b458c64  Create PCollections manipulation methods
     new 8c1a012  Create Datasets manipulation methods
     new 9d721a7  Add Flatten transformation translator
     new 1298b77  Add primitive GroupByKeyTranslatorBatch implementation
     new b0920e3  Use Iterators.transform() to return Iterable
     new 3ae60a4  Implement read transform
     new 238e57a  update TODO
     new 5701c75  Apply spotless
     new 3cb2a76  start source instanciation
     new b8cb742  Improve exception flow
     new 6795e4c  Improve type enforcement in ReadSourceTranslator
     new b57367f  Experiment over using spark Catalog to pass in Beam Source 
through spark Table
     new eb2fa49  Add source mocks
     new bea28b1  fix mock, wire mock in translators and create a main test.
     new cb03179  Use raw WindowedValue so that spark Encoders could work 
(temporary)
     new 44644d3  clean deps
     new 4aa321c  Move DatasetSourceMock to proper batch mode
     new f9ed0dd  Run pipeline in batch mode or in streaming mode
     new 64c8202  Split batch and streaming sources and translators
     new d6e905b  Use raw Encoder<WindowedValue> also in regular 
ReadSourceTranslatorBatch
     new b617ba4  Clean
     new 08c05d6  Add ReadSourceTranslatorStreaming
     new 5b0f9a2  Move Source and translator mocks to a mock package.
     new 9d8dd90  Pass Beam Source and PipelineOptions to the spark DataSource 
as serialized strings
     new ca5f70c  Refactor DatasetSource fields
     new 2b64bd2  Wire real SourceTransform and not mock and update the test
     new ca5a120  Add missing 0-arg public constructor
     new 101f6f2  Use new PipelineOptionsSerializationUtils
     new 02458a7  Apply spotless and fix  checkstyle
     new 9251dcb  Add a dummy schema for reader
     new d12cc14  Add empty 0-arg constructor for mock source
     new 037db6e  Clean
     new bb830be  Checkstyle and Findbugs
     new 02933bd  Refactor SourceTest to a UTest instaed of a main
     new 0ffd98d  Fix pipeline triggering: use a spark action instead of 
writing the dataset
     new 4936687  improve readability of options passing to the source
     new 0e85242  Clean unneeded fields in DatasetReader
     new 625056e  Fix serialization issues
     new 1720e6b  Add SerializationDebugger
     new 00f2e11  Add serialization test
     new 74693b2  Move SourceTest to same package as tested class
     new e54cbc6  Fix SourceTest
     new 23ca155  Simplify beam reader creation as it created once the source 
as already been partitioned
     new 047af3e  Put all transform translators Serializable
     new 3dc4dc3  Enable test mode
     new 3aea53b  Enable gradle build scan
     new fc54404  Add flatten test
     new e0c8fbd  First attempt for ParDo primitive implementation
     new 675bf94  Serialize windowedValue to byte[] in source to be able to 
specify a binary dataset schema and deserialize windowedValue from Row to get a 
dataset<WindowedValue>
     new e4d4b4f  Comment schema choices
     new 84bdf70  Fix errorprone
     new b4230dc  Fix testMode output to comply with new binary schema
     new defd5e6  Cleaning
     new e12226a  Remove bundleSize parameter and always use spark default 
parallelism
     new 1d31831  Fix split bug
     new f47bb0a  Clean
     new d4acf25  Add ParDoTest
     new ae44706  Address minor review notes
     new fe19d6c  Clean
     new 779e621  Add GroupByKeyTest
     new fe7fb4e  Add comments and TODO to GroupByKeyTranslatorBatch
     new 14368ff  Fix type checking with Encoder of WindowedValue<T>
     new a8e50ad  Port latest changes of ReadSourceTranslatorBatch to 
ReadSourceTranslatorStreaming
     new 21902b6  Remove no more needed putDatasetRaw
     new a74d149  Add ComplexSourceTest
     new 1b244e9  Fail in case of having SideInouts or State/Timers
     new 7c5b778  Fix Encoders: create an Encoder for every manipulated type to 
avoid Spark fallback to genericRowWithSchema and cast to allow having Encoders 
with generic types such as WindowedValue<T> and get the type checking back
     new 2f37994  Apply spotless
     new 9bee53f  Fixed Javadoc error
     new 6d723b1  Rename SparkSideInputReader class and rename pruneOutput() to 
pruneOutputFilteredByTag()
     new b7a76d9  Don't use deprecated sideInput.getWindowingStrategyInternal()
     new 3ba4bc6  Simplify logic of ParDo translator
     new 314d935  Fix kryo issue in GBK translator with a workaround
     new bc7fab4  Rename SparkOutputManager for consistency
     new 2482172  Fix for test elements container in GroupByKeyTest
     new 3716673  Added "testTwoPardoInRow"
     new 674a048  Add a test for the most simple possible Combine
     new 08b580b  Rename SparkDoFnFilterFunction to DoFnFilterFunction for 
consistency
     new 4cbdbb8  Generalize the use of SerializablePipelineOptions in place of 
(not serializable) PipelineOptions
     new bfc10a2  Fix getSideInputs
     new d33508e  Extract binary schema creation in a helper class
     new 2c465f8  First version of combinePerKey
     new 4d3be61  Improve type checking of Tuple2 encoder
     new 72d338b  Introduce WindowingHelpers (and helpers package) and use it 
in Pardo, GBK and CombinePerKey
     new 684fc4a  Fix combiner using KV as input, use binary encoders in place 
of accumulatorEncoder and outputEncoder, use helpers, spotless
     new 2a1d74e  Add combinePerKey and CombineGlobally tests
     new f48c109  Introduce RowHelpers
     new 0a88819  Add CombineGlobally translation to avoid translating 
Combine.perKey as a composite transform based on Combine.PerKey (which uses low 
perf GBK)
     new 8d0a8b5  Cleaning
     new 3c25348  Get back to classes in translators resolution because URNs 
cannot translate Combine.Globally
     new b13839d  Fix various type checking issues in Combine.Globally
     new 00acd7d  Update test with Long
     new a72afd8  Fix combine. For unknown reason GenericRowWithSchema is used 
as input of combine so extract its content to be able to proceed
     new a2d1975  Use more generic Row instead of GenericRowWithSchema
     new ac67ada  Add explanation about receiving a Row as input in the combiner
     new 1244549  Fix encoder bug in combinePerkey
     new 383d58d  Cleaning
     new bf2af77  Implement WindowAssignTranslatorBatch
     new 41e6a19  Implement WindowAssignTest
     new 1355ece  Fix javadoc
     new d759a19  Added SideInput support
     new c879337  Fix CheckStyle violations
     new d8ee03e  Don't use Reshuffle translation
     new 530dfb0  Added using CachedSideInputReader
     new cb1a99c  Added TODO comment for ReshuffleTranslatorBatch
     new 2ad1f15  And unchecked warning suppression
     new 81c0bbe  Add streaming source initialisation
     new 4030fb0  Implement first streaming source
     new cb5dffa  Add a TODO on spark output modes
     new ce46b9b  Add transformators registry in PipelineTranslatorStreaming
     new 4527615  Add source streaming test
     new 6e94948  Specify checkpointLocation at the pipeline start
     new 8caa982  Clean unneeded 0 arg constructor in batch source
     new afa6a48  Clean streaming source
     new b7c68bd  Continue impl of offsets for streaming source
     new 79e85ec  Deal with checkpoint and offset based read
     new c68e875  Apply spotless and fix spotbugs warnings
     new 19e5fdf  Disable never ending test SimpleSourceTest.testUnboundedSource
     new c9a8c8c  Fix access level issues, typos and modernize code to Java 8 
style
     new cb72394  Merge Spark Structured Streaming runner into main Spark 
module. Remove spark-structured-streaming module.Rename Runner to 
SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar 
for Structured Streaming Runner
     new 2628393  Fix non-vendored imports from Spark Streaming Runner classes
     new 58f97b8  Pass doFnSchemaInformation to ParDo batch translation
     new 03eb450  Fix spotless issues after rebase
     new 47376e3  Fix logging levels in Spark Structured Streaming translation
     new fc877cd  Add SparkStructuredStreamingPipelineOptions and 
SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added 
to have the new   runner rely only on its specific options.
     new 9354339  Rename SparkPipelineResult to 
SparkStructuredStreamingPipelineResult This is done to avoid an eventual 
collision with the one in SparkRunner. However this cannot happen at this 
moment because it is package private, so it is also done for consistency.
     new d285dd1  Use PAssert in Spark Structured Streaming transform tests
     new d2e65df  Ignore spark offsets (cf javadoc)
     new fdcd346  implement source.stop
     new ab6a879  Update javadoc
     new e4ef07b  Apply Spotless
     new c2896b6  Enable batch Validates Runner tests for Structured Streaming 
Runner
     new 2bfd3d5  Limit the number of partitions to make tests go 300% faster
     new 1be0c8a  Fixes ParDo not calling setup and not tearing down if 
exception on startBundle
     new a94045c  Pass transform based doFnSchemaInformation in ParDo 
translation
     new cc7a52d  Consider null object case on RowHelpers, fixes empty side 
inputs tests.
     new 68e3ae2  Put back batch/simpleSourceTest.testBoundedSource
     new a3e29b4  Update windowAssignTest
     new c23c07e  Add comment about checkpoint mark
     new fed93da  Re-code GroupByKeyTranslatorBatch to conserve windowing 
instead of unwindowing/windowing(GlobalWindow): simplify code, use 
ReduceFnRunner to merge the windows
     new 47b7132  re-enable reduceFnRunner timers for output
     new 28ee71c  Improve visibility of debug messages
     new 14c703f  Add a test that GBK preserves windowing
     new 5dc8c24  Add TODO in Combine translations
     new edaa37f  Update KVHelpers.extractKey() to deal with WindowedValue and 
update GBK and CPK
     new 595d9eb  Fix comment about schemas
     new 960d245  Implement reduce part of CombineGlobally translation with 
windowing
     new 28ba572  Output data after combine
     new 70e3d66  Implement merge accumulators part of CombineGlobally 
translation with windowing
     new 6de9acf  Fix encoder in combine call
     new 4f4744a  Revert extractKey while combinePerKey is not done (so that it 
compiles)
     new f36e5c3  Apply a groupByKey avoids for some reason that the spark 
structured streaming fmwk casts data to Row which makes it impossible to 
deserialize without the coder shipped into the data. For performance reasons 
(avoid memory consumption and having to deserialize), we do not ship coder + 
data. Also add a mapparitions before GBK to avoid shuffling
     new f00e7d5  Fix case when a window does not merge into any other window
     new 4e34632  Fix wrong encoder in combineGlobally GBK
     new 1589877  Fix bug in the window merging logic
     new 7b6c914  Remove the mapPartition that adds a key per partition because 
otherwise spark will reduce values per key instead of globally
     new ad0c179  Remove CombineGlobally translation because it is less 
performant than the beam sdk one (key + combinePerKey.withHotkeyFanout)
     new 569a9cb  Now that there is only Combine.PerKey translation, make only 
one Aggregator
     new 4b5af9c  Clean no more needed KVHelpers
     new 3bd95df  Clean not more needed RowHelpers
     new 7784f30  Clean not more needed WindowingHelpers
     new 9601649  Fix javadoc of AggregatorCombiner
     new f68ed7a  Fixed immutable list bug
     new 8c499a5  add comment in combine globally test
     new 6638522  Clean groupByKeyTest
     new ff50ccb  Add a test that combine per key preserves windowing
     new d29c64e  Ignore for now not working test testCombineGlobally
     new 5ed3e03  Add metrics support in DoFn
     new 51ca79a  Add missing dependencies to run Spark Structured Streaming 
Runner on Nexmark
     new 476bc20  Add setEnableSparkMetricSinks() method
     new a797884  Fix javadoc
     new 6e9ccdd  Fix accumulators initialization in Combine that prevented 
CombineGlobally to work.
     new dab3c2e  Add a test to check that CombineGlobally preserves windowing
     new dc939c8  Persist all output Dataset if there are multiple outputs in 
pipeline Enabled Use*Metrics tests
     new 4aaf456  Added metrics sinks and tests
     new 0e36b19  Make spotless happy
     new cde225a  Add PipelineResults to Spark structured streaming.
     new 3b15128  Update log4j configuration
     new ec43374  Add spark execution plans extended debug messages.
     new 8aafd50  Print number of leaf datasets
     new f8a5046  fixup! Add PipelineResults to Spark structured streaming.
     new 61f487f  Remove no more needed AggregatorCombinerPerKey (there is only 
AggregatorCombiner)
     new 93d425a  After testing performance and correctness, launch pipeline 
with dataset.foreach(). Make both test mode and production mode use foreach for 
uniformity. Move dataset print as a utility method
     new 0cedc7a  Add a TODO on perf improvement of Pardo translation
     new a524036  Improve Pardo translation performance: avoid calling a filter 
transform when there is only one output tag
     new c350188  Use "sparkMaster" in local mode to obtain number of shuffle 
partitions + spotless apply

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (6bf5a93)
            \
             N -- N -- N   refs/heads/spark-runner_structured-streaming 
(c350188)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 208 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .github/PULL_REQUEST_TEMPLATE.md                   |   1 +
 .test-infra/jenkins/Kubernetes.groovy              | 108 ++++++++
 .test-infra/jenkins/README.md                      |  23 ++
 .../job_PerformanceTests_MongoDBIO_IT.groovy       |  83 +++---
 .../jenkins/job_PerformanceTests_Python.groovy     |  15 +-
 ...ommit_CrossLanguageValidatesRunner_Flink.groovy |  43 +++
 ...mit_Python_Chicago_Taxi_Example_Dataflow.groovy |  58 ++++
 .test-infra/kubernetes/kubernetes.sh               |  93 +++++++
 .test-infra/metrics/beamgrafana-deploy.yaml        |  18 +-
 .test-infra/metrics/docker-compose.yml             |  10 +-
 .test-infra/metrics/sync/github/sync.py            |   4 +-
 .test-infra/metrics/sync/jenkins/syncjenkins.py    |  11 +-
 .../org/apache/beam/gradle/BeamModulePlugin.groovy | 187 ++++++++++---
 examples/java/README.md                            |   2 +-
 examples/kotlin/README.md                          |   2 +-
 .../Core Transforms/Map/FlatMapElements/task.html  |   2 +-
 .../java/Core Transforms/Map/MapElements/task.html |   2 +-
 ownership/JAVA_DEPENDENCY_OWNERS.yaml              |  15 --
 runners/core-construction-java/build.gradle        |  21 --
 .../graph/GreedyPCollectionFusers.java             |   4 +-
 .../runners/core/construction/ExternalTest.java    |  47 ++--
 .../expansion/TestExpansionService.java            |  58 ----
 runners/core-java/build.gradle                     |   1 -
 runners/flink/job-server/flink_job_server.gradle   |  30 +--
 runners/google-cloud-dataflow-java/build.gradle    |  20 +-
 .../beam/runners/dataflow/DataflowRunner.java      |  77 ++++++
 .../dataflow/options/DataflowPipelineOptions.java  |   4 +-
 .../beam/runners/dataflow/DataflowRunnerTest.java  |  63 ++++-
 .../worker/DataflowWorkProgressUpdater.java        |  55 +---
 .../beam/runners/dataflow/worker/HotKeyLogger.java |  77 ++++++
 .../dataflow/worker/StreamingDataflowWorker.java   |  23 +-
 .../worker/windmill/GrpcWindmillServer.java        |   8 +-
 .../worker/DataflowWorkProgressUpdaterTest.java    |  71 ++---
 .../worker/DataflowWorkUnitClientTest.java         |   1 +
 .../runners/dataflow/worker/HotKeyLoggerTest.java  | 104 ++++++++
 .../worker/StreamingDataflowWorkerTest.java        |  15 +-
 .../worker/windmill/GrpcWindmillServerTest.java    |   2 +-
 sdks/go/pkg/beam/pardo.go                          |   2 +-
 sdks/go/pkg/beam/runners/dataflow/dataflow.go      |   8 +-
 sdks/java/core/build.gradle                        |   3 +-
 .../apache/beam/sdk/coders/BigEndianLongCoder.java |   2 +-
 .../beam/sdk/coders/BigEndianShortCoder.java       |   2 +-
 .../org/apache/beam/sdk/coders/FloatCoder.java     |   2 +-
 .../apache/beam/sdk/coders/RowCoderGenerator.java  |  44 ++--
 .../main/java/org/apache/beam/sdk/io/AvroIO.java   |   2 +-
 .../java/org/apache/beam/sdk/io/FileBasedSink.java |   5 +-
 .../main/java/org/apache/beam/sdk/io/FileIO.java   |   2 +-
 .../beam/sdk/options/PipelineOptionsFactory.java   |   4 +-
 .../beam/sdk/options/PipelineOptionsReflector.java |  11 +-
 .../beam/sdk/options/ProxyInvocationHandler.java   |   2 +-
 .../beam/sdk/schemas/utils/AutoValueUtils.java     |  40 +--
 .../beam/sdk/schemas/utils/AvroByteBuddyUtils.java |  22 +-
 .../beam/sdk/schemas/utils/ByteBuddyUtils.java     |  54 ++--
 .../beam/sdk/schemas/utils/ConvertHelpers.java     |  24 +-
 .../beam/sdk/schemas/utils/JavaBeanUtils.java      |  30 +--
 .../apache/beam/sdk/schemas/utils/POJOUtils.java   |  42 +--
 .../beam/sdk/transforms/GroupIntoBatches.java      |   5 +
 .../java/org/apache/beam/sdk/transforms/ParDo.java |   4 +-
 .../reflect/ByteBuddyDoFnInvokerFactory.java       |  68 ++---
 .../reflect/ByteBuddyOnTimerInvokerFactory.java    |  32 +--
 .../reflect/StableInvokerNamingStrategy.java       |   4 +-
 .../beam/sdk/transforms/windowing/PaneInfo.java    |   2 +-
 .../org/apache/beam/sdk/io/FileBasedSinkTest.java  |  16 ++
 .../java/org/apache/beam/sdk/io/SimpleSink.java    |   4 +
 .../sdk/options/PipelineOptionsReflectorTest.java  |  20 +-
 .../beam/sdk/transforms/ParDoLifecycleTest.java    |   2 +-
 sdks/java/extensions/sql/build.gradle              |   1 +
 .../sql/meta/provider/hcatalog/HCatalogTable.java  |   7 +
 .../beam/sdk/extensions/sql/BeamSqlTable.java      |   9 +-
 .../sdk/extensions/sql/impl/BeamCalciteTable.java  |   2 +-
 .../extensions/sql/impl/BeamTableStatistics.java   |   2 +-
 .../extensions/sql/impl/CalciteQueryPlanner.java   |  26 +-
 .../extensions/sql/impl/planner/BeamCostModel.java | 253 ++++++++++++++++++
 .../sql/impl/planner/NodeStatsMetadata.java        |   4 +-
 .../sql/impl/rel/BeamAggregationRel.java           |  61 ++++-
 .../sdk/extensions/sql/impl/rel/BeamCalcRel.java   |  30 ++-
 .../sdk/extensions/sql/impl/rel/BeamIOSinkRel.java |   9 +-
 .../extensions/sql/impl/rel/BeamIOSourceRel.java   |  16 +-
 .../extensions/sql/impl/rel/BeamIntersectRel.java  |  28 +-
 .../sdk/extensions/sql/impl/rel/BeamJoinRel.java   |  29 +-
 .../sdk/extensions/sql/impl/rel/BeamMinusRel.java  |  21 +-
 .../sdk/extensions/sql/impl/rel/BeamRelNode.java   |  22 ++
 .../sdk/extensions/sql/impl/rel/BeamSortRel.java   |  15 +-
 .../extensions/sql/impl/rel/BeamSqlRelUtils.java   |  19 ++
 .../extensions/sql/impl/rel/BeamUncollectRel.java  |  20 +-
 .../sdk/extensions/sql/impl/rel/BeamUnionRel.java  |  23 +-
 .../sdk/extensions/sql/impl/rel/BeamUnnestRel.java |  34 ++-
 .../sdk/extensions/sql/impl/rel/BeamValuesRel.java |  10 +-
 .../sql/impl/schema/BeamPCollectionTable.java      |   7 +
 .../sql/meta/provider/bigquery/BigQueryTable.java  |   6 +-
 .../sql/meta/provider/kafka/BeamKafkaTable.java    | 152 ++++++++++-
 .../sql/meta/provider/parquet/ParquetTable.java    |   7 +
 .../meta/provider/pubsub/PubsubIOJsonTable.java    |   7 +
 .../provider/seqgen/GenerateSequenceTable.java     |   7 +
 .../sql/meta/provider/test/TestBoundedTable.java   |   2 +-
 .../sql/meta/provider/test/TestTableProvider.java  |   2 +-
 .../sql/meta/provider/test/TestUnboundedTable.java |   2 +-
 .../sql/meta/provider/text/TextTable.java          |   4 +-
 .../sql/impl/planner/BeamCostModelTest.java        | 103 ++++++++
 .../sql/impl/planner/CalciteQueryPlannerTest.java  |  73 ++++++
 .../extensions/sql/impl/planner/NodeStatsTest.java |  15 ++
 ...rceRelTest.java => BeamAggregationRelTest.java} |  70 +++--
 ...amIOSourceRelTest.java => BeamCalcRelTest.java} |  77 ++++--
 .../sql/impl/rel/BeamEnumerableConverterTest.java  |   6 +
 .../sql/impl/rel/BeamIOSourceRelTest.java          |  43 ++-
 .../sql/impl/rel/BeamIntersectRelTest.java         |  27 ++
 .../impl/rel/BeamJoinRelBoundedVsBoundedTest.java  |  75 ++++++
 .../rel/BeamJoinRelUnboundedVsBoundedTest.java     |  48 ++++
 .../rel/BeamJoinRelUnboundedVsUnboundedTest.java   |  50 +++-
 .../extensions/sql/impl/rel/BeamMinusRelTest.java  |  99 ++++++-
 .../extensions/sql/impl/rel/BeamSortRelTest.java   |  39 ++-
 .../sql/impl/rel/BeamUncollectRelTest.java         | 103 ++++++++
 .../extensions/sql/impl/rel/BeamUnionRelTest.java  |  53 ++++
 .../extensions/sql/impl/rel/BeamUnnestRelTest.java |  71 +++++
 .../extensions/sql/impl/rel/BeamValuesRelTest.java |  24 ++
 .../sql/impl/rule/JoinReorderingTest.java          |   6 +-
 .../meta/provider/bigquery/BigQueryRowCountIT.java |   6 +-
 .../meta/provider/bigquery/BigQueryTestTable.java  |   4 +-
 .../meta/provider/kafka/BeamKafkaCSVTableTest.java | 118 ++++++++-
 .../sql/meta/provider/kafka/KafkaCSVTableIT.java   | 292 +++++++++++++++++++++
 .../sql/meta/provider/kafka/KafkaCSVTestTable.java | 197 ++++++++++++++
 .../sql/meta/provider/kafka/KafkaTestRecord.java   |  39 +++
 sdks/java/io/bigquery-io-perf-tests/build.gradle   |  40 +++
 .../BigQueryIOReadPerformanceIT.java               | 203 ++++++++++++++
 .../sdk/io/gcp/bigquery/BigQueryAvroUtils.java     |   2 +-
 .../beam/sdk/io/gcp/bigquery/BigQueryHelpers.java  |   6 +-
 .../beam/sdk/io/gcp/bigquery/BigQueryIO.java       |  13 +-
 .../beam/sdk/io/gcp/bigquery/BigQueryServices.java |   4 +-
 .../beam/sdk/io/gcp/bigquery/BigQueryUtils.java    |   6 +
 .../sdk/io/gcp/testing}/FakeBigQueryServices.java  |  14 +-
 .../sdk/io/gcp/testing}/FakeDatasetService.java    |   5 +-
 .../beam/sdk/io/gcp/testing}/FakeJobService.java   |   7 +-
 .../beam/sdk/io/gcp/testing}/TableContainer.java   |   2 +-
 .../apache/beam/sdk/io/gcp/GcpApiSurfaceTest.java  |   2 +-
 .../sdk/io/gcp/bigquery/BigQueryIOReadTest.java    |   3 +
 .../gcp/bigquery/BigQueryIOStorageQueryTest.java   |   5 +-
 .../io/gcp/bigquery/BigQueryIOStorageReadTest.java |   4 +-
 .../sdk/io/gcp/bigquery/BigQueryIOWriteTest.java   |   3 +
 .../beam/sdk/io/gcp/datastore/DatastoreV1Test.java |  12 +-
 .../apache/beam/sdk/io/hdfs/HadoopFileSystem.java  |  84 +++---
 .../sdk/io/hdfs/HadoopFileSystemRegistrar.java     |  49 +++-
 .../beam/sdk/io/hdfs/HadoopFileSystemTest.java     |  12 +-
 sdks/java/io/kinesis/build.gradle                  |   2 +-
 sdks/java/testing/expansion-service/build.gradle   |  52 ++++
 .../beam/sdk/expansion/TestExpansionService.java   | 101 +++++++
 sdks/python/apache_beam/coders/avro_coder.py       |  99 -------
 sdks/python/apache_beam/coders/avro_coder_test.py  |  71 -----
 sdks/python/apache_beam/coders/avro_record.py      |  38 +++
 sdks/python/apache_beam/coders/coder_impl.py       |  23 ++
 sdks/python/apache_beam/coders/coders.py           |  41 ++-
 sdks/python/apache_beam/coders/coders_test.py      |  43 +++
 .../apache_beam/coders/coders_test_common.py       |   1 +
 .../snippets/transforms/element_wise/flat_map.py   |  27 ++
 .../transforms/element_wise/flat_map_test.py       |  71 ++---
 .../snippets/transforms/element_wise/map.py        |  23 ++
 .../snippets/transforms/element_wise/map_test.py   |  71 ++---
 .../transforms/element_wise/with_timestamps.py     |  22 ++
 .../element_wise/with_timestamps_test.py           | 110 ++++----
 .../python/apache_beam/examples/wordcount_xlang.py |   2 +-
 .../apache_beam/io/external/generate_sequence.py   |   5 +-
 .../io/external/generate_sequence_test.py          |  64 +++++
 .../io/external/xlang_parquetio_test.py            |  88 +++++++
 sdks/python/apache_beam/io/filebasedsink_test.py   |   8 +
 .../apache_beam/io/gcp/datastore/v1/helper.py      |   3 +-
 .../io/gcp/datastore/v1/query_splitter_test.py     |  11 +-
 .../io/gcp/datastore/v1new/query_splitter.py       |  63 ++++-
 .../io/gcp/datastore/v1new/query_splitter_test.py  |  51 +++-
 .../apache_beam/io/gcp/datastore/v1new/types.py    |  36 ++-
 .../io/gcp/datastore/v1new/types_test.py           |  26 ++
 .../apache_beam/io/gcp/tests/bigquery_matcher.py   |   4 +-
 sdks/python/apache_beam/io/gcp/tests/utils.py      |  57 +++-
 sdks/python/apache_beam/io/gcp/tests/utils_test.py | 200 +++++++++++++-
 .../python/apache_beam/options/pipeline_options.py |  18 +-
 sdks/python/apache_beam/pipeline.py                |   6 +-
 sdks/python/apache_beam/pipeline_test.py           |  34 +++
 .../apache_beam/runners/dataflow/internal/names.py |   4 +-
 .../apache_beam/runners/direct/direct_userstate.py | 134 +++++++++-
 .../runners/portability/expansion_service_test.py  |  50 ++--
 .../runners/portability/fn_api_runner.py           | 115 ++++----
 .../runners/portability/local_job_service.py       |   9 +-
 .../apache_beam/runners/worker/bundle_processor.py |  74 +++++-
 .../chicago_taxi}/__init__.py                      |   0
 .../testing/benchmarks/chicago_taxi/preprocess.py  | 258 ++++++++++++++++++
 .../benchmarks/chicago_taxi/process_tfma.py        | 191 ++++++++++++++
 .../benchmarks/chicago_taxi/requirements.txt       |  25 ++
 .../testing/benchmarks/chicago_taxi/run_chicago.sh | 192 ++++++++++++++
 .../testing/benchmarks/chicago_taxi/setup.py       |  41 +++
 .../chicago_taxi/tfdv_analyze_and_validate.py      | 225 ++++++++++++++++
 .../chicago_taxi/trainer}/__init__.py              |   0
 .../benchmarks/chicago_taxi/trainer/model.py       | 164 ++++++++++++
 .../benchmarks/chicago_taxi/trainer/task.py        | 189 +++++++++++++
 .../benchmarks/chicago_taxi/trainer/taxi.py        | 186 +++++++++++++
 .../python/apache_beam/testing/extra_assertions.py |  64 +++++
 .../apache_beam/testing/extra_assertions_test.py   |  71 +++++
 .../apache_beam/testing/load_tests/build.gradle    |   2 +-
 sdks/python/apache_beam/testing/test_pipeline.py   |   3 +
 sdks/python/apache_beam/transforms/combiners.py    |   4 +-
 sdks/python/apache_beam/transforms/core.py         |  60 +++--
 .../python/apache_beam/transforms/external_test.py |  72 +++--
 sdks/python/apache_beam/transforms/trigger.py      |  16 ++
 sdks/python/apache_beam/transforms/userstate.py    | 110 +++-----
 .../apache_beam/transforms/userstate_test.py       | 156 ++++++++++-
 sdks/python/apache_beam/transforms/util.py         | 222 ++++++++++++++++
 sdks/python/apache_beam/transforms/util_test.py    | 280 ++++++++++++++++++++
 .../typehints/trivial_inference_test.py            |  21 --
 .../typehints/trivial_inference_test_py3.py        |  50 ++++
 sdks/python/build.gradle                           |  34 ++-
 sdks/python/container/build.gradle                 |   2 +-
 sdks/python/container/py3/build.gradle             |   2 +-
 sdks/python/container/run_validatescontainer.sh    |   1 +
 sdks/python/scripts/generate_pydoc.sh              |   3 +
 sdks/python/scripts/run_expansion_services.sh      | 136 ++++++++++
 sdks/python/scripts/run_integration_test.sh        |   1 +
 sdks/python/scripts/run_pylint.sh                  |  32 ++-
 sdks/python/test-suites/dataflow/py2/build.gradle  |  37 ++-
 sdks/python/test-suites/dataflow/py35/build.gradle |  17 +-
 sdks/python/test-suites/dataflow/py36/build.gradle |  17 +-
 sdks/python/test-suites/dataflow/py37/build.gradle |  21 +-
 sdks/python/test-suites/portable/py2/build.gradle  |  10 +-
 sdks/python/tox.ini                                |   6 +-
 settings.gradle                                    |   2 +
 website/src/contribute/design-documents.md         |   1 +
 website/src/contribute/release-guide.md            |   7 +-
 website/src/contribute/runner-guide.md             |  26 +-
 website/src/documentation/io/built-in.md           |   4 +-
 .../patterns/file-processing-patterns.md           |   2 +-
 website/src/documentation/runners/jet.md           |  41 +--
 .../src/documentation/sdks/python-dependencies.md  |  39 +++
 .../sdks/python-pipeline-dependencies.md           |   2 +-
 .../src/documentation/transforms/python/index.md   |   2 +-
 .../transforms/python/other/reshuffle.md           |   2 +-
 website/src/get-started/quickstart-py.md           |   2 +-
 website/src/roadmap/python-sdk.md                  |   2 +-
 233 files changed, 8204 insertions(+), 1473 deletions(-)
 create mode 100644 .test-infra/jenkins/Kubernetes.groovy
 create mode 100644 
.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Flink.groovy
 create mode 100644 
.test-infra/jenkins/job_PostCommit_Python_Chicago_Taxi_Example_Dataflow.groovy
 create mode 100755 .test-infra/kubernetes/kubernetes.sh
 delete mode 100644 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/expansion/TestExpansionService.java
 create mode 100644 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/HotKeyLogger.java
 create mode 100644 
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/HotKeyLoggerTest.java
 create mode 100644 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamCostModel.java
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamCostModelTest.java
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/planner/CalciteQueryPlannerTest.java
 copy 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/{BeamIOSourceRelTest.java
 => BeamAggregationRelTest.java} (60%)
 copy 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/{BeamIOSourceRelTest.java
 => BeamCalcRelTest.java} (55%)
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRelTest.java
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnnestRelTest.java
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaCSVTableIT.java
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaCSVTestTable.java
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTestRecord.java
 create mode 100644 sdks/java/io/bigquery-io-perf-tests/build.gradle
 create mode 100644 
sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOReadPerformanceIT.java
 rename 
sdks/java/io/google-cloud-platform/src/{test/java/org/apache/beam/sdk/io/gcp/bigquery
 => main/java/org/apache/beam/sdk/io/gcp/testing}/FakeBigQueryServices.java 
(87%)
 rename 
sdks/java/io/google-cloud-platform/src/{test/java/org/apache/beam/sdk/io/gcp/bigquery
 => main/java/org/apache/beam/sdk/io/gcp/testing}/FakeDatasetService.java (98%)
 rename 
sdks/java/io/google-cloud-platform/src/{test/java/org/apache/beam/sdk/io/gcp/bigquery
 => main/java/org/apache/beam/sdk/io/gcp/testing}/FakeJobService.java (98%)
 rename 
sdks/java/io/google-cloud-platform/src/{test/java/org/apache/beam/sdk/io/gcp/bigquery
 => main/java/org/apache/beam/sdk/io/gcp/testing}/TableContainer.java (97%)
 create mode 100644 sdks/java/testing/expansion-service/build.gradle
 create mode 100644 
sdks/java/testing/expansion-service/src/test/java/org/apache/beam/sdk/expansion/TestExpansionService.java
 delete mode 100644 sdks/python/apache_beam/coders/avro_coder.py
 delete mode 100644 sdks/python/apache_beam/coders/avro_coder_test.py
 create mode 100644 sdks/python/apache_beam/coders/avro_record.py
 create mode 100644 
sdks/python/apache_beam/io/external/generate_sequence_test.py
 create mode 100644 sdks/python/apache_beam/io/external/xlang_parquetio_test.py
 copy sdks/python/apache_beam/testing/{load_tests => 
benchmarks/chicago_taxi}/__init__.py (100%)
 create mode 100644 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/preprocess.py
 create mode 100644 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/process_tfma.py
 create mode 100644 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/requirements.txt
 create mode 100755 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/run_chicago.sh
 create mode 100644 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/setup.py
 create mode 100644 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/tfdv_analyze_and_validate.py
 copy sdks/python/apache_beam/testing/{load_tests => 
benchmarks/chicago_taxi/trainer}/__init__.py (100%)
 create mode 100644 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/model.py
 create mode 100644 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/task.py
 create mode 100644 
sdks/python/apache_beam/testing/benchmarks/chicago_taxi/trainer/taxi.py
 create mode 100644 sdks/python/apache_beam/testing/extra_assertions.py
 create mode 100644 sdks/python/apache_beam/testing/extra_assertions_test.py
 create mode 100644 
sdks/python/apache_beam/typehints/trivial_inference_test_py3.py
 create mode 100755 sdks/python/scripts/run_expansion_services.sh

[beam] branch spark-runner_structured-streaming updated (6bf5a93 -> c350188)

Reply via email to