See <https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/2/display/redirect?page=changes>
Changes: [joey.baruch] Add javadoc to ConsoleIO [herohde] Initial sketches of a Go SDK [herohde] Initial version of the direct style w/ direct runner. Incomplete. [herohde] Add Data as UserFn context w/ immediate value. [herohde] Added no-I/O wordcount for profiling. [herohde] Fleshed out possible approach to generic transformations. [herohde] Add “dag” example that use multiplexing and side input. [herohde] Added a more complex DAG example. [herohde] Add yatzy example with more complex construction-time setup [herohde] Add proto for Fn API [herohde] Add beam.Composite helper for the most common pattern to align with java [herohde] Move pipeline-construction time errors into an accumulator [herohde] Add Dataflow job and Fn API clients. Incomplete. [herohde] Add binary cross-compile and upload to Dataflow runner. Incomplete. [herohde] Add tentative runner indirection (default: local). [herohde] Made data flow runner detect user main for cross-compilation. [herohde] Remove error accumulation in favor of panic. [herohde] Improve Dataflow translation of coders, side input and composite names. [herohde] Fix name for AsView. [herohde] Use 2 grpc endpoints in harness [herohde] Add gRPC harness logging [herohde] Flesh out harness and serialization further. [herohde] Made the dataflow runner wait for job termination by default [herohde] beam: [herohde] beam: [herohde] combinefn.go: fix compilation issues [herohde] Improve dataflow serialization and execution. Incomplete. [herohde] Sleep 30 sec in wordcap to allow logs to propagate to Cloud Logging. [herohde] Move the 30s sleep for logging to the harness instead of in WordCap. [herohde] Post-review updates. [herohde] Doc updates. [herohde] Flesh out coders. Incomplete. [herohde] Added prototype implementation of more coders and the runner source. [herohde] dofn: illustrates how dofns are written. [herohde] beam: add viewfn and windowfn to side inputs match support Beam 1.0 [herohde] dofn: timers [herohde] Complete revamp: coders, graph and execution use element-wise [herohde] Fix coder encoding for Dataflow side input. Otherwise, the job is [herohde] Added more godoc comments to graph types. [herohde] Added more comments plus made local GBK use coder equality. [herohde] Added Flatten support and “forest” example that uses it. [herohde] Move bigqueryio to defunct [herohde] Make forest example print less [herohde] Add external struct registry and serialization. [herohde] Updated comments in node.go. [herohde] Replace real type with 'full type' since that's the current term. [herohde] Refactor Fn API dependency. [herohde] Added more comments to the runner/dataflow and runner/beamexec packages [herohde] Fix most go vet issues [herohde] Make core operations panic to cut down on the error propagation [herohde] Add more comments to the graph package. [herohde] Add DoFn wrapper to handle either function or (ptr to) struct [herohde] Fix remaining go vet warnings. [herohde] Code review for beam/graph/coder package. [herohde] Code review of the runtime/graphx package. [herohde] Remove Data options in favor of using a Fn struct [herohde] Code review of the beam/graph/userfn package. [herohde] Code review for beam/graph package. [herohde] godoc for runtime/graphx [herohde] Add support for []T and Combine functions [herohde] Add adapted documentation from the Java SDK to the beam package [herohde] Update snapshot of Fn API. [herohde] Add experiments flag to the Dataflow runner [herohde] Remove context arg from beamexec.Init [herohde] Migration to Runner API. [herohde] Add support for creating DOT graphs. [herohde] Make pretty printing of types and coders more concise [herohde] Add flexible Signature to aid type checking [herohde] Adding unit testability to harness translation. [herohde] Fix crash due to initialization order [herohde] Add CreateValues and Impulse [herohde] Add Runner API support for WindowingStrategy. [herohde] Run goimports on baseline. [herohde] Fix encoding of global window strategy. [herohde] Ensure the windowed value is atomically encoded. [herohde] Limit gRPC messages to max size. [herohde] Developer conveniences for running jobs. [herohde] Fix sends to not close the network channel. [herohde] Add re-iterable side input [herohde] Add per-key Combine [herohde] Add Min [herohde] Reorganize non-user-facing code into core [herohde] Make type register reject unnamed or predeclared types [herohde] Add type specialization tool [herohde] Don't run grpc plugin in generate phase. [herohde] Fix import reference path for runner API proto. [herohde] Revamp runner registration as _ imports [herohde] Add stats.Max and Mean [herohde] Add global pipeline options [herohde] Unify global and per-key combiners [herohde] Add beam convenience wrapper for imports and runner selection [herohde] Add session recording and CPU profiling to harness. [herohde] Add ptest and passert for testing pipelines [herohde] Add GCS and glob support to textio [herohde] Add BigQuery IO and examples [herohde] Adds a session runner for testing. [herohde] Add Partition and dynamic functions [herohde] Adding example that returns 10 words that contain provided search [herohde] Remove duplicate LOG line [herohde] Enable Combine Fns in Dataflow runner by modifying translation. [herohde] Fixing type bug by dropping T and using underlying type of value in [herohde] Adding graph validation at build time. [herohde] Import the Fn API changes. [herohde] Simple changes to support new Fn API coder changes. [herohde] Update translator to work with new Fn API changes. [herohde] Use appropriate equality tests. [herohde] Fix test to not use path of package. [herohde] Renaming directory to match package name. [herohde] Fixing random nits in comments. [herohde] Modify build command to avoid bash. [herohde] Fixing selected golint issues. [herohde] Addressing import review comments. [herohde] Add coder specialization for bytes/strings. [herohde] Adding unit tests to stats. [herohde] Fixing typo. [herohde] Add beam.External [herohde] Fix grpc.Dial calls to block properly. [herohde] Creates a symtab verifier by running Sym2Addr and Addr2Sym in a binary. [herohde] Add spec field to help interpretation of payload. [herohde] Use beam.T alias for typex.T etc with Go 1.9 [herohde] Move shared GCP options to a separate package [herohde] Update portability protos [herohde] Remove old source/sink from beam package [herohde] Add context-aware logging for both pipeline-submission time and runtime [herohde] Fix coder inference for strings. [herohde] Improve tornadoes example [herohde] Fix beam.External to map arguments correctly. [herohde] Added comments to yatzy and forest [herohde] Add comments to tornadoes from the java counterpart [herohde] Rename Pipeline Composite to Scope [herohde] Add 3 progressive wordcount examples [herohde] Clarify comments in wordcount pipelines [herohde] Add apache 2.0 license to files [herohde] Updates to examples. [herohde] Adding more godoc for the main beam package. [herohde] Update to new proto structure [herohde] Split Combine and fields in to global and per-key variants [herohde] Refactor Flatten of a single []T into Explode [herohde] Rename local runner to direct runner [herohde] Fix argument index error in ParDo execution [herohde] Add Apache copyright header to files that need it. [herohde] Made debug.Head not just work per bundle [herohde] Impose a total ordering on Fn parameters. [herohde] Rename Dedup to Distinct for consistency with other SDKs [herohde] Add coder to model coder translation [herohde] Simplify harness coder translation [herohde] Split Pipeline into Pipeline and Scope [herohde] Relocate Go SDK code [herohde] Fix Go SDK maven build [herohde] Move Go SKD to latest version of bigquery [herohde] Add Go SDK container image [herohde] Add Go SDK README [herohde] Update version for Go Dataflow pipelines [herohde] Make Scope a value type [herohde] Add Go graph/pipeline translation [herohde] Stage Go model pipeline for Dataflow [herohde] Use pipeline unmarhaller in runtime harness [herohde] CR: [BEAM-3287] Use model pipelines in Go SDK [herohde] CR: [BEAM-3287] Use model pipelines in Go SDK [herohde] Fix name of syscallx ErrUnsupported [herohde] Allow any named type to be registered and serialized as external [herohde] Add more package comments for core packages [herohde] Make Go SDK External a graph primitive [herohde] Cache Go runtime symbol lookups [github] Fix code comment to match code [wcn] Fix storagePath variable. [wcn] BEAM-3368 fix translation for external [robertwb] [BEAM-3356] Add Go SDK int and varint custom coders (#4276) [lcwik] BEAM-3361 Increase Go gRPC message size [herohde] Go SDK runtime revamp [lcwik] Add a few function call overhead benchmarks [lcwik] Add type-specialized emitters [lcwik] BEAM-3324 improve symtab memory usage [lcwik] BEAM-3324 improve symtab memory usage [lcwik] BEAM-3324 improve symtab memory usage [lcwik] Store objects in pool so they can be reused. [lcwik] Add builtin varint coder [herohde] Type-specialize custom decoders and encoders in Go SDK runtime [herohde] Type-specialize iterators and side input in the Go SDK [herohde] Add warnings if Go runtime registrations are overwritten [herohde] Add reusable element coders in Go SDK runtime [wcn] Updated translater to preserve payload and its URN. [github] NotImplementedErrror --> NotImplementedError [herohde] Initial version of type-specialized general reflective calls [herohde] Add general-purpose untyped callers in Go SDK runtime [herohde] Use fast caller for filter transform predicate [abhyankar] Add support for ValueProvider in JdbcIO.DataSourceConfiguration methods [herohde] CR: Clarified comment on caller template [robertwb] Remove legacy windowfn translation. [herohde] Fix value encoding for Create [github] Update BUILD.md [lcwik] BEAM-3473: Fix GroupByKey iterators to be initialized. [lcwik] BEAM-3474 Include stacks in panic messages. [shashank] fix serialization error in BigQueryIO's DynamicDestinations [shashank] correct side input check in BigQueryIO DynamicDestination for pull [lcwik] BEAM-3299: Add source reporting support. [lcwik] Remove GetId() call from under lock. [lcwik] Add additional comments about concurrency invariants. [lcwik] Add initialization of active plans map. [zoy] Increasing BatchElements's max_batch_size to 10K [robertwb] Curry CombineFn arguments into runner API protos. [batbat] Added an example pipeline that uses stateful processing to output team [markliu] [BEAM-2762] Python code coverage report in Postcommit [markliu] fixup! Exclude auto-generated files in portability/api [markliu] fixup! Clean up configuration and use better way to exclude cover in mvn [markliu] fixup! Add more comments [wcn] Update generated version of Fn API code. [Pablo] Tracking of time spent reading side inputs, and bytes read in Dataflow. [Pablo] Fix lint issues [Pablo] Fixing counter names [Pablo] Using comments to improve changes [Pablo] Placing feature behind an experiment flag. [Pablo] Fixing lint issue [Pablo] Rebasing [Pablo] Added cythonization and test [Pablo] Fixing lint issue [Pablo] Modifying the experiment flag check to reduce performance impact. [Pablo] Addressing comments. Profile pending. [Pablo] Reducing impact - Tracking only Bytes if experiment is active. [tgroh] Add a BundleProcessor to SdkHarnessClient [lcwik] Renamed Go runtime Caller to Func and added name [lcwik] Use reflectx.Func as the fundamental function representation [lcwik] CR: fix DynFn comments [lcwik] CR: fix comments [herohde] Avoid reflect.Value conversions in Go runtime [robertwb] [BEAM-3490] Picklable side inputs for FnApi Workers. [robertwb] Manually specify direct runner for global-state modifying tests. [robertwb] FakeSource requires direct runner. [robertwb] Explicitly use DirectRunner in DirectRunner tests. [markliu] fixup! Improve comments [altay] Disable combiner lifting optimization in DataflowRunner for fnapi [wcn] Allow grpcx.Dial to support overrides. [jb] [BEAM-3507] Add a way to specify the batch size in JdbcIO Write [jbonofre] [BEAM-3507] Add DEFAULT_BATCH_SIZE and use batchSize in default [iemejia] [BEAM-3492] Fix spark 2.10 dependency leak on hadoop tests [iemejia] [BEAM-3492] Force netty version when running with the spark runner [lcwik] [BEAM-3427] Update build to Java 8 (before clean out) (#4424) [sduskis] [BEAM-3412] Upgrade Cloud Bigtable to 1.0.0 The 1.0.0 Cloud Bigtable [sduskis] BigtableServiceImplTest now uses a List of ranges. The PR did not [sduskis] Updating Cloud Bigtable dependnecy to 1.0.0 in build.gradle. [ekirpichov] Introduces the Wait transform [github] [BEAM-2963] Remove layer of indirection in output name mapping in [ehudm] Pass PipelineOptions to FileSystem constructor. [altay] Disable combiner lifting when only the streaming flag is set. [Pablo] Addressing comments [Pablo] Improved IO Target documentation [tgroh] Register Environments in SdkComponents [Pablo] Improving documentation [tgroh] Move BeamFnDataInboundObserver to java-fn-execution [tgroh] Add an implementation of the Fn API Data Service [robertwb] Document DirectRunnerOnly tests. [rober] Use a typeswitch instead of reflect.Convert when encoding strings or [robert] Add CONTRIBUTING.md [lcwik] [BEAM-3008] Adds parameters templatization for Bigtable (#4357) [lcwik] Migrate Flink ValidatesRunner to Gradle [lcwik] Increment the Dataflow runner major version to 7. [zhouhai02] Update copyright date in NOTICE [sduskis] Using 1.0.0-pre3 for bigtable-proto. This should allow Cloud Bigtable's [sduskis] Updating build.gradle with a bigtable_proto_version of 1.0.0-pre3 [github] Update coder.go [robertwb] [BEAM-3126] Adding a new Flatten test to Python SDK. (#4463) [rober] Update printed gcloud cancel commands to include the job's region. [sduskis] Fixing a bad merge in BigtableServiceImpl. [kirpichov] Code compiles after auto-transition to lambda [kirpichov] google-java-format [kirpichov] Removes unnecessary explicit type arguments [kirpichov] google-java-format [kirpichov] checkstyle fixups [kirpichov] More removal of explicit type arguments [kirpichov] google-java-format [kirpichov] Manually fixed a couple cases of bad comment formatting [kirpichov] Manual fixup of some call sites where lambdas mess up coder inference [kirpichov] A couple of final example fixups [lcwik] Upgrade Jenkins jobs to use Maven 3.5.2 [iemejia] [BEAM-3432] Remove hadoop/jdk1.8-tests module [lcwik] [BEAM-2728] Add Count-Min Sketch in sketching extension [lcwik] Small fix in SketchCoder [lcwik] Make Sketch AutoValue + Javadoc update [lcwik] Optimize coder memory use [lcwik] [BEAM-3160] Fix issue where we would choose which coder to use [lcwik] Initial post-release snapshot test [lcwik] Make the snapshot and url parameters to the job to allow installing a [lcwik] Make the snapshot and url parameters to the job to allow installing a [lcwik] Rename TestHelper to TestScripts, it() to intent() [chamikara] [BEAM-3060] Support for Perfkit execution of file-based-io-tests on HDFS [Pablo] Addressing comments. [jbonofre] [BEAM-3428] Merge Java8 examples in "main" Java examples [jbonofre] [BEAM-3428] Replace MinimalWordCount with Java8 one [echauchot] [BEAM-3534] Add a spark validates runner test for metrics sink in [coheigea] BEAM-3533 - Replace hard-coded UTF-8 Strings [tgroh] Retrieve Environments from PTransforms [jbonofre] [BEAM-3466] Remove JDK 7 references in Jenkins [kirpichov] [BEAM-3083] Do not call getSchema() and getTable() on every element [altay] Use non-deprecated version of Futures.transform [aromanenko.dev] [BEAM-3539] BigtableIO.Write javadoc of some methods is incorrect [iemejia] Remove unneeded profile for javadoc on Java 8 [iemejia] Remove unneeded explicit Java 8 references on maven-compiler-plugin [iemejia] Fix doc error on hadoop-input-format ITs after move to Java 8 only tests [iemejia] Remove references to non-existent examples:java8 module in gradle [iemejia] Remove references to java 7/8 only examples from the README [iemejia] Remove some comments on Java 7/8 only stuff that don't make sense [kenn] Add a test for an event time timer loop in ParDo [lcwik] [BEAM-2273] Cleanup examples Maven Archetype to copy in a clean state [XuMingmin] [BEAM-3526] KakfaIO support for finalizeCheckpoint() (#4481) [kedin] Add Avro dependency to KafkaIO [rangadi] Remove an unused test dependency in KafkaIO. [lcwik] [Beam-2500] Add S3FileSystem to SDKs/Java/IO [lcwik] implement serializing AWS credentials provider [lcwik] fixup! Clarify error message is received from SDK harness [iemejia] Fix modules that were activated only on Java 8 profile [robert] Replace reflective convert to direct convert. [iemejia] [BEAM-3275] Fix ValidatesRunner Spark runner after the Kafka update [iemejia] Refactor code into idiomatic Java 8 style [iemejia] Fix missing gearpump module activated only on Java 8 profile [iemejia] Add missing amazon-web-services module from javadoc [jbonofre] [BEAM-2271] Exclude files not intended to be in the releases source [jbonofre] Typo fix in MinimalWordCount example [Mottyg1] [BEAM-675] Introduce message mapper in JmsIO [iemejia] Update maven-shade-plugin version to 3.1.0 [iemejia] Update maven-compiler-plugin version to 3.7.0 [iemejia] Update maven-dependency-plugin version to 3.0.2 [iemejia] Update maven-surefire-plugin version to 2.20.1 [iemejia] Update maven-failsafe-plugin version to 2.20.1 [iemejia] Update maven-assembly-plugin version to 3.1.0 [iemejia] Update versions-maven-plugin version to 2.5 [iemejia] Update findbugs-maven-plugin version to 3.0.5 [iemejia] Update license-maven-plugin version to 1.14 [iemejia] Update jacoco-maven-plugin version to 0.7.9 [iemejia] Update dockerfile-maven-plugin version to 1.3.6 [iemejia] Add maven-enforcer-plugin version to pluginManagement [iemejia] Fix warning on using directly parent.version on Nexmark [iemejia] Remove warnings on repeated maven-jar-plugin invocation / deps [iemejia] Remove warning on defining <prerequisites> for non-maven plugin project [iemejia] Update parent POM to version 19 [iemejia] Remove repeated dependency in hadoop-input-format module [mariagh] Support argparse-style choices for ValueProvider [jbonofre] [BEAM-3551] Add -parameters flag to the compiler [jbonofre] [BEAM-3551] Align gradle java compile task options with the [coheigea] BEAM-3560 - Switch to use BigInteger/BigDecimal.ZERO/ONE/TEN [lcwik] [BEAM-2500, BEAM-3249] Add amazon-web-services gradle build rules. [jbonofre] [maven-release-plugin] prepare branch release-2.3.0 [jbonofre] [maven-release-plugin] prepare for next development iteration [jbonofre] Update Python SDK version post release [iemejia] Fix dependency order error, harness must be built before container [robertwb] [BEAM-3537] Allow more general eager in-process pipeline execution [lcwik] [BEAM-3550] Add --awsServiceEndpoint option and use with S3 filesystem. [lcwik] Fixes for sdks/java/core for the eclipse compiler [lcwik] Add some m2e lifecycle things to get the various maven plugins running [lcwik] Set the proper package for the Snippets example so eclipse won't raise [lcwik] Add some template args and direct casts to help the eclipse compiler [lcwik] Incorporate reviews and rebase on latest master [lcwik] Return no environment more often [Pablo] Logging deviation from sampling expectation. This will allow to track [kedin] [SQL] Refactor Variance [robertwb] [BEAM-3490] Make runtime type checking code runner agnostic. [kedin] [Nexmark][SQL] Implement sql query 3 [github] [BEAM-3557] Sets parent pointer of AppliedPTransform objects correctly [pawel.pk.kaczmarczyk] [BEAM-2469] Handling Kinesis shards splits and merges [Pablo] Adding a static getter for RuntimeValueProvider. [tgroh] Add CoderTranslatorRegistrar [tgroh] Add slf4j_simple to the top level Gradle build [tgroh] Implement FnService in FnApiControlClientPoolService [tgroh] Add a Timeout to GrpcDataService#send [tgroh] Use a Data Service in SdkHarnessClient [github] get the query from configuration not options [XuMingmin] [BEAM-3525] Fix KafkaIO metric (#4524) [chamikara] Updates PTransform overriding to create a new AppliedPTransform object [dariusz.aniszewski] use build $WORKSPACE as pkb temp_dir and update pip and setuptools in [iemejia] [BEAM-3578] SQL module build breaks because of missing dependency [Pablo] Renaming the ZERO element of DistributionResult to be IDENTITY_ELEMENT. [kenn] google-java-format [kenn] Fix Distinct null pointer error with speculative triggers [kenn] Move TestCountingSource to appropriate location [robertwb] Direct runner fixes. [lcwik] [BEAM-2926] Add support for side inputs to the runner harness. [kenn] Sickbay ApexRunner gradle WordCountIT [kenn] Sickbay flakey KinesisReaderTest [Pablo] Addressing comments. [rober] Fix beam.Combine to combine globally [ehudm] Split out buffered read and write code from gcsio. [github] Removing unnecessary code. [lcwik] [BEAM-3249] Make sure that all java projects package tests. Also package [lcwik] [BEAM-3249] Do not assume build directory is within build/, use the [github] Fix undefined names: exc_info --> self.exc_info [github] import logging for line 1163 [iemejia] [BEAM-3592] Fix spark-runner profile for Nexmark after move to Spark 2.x [dkulp] [BEAM-3562] Update to Checkstyle 8.7 [klk] Encourage a good description in a good spot on a PR description. [lcwik] Change info to debug statement [cclauss] global INT64_MAX, INT64_MIN to placate linters [tgroh] Add QueryablePipeline [gene] Changing FileNaming to public to allow for usage in lambdas/inheritance [robertwb] [BEAM-3207] Create a standard location to enumerate and document URNs. [cclauss] xrange() was removed in Python 3 (en masse) [robertwb] Reduce the flakiness of the state sampler progress metrics. [robertwb] Revert URNs that are currently hard-coded in the Dataflow worker. [herohde] Add optional function registration to Go SDK runtime [kedin] [SQL] Inherit windowing strategy from the input in Aggregate operation [jbonofre] [BEAM-3551] Define compiler -parameters flag in the default options [tgroh] Add SdkHarnessClientControlService [tgroh] Update Synchronization in FnApiControlClient [coheigea] BEAM-3593 - Remove methods that just call super() [lcwik] Move off of deprecated method in Guava. [tgroh] Add a LocalArtifactStagingLocation [tgroh] Add LocalArtifactStagingLocation#forExisting [tgroh] Add an ArtifactRetrievalService interface [tgroh] Implement a Local ArtifactRetrievalService [chamikara] Adds a ReadAll transform to tfrecordio. [rangadi] KafkaIO : move source and sink implemenations into own files. [rangadi] minor [kedin] [SQL] Add SqlTypeCoder, replace java.sql.Types [Pablo] Moving User metrics to be in the PTransform proto for Fn API. [mairbek] Update cloud spanner library to 0.29.0 [mairbek] Fix test [mairbek] More google-cloud-platform whitelisting [mairbek] pom updates to make maven happy [mairbek] Update netty deps [rober] fixup! Remove reflection from varint codecs [ccy] [BEAM-3566] Replace apply_* hooks in DirectRunner with [ccy] Address reviewer comments [davidyan] Correct typo in SpannerIO.Write.withHost [klk] google-java-format [klk] Fix empty window assignments in Nexmark [klk] Fix empty window assignment in FlattenEvaluatorFactoryTest [klk] Switch DataflowRunner to its own private ValueInEmptyWindows [klk] Remove deprecated valueInEmptyWindows [jiangkai] Covariance Functions [aljoscha.krettek] Remove erroneous cast in FlinkStreamingTransformTranslators [aljoscha.krettek] [BEAM-3186] Correctly use deserialized timerService in Flink Runner [lcwik] Adjust gradle build dirs and hints to help IntelliJ (#4583) [coheigea] BEAM-3618 - Remove extraneous "return" statement [robertwb] [BEAM-3183] Allow a callable as input to runner.run(). [sidhom] Fix gradle java sdk image build [kenn] Add MoreFutures utility [kenn] Switch runners/java-fn-execution from Guava futures to Java 8 futures [kenn] Switch DataflowRunner from Guava futures to Java 8 futures [kenn] Switch gcp-core from Guava futures to Java 8 futures [kenn] Switch runners/core-construction-java from Guava futures to Java 8 [kenn] Switch AWS IO from Guava futures to Java 8 futures [kenn] Switch BigTableIO from Guava futures to Java 8 futures [kenn] Deprecate DoFnTester [herohde] Changed core GBK to CoGBK [herohde] Add CoGBK support to direct runner and Join example [herohde] [BEAM-3316] Translate bundle descriptors directly to execution plans in [herohde] Translate CoGBK into GBK for Dataflow and model pipeline runners [mairbek] Fixed broken test [herohde] CR: [BEAM-3302] Support CoGBK in the Go SDK [rangadi] Remove older Kafka versions from build time support. [ekirpichov] Adds PositionT and claim callback to RestrictionTracker [ekirpichov] Changes OutputAndTimeBounded invoker to start checkpoint timer after [ekirpichov] Compresses encoded GrowthState with Snappy - about 2x-3x more compact [ekirpichov] InMemoryStateInternals.copy clones the values using the coder [ekirpichov] Final fixups [ekirpichov] Bump worker to 20180205 [klk] Sickbay flaky KinesisIO tests [klk] Remove DoFnTester from core SDK tests [cclauss] from six import integer_types (en masse) [aljoscha.krettek] [BEAM-2806] Fix pipeline translation mode recognition in Flink Runner [github] Consistantly show Python and pip versions in tests [jbonofre] Revert "Reinstate proto round trip in Java DirectRunner" [tgroh] Update Assign Window URN Constant Name [coheigea] BEAM-3624 - Remove collapsible if statements [kenn] Sickbay ApexRunner ParDoTranslatorTest.testAssertionFailure [kenn] Switch FullWindowedValueCoder to bypass validation [kedin] Refactor BeamRecordType and BeamRecord [sidhom] Allow docker tag root to be specified as in Maven image build [herohde] [BEAM-3457] Upgrade gogradle and fix thrift resolution issue [herohde] [BEAM-3457] Add Go Gradle precommit [ccy] [BEAM-3635] Infer type hints on PTransformOverrides [robertwb] [BEAM-3625] Enable DoFn params in Map, Filter, etc. [herohde] [BEAM-3579] Fix textio.Write [kedin] Rename BeamRecord -> Row, BeamRecordType -> RowType [mariagh] Add test for processing-time timer [iemejia] Add missing gradle build config for sdks/java/extensions/sketching [iemejia] Fix type on shadowTest when it should be testShadow [ccy] Update snippets to fix pickling and clarify encoding issues [wcn] Modify BufferedElementCountingOutputStream to use a buffer pool for its [klk] Fix stable name errors in HBaseIOTest [arnaudfournier021] Minor Javadoc corrections for SketchFrequencies [arnaudfournier021] [BEAM-2728] Add Quantiles finding transforms to sketching extension [arnaudfournier021] Change coder serialization + improve Javadoc comments + minor fixes [aromanenko.dev] [BEAM-3291] Add Kinesis write transform [coheigea] Remove unused private variables. [ehudm] Add and migrate to HdfsCLI library for Python SDK. [klk] Fix typo in gradle idea hints [dkulp] [BEAM-3639] Update to gradle 4.5.1 [herohde] Also ignore alternative path for gogradle thrift location [herohde] Remove gogradle manual dependency ordering [herohde] Lock Go dependency versions [herohde] Ignore gogradle.lock in rat check [herohde] Ignore gogradle.lock in rat check for maven [github] Revert "Update cloud spanner library to 0.29.0" [herohde] CR: fix Go SDK textio.Write [tgroh] Add Javadoc on how Teardown is best-effort [kedin] [Schema Generation] Generate BeamRecordTypes based on pojos. [alan] [BEAM-3524] Automate testing using python sdk container built at head [coheigea] Replace boolean ternary operator + simplify some Boolean expressions [apilloud] [BEAM-410] Sort PriorityQueue<QuantileBuffer> with explicit comparator [robertwb] Disable verbose typecoder warnings. [fjetumale] [BEAM-2817] BigQuery queries are allowed to run in either BATCH or [jb] [BEAM-793] Add backoff support in JdbcIO Write [kenn] Increase gradle logging to --info [tgroh] Add a Primitive Impulse PTransform [ccy] Add switchable DirectRunner which uses the fast FnApiRunner when [rangadi] Move kafka-clients dependency to provided scope. [daniel.o.programmer] [BEAM-419] Modifying FindBug comment. [tgroh] Add a single-stage fusion implementation [robertwb] [BEAM-3074] Serialize DoFns by portable id in Dataflow runner. [lcwik] [BEAM-3629] Send the windowing strategy and whether its a merging window [jb] [BEAM-3668] Quick workaround fix for netty conflict waiting better fix [wcn] Fixing filename. [iemejia] Fix warning on jenkins on non-existent profile 'validates-runner-tests' [iemejia] Remove unneeded overwrites of maven-compiler-plugin [iemejia] Change tests execution order from filesystem (default) to random [iemejia] Remove repeated dependencies on runners/java-fn-execution module [iemejia] Add missing modules to javadoc generation: TikaIO, RedisIO, Jackson, Xml [iemejia] [BEAM-2530] Make final fixes to ensure code and tests compile with Java [aromanenko.dev] [BEAM-3637] HBaseIOTest - random table names for every test [jbonofre] [BEAM-3692] Remove maven deploy plugin configuration with skip in the [ehudm] Integration test for Python HDFS implementation. [herohde] Remove bad gogradle.lock files [pawel.pk.kaczmarczyk] [BEAM-3605] Use verification with timeout instead of Thread.sleep [XuMingmin] [BEAM-3176] support drop table (#4184) [kirpichov] Two fixes to common URN handling [arnaudfournier921] Improve Javadoc ° minor fixes [dkulp] [BEAM-3581] Make sure calcite gets an appropriate charset PRIOR to any [ehudm] Print correct line numbers for warnings. [Pablo] Adding Gauge metric to Python SDK. [Pablo] Fix lint issue [wcn] Improve rendering of DOT diagrams. [lcwik] [BEAM-3626] Add a handler capable of executing a window mapping fn on a [Pablo] Addressing comments [herohde] Update Go SDK coder constants [lcwik] [BEAM-3339] Fix failing post-release test by running groovy from gradle, [batbat] Fixed a bug that timer ID was not used for comparing timer data. Added [ccy] Use TestClock when TestStream is present in pipeline [klk] Function interface for Fn API instructions [cclauss] Exception.message was removed in Python 3 [iemejia] [BEAM-3697] Add Maven profile to run error-prone static analysis [iemejia] [BEAM-3697] Fix MoreFutures errorprone [alan] [BEAM-3695] Fix failing validates container test [aromanenko.dev] [BEAM-3228] Fix flaky Kinesis tests [tgroh] Add a multi-stage fuser [XuMingmin] [BEAM-3345][SQL] Reject unsupported inputs into JOIN (#4642) [kedin] Update 'PCollectionTuple.apply()' generic definition [kedin] [SQL] Refactor BeamSql [tgroh] fixup! Add a multi-stage fuser [Pablo] Addressing comments. [luke_zhu] Support Python 3 in the metrics, internal, typehints, and utils modules. [apilloud] [Nexmark][SQL] Use Timestamp type for timestamps [dcavazos] Added snippets for BigQueryIO, serializable functions, Dynamic [apilloud] [Nexmark][SQL] Implement Query5 [dcavazos] Use beam.io.WriteToBigQuery() [rangadi] Fix unbounded reader leak in direct-runner. Also close the reader at end [rangadi] `extractOutput()` ended up resetting underlying aggregation. This is due [rangadi] review comments. [kirpichov] Updates BigQuery dependency version [XuMingmin] Bump calcite version to 1.15.0 (#4692) [ccy] Use the switching DirectRunner implementation [cclauss] long was renamed to int in Python 3 (en masse) [dkulp] [BEAM-3640] Part1: Update Checkstyle to enforce blank lines for imports [tgroh] fixup! Add a multi-stage fuser [kenn] Ignore IntelliJ Gradle build outputs [Pablo] Renaming MetricAggregator.zero to MetricAggregator.identity_element [apilloud] [Nexmark][SQL] Implement Query7 [kenn] Add hints for IntelliJ owned output dirs [iemejia] Update byte-buddy to version 1.7.10 (adds support for Java 9) [iemejia] Update google-auto-value to version 1.5.3 [iemejia] Pin maven-gpg-plugin to version 1.6 [iemejia] Pin missing version for license-maven-plugin to version 1.14 [iemejia] [BEAM-2530] Fix dependencies for XmlIO on Java 9 [iemejia] [BEAM-2530] Add a java 9 profile to parent pom [iemejia] Add extra-enforcer-rules maven plugin to version 1.0-beta-7 [iemejia] Add ignore rule for multi-release jars compatible with Java 8 [sidhom] Use maven-publish plugin to publish java artifacts [1028332163] replace mockito-all [1028332163] replace mockito [1028332163] replace mockito-all and harcrest-all [1028332163] replace mockito-all and harcrest-all [1028332163] replace mockito-all and harcrest-all [1028332163] replace mockito-all and harcrest-all [1028332163] replace mockito-all and harcrest-all [1028332163] replace mockito-all and harcrest-all [1028332163] replace mockito-all and harcrest-all [1028332163] replace mockito-all and harcrest-all [1028332163] ignoredUnusedDeclaredDependencies [lcwik] [BEAM-3690] swapping to use mockito-core, hamcrest-core and [github] Updates javadocs of Setup and Teardown [1028332163] ban hamcrest-all and mockito-all [1028332163] ban mockito-all and hamcrest-all [willy] [BEAM-3662] Port MongoDbIOTest off DoFnTester [lukasz.gajowy] [BEAM-3456] Re-enable JDBC performance test [lukasz.gajowy] fixup! [BEAM-3456] Re-enable JDBC performance test [lukasz.gajowy] fixup! fixup! [BEAM-3456] Re-enable JDBC performance test [kirpichov] Updates BigQuery documentation [luke_zhu] Revert invalid use of io.StringIO in utils/profiler.py [Pablo] Plumbing Gauge metrics through the Fn API. [dkulp] [BEAM-3640] part 2 - add [kenn] Spotless gradle: remove extraneous globbing of all java everywhere [rmannibucau] ensure pipeline options setup is using contextual classloader and not [kenn] Explicitly exclude some troublesome optional deps from [Pablo] Fixing nanosecond translation issue in Gauge Fn API translation. [lcwik] Break fusion for a ParDo which has State or Timers [boyuanz] Add distribution counter implementation [kirpichov] A relative directory should be applied (if specified) even when using a [kenn] Explicitly exclude further optional deps from elasticsearch-hadoop [lcwik] [BEAM-2573] Don't force importing filesystems, if they fail then give up [tgroh] Use Conccurrent Constructs in InMemoryArtifactStagerService [kirpichov] Adds more logging of BigQuery jobs and makes load job naming more [tgroh] Add Environment Manager Interfaces [aromanenko.dev] [BEAM-3538] Remove (or merge) Java 8 specific tests module into the main [iemejia] [BEAM-3632] Add missing partitioning parameter in WriteTables [iemejia] [BEAM-3632] Add TableDestination.withTableReference and fix WriteTables [rmannibucau] [BEAM-3728][BEAM-3729] fixing the classloader lookup for pipeline [pawel.pk.kaczmarczyk] [BEAM-3317] Use fixed system time for testing [lukasz.gajowy] [BEAM-3732] Fix broken maven profiles [rangadi] Use TreeSet in place of PriorityQueue. [coheigea] Make sure there is a space between closing round bracket and opening [ehudm] Don't cache pubsub subscription prematurely. [robertwb] Add MultiMap side inputs to Python SDK. [altay] Fixing minor bugs: [ankurgoenka] Making default thread count 12 [rangadi] update checksWithMultipleMerges() to check for multiple merges by [holden] First pass at fixing all of E999 (invalid parsing) errors in Py3 found [aljoscha.krettek] Make parameter of DoFnRunners.lateDataDroppingRunner() more specific [aljoscha.krettek] Allow overriding DoFnRunners in subclasses of Flink DoFnOperator [aljoscha.krettek] Invoke finishBundle() before teardown() in DoFnOperator [aljoscha.krettek] [BEAM-2140] Ignore event-time timers in SplittableDoFnOperator [aljoscha.krettek] [BEAM-2140] Block DoFnOperator.close() if we have pending timers [aljoscha.krettek] [BEAM-2140] Don't use StatefulDoFnRunner when running SDF in FlinkRunner [aljoscha.krettek] Make ProcessFnRunner constructor public [aljoscha.krettek] [BEAM-2140] Use ProcessFnRunner in DoFnRunner for executing SDF [aljoscha.krettek] [BEAM-2140] Enable SDF tests for Flink Streaming Runner [aljoscha.krettek] [BEAM-2140] Enable SDF tests in gradle for Flink Streaming Runner [aljoscha.krettek] [BEAM-3727] Never shutdown sources in Flink Streaming execution mode [mairbek] Changes multipe structuredValue implementations. Most notably, updates [tgroh] Make Impulse#create() visible [XuMingmin] [BEAM-3634] Refactor BeamRelNode to return a PTransform (#4705) [XuMingmin] [BEAM-591] KafkaIO : Improve watermarks and support server side [holden] Fix some raise_from to reraise. [holden] vcfio somehow has some sort issues. It's not overly important and [wtanaka] [BEAM-3034] Utilize some java 5 features [xumingmingv] add setup/teardown for BeamSqlSeekableTable. [xumingmingv] rename method as suggested and declare as default methods. [yifanzou] [BEAM-3339] add python RC validation automation [github] Fixes typo in BigQueryIO javadoc [markliu] [BEAM-3750] Remove TestPipeline.covnertToArgs from integration tests [XuMingmin] [BEAM-591]: Update KafkaIO JavaDoc to reflect new timestamp API. (#4749) [tgroh] Add To/From Proto Round Trip for ExecutableStage [tgroh] Make GreedyStageFuser a Factory Class [kirpichov] Makes it possible to use Wait with default windowing, eg. in batch [xumingmingv] @Parameter annotation does not work for UDFs in Beam SQL [xumingmingv] fix type error [grzegorz.kolakowski] [BEAM-3043] Set user-specified PTransform names on Flink operators [coheigea] Avoid unnecessary autoboxing by replacing Integer/Long.valueOf with [tgroh] Update QueryablePipeline Factory Method Name [tgroh] Scan Core Construction NeedsRunner Tests [klk] Add MetricsTranslation [rmannibucau] extracting the scheduled executor service in a factory variable in SDF [echauchot] [BEAM-3681] Make S3FileSystem copy atomic for smaller than 5GB objects [echauchot] [BEAM-3681] Add a comment for the extra check of objectSize in [grzegorz.kolakowski] [BEAM-3753] Fix failing integration tests [grzegorz.kolakowski] [BEAM-3753] Rename *ITCase.java tests files to *Test.java [lcwik] [BEAM-3762] Update Dataflow worker container image to support unlimited [github] Bump container image tag to fix incompatibility between the SDK and the [github] Update dependency.py [sidhom] Run NeedsRunner tests from direct runner gradle build [ccy] Fix issue from incomplete removal of has_cache [yifanzou] [BEAM-3735] copy mobile-gaming sources in to archetypes [mairbek] Update SpannerIO to use Batch API. [sidhom] Address review comments [sidhom] Remove old sourceSets.test.output references [rangadi] Upate Readme and JavaDoc for KafkaIO. [robertwb] Avoid warning in our default runner. [github] [BEAM-3719] Adds support for reading side-inputs from SDFs [github] print() is a function in Python 3 [ehudm] Add Python lint check for calls to unittest.main. [tgroh] Add JavaReadViaImpulse to core-construction [robertwb] [maven-release-plugin] prepare branch release-2.4.0 [robertwb] [maven-release-plugin] prepare for next development iteration [robertwb] Bump Python dev version. [tgroh] Revert "extracting the scheduled executor service in a factory variable [rangadi] Mention support for older versions. [amyrvold] Beam runner inventory, run as a cron job, not on each CL [github] Fixing formatting bug in filebasedsink.py. [github] Fix lint issue. [ehudm] Add support for Pub/Sub messages with attributes. [amyrvold] [BEAM-3775] Increase timeout in [ehudm] Address review comment from PR #4744. [mariagh] Add TestClock to test [wcn] Update generated code to match import from 5e6db92 [XuMingmin] [BEAM-3754]: Fix readBytes() initialization (#4792) [daniel.o.programmer] [BEAM-3126] Fixing incorrect function call in bundle processor. [cclauss] Change unicode --> six.text_type for Python 3 [alan.myrvold] [BEAM-3621] Add Spring repo to build_rules to allow downloading pentaho [samuel.waggoner] [BEAM-3777] allow UDAFs to be indirect subclasses of CombineFn [ehudm] Improve FileBasedSink rename safety. [ehudm] Add missing import statements. [lcwik] [BEAM-3690] ban mockito-all and hamcrest-all [kenn] Bump sdks/go/container.pom.xml to 2.5.0-SNAPSHOT [amyrvold] [BEAM-3791] Update version number in build_rules.gradle [rmannibucau] Make StateInternals short state method defaulting to the implementation [apilloud] [Nexmark][SQL] Convert SQL Rows to Java models [apilloud] [Nexmark][SQL] Output java pojos [apilloud] [Nexmark][SQL] Add model tests for sql [tgroh] Fallback to byte equality in MutaitonDetection [herohde] [BEAM-3457] Drop Go Maven PreCommit [markliu] Fix integration tests that use NumberedSharedFiles [kedin] [SQL] Add support for ARRAY expression [kedin] [SQL] Support ARRAY in projections [kedin] [SQL] Implement array elements access expression [kedin] [SQL] Add support for ELEMENT(collection) function [kedin] [SQL] Add support for CARDINALITY(collection) [tgroh] Add an InProcess SdkHarness Rule [tgroh] Close Outstanding Control Clients on Service close [tgroh] Don't try to close a cancelled Multiplexer [herohde] [BEAM-3804] Build Go SDK container with gradle [herohde] [BEAM-3793] Validate provision response and add beamctl support [cclauss] [BEAM-1251] Fix basestring, file(), reduce(), and xrange() for Python 3 [apilloud] [Nexmark] Use queue for Query 6 model [apilloud] [Nexmark] Ensure enough data to produce output [altay] Fix topic URIs. [apilloud] [BEAM-3802] Remove broken cachedMetricResults [tgroh] fixup! Close Outstanding Control Clients on Service close [tgroh] fixup! Don't try to close a cancelled Multiplexer [tgroh] fixup! Add an InProcess SdkHarness Rule [cclauss] [BEAM-1251] Fix basestring for Python 3 - again [cclauss] [BEAM-1251] Change unicode --> six.text_type for Python 3 - again [herohde] Fix package name in Go container gradle build [apilloud] [Nexmark] Sickbay query 6 [mariand] Improve implementation of nexmark.StringsGenerator.nextExactString() [ehudm] Update Dataflow Beam container version. [chamikara] [BEAM-3734] Performance tests for XmlIO (#4747) [apilloud] Dataflow runner must supply attempted metrics [apilloud] Throw UnsupportedOperationException instead of returning null [robertwb] Remove obsolete MapTaskRunner. [ccy] Revert #4666 "Use beam.io.WriteToBigQuery()" [github] [BEAM-3806] Fix direct-runner hang (#4829) [kedin] [SQL] Support nested Rows [klk] Add GAUGE_DATA case to metricUpdatesFromProto [klk] Add Gauge metric tests to ensure value persists [klk] Add test for a trigger with windowed SQL query [klk] Increase whitelist of false detections in SdkCoreApiSurfaceTest [klk] Eliminate beam-model-fn-execution test-jar deps [klk] Eliminate beam-sdks-java-fn-execution test-jar deps [klk] Eliminate incorrect sdks-java-core test-jar deps [klk] Notate uses of beam-runners-core-java test-jar [chamikara] [BEAM-3217] Jenkins job for HadoopInputFormatIOIT (#4758) [ekirpichov] [BEAM-3741] Proto changes for splitting over Fn API [ekirpichov] Address comments [aljoscha.krettek] Annotate ParDoTest.duplicateTimerSetting with UsesTestStream [lcwik] [BEAM-3842] Allow static methods to be defined inside PipelineOptions. [lcwik] [BEAM-3843] Add a convenience method in ExperimentalOptions to check to [lcwik] [BEAM-3843] Migrate to using ExperimentalOptions.hasExperiment [tgroh] fixup! Don't try to close a cancelled Multiplexer [ehudm] Use Python 3 compatible string. [XuMingmin] [SQL] Add support for arrays of rows (#4857) [XuMingmin] [BEAM-2281][Sql] Use SqlFunctions.toBigDecimal not toString (#4865) ------------------------------------------ [...truncated 193.11 KB...] [INFO] Running org.apache.beam.sdk.io.tfrecord.TFRecordIOIT [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 468.073 s <<< FAILURE! - in org.apache.beam.sdk.io.tfrecord.TFRecordIOIT [ERROR] writeThenReadAll(org.apache.beam.sdk.io.tfrecord.TFRecordIOIT) Time elapsed: 468.073 s <<< ERROR! java.lang.RuntimeException: java.io.IOException: Failed to advance reader of source: hdfs://146.148.62.9:9000/TEXTIO_IT__1521115464780-00000-of-00003.tfrecord range [0, 19222204) at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:605) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:379) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:185) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:150) at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:74) at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:381) at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:353) at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:284) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Not a valid TFRecord. Fewer than 12 bytes. at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444) at org.apache.beam.sdk.io.TFRecordIO$TFRecordCodec.read(TFRecordIO.java:631) at org.apache.beam.sdk.io.TFRecordIO$TFRecordSource$TFRecordReader.readNextRecord(TFRecordIO.java:526) at org.apache.beam.sdk.io.CompressedSource$CompressedReader.readNextRecord(CompressedSource.java:426) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473) at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:267) at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:602) ... 14 more (402142cc52d644e0): java.io.IOException: Failed to advance reader of source: hdfs://146.148.62.9:9000/TEXTIO_IT__1521115464780-00000-of-00003.tfrecord range [0, 19222204) at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:605) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:379) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:185) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:150) at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:74) at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:381) at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:353) at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:284) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Not a valid TFRecord. Fewer than 12 bytes. at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444) at org.apache.beam.sdk.io.TFRecordIO$TFRecordCodec.read(TFRecordIO.java:631) at org.apache.beam.sdk.io.TFRecordIO$TFRecordSource$TFRecordReader.readNextRecord(TFRecordIO.java:526) at org.apache.beam.sdk.io.CompressedSource$CompressedReader.readNextRecord(CompressedSource.java:426) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473) at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:267) at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:602) ... 14 more (402142cc52d64c95): java.io.IOException: Failed to advance reader of source: hdfs://146.148.62.9:9000/TEXTIO_IT__1521115464780-00000-of-00003.tfrecord range [0, 19222204) at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:605) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:379) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:185) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:150) at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:74) at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:381) at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:353) at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:284) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Not a valid TFRecord. Fewer than 12 bytes. at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444) at org.apache.beam.sdk.io.TFRecordIO$TFRecordCodec.read(TFRecordIO.java:631) at org.apache.beam.sdk.io.TFRecordIO$TFRecordSource$TFRecordReader.readNextRecord(TFRecordIO.java:526) at org.apache.beam.sdk.io.CompressedSource$CompressedReader.readNextRecord(CompressedSource.java:426) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473) at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:267) at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:602) ... 14 more (402142cc52d6444a): java.io.IOException: Failed to advance reader of source: hdfs://146.148.62.9:9000/TEXTIO_IT__1521115464780-00000-of-00003.tfrecord range [0, 19222204) at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:605) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:379) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:185) at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:150) at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:74) at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:381) at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:353) at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:284) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114) at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Not a valid TFRecord. Fewer than 12 bytes. at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444) at org.apache.beam.sdk.io.TFRecordIO$TFRecordCodec.read(TFRecordIO.java:631) at org.apache.beam.sdk.io.TFRecordIO$TFRecordSource$TFRecordReader.readNextRecord(TFRecordIO.java:526) at org.apache.beam.sdk.io.CompressedSource$CompressedReader.readNextRecord(CompressedSource.java:426) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473) at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:267) at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:602) ... 14 more (ca56e8b0d6266738): Workflow failed. Causes: (d03bc8b58b44abcf): S02:TFRecordIO.Read/Read+Transform bytes to strings/Map+Calculate hashcode/WithKeys/AddKeys/Map+Calculate hashcode/Combine.perKey(Hashing)/GroupByKey+Calculate hashcode/Combine.perKey(Hashing)/Combine.GroupedValues/Partial+Calculate hashcode/Combine.perKey(Hashing)/GroupByKey/Reify+Calculate hashcode/Combine.perKey(Hashing)/GroupByKey/Write failed., (40549a95d3ed11a8): A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on: tfrecordioit0writethenrea-03150508-fd86-harness-v056, tfrecordioit0writethenrea-03150508-fd86-harness-v056, tfrecordioit0writethenrea-03150508-fd86-harness-v056, tfrecordioit0writethenrea-03150508-fd86-harness-v056 at org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:133) at org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:89) at org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:54) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:346) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:328) at org.apache.beam.sdk.io.tfrecord.TFRecordIOIT.writeThenReadAll(TFRecordIOIT.java:127) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:317) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:317) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:410) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] TFRecordIOIT.writeThenReadAll:127 Runtime java.io.IOException: Failed to adv... [INFO] [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 [INFO] [INFO] [INFO] --- maven-dependency-plugin:3.0.2:analyze-only (default) @ beam-sdks-java-io-file-based-io-tests --- [WARNING] Used undeclared dependencies found: [WARNING] javax.xml.bind:jaxb-api:jar:2.2.2:runtime [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 08:27 min [INFO] Finished at: 2018-03-15T12:12:14Z [INFO] Final Memory: 104M/1393M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.2:analyze-only (default) on project beam-sdks-java-io-file-based-io-tests: Dependency problems found -> [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.2:analyze-only (default) on project beam-sdks-java-io-file-based-io-tests: Dependency problems found at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:213) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:154) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:146) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:51) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:309) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:194) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:107) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:955) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:290) at org.apache.maven.cli.MavenCli.main (MavenCli.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356) Caused by: org.apache.maven.plugin.MojoExecutionException: Dependency problems found at org.apache.maven.plugins.dependency.analyze.AbstractAnalyzeMojo.execute (AbstractAnalyzeMojo.java:254) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:208) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:154) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:146) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:51) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:309) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:194) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:107) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:955) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:290) at org.apache.maven.cli.MavenCli.main (MavenCli.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356) [ERROR] [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException STDERR: 2018-03-15 12:12:14,593 240db462 MainThread beam_integration_benchmark(1/1) ERROR Error during benchmark beam_integration_benchmark Traceback (most recent call last): File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",> line 623, in RunBenchmark DoRunPhase(spec, collector, detailed_timer) File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",> line 526, in DoRunPhase samples = spec.BenchmarkRun(spec) File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py",> line 159, in Run job_type=job_type) File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/providers/gcp/gcp_dpb_dataflow.py",> line 90, in SubmitJob assert retcode == 0, "Integration Test Failed." AssertionError: Integration Test Failed. 2018-03-15 12:12:14,595 240db462 MainThread beam_integration_benchmark(1/1) INFO Cleaning up benchmark beam_integration_benchmark 2018-03-15 12:12:14,596 240db462 MainThread beam_integration_benchmark(1/1) INFO Running: kubectl --kubeconfig=<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/config-filebasedioithdfs-1521114081611> delete -f <https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/src/.test-infra/kubernetes/hadoop/SmallITCluster/hdfs-single-datanode-cluster.yml> 2018-03-15 12:12:15,007 240db462 MainThread beam_integration_benchmark(1/1) INFO Running: kubectl --kubeconfig=<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/config-filebasedioithdfs-1521114081611> delete -f <https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/src/.test-infra/kubernetes/hadoop/SmallITCluster/hdfs-single-datanode-cluster-for-local-dev.yml> 2018-03-15 12:12:15,153 240db462 MainThread beam_integration_benchmark(1/1) ERROR Exception running benchmark Traceback (most recent call last): File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",> line 733, in RunBenchmarkTask RunBenchmark(spec, collector) File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",> line 623, in RunBenchmark DoRunPhase(spec, collector, detailed_timer) File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",> line 526, in DoRunPhase samples = spec.BenchmarkRun(spec) File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py",> line 159, in Run job_type=job_type) File "<https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/PerfKitBenchmarker/perfkitbenchmarker/providers/gcp/gcp_dpb_dataflow.py",> line 90, in SubmitJob assert retcode == 0, "Integration Test Failed." AssertionError: Integration Test Failed. 2018-03-15 12:12:15,154 240db462 MainThread beam_integration_benchmark(1/1) ERROR Benchmark 1/1 beam_integration_benchmark (UID: beam_integration_benchmark0) failed. Execution will continue. 2018-03-15 12:12:15,155 240db462 MainThread beam_integration_benchmark(1/1) INFO Benchmark run statuses: --------------------------------------------------------------------------------- Name UID Status Failed Substatus --------------------------------------------------------------------------------- beam_integration_benchmark beam_integration_benchmark0 FAILED --------------------------------------------------------------------------------- Success rate: 0.00% (0/1) 2018-03-15 12:12:15,155 240db462 MainThread beam_integration_benchmark(1/1) INFO Complete logs can be found at: <https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/runs/240db462/pkb.log> 2018-03-15 12:12:15,155 240db462 MainThread beam_integration_benchmark(1/1) INFO Completion statuses can be found at: <https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/ws/runs/240db462/completion_statuses.json> Build step 'Execute shell' marked build as failure