[jira] [Commented] (BEAM-7526) beam_PostCommit_SQL suite is failing since June 7.
[ https://issues.apache.org/jira/browse/BEAM-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860600#comment-16860600 ] Charith Ellawala commented on BEAM-7526: I'll take a look > beam_PostCommit_SQL suite is failing since June 7. > -- > > Key: BEAM-7526 > URL: https://issues.apache.org/jira/browse/BEAM-7526 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Valentyn Tymofieiev >Assignee: Reuven Lax >Priority: Major > > Error Message > java.lang.UnsupportedOperationException: Converting BigQuery type 'class > com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, > nullable=false, logicalType=null, collectionElementType=null, > mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not > supported > Stacktrace > java.lang.UnsupportedOperationException: Converting BigQuery type 'class > com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, > nullable=false, logicalType=null, collectionElementType=null, > mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not > supported > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:457) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$11(BigQueryUtils.java:447) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError
[ https://issues.apache.org/jira/browse/BEAM-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7527: -- Description: I am seeing several errors in Python SDK Integration test suites, such as Dataflow ValidatesRunner and Python PostCommit that fail due to one of the autogenerated files not being found. For example: {noformat} /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84: UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully supported. You may encounter buggy behavior or missing features. 'Running the Apache Beam SDK on Python 3 is not yet fully supported. ' Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') ... ERROR == ERROR: Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') -- Traceback (most recent call last): File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py", line 39, in runTest raise self.exc_val.with_traceback(self.tb) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/loader.py", line 418, in loadTestsFromName addr.filename, addr.module) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py", line 245, in load_module return load_package(name, filename) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py", line 217, in load_package return _load(spec) File "", line 684, in _load File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py", line 97, in from apache_beam import coders File "/home/jenkins/jenkins-slave/workspace/beam_Pos tCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/__init__.py", line 19, in from apache_beam.coders.coders import * File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coders.py", line 32, in from apache_beam.coders import coder_impl File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coder_impl.py", line 44, in from apache_beam.utils import windowed_value File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/windowed_value.py", line 34, in from apache_beam.utils.timestamp import MAX_TIMESTAMP File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/timestamp.py", line 34, in from apache_beam.portability import common_urns File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/common_urns.py", line 25, in from apache_beam.portability.api import metrics_pb2 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/api/metrics_pb2.py", line 16, in import beam_runner_api_pb2 as beam__runner__api__pb2 ModuleNotFoundError: No module named 'beam_runner_api_pb2' {noformat} {noformat} /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84: UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully supported. You may encounter buggy behavior or missing features. 'Running the Apache Beam SDK on Python 3 is not yet fully supported. ' Failure: ModuleNotFoundError (No module named 'endpoints_pb2') ... ERROR == ERROR: Failure: ModuleNotFoundError (No module named 'endpoints_pb2') -- Traceback (most recent call last): File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py", line 39, in runTest ra
[jira] [Commented] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError
[ https://issues.apache.org/jira/browse/BEAM-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860467#comment-16860467 ] Valentyn Tymofieiev commented on BEAM-7527: --- Same error in Python SDK WordCount performance test suite: https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py27/103/console > Beam Python integration test suites are flaky: ModuleNotFoundError > -- > > Key: BEAM-7527 > URL: https://issues.apache.org/jira/browse/BEAM-7527 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > > I am seeing several errors in Python SDK Integration test suites, such as > Dataflow ValidatesRunner and Python PostCommit that fail due to one of the > autogenerated files not being found. > For example: > {noformat} > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84: > UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully > supported. You may encounter buggy behavior or missing features. > 'Running the Apache Beam SDK on Python 3 is not yet fully supported. ' > Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') ... > ERROR > == > ERROR: Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py", > line 39, in runTest > raise self.exc_val.with_traceback(self.tb) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/loader.py", > line 418, in loadTestsFromName > addr.filename, addr.module) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py", > line 47, in importFromPath > return self.importFromDir(dir_path, fqname) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py", > line 94, in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py", > line 245, in load_module > return load_package(name, filename) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py", > line 217, in load_package > return _load(spec) > File "", line 684, in _load > File "", line 665, in _load_unlocked > File "", line 678, in exec_module > File "", line 219, in _call_with_frames_removed > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py", > line 97, in > from apache_beam import coders > File "/home/jenkins/jenkins-slave/workspace/beam_Pos > tCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/__init__.py", line > 19, in > from apache_beam.coders.coders import * > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coders.py", > line 32, in > from apache_beam.coders import coder_impl > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coder_impl.py", > line 44, in > from apache_beam.utils import windowed_value > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/windowed_value.py", > line 34, in > from apache_beam.utils.timestamp import MAX_TIMESTAMP > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/timestamp.py", > line 34, in > from apache_beam.portability import common_urns > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/common_urns.py", > line 25, in > from apache_beam.portability.api import metrics_pb2 > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/api/metrics_pb2.py", > line 16, in > import beam_runner_api_pb2 as beam__runner__api__pb2 > ModuleNotFoundError: No module named 'beam_runner_api_pb2' > {noformat} > {noformat} > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/py
[jira] [Commented] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly
[ https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860466#comment-16860466 ] Kishan Kumar commented on BEAM-7425: [~kedin] Thanks for Clarification but Like for all Other Datatype I can Easily Read the Value as a String from SchemaRecord Class While Parsing after Reading from BQ but for Numeric Data Type it's Failing and Giving Exception or Giving Value as *last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]*. So instead of just Asking this Fix, I asked a New Feature where I only need to Pass POJO Class And My Values automatically get Parsed and Set. All Such Parsing and Other Work will get Rid off. And i don't want to Use TableRow because of Following Issue:https://stackoverflow.com/questions/48184043/bigqueryio-read-performance-very-slow-apache-beam?rq=1 > Reading BigQuery Table Data into Java Classes(Pojo) Directly > > > Key: BEAM-7425 > URL: https://issues.apache.org/jira/browse/BEAM-7425 > Project: Beam > Issue Type: New Feature > Components: io-java-avro, io-java-gcp >Affects Versions: 2.12.0 > Environment: Dataflow >Reporter: Kishan Kumar >Priority: Major > > While Developing my code I used the below code snippet to read the table data > from BigQuery. > > {code:java} > PCollection gpseEftReasonCodes = input > .apply("Reading xxyyzz", > BigQueryIO > .read(new ReadTable(ReasonCode.class)) > .withoutValidation() > .withTemplateCompatibility() > .fromQuery("Select * from dataset.xxyyzz") > .usingStandardSql() > .withCoder(SerializableCoder.of(xxyyzz.class)) > {code} > Read Table Class: > {code:java} > @DefaultSchema(JavaBeanSchema.class) > public class ReadTable implements SerializableFunction > { > private static final long serialVersionUID = 1L; > private static Gson gson = new Gson(); > public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); > private final Counter countingRecords = > Metrics.counter(ReadTable.class, "Reading Records EFT Report"); > private Class class1; > > public ReadTable(Class class1) { this.class1 = class1; } > > public T apply(SchemaAndRecord schemaAndRecord) { > Map mapping = new HashMap<>(); > int counter = 0; > try { > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > } > countingRecords.inc(); > JsonElement jsonElement = gson.toJsonTree(mapping); > return gson.fromJson(jsonElement, class1); > } catch (Exception mp) { > LOG.error("Found Wrong Mapping for the Record: "+mapping); > mp.printStackTrace(); return null; } > } > } > {code} > So After Reading the data from Bigquery I was mapping data from > SchemaAndRecord to pojo I was getting value for columns whose Data type is > Numeric mention below. > {code} > last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16] > {code} > My Expectation was I will get exact value but getting the HyperByte Buffer > the version I am using is Apache beam 2.12.0. If any more information is > needed then please let me know. > Way 2 Tried: > {code:java} > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > if(f.name().equalsIgnoreCase("reason_code_id")) { > BigDecimal numericValue = new Conversions.DecimalConversion() >.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), > s1.getLogicalType()); >System.out.println("Numeric Con"+numericValue); > } else { > System.out.println("Else Condition "+f.name()); > } > {code} > Facing Issue: > {code} > 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: > RECORD > {code} > > It would be Great if we have a method which maps all the BigQuery Data with > Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only > 5 Column then, in that case, BigQueryIO should map only that 5 Data values > into Java Class and Rest will be Rejected As I am Doing After So much Effort. > Numeric Data Type must be Deserialize by itself while fetching data like > TableRow. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7527) Beam Python integration test suites are flaky: ModuleNotFoundError
Valentyn Tymofieiev created BEAM-7527: - Summary: Beam Python integration test suites are flaky: ModuleNotFoundError Key: BEAM-7527 URL: https://issues.apache.org/jira/browse/BEAM-7527 Project: Beam Issue Type: Bug Components: test-failures Reporter: Valentyn Tymofieiev I am seeing several errors in Python SDK Integration test suites, such as Dataflow ValidatesRunner and Python PostCommit that fail due to one of the autogenerated files not being found. For example: {noformat} /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84: UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully supported. You may encounter buggy behavior or missing features. 'Running the Apache Beam SDK on Python 3 is not yet fully supported. ' Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') ... ERROR == ERROR: Failure: ModuleNotFoundError (No module named 'beam_runner_api_pb2') -- Traceback (most recent call last): File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py", line 39, in runTest raise self.exc_val.with_traceback(self.tb) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/loader.py", line 418, in loadTestsFromName addr.filename, addr.module) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py", line 245, in load_module return load_package(name, filename) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/build/gradleenv/-1734967053/lib/python3.6/imp.py", line 217, in load_package return _load(spec) File "", line 684, in _load File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py", line 97, in from apache_beam import coders File "/home/jenkins/jenkins-slave/workspace/beam_Pos tCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/__init__.py", line 19, in from apache_beam.coders.coders import * File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coders.py", line 32, in from apache_beam.coders import coder_impl File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/coders/coder_impl.py", line 44, in from apache_beam.utils import windowed_value File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/windowed_value.py", line 34, in from apache_beam.utils.timestamp import MAX_TIMESTAMP File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/timestamp.py", line 34, in from apache_beam.portability import common_urns File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/common_urns.py", line 25, in from apache_beam.portability.api import metrics_pb2 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/portability/api/metrics_pb2.py", line 16, in import beam_runner_api_pb2 as beam__runner__api__pb2 ModuleNotFoundError: No module named 'beam_runner_api_pb2' {noformat} {noformat} /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/__init__.py:84: UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully supported. You may encounter buggy behavior or missing features. 'Running the Apache Beam SDK on Python 3 is not yet fully supported. ' Failure: ModuleNotFoundError (No module named 'endpoints_pb2') ... ERROR == ERROR: Failure: ModuleNotFoundError (No module named 'endpoints_pb2') -- Traceback (most recent call last):
[jira] [Work logged] (BEAM-6673) BigQueryIO.Read should automatically produce schemas
[ https://issues.apache.org/jira/browse/BEAM-6673?focusedWorklogId=257313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257313 ] ASF GitHub Bot logged work on BEAM-6673: Author: ASF GitHub Bot Created on: 11/Jun/19 01:38 Start Date: 11/Jun/19 01:38 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8620: [BEAM-6673] Add schema support to BigQuery reads URL: https://github.com/apache/beam/pull/8620#issuecomment-500652691 This PR breaks Beam SQL postcommit suite, see: https://issues.apache.org/jira/browse/BEAM-7526. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257313) Time Spent: 2h 20m (was: 2h 10m) > BigQueryIO.Read should automatically produce schemas > > > Key: BEAM-6673 > URL: https://issues.apache.org/jira/browse/BEAM-6673 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The output PCollections should contain -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-7526) beam_PostCommit_SQL suite is failing since June 7.
[ https://issues.apache.org/jira/browse/BEAM-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reassigned BEAM-7526: - Assignee: Reuven Lax > beam_PostCommit_SQL suite is failing since June 7. > -- > > Key: BEAM-7526 > URL: https://issues.apache.org/jira/browse/BEAM-7526 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Valentyn Tymofieiev >Assignee: Reuven Lax >Priority: Major > > Error Message > java.lang.UnsupportedOperationException: Converting BigQuery type 'class > com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, > nullable=false, logicalType=null, collectionElementType=null, > mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not > supported > Stacktrace > java.lang.UnsupportedOperationException: Converting BigQuery type 'class > com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, > nullable=false, logicalType=null, collectionElementType=null, > mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not > supported > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:457) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$11(BigQueryUtils.java:447) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7526) beam_PostCommit_SQL suite is failing since June 7.
[ https://issues.apache.org/jira/browse/BEAM-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860461#comment-16860461 ] Valentyn Tymofieiev commented on BEAM-7526: --- https://github.com/apache/beam/pull/8620 seems to be the rootcause as this is the only relevant commit visualized on the first failing run: https://builds.apache.org/job/beam_PostCommit_SQL/1746/ I cannot find Jira username for @charithe, assigning to [~reuvenlax]. cc: [~shehzaadn] [~kedin] > beam_PostCommit_SQL suite is failing since June 7. > -- > > Key: BEAM-7526 > URL: https://issues.apache.org/jira/browse/BEAM-7526 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Valentyn Tymofieiev >Priority: Major > > Error Message > java.lang.UnsupportedOperationException: Converting BigQuery type 'class > com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, > nullable=false, logicalType=null, collectionElementType=null, > mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not > supported > Stacktrace > java.lang.UnsupportedOperationException: Converting BigQuery type 'class > com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, > nullable=false, logicalType=null, collectionElementType=null, > mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not > supported > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:457) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$11(BigQueryUtils.java:447) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements
[ https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=257311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257311 ] ASF GitHub Bot logged work on BEAM-7428: Author: ASF GitHub Bot Created on: 11/Jun/19 01:37 Start Date: 11/Jun/19 01:37 Worklog Time Spent: 10m Work Description: jkff commented on issue #8741: [BEAM-7428] Output the timestamp on elements in ReadAllViaFileBasedSource URL: https://github.com/apache/beam/pull/8741#issuecomment-500652579 I think we discussed this before, and concluded that reader.getCurrentTimestamp() ought to return a timestamp greater than timestamp of the element being read, which is consistent with watermark logic everywhere in Beam (including SDF): you can output forward in time but not backward. Most BoundedSource's indeed output at -INF, which is inconsistent with the above. On the other hand, you could argue that they do so because they assume their source itself has a timestamp of -INF, and what they really mean is "output at the timestamp of my source"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257311) Time Spent: 4h (was: 3h 50m) > ReadAllViaFileBasedSource does not output the timestamps of the read elements > - > > Key: BEAM-7428 > URL: https://issues.apache.org/jira/browse/BEAM-7428 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 4h > Remaining Estimate: 0h > > This differs from the implementation of JavaReadViaImpulse that tackles a > similar problem but does output the timestamps correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6947) TestGCSIO.test_last_updated (gcsio_test.py) fails when the current timezone offset and the timezone offset on 2nd of Jan 1970 differ
[ https://issues.apache.org/jira/browse/BEAM-6947?focusedWorklogId=257312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257312 ] ASF GitHub Bot logged work on BEAM-6947: Author: ASF GitHub Bot Created on: 11/Jun/19 01:37 Start Date: 11/Jun/19 01:37 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #8180: [BEAM-6947] Fix for failing TestGCSIO.test_last_updated (gcsio_test.py) test URL: https://github.com/apache/beam/pull/8180#issuecomment-500652616 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257312) Time Spent: 40m (was: 0.5h) > TestGCSIO.test_last_updated (gcsio_test.py) fails when the current timezone > offset and the timezone offset on 2nd of Jan 1970 differ > > > Key: BEAM-6947 > URL: https://issues.apache.org/jira/browse/BEAM-6947 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Csaba Kassai >Assignee: Csaba Kassai >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The test TestGCSIO.test_last_updated uses timestamp 123456.78 as the last > updated timestamp. This timestamp is converted into a naive datetime in the > Fakefile class get_metadata method (gcsio_test.py line 72) Then in the GcsIO > class last_updated method (gcsio.py ) is converted back to timestamp. But the > conversion is incorrect when the the timezone offset is different in 1970 and > now. In my case now Singapore is GMT+8 and it was only GMT+7:30 in 1970. So > the test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7526) beam_PostCommit_SQL suite is failing since June 7.
Valentyn Tymofieiev created BEAM-7526: - Summary: beam_PostCommit_SQL suite is failing since June 7. Key: BEAM-7526 URL: https://issues.apache.org/jira/browse/BEAM-7526 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Valentyn Tymofieiev Error Message java.lang.UnsupportedOperationException: Converting BigQuery type 'class com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, nullable=false, logicalType=null, collectionElementType=null, mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not supported Stacktrace java.lang.UnsupportedOperationException: Converting BigQuery type 'class com.google.api.client.util.ArrayMap' to 'FieldType{typeName=STRING, nullable=false, logicalType=null, collectionElementType=null, mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}' is not supported at org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:457) at org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$11(BigQueryUtils.java:447) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6952) concatenated compressed files bug with python sdk
[ https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath updated BEAM-6952: - Affects Version/s: (was: 2.13.0) Not applicable > concatenated compressed files bug with python sdk > - > > Key: BEAM-6952 > URL: https://issues.apache.org/jira/browse/BEAM-6952 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: Not applicable >Reporter: Daniel Lescohier >Priority: Major > Fix For: 2.14.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The Python apache_beam.io.filesystem module has a bug handling concatenated > compressed files. > The PR I will create has two commits: > # a new unit test that shows the problem > # a fix to the problem. > The unit test is added to the apache_beam.io.filesystem_test module. It was > added to this module because the test: > apache_beam.io.textio_test.test_read_gzip_concat does not encounter the > problem in the Beam 2.11 and earlier code base because the test data is too > small: the data is smaller than read_size, so it goes through logic in the > code that avoids the problem in the code. So, this test sets read_size > smaller and test data bigger, in order to encounter the problem. It would be > difficult to test in the textio_test module, because you'd need very large > test data because default read_size is 16MiB, and the ReadFromText interface > does not allow you to modify the read_size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6952) concatenated compressed files bug with python sdk
[ https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath resolved BEAM-6952. -- Resolution: Fixed > concatenated compressed files bug with python sdk > - > > Key: BEAM-6952 > URL: https://issues.apache.org/jira/browse/BEAM-6952 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Daniel Lescohier >Priority: Major > Fix For: 2.14.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The Python apache_beam.io.filesystem module has a bug handling concatenated > compressed files. > The PR I will create has two commits: > # a new unit test that shows the problem > # a fix to the problem. > The unit test is added to the apache_beam.io.filesystem_test module. It was > added to this module because the test: > apache_beam.io.textio_test.test_read_gzip_concat does not encounter the > problem in the Beam 2.11 and earlier code base because the test data is too > small: the data is smaller than read_size, so it goes through logic in the > code that avoids the problem in the code. So, this test sets read_size > smaller and test data bigger, in order to encounter the problem. It would be > difficult to test in the textio_test module, because you'd need very large > test data because default read_size is 16MiB, and the ReadFromText interface > does not allow you to modify the read_size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6952) concatenated compressed files bug with python sdk
[ https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath updated BEAM-6952: - Affects Version/s: (was: 2.11.0) 2.13.0 > concatenated compressed files bug with python sdk > - > > Key: BEAM-6952 > URL: https://issues.apache.org/jira/browse/BEAM-6952 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.13.0 >Reporter: Daniel Lescohier >Priority: Major > Fix For: 2.14.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The Python apache_beam.io.filesystem module has a bug handling concatenated > compressed files. > The PR I will create has two commits: > # a new unit test that shows the problem > # a fix to the problem. > The unit test is added to the apache_beam.io.filesystem_test module. It was > added to this module because the test: > apache_beam.io.textio_test.test_read_gzip_concat does not encounter the > problem in the Beam 2.11 and earlier code base because the test data is too > small: the data is smaller than read_size, so it goes through logic in the > code that avoids the problem in the code. So, this test sets read_size > smaller and test data bigger, in order to encounter the problem. It would be > difficult to test in the textio_test module, because you'd need very large > test data because default read_size is 16MiB, and the ReadFromText interface > does not allow you to modify the read_size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6952) concatenated compressed files bug with python sdk
[ https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860455#comment-16860455 ] Chamikara Jayalath commented on BEAM-6952: -- Yeah. closing. > concatenated compressed files bug with python sdk > - > > Key: BEAM-6952 > URL: https://issues.apache.org/jira/browse/BEAM-6952 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Daniel Lescohier >Priority: Major > Fix For: 2.14.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The Python apache_beam.io.filesystem module has a bug handling concatenated > compressed files. > The PR I will create has two commits: > # a new unit test that shows the problem > # a fix to the problem. > The unit test is added to the apache_beam.io.filesystem_test module. It was > added to this module because the test: > apache_beam.io.textio_test.test_read_gzip_concat does not encounter the > problem in the Beam 2.11 and earlier code base because the test data is too > small: the data is smaller than read_size, so it goes through logic in the > code that avoids the problem in the code. So, this test sets read_size > smaller and test data bigger, in order to encounter the problem. It would be > difficult to test in the textio_test module, because you'd need very large > test data because default read_size is 16MiB, and the ReadFromText interface > does not allow you to modify the read_size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7399) Publishing a blog post about looping timers.
[ https://issues.apache.org/jira/browse/BEAM-7399?focusedWorklogId=257308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257308 ] ASF GitHub Bot logged work on BEAM-7399: Author: ASF GitHub Bot Created on: 11/Jun/19 00:52 Start Date: 11/Jun/19 00:52 Worklog Time Spent: 10m Work Description: rezarokni commented on issue #8686: [BEAM-7399] Blog post on looping timers URL: https://github.com/apache/beam/pull/8686#issuecomment-500644848 @kennknowles fixed conflict and changed dates. Also when I make use of gradle to see the staged web, although this blog does not show up, I noticed the last blog in master also does not show up. Be interesting to see if on merge this magically appears This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257308) Time Spent: 50m (was: 40m) > Publishing a blog post about looping timers. > - > > Key: BEAM-7399 > URL: https://issues.apache.org/jira/browse/BEAM-7399 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Reza ardeshir rokni >Assignee: Reza ardeshir rokni >Priority: Trivial > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=257299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257299 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 10/Jun/19 23:58 Start Date: 10/Jun/19 23:58 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on issue #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#issuecomment-500635479 @iemejia , I put together my implementation for the Reader with ScanResult (as return type) in the last commit, do you mind to take a look? However the test case of the Reader is failing, so I would love to have your instructions to get it work. And regarding the Writer with PCollection as return type as you requested, I still couldn't get it work (the reasons I explained above). Can you give me some instructions as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257299) Time Spent: 15.5h (was: 15h 20m) > Add DynamoDBIO > -- > > Key: BEAM-7043 > URL: https://issues.apache.org/jira/browse/BEAM-7043 > Project: Beam > Issue Type: New Feature > Components: io-java-aws >Reporter: Cam Mach >Assignee: Cam Mach >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 15.5h > Remaining Estimate: 0h > > Currently we don't have any feature to write data to AWS DynamoDB. This > feature will enable us to send data to DynamoDB -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state
[ https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257298&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257298 ] ASF GitHub Bot logged work on BEAM-7044: Author: ASF GitHub Bot Created on: 10/Jun/19 23:57 Start Date: 10/Jun/19 23:57 Worklog Time Spent: 10m Work Description: ibzib commented on issue #8812: [BEAM-7044] portable Spark: support stateful dofns URL: https://github.com/apache/beam/pull/8812#issuecomment-500635374 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257298) Time Spent: 50m (was: 40m) > Spark portable runner: support user state > - > > Key: BEAM-7044 > URL: https://issues.apache.org/jira/browse/BEAM-7044 > Project: Beam > Issue Type: New Feature > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
[ https://issues.apache.org/jira/browse/BEAM-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860421#comment-16860421 ] Chamikara Jayalath commented on BEAM-7424: -- There are few things that are planned to do. (1) Add retry logic for Java and Python SDKs. (2) For Dataflow runner, plumb throttled time to Dataflow backend to consider when making autoscaling decisions. This bug is for (1), It's great if we can get (2) into 2.14 as well but I'm not sure if timelines will match. > Retry HTTP 429 errors from GCS w/ exponential backoff when reading data > --- > > Key: BEAM-7424 > URL: https://issues.apache.org/jira/browse/BEAM-7424 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, io-python-gcp, sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.14.0 > > > This has to be done for both Java and Python SDKs. > Seems like Java SDK already retries 429 errors w/o backoff (please verify): > [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath reassigned BEAM-3788: Assignee: (was: Chamikara Jayalath) > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860419#comment-16860419 ] Chamikara Jayalath commented on BEAM-3788: -- This is blocked till we have a streaming source framework for Python SDK for portable runners. To this end, Splittable DoFn is currently under development. Note that, for folks using Flink runner, native Kafka source of Flink is currently available to Python SDK users through the cross-language transform API. > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.
[ https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=257293&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257293 ] ASF GitHub Bot logged work on BEAM-6892: Author: ASF GitHub Bot Created on: 10/Jun/19 23:37 Start Date: 10/Jun/19 23:37 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #8135: [BEAM-6892] Adding support for auto-creating buckets for BigQuery file loads URL: https://github.com/apache/beam/pull/8135#issuecomment-500631749 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257293) Time Spent: 7h 10m (was: 7h) > Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS > if not specified by user. > --- > > Key: BEAM-6892 > URL: https://issues.apache.org/jira/browse/BEAM-6892 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.15.0 > > Time Spent: 7h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257289 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 23:28 Start Date: 10/Jun/19 23:28 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8811: [BEAM-7389] Add element-wise transform common code URL: https://github.com/apache/beam/pull/8811#discussion_r292228951 ## File path: sdks/python/apache_beam/examples/snippets/util.py ## @@ -0,0 +1,35 @@ +import argparse Review comment: maybe add a util_test.py for the methods here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257289) Time Spent: 1.5h (was: 1h 20m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state
[ https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257280 ] ASF GitHub Bot logged work on BEAM-7044: Author: ASF GitHub Bot Created on: 10/Jun/19 23:16 Start Date: 10/Jun/19 23:16 Worklog Time Spent: 10m Work Description: ibzib commented on issue #8812: [BEAM-7044] portable Spark: support stateful dofns URL: https://github.com/apache/beam/pull/8812#issuecomment-500627536 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257280) Time Spent: 0.5h (was: 20m) > Spark portable runner: support user state > - > > Key: BEAM-7044 > URL: https://issues.apache.org/jira/browse/BEAM-7044 > Project: Beam > Issue Type: New Feature > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state
[ https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257281 ] ASF GitHub Bot logged work on BEAM-7044: Author: ASF GitHub Bot Created on: 10/Jun/19 23:16 Start Date: 10/Jun/19 23:16 Worklog Time Spent: 10m Work Description: ibzib commented on issue #8812: [BEAM-7044] portable Spark: support stateful dofns URL: https://github.com/apache/beam/pull/8812#issuecomment-500627601 Run Python Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257281) Time Spent: 40m (was: 0.5h) > Spark portable runner: support user state > - > > Key: BEAM-7044 > URL: https://issues.apache.org/jira/browse/BEAM-7044 > Project: Beam > Issue Type: New Feature > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257276 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 23:03 Start Date: 10/Jun/19 23:03 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add element-wise transform code samples URL: https://github.com/apache/beam/pull/8811#issuecomment-500625003 This will add the common files infrastructure, I'll create (and link) other PRs for the rest of the transforms This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257276) Time Spent: 1h 20m (was: 1h 10m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
[ https://issues.apache.org/jira/browse/BEAM-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860399#comment-16860399 ] Anton Kedin commented on BEAM-7424: --- [~chamikara] [~heejong] what needs to be done here? Is the request to add retry logic to python SDK? It's unclear from the description. *(trying to understand the scope of changes for 2.14) > Retry HTTP 429 errors from GCS w/ exponential backoff when reading data > --- > > Key: BEAM-7424 > URL: https://issues.apache.org/jira/browse/BEAM-7424 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, io-python-gcp, sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Blocker > Fix For: 2.14.0 > > > This has to be done for both Java and Python SDKs. > Seems like Java SDK already retries 429 errors w/o backoff (please verify): > [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7511) KafkaTable Initialization
[ https://issues.apache.org/jira/browse/BEAM-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alireza Samadianzakaria resolved BEAM-7511. --- Resolution: Fixed Fix Version/s: 2.14.0 > KafkaTable Initialization > - > > Key: BEAM-7511 > URL: https://issues.apache.org/jira/browse/BEAM-7511 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Fix For: 2.14.0 > > Time Spent: 2h > Remaining Estimate: 0h > > This exception is thrown when a kafka table is created because. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028) > at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251) > at > org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814) > at > org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36) > at > org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76) > > This happens because in > org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not > initialized anywhere and the method updateConsumerProperties is never called. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257259 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 22:35 Start Date: 10/Jun/19 22:35 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add element-wise transform code samples URL: https://github.com/apache/beam/pull/8811#issuecomment-500619425 Alright, I'll try to split it into somewhat related transforms This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257259) Time Spent: 1h 10m (was: 1h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257258 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 22:34 Start Date: 10/Jun/19 22:34 Worklog Time Spent: 10m Work Description: aaltay commented on issue #8811: [BEAM-7389] Add element-wise transform code samples URL: https://github.com/apache/beam/pull/8811#issuecomment-500619228 @davidcavazos can you split this into multiple changes with ideally ~500 line PRs (or not more than 1000). It is tricky to find reviewers for very large changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257258) Time Spent: 1h (was: 50m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state
[ https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257257&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257257 ] ASF GitHub Bot logged work on BEAM-7044: Author: ASF GitHub Bot Created on: 10/Jun/19 22:33 Start Date: 10/Jun/19 22:33 Worklog Time Spent: 10m Work Description: ibzib commented on issue #8812: [BEAM-7044] portable Spark: support stateful dofns URL: https://github.com/apache/beam/pull/8812#issuecomment-500618942 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257257) Time Spent: 20m (was: 10m) > Spark portable runner: support user state > - > > Key: BEAM-7044 > URL: https://issues.apache.org/jira/browse/BEAM-7044 > Project: Beam > Issue Type: New Feature > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.
[ https://issues.apache.org/jira/browse/BEAM-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-6892: Fix Version/s: (was: 2.14.0) 2.15.0 > Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS > if not specified by user. > --- > > Key: BEAM-6892 > URL: https://issues.apache.org/jira/browse/BEAM-6892 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.15.0 > > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly
[ https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-7425: -- Component/s: (was: beam-model) io-java-gcp io-java-avro > Reading BigQuery Table Data into Java Classes(Pojo) Directly > > > Key: BEAM-7425 > URL: https://issues.apache.org/jira/browse/BEAM-7425 > Project: Beam > Issue Type: New Feature > Components: io-java-avro, io-java-gcp >Affects Versions: 2.12.0 > Environment: Dataflow >Reporter: Kishan Kumar >Priority: Major > > While Developing my code I used the below code snippet to read the table data > from BigQuery. > > {code:java} > PCollection gpseEftReasonCodes = input > .apply("Reading xxyyzz", > BigQueryIO > .read(new ReadTable(ReasonCode.class)) > .withoutValidation() > .withTemplateCompatibility() > .fromQuery("Select * from dataset.xxyyzz") > .usingStandardSql() > .withCoder(SerializableCoder.of(xxyyzz.class)) > {code} > Read Table Class: > {code:java} > @DefaultSchema(JavaBeanSchema.class) > public class ReadTable implements SerializableFunction > { > private static final long serialVersionUID = 1L; > private static Gson gson = new Gson(); > public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); > private final Counter countingRecords = > Metrics.counter(ReadTable.class, "Reading Records EFT Report"); > private Class class1; > > public ReadTable(Class class1) { this.class1 = class1; } > > public T apply(SchemaAndRecord schemaAndRecord) { > Map mapping = new HashMap<>(); > int counter = 0; > try { > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > } > countingRecords.inc(); > JsonElement jsonElement = gson.toJsonTree(mapping); > return gson.fromJson(jsonElement, class1); > } catch (Exception mp) { > LOG.error("Found Wrong Mapping for the Record: "+mapping); > mp.printStackTrace(); return null; } > } > } > {code} > So After Reading the data from Bigquery I was mapping data from > SchemaAndRecord to pojo I was getting value for columns whose Data type is > Numeric mention below. > {code} > last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16] > {code} > My Expectation was I will get exact value but getting the HyperByte Buffer > the version I am using is Apache beam 2.12.0. If any more information is > needed then please let me know. > Way 2 Tried: > {code:java} > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > if(f.name().equalsIgnoreCase("reason_code_id")) { > BigDecimal numericValue = new Conversions.DecimalConversion() >.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), > s1.getLogicalType()); >System.out.println("Numeric Con"+numericValue); > } else { > System.out.println("Else Condition "+f.name()); > } > {code} > Facing Issue: > {code} > 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: > RECORD > {code} > > It would be Great if we have a method which maps all the BigQuery Data with > Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only > 5 Column then, in that case, BigQueryIO should map only that 5 Data values > into Java Class and Rest will be Rejected As I am Doing After So much Effort. > Numeric Data Type must be Deserialize by itself while fetching data like > TableRow. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7044) Spark portable runner: support user state
[ https://issues.apache.org/jira/browse/BEAM-7044?focusedWorklogId=257254&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257254 ] ASF GitHub Bot logged work on BEAM-7044: Author: ASF GitHub Bot Created on: 10/Jun/19 22:30 Start Date: 10/Jun/19 22:30 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #8812: [BEAM-7044] portable Spark: support stateful dofns URL: https://github.com/apache/beam/pull/8812 Depends on #8802. We need a new custom group by key function so that we can run the executable stage function separately for each key, while keeping the original key/value pairs. Something I'm not too proud of here is re-using helper functions within typing constraints set by guava `Iterables.transform`. I could've just as easily copied these functions into lambdas or something, so let me know what you think. R: @iemejia @angoenka @mxm Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/
[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly
[ https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-7425: -- Flags: (was: Patch,Important) > Reading BigQuery Table Data into Java Classes(Pojo) Directly > > > Key: BEAM-7425 > URL: https://issues.apache.org/jira/browse/BEAM-7425 > Project: Beam > Issue Type: New Feature > Components: beam-model >Affects Versions: 2.12.0 > Environment: Dataflow >Reporter: Kishan Kumar >Priority: Major > > While Developing my code I used the below code snippet to read the table data > from BigQuery. > > {code:java} > PCollection gpseEftReasonCodes = input > .apply("Reading xxyyzz", > BigQueryIO > .read(new ReadTable(ReasonCode.class)) > .withoutValidation() > .withTemplateCompatibility() > .fromQuery("Select * from dataset.xxyyzz") > .usingStandardSql() > .withCoder(SerializableCoder.of(xxyyzz.class)) > {code} > Read Table Class: > {code:java} > @DefaultSchema(JavaBeanSchema.class) > public class ReadTable implements SerializableFunction > { > private static final long serialVersionUID = 1L; > private static Gson gson = new Gson(); > public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); > private final Counter countingRecords = > Metrics.counter(ReadTable.class, "Reading Records EFT Report"); > private Class class1; > > public ReadTable(Class class1) { this.class1 = class1; } > > public T apply(SchemaAndRecord schemaAndRecord) { > Map mapping = new HashMap<>(); > int counter = 0; > try { > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > } > countingRecords.inc(); > JsonElement jsonElement = gson.toJsonTree(mapping); > return gson.fromJson(jsonElement, class1); > } catch (Exception mp) { > LOG.error("Found Wrong Mapping for the Record: "+mapping); > mp.printStackTrace(); return null; } > } > } > {code} > So After Reading the data from Bigquery I was mapping data from > SchemaAndRecord to pojo I was getting value for columns whose Data type is > Numeric mention below. > {code} > last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16] > {code} > My Expectation was I will get exact value but getting the HyperByte Buffer > the version I am using is Apache beam 2.12.0. If any more information is > needed then please let me know. > Way 2 Tried: > {code:java} > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > if(f.name().equalsIgnoreCase("reason_code_id")) { > BigDecimal numericValue = new Conversions.DecimalConversion() >.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), > s1.getLogicalType()); >System.out.println("Numeric Con"+numericValue); > } else { > System.out.println("Else Condition "+f.name()); > } > {code} > Facing Issue: > {code} > 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: > RECORD > {code} > > It would be Great if we have a method which maps all the BigQuery Data with > Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only > 5 Column then, in that case, BigQueryIO should map only that 5 Data values > into Java Class and Rest will be Rejected As I am Doing After So much Effort. > Numeric Data Type must be Deserialize by itself while fetching data like > TableRow. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly
[ https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-7425: -- Status: Open (was: Triage Needed) > Reading BigQuery Table Data into Java Classes(Pojo) Directly > > > Key: BEAM-7425 > URL: https://issues.apache.org/jira/browse/BEAM-7425 > Project: Beam > Issue Type: New Feature > Components: beam-model >Affects Versions: 2.12.0 > Environment: Dataflow >Reporter: Kishan Kumar >Priority: Major > Labels: newbie, patch > > While Developing my code I used the below code snippet to read the table data > from BigQuery. > > {code:java} > PCollection gpseEftReasonCodes = input > .apply("Reading xxyyzz", > BigQueryIO > .read(new ReadTable(ReasonCode.class)) > .withoutValidation() > .withTemplateCompatibility() > .fromQuery("Select * from dataset.xxyyzz") > .usingStandardSql() > .withCoder(SerializableCoder.of(xxyyzz.class)) > {code} > Read Table Class: > {code:java} > @DefaultSchema(JavaBeanSchema.class) > public class ReadTable implements SerializableFunction > { > private static final long serialVersionUID = 1L; > private static Gson gson = new Gson(); > public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); > private final Counter countingRecords = > Metrics.counter(ReadTable.class, "Reading Records EFT Report"); > private Class class1; > > public ReadTable(Class class1) { this.class1 = class1; } > > public T apply(SchemaAndRecord schemaAndRecord) { > Map mapping = new HashMap<>(); > int counter = 0; > try { > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > } > countingRecords.inc(); > JsonElement jsonElement = gson.toJsonTree(mapping); > return gson.fromJson(jsonElement, class1); > } catch (Exception mp) { > LOG.error("Found Wrong Mapping for the Record: "+mapping); > mp.printStackTrace(); return null; } > } > } > {code} > So After Reading the data from Bigquery I was mapping data from > SchemaAndRecord to pojo I was getting value for columns whose Data type is > Numeric mention below. > {code} > last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16] > {code} > My Expectation was I will get exact value but getting the HyperByte Buffer > the version I am using is Apache beam 2.12.0. If any more information is > needed then please let me know. > Way 2 Tried: > {code:java} > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > if(f.name().equalsIgnoreCase("reason_code_id")) { > BigDecimal numericValue = new Conversions.DecimalConversion() >.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), > s1.getLogicalType()); >System.out.println("Numeric Con"+numericValue); > } else { > System.out.println("Else Condition "+f.name()); > } > {code} > Facing Issue: > {code} > 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: > RECORD > {code} > > It would be Great if we have a method which maps all the BigQuery Data with > Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only > 5 Column then, in that case, BigQueryIO should map only that 5 Data values > into Java Class and Rest will be Rejected As I am Doing After So much Effort. > Numeric Data Type must be Deserialize by itself while fetching data like > TableRow. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly
[ https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-7425: -- Labels: (was: newbie patch) > Reading BigQuery Table Data into Java Classes(Pojo) Directly > > > Key: BEAM-7425 > URL: https://issues.apache.org/jira/browse/BEAM-7425 > Project: Beam > Issue Type: New Feature > Components: beam-model >Affects Versions: 2.12.0 > Environment: Dataflow >Reporter: Kishan Kumar >Priority: Major > > While Developing my code I used the below code snippet to read the table data > from BigQuery. > > {code:java} > PCollection gpseEftReasonCodes = input > .apply("Reading xxyyzz", > BigQueryIO > .read(new ReadTable(ReasonCode.class)) > .withoutValidation() > .withTemplateCompatibility() > .fromQuery("Select * from dataset.xxyyzz") > .usingStandardSql() > .withCoder(SerializableCoder.of(xxyyzz.class)) > {code} > Read Table Class: > {code:java} > @DefaultSchema(JavaBeanSchema.class) > public class ReadTable implements SerializableFunction > { > private static final long serialVersionUID = 1L; > private static Gson gson = new Gson(); > public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); > private final Counter countingRecords = > Metrics.counter(ReadTable.class, "Reading Records EFT Report"); > private Class class1; > > public ReadTable(Class class1) { this.class1 = class1; } > > public T apply(SchemaAndRecord schemaAndRecord) { > Map mapping = new HashMap<>(); > int counter = 0; > try { > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > } > countingRecords.inc(); > JsonElement jsonElement = gson.toJsonTree(mapping); > return gson.fromJson(jsonElement, class1); > } catch (Exception mp) { > LOG.error("Found Wrong Mapping for the Record: "+mapping); > mp.printStackTrace(); return null; } > } > } > {code} > So After Reading the data from Bigquery I was mapping data from > SchemaAndRecord to pojo I was getting value for columns whose Data type is > Numeric mention below. > {code} > last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16] > {code} > My Expectation was I will get exact value but getting the HyperByte Buffer > the version I am using is Apache beam 2.12.0. If any more information is > needed then please let me know. > Way 2 Tried: > {code:java} > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > if(f.name().equalsIgnoreCase("reason_code_id")) { > BigDecimal numericValue = new Conversions.DecimalConversion() >.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), > s1.getLogicalType()); >System.out.println("Numeric Con"+numericValue); > } else { > System.out.println("Else Condition "+f.name()); > } > {code} > Facing Issue: > {code} > 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: > RECORD > {code} > > It would be Great if we have a method which maps all the BigQuery Data with > Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only > 5 Column then, in that case, BigQueryIO should map only that 5 Data values > into Java Class and Rest will be Rejected As I am Doing After So much Effort. > Numeric Data Type must be Deserialize by itself while fetching data like > TableRow. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly
[ https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-7425: -- Fix Version/s: (was: 2.14.0) > Reading BigQuery Table Data into Java Classes(Pojo) Directly > > > Key: BEAM-7425 > URL: https://issues.apache.org/jira/browse/BEAM-7425 > Project: Beam > Issue Type: New Feature > Components: beam-model >Affects Versions: 2.12.0 > Environment: Dataflow >Reporter: Kishan Kumar >Priority: Major > Labels: newbie, patch > > While Developing my code I used the below code snippet to read the table data > from BigQuery. > > {code:java} > PCollection gpseEftReasonCodes = input > .apply("Reading xxyyzz", > BigQueryIO > .read(new ReadTable(ReasonCode.class)) > .withoutValidation() > .withTemplateCompatibility() > .fromQuery("Select * from dataset.xxyyzz") > .usingStandardSql() > .withCoder(SerializableCoder.of(xxyyzz.class)) > {code} > Read Table Class: > {code:java} > @DefaultSchema(JavaBeanSchema.class) > public class ReadTable implements SerializableFunction > { > private static final long serialVersionUID = 1L; > private static Gson gson = new Gson(); > public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); > private final Counter countingRecords = > Metrics.counter(ReadTable.class, "Reading Records EFT Report"); > private Class class1; > > public ReadTable(Class class1) { this.class1 = class1; } > > public T apply(SchemaAndRecord schemaAndRecord) { > Map mapping = new HashMap<>(); > int counter = 0; > try { > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > } > countingRecords.inc(); > JsonElement jsonElement = gson.toJsonTree(mapping); > return gson.fromJson(jsonElement, class1); > } catch (Exception mp) { > LOG.error("Found Wrong Mapping for the Record: "+mapping); > mp.printStackTrace(); return null; } > } > } > {code} > So After Reading the data from Bigquery I was mapping data from > SchemaAndRecord to pojo I was getting value for columns whose Data type is > Numeric mention below. > {code} > last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16] > {code} > My Expectation was I will get exact value but getting the HyperByte Buffer > the version I am using is Apache beam 2.12.0. If any more information is > needed then please let me know. > Way 2 Tried: > {code:java} > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > if(f.name().equalsIgnoreCase("reason_code_id")) { > BigDecimal numericValue = new Conversions.DecimalConversion() >.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), > s1.getLogicalType()); >System.out.println("Numeric Con"+numericValue); > } else { > System.out.println("Else Condition "+f.name()); > } > {code} > Facing Issue: > {code} > 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: > RECORD > {code} > > It would be Great if we have a method which maps all the BigQuery Data with > Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only > 5 Column then, in that case, BigQueryIO should map only that 5 Data values > into Java Class and Rest will be Rejected As I am Doing After So much Effort. > Numeric Data Type must be Deserialize by itself while fetching data like > TableRow. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly
[ https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860390#comment-16860390 ] Anton Kedin commented on BEAM-7425: --- It's not a bug or a feature, it's a user question about how to convert the output of `BigQueryIO.read()` to POJOs. I don't think there can be implemented any concrete solution for this for any specific release, so removing the fix version tag as it shows up during release process and is not actionable by the release owner. I suggest asking on [u...@beam.apache.org|mailto:u...@beam.apache.org] or [d...@beam.apache.org|mailto:d...@beam.apache.org] if the issue is not resolved. Please take a look at examples of how to use the BigQueryIO.read(): [https://beam.apache.org/releases/javadoc/2.13.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html] You can use `readTableRows()` instead and get the parsed values out. Take a look at snippets here: [https://github.com/apache/beam/blob/77cf84c634381495d45a112a9d147ad69394c0d4/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java#L168] Or follow the TableRowParser implementation for an example of how such parser would work: [https://github.com/apache/beam/blob/79d478a83be221461add1501e218b9a4308f9ec8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L449] > Reading BigQuery Table Data into Java Classes(Pojo) Directly > > > Key: BEAM-7425 > URL: https://issues.apache.org/jira/browse/BEAM-7425 > Project: Beam > Issue Type: New Feature > Components: beam-model >Affects Versions: 2.12.0 > Environment: Dataflow >Reporter: Kishan Kumar >Priority: Major > Labels: newbie, patch > Fix For: 2.14.0 > > > While Developing my code I used the below code snippet to read the table data > from BigQuery. > > {code:java} > PCollection gpseEftReasonCodes = input > .apply("Reading xxyyzz", > BigQueryIO > .read(new ReadTable(ReasonCode.class)) > .withoutValidation() > .withTemplateCompatibility() > .fromQuery("Select * from dataset.xxyyzz") > .usingStandardSql() > .withCoder(SerializableCoder.of(xxyyzz.class)) > {code} > Read Table Class: > {code:java} > @DefaultSchema(JavaBeanSchema.class) > public class ReadTable implements SerializableFunction > { > private static final long serialVersionUID = 1L; > private static Gson gson = new Gson(); > public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); > private final Counter countingRecords = > Metrics.counter(ReadTable.class, "Reading Records EFT Report"); > private Class class1; > > public ReadTable(Class class1) { this.class1 = class1; } > > public T apply(SchemaAndRecord schemaAndRecord) { > Map mapping = new HashMap<>(); > int counter = 0; > try { > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > } > countingRecords.inc(); > JsonElement jsonElement = gson.toJsonTree(mapping); > return gson.fromJson(jsonElement, class1); > } catch (Exception mp) { > LOG.error("Found Wrong Mapping for the Record: "+mapping); > mp.printStackTrace(); return null; } > } > } > {code} > So After Reading the data from Bigquery I was mapping data from > SchemaAndRecord to pojo I was getting value for columns whose Data type is > Numeric mention below. > {code} > last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16] > {code} > My Expectation was I will get exact value but getting the HyperByte Buffer > the version I am using is Apache beam 2.12.0. If any more information is > needed then please let me know. > Way 2 Tried: > {code:java} > GenericRecord s = schemaAndRecord.getRecord(); > org.apache.avro.Schema s1 = s.getSchema(); > for (Field f : s1.getFields()) { > counter++; > mapping.put(f.name(), null==s.get(f.name()) ? null : > String.valueOf(s.get(counter))); > if(f.name().equalsIgnoreCase("reason_code_id")) { > BigDecimal numericValue = new Conversions.DecimalConversion() >.fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), > s1.getLogicalType()); >System.out.println("Numeric Con"+numericValue); > } else { > System.out.println("Else Condition "+f.name()); > } > {code} > Facing Issue: > {code} > 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: > RECORD > {code} > > It would be Great if we have a method which maps all the BigQue
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257249&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257249 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 22:22 Start Date: 10/Jun/19 22:22 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add element-wise transform code samples URL: https://github.com/apache/beam/pull/8811#issuecomment-500616314 R: @aaltay R: @chamikaramj R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257249) Time Spent: 40m (was: 0.5h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257251 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 22:22 Start Date: 10/Jun/19 22:22 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add element-wise transform code samples URL: https://github.com/apache/beam/pull/8811#issuecomment-500616314 R: @aaltay R: @chamikaramj R: @pabloem Please review whenever you have a chance. It's a long list of changes, but it's mostly very simple code, so it shouldn't take too long. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257251) Time Spent: 50m (was: 40m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257248 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 22:21 Start Date: 10/Jun/19 22:21 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add element-wise transform code samples URL: https://github.com/apache/beam/pull/8811#issuecomment-500616084 R: @aaltay @chamikaramj @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257248) Time Spent: 0.5h (was: 20m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257247 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 22:21 Start Date: 10/Jun/19 22:21 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #8811: [BEAM-7389] Add element-wise transform code samples URL: https://github.com/apache/beam/pull/8811#issuecomment-500616084 R: @aaltay @chamikaramj @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257247) Time Spent: 20m (was: 10m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=257246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257246 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 10/Jun/19 22:20 Start Date: 10/Jun/19 22:20 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #8811: [BEAM-7389] Add element-wise transform code samples URL: https://github.com/apache/beam/pull/8811 This adds Python code snippets for the element-wise transforms. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastComplet
[jira] [Assigned] (BEAM-7525) support hash_fn for Python ApproximateUniqueCombineFn transform
[ https://issues.apache.org/jira/browse/BEAM-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang reassigned BEAM-7525: -- Assignee: Hannah Jiang > support hash_fn for Python ApproximateUniqueCombineFn transform > --- > > Key: BEAM-7525 > URL: https://issues.apache.org/jira/browse/BEAM-7525 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Minor > > ApproximateUniqueCombineFn is using default hash function for estimation. > We can pass a hash_fn and overwrite default hash function to support better > estimation performance. > Parent issue: BEAM-6693 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort
[ https://issues.apache.org/jira/browse/BEAM-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-5519: --- Fix Version/s: (was: 2.14.0) 2.15.0 > Spark Streaming Duplicated Encoding/Decoding Effort > --- > > Key: BEAM-5519 > URL: https://issues.apache.org/jira/browse/BEAM-5519 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Labels: spark, spark-streaming > Fix For: 2.15.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > When using the SparkRunner in streaming mode. There is a call to groupByKey > followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this > used to cause 2 shuffles but it still causes 2 encode/decode cycles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk
[ https://issues.apache.org/jira/browse/BEAM-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860383#comment-16860383 ] Ismaël Mejía commented on BEAM-5775: No, we still need to evaluate the performance consequences of the change, moving to next release. > Make the spark runner not serialize data unless spark is spilling to disk > - > > Key: BEAM-5775 > URL: https://issues.apache.org/jira/browse/BEAM-5775 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Mike Kaplinskiy >Assignee: Mike Kaplinskiy >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 11h 10m > Remaining Estimate: 0h > > Currently for storage level MEMORY_ONLY, Beam does not coder-ify the data. > This lets Spark keep the data in memory avoiding the serialization round > trip. Unfortunately the logic is fairly coarse - as soon as you switch to > MEMORY_AND_DISK, Beam coder-ifys the data even though Spark might have chosen > to keep the data in memory, incurring the serialization overhead. > > Ideally Beam would serialize the data lazily - as Spark chooses to spill to > disk. This would be a change in behavior when using beam, but luckily Spark > has a solution for folks that want data serialized in memory - > MEMORY_AND_DISK_SER will keep the data serialized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk
[ https://issues.apache.org/jira/browse/BEAM-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-5775: --- Fix Version/s: (was: 2.14.0) 2.15.0 > Make the spark runner not serialize data unless spark is spilling to disk > - > > Key: BEAM-5775 > URL: https://issues.apache.org/jira/browse/BEAM-5775 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Mike Kaplinskiy >Assignee: Mike Kaplinskiy >Priority: Minor > Fix For: 2.15.0 > > Time Spent: 11h 10m > Remaining Estimate: 0h > > Currently for storage level MEMORY_ONLY, Beam does not coder-ify the data. > This lets Spark keep the data in memory avoiding the serialization round > trip. Unfortunately the logic is fairly coarse - as soon as you switch to > MEMORY_AND_DISK, Beam coder-ifys the data even though Spark might have chosen > to keep the data in memory, incurring the serialization overhead. > > Ideally Beam would serialize the data lazily - as Spark chooses to spill to > disk. This would be a change in behavior when using beam, but luckily Spark > has a solution for folks that want data serialized in memory - > MEMORY_AND_DISK_SER will keep the data serialized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort
[ https://issues.apache.org/jira/browse/BEAM-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860355#comment-16860355 ] Anton Kedin commented on BEAM-5519: --- It doesn't look like this should be treated as a blocker for 2.14 as well? If so the I suggest removing the concrete fix version > Spark Streaming Duplicated Encoding/Decoding Effort > --- > > Key: BEAM-5519 > URL: https://issues.apache.org/jira/browse/BEAM-5519 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Labels: spark, spark-streaming > Fix For: 2.14.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > When using the SparkRunner in streaming mode. There is a call to groupByKey > followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this > used to cause 2 shuffles but it still causes 2 encode/decode cycles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257222 ] ASF GitHub Bot logged work on BEAM-6693: Author: ASF GitHub Bot Created on: 10/Jun/19 21:38 Start Date: 10/Jun/19 21:38 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8799: [BEAM-6693] replace mmh3 with default hash function URL: https://github.com/apache/beam/pull/8799 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257222) Time Spent: 12.5h (was: 12h 20m) > ApproximateUnique transform for Python SDK > -- > > Key: BEAM-6693 > URL: https://issues.apache.org/jira/browse/BEAM-6693 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Hannah Jiang >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 12.5h > Remaining Estimate: 0h > > Add a PTransform for estimating the number of distinct elements in a > PCollection and the number of distinct values associated with each key in a > PCollection KVs. > it should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6928) Make Python SDK custom Sink the default Sink for BigQuery
[ https://issues.apache.org/jira/browse/BEAM-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-6928: Fix Version/s: (was: 2.14.0) 2.15.0 > Make Python SDK custom Sink the default Sink for BigQuery > - > > Key: BEAM-6928 > URL: https://issues.apache.org/jira/browse/BEAM-6928 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.15.0 > > > This is for 2.14.0 - please bump version to 2.14.0 when doing 2.13.0 release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk
[ https://issues.apache.org/jira/browse/BEAM-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860346#comment-16860346 ] Anton Kedin commented on BEAM-5775: --- Is this on track for 2.14 release (branch cut next week)? > Make the spark runner not serialize data unless spark is spilling to disk > - > > Key: BEAM-5775 > URL: https://issues.apache.org/jira/browse/BEAM-5775 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Mike Kaplinskiy >Assignee: Mike Kaplinskiy >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 11h 10m > Remaining Estimate: 0h > > Currently for storage level MEMORY_ONLY, Beam does not coder-ify the data. > This lets Spark keep the data in memory avoiding the serialization round > trip. Unfortunately the logic is fairly coarse - as soon as you switch to > MEMORY_AND_DISK, Beam coder-ifys the data even though Spark might have chosen > to keep the data in memory, incurring the serialization overhead. > > Ideally Beam would serialize the data lazily - as Spark chooses to spill to > disk. This would be a change in behavior when using beam, but luckily Spark > has a solution for folks that want data serialized in memory - > MEMORY_AND_DISK_SER will keep the data serialized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.
[ https://issues.apache.org/jira/browse/BEAM-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860345#comment-16860345 ] Anton Kedin commented on BEAM-6892: --- One of the linked PRs hasn't been touched in a while. Does it make sense to keep targeting 2.14 (branch will be cut next week)? > Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS > if not specified by user. > --- > > Key: BEAM-6892 > URL: https://issues.apache.org/jira/browse/BEAM-6892 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.14.0 > > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7511) KafkaTable Initialization
[ https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=257215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257215 ] ASF GitHub Bot logged work on BEAM-7511: Author: ASF GitHub Bot Created on: 10/Jun/19 21:25 Start Date: 10/Jun/19 21:25 Worklog Time Spent: 10m Work Description: akedin commented on pull request #8797: [BEAM-7511] Fixes the bug in KafkaTable Initialization. URL: https://github.com/apache/beam/pull/8797 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257215) Time Spent: 2h (was: 1h 50m) > KafkaTable Initialization > - > > Key: BEAM-7511 > URL: https://issues.apache.org/jira/browse/BEAM-7511 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > This exception is thrown when a kafka table is created because. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028) > at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251) > at > org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814) > at > org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36) > at > org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76) > > This happens because in > org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not > initialized anywhere and the method updateConsumerProperties is never called. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6952) concatenated compressed files bug with python sdk
[ https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860341#comment-16860341 ] Anton Kedin commented on BEAM-6952: --- This looks fixed, both PRs are merged now. Safe to close? > concatenated compressed files bug with python sdk > - > > Key: BEAM-6952 > URL: https://issues.apache.org/jira/browse/BEAM-6952 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Daniel Lescohier >Priority: Major > Fix For: 2.14.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The Python apache_beam.io.filesystem module has a bug handling concatenated > compressed files. > The PR I will create has two commits: > # a new unit test that shows the problem > # a fix to the problem. > The unit test is added to the apache_beam.io.filesystem_test module. It was > added to this module because the test: > apache_beam.io.textio_test.test_read_gzip_concat does not encounter the > problem in the Beam 2.11 and earlier code base because the test data is too > small: the data is smaller than read_size, so it goes through logic in the > code that avoids the problem in the code. So, this test sets read_size > smaller and test data bigger, in order to encounter the problem. It would be > difficult to test in the textio_test module, because you'd need very large > test data because default read_size is 16MiB, and the ReadFromText interface > does not allow you to modify the read_size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7144) Job re-scale fails on Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860339#comment-16860339 ] Anton Kedin commented on BEAM-7144: --- If this is present in all Beam releases since few versions back, do we still want to target/block a specific Beam release and keep pushing it out? > Job re-scale fails on Flink 1.7 > --- > > Key: BEAM-7144 > URL: https://issues.apache.org/jira/browse/BEAM-7144 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.11.0 >Reporter: Jozef Vilcek >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.14.0 > > > I am unable to rescale job after moving it to flink runner 1.7. What I am > doing is: > # Recompile job code just with swapped flink runner version 1.5 -> 1.7 > # Run streaming job with parallelism 112 and maxParallelism 448 > # Wait until checkpoint is taken > # Stop job > # Run job again with parallelims 224 and checpooint path to restore from > # Job fails > The same happens if I try to increase parallelims. This procedure works for > the same job compiled with flink runner 1.5 and run on 1.5.0. Fails with > runner 1.7 on flink 1.7.2 > Exception is: > {noformat} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for > WindowDoFnOperator_2b6af61dc418f10e82551367a7e7f78e_(83/224) from any of the > 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:284) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 5 more > Caused by: java.lang.IndexOutOfBoundsException: Index: 101, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42) > at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759) > at > org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:315) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:73) > at > org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readKeyGroupStateData(HeapKeyedStateBackend.java:492) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:453) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:410) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:358) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:104) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123) > ... 7 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanay Tummalapalli resolved BEAM-7437. -- Resolution: Fixed Thank You [~kedin] Yes, the PR merged resolved this. > Integration Test for BQ streaming inserts for streaming pipelines > - > > Key: BEAM-7437 > URL: https://issues.apache.org/jira/browse/BEAM-7437 > Project: Beam > Issue Type: Test > Components: io-python-gcp >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Labels: test > Fix For: 2.14.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Integration Test for BigQuery Sink using Streaming Inserts for streaming > pipelines. > Integration tests currently exist for batch pipelines, it can also be added > for streaming pipelines using TestStream. This will be a precursor to the > failing integration test to be added for [BEAM-6611| > https://issues.apache.org/jira/browse/BEAM-6611]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly
[ https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-7425: -- Description: While Developing my code I used the below code snippet to read the table data from BigQuery. {code:java} PCollection gpseEftReasonCodes = input .apply("Reading xxyyzz", BigQueryIO .read(new ReadTable(ReasonCode.class)) .withoutValidation() .withTemplateCompatibility() .fromQuery("Select * from dataset.xxyyzz") .usingStandardSql() .withCoder(SerializableCoder.of(xxyyzz.class)) {code} Read Table Class: {code:java} @DefaultSchema(JavaBeanSchema.class) public class ReadTable implements SerializableFunction { private static final long serialVersionUID = 1L; private static Gson gson = new Gson(); public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); private final Counter countingRecords = Metrics.counter(ReadTable.class, "Reading Records EFT Report"); private Class class1; public ReadTable(Class class1) { this.class1 = class1; } public T apply(SchemaAndRecord schemaAndRecord) { Map mapping = new HashMap<>(); int counter = 0; try { GenericRecord s = schemaAndRecord.getRecord(); org.apache.avro.Schema s1 = s.getSchema(); for (Field f : s1.getFields()) { counter++; mapping.put(f.name(), null==s.get(f.name()) ? null : String.valueOf(s.get(counter))); } countingRecords.inc(); JsonElement jsonElement = gson.toJsonTree(mapping); return gson.fromJson(jsonElement, class1); } catch (Exception mp) { LOG.error("Found Wrong Mapping for the Record: "+mapping); mp.printStackTrace(); return null; } } } {code} So After Reading the data from Bigquery I was mapping data from SchemaAndRecord to pojo I was getting value for columns whose Data type is Numeric mention below. {code} last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16] {code} My Expectation was I will get exact value but getting the HyperByte Buffer the version I am using is Apache beam 2.12.0. If any more information is needed then please let me know. Way 2 Tried: {code:java} GenericRecord s = schemaAndRecord.getRecord(); org.apache.avro.Schema s1 = s.getSchema(); for (Field f : s1.getFields()) { counter++; mapping.put(f.name(), null==s.get(f.name()) ? null : String.valueOf(s.get(counter))); if(f.name().equalsIgnoreCase("reason_code_id")) { BigDecimal numericValue = new Conversions.DecimalConversion() .fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), s1.getLogicalType()); System.out.println("Numeric Con"+numericValue); } else { System.out.println("Else Condition "+f.name()); } {code} Facing Issue: {code} 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: RECORD {code} It would be Great if we have a method which maps all the BigQuery Data with Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 5 Column then, in that case, BigQueryIO should map only that 5 Data values into Java Class and Rest will be Rejected As I am Doing After So much Effort. Numeric Data Type must be Deserialize by itself while fetching data like TableRow. was: While Developing my code I used the below code snippet to read the table data from BigQuery. PCollection gpseEftReasonCodes = input. apply("Reading xxyyzz", BigQueryIO.read(new ReadTable(ReasonCode.class)) .withoutValidation().withTemplateCompatibility() .fromQuery("Select * from dataset.xxyyzz").usingStandardSql() .withCoder(SerializableCoder.of(xxyyzz.class)) Read Table Class: @DefaultSchema(JavaBeanSchema.class) public class ReadTable implements SerializableFunction { private static final long serialVersionUID = 1L; private static Gson gson = new Gson(); public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); private final Counter countingRecords = Metrics.counter(ReadTable.class,"Reading Records EFT Report"); private Class class1; public ReadTable(Class class1) { this.class1 = class1; } public T apply(SchemaAndRecord schemaAndRecord) { Map mapping = new HashMap<>(); int counter = 0; try { GenericRecord s = schemaAndRecord.getRecord(); org.apache.avro.Schema s1 = s.getSchema(); for (Field f : s1.getFields()) { counter++; mapping.put(f.name(), null==s.get(f.name())?null:String.valueOf(s.get(counter))); } countingRecords.inc(); JsonElement jsonElement = gson.toJsonTree(mapping); return gson.fromJson(jsonElement, class1); }catch(Exception mp) { LOG.error("Found Wrong Mapping for the Record: "+mapping); mp.printStackTrace(); return null; } } } So After Reading the data from Bigquery I was mapping data from SchemaAndRecord to pojo I was getting value for columns whose Data ty
[jira] [Created] (BEAM-7525) support hash_fn for Python ApproximateUniqueCombineFn transform
Hannah Jiang created BEAM-7525: -- Summary: support hash_fn for Python ApproximateUniqueCombineFn transform Key: BEAM-7525 URL: https://issues.apache.org/jira/browse/BEAM-7525 Project: Beam Issue Type: Task Components: sdk-py-core Reporter: Hannah Jiang ApproximateUniqueCombineFn is using default hash function for estimation. We can pass a hash_fn and overwrite default hash function to support better estimation performance. Parent issue: BEAM-6693 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257193 ] ASF GitHub Bot logged work on BEAM-6693: Author: ASF GitHub Bot Created on: 10/Jun/19 20:57 Start Date: 10/Jun/19 20:57 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #8799: [BEAM-6693] replace mmh3 with default hash function URL: https://github.com/apache/beam/pull/8799#discussion_r292188523 ## File path: sdks/python/apache_beam/transforms/stats.py ## @@ -214,7 +212,7 @@ def create_accumulator(self, *args, **kwargs): Review comment: Ticket is created: https://issues.apache.org/jira/browse/BEAM-7525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257193) Time Spent: 12h 20m (was: 12h 10m) > ApproximateUnique transform for Python SDK > -- > > Key: BEAM-6693 > URL: https://issues.apache.org/jira/browse/BEAM-6693 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Hannah Jiang >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 12h 20m > Remaining Estimate: 0h > > Add a PTransform for estimating the number of distinct elements in a > PCollection and the number of distinct values associated with each key in a > PCollection KVs. > it should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-7524) Update Python dependencies page for 2.13.0
[ https://issues.apache.org/jira/browse/BEAM-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rose Nguyen closed BEAM-7524. - Resolution: Fixed Fix Version/s: Not applicable > Update Python dependencies page for 2.13.0 > -- > > Key: BEAM-7524 > URL: https://issues.apache.org/jira/browse/BEAM-7524 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: Rose Nguyen >Priority: Minor > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > Update Python dependencies page for 2.13.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7524) Update Python dependencies page for 2.13.0
[ https://issues.apache.org/jira/browse/BEAM-7524?focusedWorklogId=257190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257190 ] ASF GitHub Bot logged work on BEAM-7524: Author: ASF GitHub Bot Created on: 10/Jun/19 20:49 Start Date: 10/Jun/19 20:49 Worklog Time Spent: 10m Work Description: melap commented on pull request #8810: [BEAM-7524] Update Python dependencies page for 2.13.0 URL: https://github.com/apache/beam/pull/8810 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257190) Time Spent: 0.5h (was: 20m) > Update Python dependencies page for 2.13.0 > -- > > Key: BEAM-7524 > URL: https://issues.apache.org/jira/browse/BEAM-7524 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: Rose Nguyen >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Update Python dependencies page for 2.13.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7524) Update Python dependencies page for 2.13.0
[ https://issues.apache.org/jira/browse/BEAM-7524?focusedWorklogId=257189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257189 ] ASF GitHub Bot logged work on BEAM-7524: Author: ASF GitHub Bot Created on: 10/Jun/19 20:47 Start Date: 10/Jun/19 20:47 Worklog Time Spent: 10m Work Description: rosetn commented on issue #8810: [BEAM-7524] Update Python dependencies page for 2.13.0 URL: https://github.com/apache/beam/pull/8810#issuecomment-500588383 @melap PTAL STAGED: http://apache-beam-website-pull-requests.storage.googleapis.com/8810/documentation/sdks/python-dependencies/index.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257189) Time Spent: 20m (was: 10m) > Update Python dependencies page for 2.13.0 > -- > > Key: BEAM-7524 > URL: https://issues.apache.org/jira/browse/BEAM-7524 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: Rose Nguyen >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Update Python dependencies page for 2.13.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257187&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257187 ] ASF GitHub Bot logged work on BEAM-6693: Author: ASF GitHub Bot Created on: 10/Jun/19 20:43 Start Date: 10/Jun/19 20:43 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8799: [BEAM-6693] replace mmh3 with default hash function URL: https://github.com/apache/beam/pull/8799#discussion_r292183173 ## File path: sdks/python/apache_beam/transforms/stats.py ## @@ -214,7 +212,7 @@ def create_accumulator(self, *args, **kwargs): Review comment: Suggestion (Maybe add a JIRA (does not have to be fixed in this PR) for future): - Pass a hash_fn argument to ApproximateUniqueCombineFn, that can be passed by users to customize the hash_fn they would like to use. It can default to `hash` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257187) Time Spent: 12h 10m (was: 12h) > ApproximateUnique transform for Python SDK > -- > > Key: BEAM-6693 > URL: https://issues.apache.org/jira/browse/BEAM-6693 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Hannah Jiang >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 12h 10m > Remaining Estimate: 0h > > Add a PTransform for estimating the number of distinct elements in a > PCollection and the number of distinct values associated with each key in a > PCollection KVs. > it should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7524) Update Python dependencies page for 2.13.0
[ https://issues.apache.org/jira/browse/BEAM-7524?focusedWorklogId=257184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257184 ] ASF GitHub Bot logged work on BEAM-7524: Author: ASF GitHub Bot Created on: 10/Jun/19 20:42 Start Date: 10/Jun/19 20:42 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #8810: [BEAM-7524] Update Python dependencies page for 2.13.0 URL: https://github.com/apache/beam/pull/8810 Update Python dependencies page for 2.13.0 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](ht
[jira] [Updated] (BEAM-7511) KafkaTable Initialization
[ https://issues.apache.org/jira/browse/BEAM-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-7511: -- Priority: Major (was: Minor) > KafkaTable Initialization > - > > Key: BEAM-7511 > URL: https://issues.apache.org/jira/browse/BEAM-7511 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > This exception is thrown when a kafka table is created because. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028) > at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251) > at > org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814) > at > org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36) > at > org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76) > > This happens because in > org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not > initialized anywhere and the method updateConsumerProperties is never called. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860315#comment-16860315 ] Anton Kedin commented on BEAM-7437: --- Is this fixed by [https://github.com/apache/beam/pull/8748] ? > Integration Test for BQ streaming inserts for streaming pipelines > - > > Key: BEAM-7437 > URL: https://issues.apache.org/jira/browse/BEAM-7437 > Project: Beam > Issue Type: Test > Components: io-python-gcp >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Labels: test > Fix For: 2.14.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Integration Test for BigQuery Sink using Streaming Inserts for streaming > pipelines. > Integration tests currently exist for batch pipelines, it can also be added > for streaming pipelines using TestStream. This will be a precursor to the > failing integration test to be added for [BEAM-6611| > https://issues.apache.org/jira/browse/BEAM-6611]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7513) Row Estimation for BigQueryTable
[ https://issues.apache.org/jira/browse/BEAM-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-7513: -- Priority: Major (was: Minor) > Row Estimation for BigQueryTable > > > Key: BEAM-7513 > URL: https://issues.apache.org/jira/browse/BEAM-7513 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, io-java-gcp >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > > Calcite tables (org.apache.calcite.schema.Table) should implement the method > org.apache.calcite.schema.Statistic getStatistic(). The Statistic instance > returned by this method is used for the Volcano optimizer in Calcite. > Currently, org.apache.beam.sdk.extensions.sql.impl.BeamCalciteTable has not > implemented getStatistic() which means it uses the implementation in > org.apache.calcite.schema.impl.AbstractTable and that implementation just > returns Statistics.UNKNOWN for all sources. > > Things needed to be implemented: > 1- Implementing getStatistic in BeamCalciteTable such that it calls a row > count estimation method from BeamSqlTable and adding this method to > BeamSqlTable. > 2- Implementing the row count estimation method for BigQueryTable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7513) Row Estimation for BigQueryTable
[ https://issues.apache.org/jira/browse/BEAM-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-7513: -- Status: Open (was: Triage Needed) > Row Estimation for BigQueryTable > > > Key: BEAM-7513 > URL: https://issues.apache.org/jira/browse/BEAM-7513 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, io-java-gcp >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > > Calcite tables (org.apache.calcite.schema.Table) should implement the method > org.apache.calcite.schema.Statistic getStatistic(). The Statistic instance > returned by this method is used for the Volcano optimizer in Calcite. > Currently, org.apache.beam.sdk.extensions.sql.impl.BeamCalciteTable has not > implemented getStatistic() which means it uses the implementation in > org.apache.calcite.schema.impl.AbstractTable and that implementation just > returns Statistics.UNKNOWN for all sources. > > Things needed to be implemented: > 1- Implementing getStatistic in BeamCalciteTable such that it calls a row > count estimation method from BeamSqlTable and adding this method to > BeamSqlTable. > 2- Implementing the row count estimation method for BigQueryTable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7516) Add a watermark manager for the fn_api_runner
[ https://issues.apache.org/jira/browse/BEAM-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-7516: -- Status: Open (was: Triage Needed) > Add a watermark manager for the fn_api_runner > - > > Key: BEAM-7516 > URL: https://issues.apache.org/jira/browse/BEAM-7516 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > > To track watermarks for each stage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7511) KafkaTable Initialization
[ https://issues.apache.org/jira/browse/BEAM-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-7511: -- Status: Open (was: Triage Needed) > KafkaTable Initialization > - > > Key: BEAM-7511 > URL: https://issues.apache.org/jira/browse/BEAM-7511 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > This exception is thrown when a kafka table is created because. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028) > at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251) > at > org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814) > at > org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36) > at > org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76) > > This happens because in > org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not > initialized anywhere and the method updateConsumerProperties is never called. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7502) Create ParDo Python Load Test Jenkins Job
[ https://issues.apache.org/jira/browse/BEAM-7502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-7502: -- Status: Open (was: Triage Needed) > Create ParDo Python Load Test Jenkins Job > - > > Key: BEAM-7502 > URL: https://issues.apache.org/jira/browse/BEAM-7502 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7524) Update Python dependencies page for 2.13.0
Rose Nguyen created BEAM-7524: - Summary: Update Python dependencies page for 2.13.0 Key: BEAM-7524 URL: https://issues.apache.org/jira/browse/BEAM-7524 Project: Beam Issue Type: Improvement Components: website Reporter: Rose Nguyen Assignee: Rose Nguyen Update Python dependencies page for 2.13.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257176 ] ASF GitHub Bot logged work on BEAM-6693: Author: ASF GitHub Bot Created on: 10/Jun/19 20:34 Start Date: 10/Jun/19 20:34 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #8799: [BEAM-6693] replace mmh3 with default hash function URL: https://github.com/apache/beam/pull/8799#discussion_r292179692 ## File path: sdks/python/apache_beam/transforms/stats_test.py ## @@ -266,27 +251,7 @@ def test_approximate_unique_global_by_error_with_samll_population(self): >> beam.ApproximateUnique.Globally(error=est_err)) assert_that(result, equal_to([actual_count]), -label='assert:global_by_error_with_samll_population') -pipeline.run() - - def test_approximate_unique_global_by_error_with_big_population(self): Review comment: This test is more for estimation algorithm performance testing, rather than for testing functionality. With py3 default hash algorithm, it would fail sometimes (2%) because it is out of estimation error range, so I decided to remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257176) Time Spent: 12h (was: 11h 50m) > ApproximateUnique transform for Python SDK > -- > > Key: BEAM-6693 > URL: https://issues.apache.org/jira/browse/BEAM-6693 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Hannah Jiang >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 12h > Remaining Estimate: 0h > > Add a PTransform for estimating the number of distinct elements in a > PCollection and the number of distinct values associated with each key in a > PCollection KVs. > it should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257169 ] ASF GitHub Bot logged work on BEAM-6693: Author: ASF GitHub Bot Created on: 10/Jun/19 20:25 Start Date: 10/Jun/19 20:25 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8799: [BEAM-6693] replace mmh3 with default hash function URL: https://github.com/apache/beam/pull/8799#discussion_r292175998 ## File path: sdks/python/apache_beam/transforms/stats_test.py ## @@ -266,27 +251,7 @@ def test_approximate_unique_global_by_error_with_samll_population(self): >> beam.ApproximateUnique.Globally(error=est_err)) assert_that(result, equal_to([actual_count]), -label='assert:global_by_error_with_samll_population') -pipeline.run() - - def test_approximate_unique_global_by_error_with_big_population(self): Review comment: Why are you removing this test? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257169) Time Spent: 11h 40m (was: 11.5h) > ApproximateUnique transform for Python SDK > -- > > Key: BEAM-6693 > URL: https://issues.apache.org/jira/browse/BEAM-6693 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Hannah Jiang >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 11h 40m > Remaining Estimate: 0h > > Add a PTransform for estimating the number of distinct elements in a > PCollection and the number of distinct values associated with each key in a > PCollection KVs. > it should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=257170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257170 ] ASF GitHub Bot logged work on BEAM-6693: Author: ASF GitHub Bot Created on: 10/Jun/19 20:25 Start Date: 10/Jun/19 20:25 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8799: [BEAM-6693] replace mmh3 with default hash function URL: https://github.com/apache/beam/pull/8799#issuecomment-500580749 Thanks Hannah. This LGTM. I just left one question. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257170) Time Spent: 11h 50m (was: 11h 40m) > ApproximateUnique transform for Python SDK > -- > > Key: BEAM-6693 > URL: https://issues.apache.org/jira/browse/BEAM-6693 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Hannah Jiang >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 11h 50m > Remaining Estimate: 0h > > Add a PTransform for estimating the number of distinct elements in a > PCollection and the number of distinct values associated with each key in a > PCollection KVs. > it should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7450) Unbounded HCatalogIO Reader using splittable pardos
[ https://issues.apache.org/jira/browse/BEAM-7450?focusedWorklogId=257068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257068 ] ASF GitHub Bot logged work on BEAM-7450: Author: ASF GitHub Bot Created on: 10/Jun/19 19:01 Start Date: 10/Jun/19 19:01 Worklog Time Spent: 10m Work Description: jhalaria commented on issue #8718: [BEAM-7450] Add an unbounded HcatalogIO reader using splittable pardo URL: https://github.com/apache/beam/pull/8718#issuecomment-500535619 @iemejia - I think once this PR gets merged, the next step will be to clean up the multiple reads method that we currently have. Moving forward there will only be one Read class that can read both bounded and unbounded data depending on the parameters passed. Do you think we can tackle that as the next step or if not what would make it cleaner in this PR? When I am extending from Read, I have to add the boilerplate code for the properties inside Read. I think that we can do in this PR is to have only 1 Reads class and then within the expand, call the DoFn's needed for unbounded reads if only the right arguments are set. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257068) Time Spent: 4h (was: 3h 50m) > Unbounded HCatalogIO Reader using splittable pardos > --- > > Key: BEAM-7450 > URL: https://issues.apache.org/jira/browse/BEAM-7450 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Ankit Jhalaria >Assignee: Ankit Jhalaria >Priority: Minor > Time Spent: 4h > Remaining Estimate: 0h > > # Current version of HcatalogIO is a bounded source. > # While migrating our jobs to aws, we realized that it would be helpful to > have an unbounded hcat reader that can behave as an unbounded source and > polls for new partitions as and when they become available. > # I have used splittable pardo(s) to do this. There is a flag that can be > set to treat this as a bounded source which will terminate if that flag is > set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7450) Unbounded HCatalogIO Reader using splittable pardos
[ https://issues.apache.org/jira/browse/BEAM-7450?focusedWorklogId=257067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257067 ] ASF GitHub Bot logged work on BEAM-7450: Author: ASF GitHub Bot Created on: 10/Jun/19 19:00 Start Date: 10/Jun/19 19:00 Worklog Time Spent: 10m Work Description: jhalaria commented on issue #8718: [BEAM-7450] Add an unbounded HcatalogIO reader using splittable pardo URL: https://github.com/apache/beam/pull/8718#issuecomment-500535619 @iemejia - I think once this PR gets merged, the next step will be to clean up the multiple reads method that we currently have. Moving forward there will only be one Read class that can read both bounded and unbounded data depending on the parameters passed. Do you think we can tackle that as the next step or if not what would make it cleaner in this PR? When I am extending from Read, I have to add the boilerplate code for the properties inside Read. I thing that we can do in this PR itself is to have only 1 Reads class and then within the expand, call the DoFn's needed for unbounded reads if only the right arguments are set. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257067) Time Spent: 3h 50m (was: 3h 40m) > Unbounded HCatalogIO Reader using splittable pardos > --- > > Key: BEAM-7450 > URL: https://issues.apache.org/jira/browse/BEAM-7450 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Ankit Jhalaria >Assignee: Ankit Jhalaria >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > # Current version of HcatalogIO is a bounded source. > # While migrating our jobs to aws, we realized that it would be helpful to > have an unbounded hcat reader that can behave as an unbounded source and > polls for new partitions as and when they become available. > # I have used splittable pardo(s) to do this. There is a flag that can be > set to treat this as a bounded source which will terminate if that flag is > set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7462) Add Sampled Byte Count Metric to the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-7462?focusedWorklogId=257055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257055 ] ASF GitHub Bot logged work on BEAM-7462: Author: ASF GitHub Bot Created on: 10/Jun/19 18:31 Start Date: 10/Jun/19 18:31 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8809: [WIP] [BEAM-7462] Update java SDK to report the SampledByteCount counter. URL: https://github.com/apache/beam/pull/8809 Refactor code so that RehydratedComponents is constructed once and provided to PTransformRunnerFactories Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https:
[jira] [Work logged] (BEAM-7450) Unbounded HCatalogIO Reader using splittable pardos
[ https://issues.apache.org/jira/browse/BEAM-7450?focusedWorklogId=257053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257053 ] ASF GitHub Bot logged work on BEAM-7450: Author: ASF GitHub Bot Created on: 10/Jun/19 18:26 Start Date: 10/Jun/19 18:26 Worklog Time Spent: 10m Work Description: jhalaria commented on issue #8718: [BEAM-7450] Add an unbounded HcatalogIO reader using splittable pardo URL: https://github.com/apache/beam/pull/8718#issuecomment-500535619 @iemejia - I think once this PR gets merged, the next step will clean up the multiple reads method that we currently have. Moving forward there will only be one Read class that can read both bounded and unbounded data depending on the parameters passed. Do you think we can tackle that as the next step or if not what would make it cleaner in this PR? When I am extending from Read, I have to add the boilerplate code for the properties inside Read. I thing that we can do in this PR itself is to have only 1 Reads class and then within the expand, call the DoFn's needed for unbounded reads if only the right arguments are set. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257053) Time Spent: 3h 40m (was: 3.5h) > Unbounded HCatalogIO Reader using splittable pardos > --- > > Key: BEAM-7450 > URL: https://issues.apache.org/jira/browse/BEAM-7450 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Ankit Jhalaria >Assignee: Ankit Jhalaria >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > > # Current version of HcatalogIO is a bounded source. > # While migrating our jobs to aws, we realized that it would be helpful to > have an unbounded hcat reader that can behave as an unbounded source and > polls for new partitions as and when they become available. > # I have used splittable pardo(s) to do this. There is a flag that can be > set to treat this as a bounded source which will terminate if that flag is > set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support
[ https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=257051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257051 ] ASF GitHub Bot logged work on BEAM-7274: Author: ASF GitHub Bot Created on: 10/Jun/19 18:22 Start Date: 10/Jun/19 18:22 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #8690: [BEAM-7274] Implement the Protobuf schema provider URL: https://github.com/apache/beam/pull/8690#issuecomment-500531285 > Yes. I'd prefer to keep Row as a somewhat sealed class and not introduce extra subtypes. However there was enough other things to review in this PR that I figured I could review that first. Answered (and fixed comments), I also rebased the experiment to remove the ProtoRow as a subtype: https://github.com/alexvanboxel/beam/commit/1629557d6a6114b18ef4e3526b01900e5466a6e5 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257051) Time Spent: 4h (was: 3h 50m) > Protobuf Beam Schema support > > > Key: BEAM-7274 > URL: https://issues.apache.org/jira/browse/BEAM-7274 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Time Spent: 4h > Remaining Estimate: 0h > > Add support for the new Beam Schema to the Protobuf extension. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7478) Remote cluster submission from Flink Runner broken due to staging issues
[ https://issues.apache.org/jira/browse/BEAM-7478?focusedWorklogId=257038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257038 ] ASF GitHub Bot logged work on BEAM-7478: Author: ASF GitHub Bot Created on: 10/Jun/19 17:57 Start Date: 10/Jun/19 17:57 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8775: [BEAM-7478] Detect class path from the "java.class.path" property URL: https://github.com/apache/beam/pull/8775#issuecomment-500516613 The stack trace didn't give me any insights on how to make this code better then what I had suggested before. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257038) Time Spent: 1.5h (was: 1h 20m) > Remote cluster submission from Flink Runner broken due to staging issues > > > Key: BEAM-7478 > URL: https://issues.apache.org/jira/browse/BEAM-7478 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.14.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The usual way to submit pipelines with the Flink Runner is to build a fat jar > and use the {{bin/flink}} utility to submit the jar to a Flink cluster. This > works fine. > Alternatively, the Flink Runner can use the {{flinkMaster}} pipeline option > to specify a remote cluster. Upon submitting an example we get the following > at Flink's JobManager. > {noformat} > Caused by: java.lang.IllegalAccessError: class > sun.reflect.GeneratedSerializationConstructorAccessor70 cannot access its > superclass sun.reflect.SerializationConstructorAccessorImpl > at sun.misc.Unsafe.defineClass(Native Method) > at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63) > at > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399) > at > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393) > at > sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112) > at > sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:340) > at > java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1420) > at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:497) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:472) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477) > at > org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438) > at > org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:288) > at > org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:63) > ... 32 more > {noformat} > It appears there
[jira] [Created] (BEAM-7523) Add Integration Test for KafkaTable and KafkaTableProvider
Alireza Samadianzakaria created BEAM-7523: - Summary: Add Integration Test for KafkaTable and KafkaTableProvider Key: BEAM-7523 URL: https://issues.apache.org/jira/browse/BEAM-7523 Project: Beam Issue Type: Test Components: dsl-sql Reporter: Alireza Samadianzakaria We need integration test for KafkaTable and KfkaTableProvider in SQL module to create and run SQ with KafkaTable as its source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7511) KafkaTable Initialization
[ https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=257020&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257020 ] ASF GitHub Bot logged work on BEAM-7511: Author: ASF GitHub Bot Created on: 10/Jun/19 17:35 Start Date: 10/Jun/19 17:35 Worklog Time Spent: 10m Work Description: akedin commented on pull request #8797: [BEAM-7511] Fixes the bug in KafkaTable Initialization. URL: https://github.com/apache/beam/pull/8797#discussion_r29277 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java ## @@ -45,26 +46,11 @@ private List topicPartitions; private Map configUpdates; - protected BeamKafkaTable(Schema beamSchema) { -super(beamSchema); - } - public BeamKafkaTable(Schema beamSchema, String bootstrapServers, List topics) { super(beamSchema); this.bootstrapServers = bootstrapServers; this.topics = topics; - } - - public BeamKafkaTable( - Schema beamSchema, List topicPartitions, String bootstrapServers) { -super(beamSchema); -this.bootstrapServers = bootstrapServers; -this.topicPartitions = topicPartitions; - } - - public BeamKafkaTable updateConsumerProperties(Map configUpdates) { Review comment: Thanks. Will merge this PR then if there are no further concerns This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257020) Time Spent: 1h 50m (was: 1h 40m) > KafkaTable Initialization > - > > Key: BEAM-7511 > URL: https://issues.apache.org/jira/browse/BEAM-7511 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > This exception is thrown when a kafka table is created because. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028) > at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251) > at > org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814) > at > org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.Refere
[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements
[ https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=257015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257015 ] ASF GitHub Bot logged work on BEAM-7428: Author: ASF GitHub Bot Created on: 10/Jun/19 17:28 Start Date: 10/Jun/19 17:28 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8741: [BEAM-7428] Output the timestamp on elements in ReadAllViaFileBasedSource URL: https://github.com/apache/beam/pull/8741#issuecomment-500505080 @iemejia How do you resolve the case where you want to track the timestamp of the file as the output timestamp instead of the boundedsources timestamp. For example, the [Watch](https://github.com/apache/beam/blob/9428cc8c6fbffe7190c923850acbd98474210ad7/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L754) transform sets the timestamp on the output. It would make sense that a user may want to use the timestamp there instead of the timestamp stored within the boundedsource? (I think in most cases your change is the change that people want.) @chamikaramj It is unlikely that they are different for most sources but the [Watch](https://github.com/apache/beam/blob/9428cc8c6fbffe7190c923850acbd98474210ad7/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L754) transform does meaningfully set it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257015) Time Spent: 3h 50m (was: 3h 40m) > ReadAllViaFileBasedSource does not output the timestamps of the read elements > - > > Key: BEAM-7428 > URL: https://issues.apache.org/jira/browse/BEAM-7428 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > This differs from the implementation of JavaReadViaImpulse that tackles a > similar problem but does output the timestamps correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements
[ https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=257014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257014 ] ASF GitHub Bot logged work on BEAM-7428: Author: ASF GitHub Bot Created on: 10/Jun/19 17:27 Start Date: 10/Jun/19 17:27 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8741: [BEAM-7428] Output the timestamp on elements in ReadAllViaFileBasedSource URL: https://github.com/apache/beam/pull/8741#issuecomment-500505080 @iemejia How do you resolve the case where you want to track the timestamp of the file as the output timestamp instead of the boundedsources timestamp. For example, the [Watch](https://github.com/apache/beam/blob/9428cc8c6fbffe7190c923850acbd98474210ad7/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L754) transform sets the timestamp on the output. It would make sense that a user may want to use the timestamp there instead of the timestamp stored within the boundedsource. I think in most cases your change is the change that people want. @chamikaramj It is unlikely that they are different for most sources but the [Watch](https://github.com/apache/beam/blob/9428cc8c6fbffe7190c923850acbd98474210ad7/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L754) transform does meaningfully set it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257014) Time Spent: 3h 40m (was: 3.5h) > ReadAllViaFileBasedSource does not output the timestamps of the read elements > - > > Key: BEAM-7428 > URL: https://issues.apache.org/jira/browse/BEAM-7428 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > > This differs from the implementation of JavaReadViaImpulse that tackles a > similar problem but does output the timestamps correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7450) Unbounded HCatalogIO Reader using splittable pardos
[ https://issues.apache.org/jira/browse/BEAM-7450?focusedWorklogId=257012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-257012 ] ASF GitHub Bot logged work on BEAM-7450: Author: ASF GitHub Bot Created on: 10/Jun/19 17:26 Start Date: 10/Jun/19 17:26 Worklog Time Spent: 10m Work Description: jhalaria commented on pull request #8718: [BEAM-7450] Add an unbounded HcatalogIO reader using splittable pardo URL: https://github.com/apache/beam/pull/8718#discussion_r292107723 ## File path: sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/PartitionPollerFn.java ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.hcatalog; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.io.hcatalog.HCatalogIO.Read; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.DoFn.UnboundedPerElement; +import org.apache.beam.sdk.transforms.SerializableComparator; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.IMetaStoreClient; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hive.hcatalog.common.HCatUtil; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Unbounded poller to listen for new partitions. */ +@UnboundedPerElement +class PartitionPollerFn extends DoFn { + + private static final Logger LOG = LoggerFactory.getLogger(PartitionPollerFn.class); + private transient IMetaStoreClient metaStoreClient; + private Map configProperties; + private String database; + private String table; + final Boolean shouldTreatSourceAsBounded; + + public PartitionPollerFn( + final Map properties, + final String database, + final String table, + final Boolean isBounded) { +this.configProperties = properties; +this.database = database; +this.table = table; +this.shouldTreatSourceAsBounded = isBounded; + } + + /** Outputs newer partitions to be processed by the reader. */ + @ProcessElement + @SuppressWarnings("unused") + public ProcessContinuation processElement( + final ProcessContext c, RestrictionTracker tracker) { +final Read readRequest = c.element(); +final PartitionRange range = tracker.currentRestriction(); +final ImmutableList allAvailablePartitions = range.getPartitions(); + +List forSort = new ArrayList<>(allAvailablePartitions); +Collections.sort(forSort, readRequest.getPartitionComparator()); + +int trueStart; +if (tracker.currentRestriction().getLastCompletedPartition() != null) { + final int indexOfLastCompletedPartition = + forSort.indexOf(tracker.currentRestriction().getLastCompletedPartition()); + trueStart = indexOfLastCompletedPartition + 1; +} else { + trueStart = 0; +} + +for (int i = trueStart; i < forSort.size(); i++) { + if (tracker.tryClaim(forSort.get(i))) { +c.output( +new HCatRecordReaderFn.PartitionWrapper( Review comment: Good point. After the refactor, since everything is now in Read(s), this can go away. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 257012) Time Spent: 3.5h (was: 3h 20m) > Unbounded HCatalogIO Reader using splittable pardos > --- > > Key: BEAM-7450 > URL: https://issues.apache.org/jira/browse/BEAM-7450 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >
[jira] [Comment Edited] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
[ https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860164#comment-16860164 ] Paul edited comment on BEAM-6860 at 6/10/19 5:08 PM: - FWIW I'm having this same issue on apache_beam 2.10, 2.11, 2.12, 2.13 python sdk; I confirmed that this works on 2.9.0 as well, which I saw in a stack overflow thread here ([https://stackoverflow.com/questions/55109403/apache-beam-python-sdk-upgrade-issue).] It also fails not only on WriteToText, but also WriteToBigQuery. Specifically, this does not work when implementing windowing on a bounded data set and adding element timestamps. While doing this, I was working off of this example: [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py] However, this example actually works in later versions of the beam python SDK, so I'm wondering if this is a user-issue with a very poor error message or if we're explicitly/implicitly using a feature of beam that the example does not. Environment was: centos7 linux, kernel 3.10 python 2.7.16 apache beam versions: 2.9.0-2.13.0 was (Author: pshahid): FWIW I'm having this same issue on apache_beam 2.10, 2.11, 2.12, 2.13 python sdk; I confirmed that this works on 2.9.0 as well, which I saw in a stack overflow thread here ([https://stackoverflow.com/questions/55109403/apache-beam-python-sdk-upgrade-issue).] It also fails not only on WriteToText, but also WriteToBigQuery. Specifically, this does not work when implementing windowing on a bounded data set and adding element timestamps. While doing this, I was working off of this example: [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py] However, this example actually works in later versions of the beam python SDK, so I'm wondering if this is a user-issue with a very poor error message or if we're explicitly/implicitly using a feature of beam that the example does not. > WriteToText crash with "GlobalWindow -> ._IntervalWindowBase" > - > > Key: BEAM-6860 > URL: https://issues.apache.org/jira/browse/BEAM-6860 > Project: Beam > Issue Type: Bug > Components: beam-model >Affects Versions: 2.11.0 > Environment: macOS, DirectRunner, python 2.7.15 via > pyenv/pyenv-virtualenv >Reporter: Henrik >Priority: Major > Labels: newbie > > Main error: > > Cannot convert GlobalWindow to > > apache_beam.utils.windowed_value._IntervalWindowBase > This is very hard for me to debug. Doing a DoPar call before, printing the > input, gives me just what I want; so the lines of data to serialise are > "alright"; just JSON strings, in fact. > Stacktrace: > {code:java} > Traceback (most recent call last): > File "./okr_end_ride.py", line 254, in > run() > File "./okr_end_ride.py", line 250, in run > run_pipeline(pipeline_options, known_args) > File "./okr_end_ride.py", line 198, in run_pipeline > | 'write_all' >> WriteToText(known_args.output, > file_name_suffix=".txt") > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > self.run().wait_until_finish() > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 406, in run > self._options).run(False) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 419, in run > return self.runner.run_pipeline(self, self._options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py", > line 132, in run_pipeline > return runner.run_pipeline(pipeline, options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 275, in run_pipeline > default_environment=self._default_environment)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 278, in run_via_runner_api > return self.run_stages(*self.create_stages(pipeline_proto)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 354, in run_stages > stage_context.safe_coders) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 509, in run_stage > data_input, data_output) > File > "/Users/h/.pyenv/versions/2.7.15/envs/l
[jira] [Commented] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
[ https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860164#comment-16860164 ] Paul commented on BEAM-6860: FWIW I'm having this same issue on apache_beam 2.10, 2.11, 2.12, 2.13 python sdk; I confirmed that this works on 2.9.0 as well, which I saw in a stack overflow thread here ([https://stackoverflow.com/questions/55109403/apache-beam-python-sdk-upgrade-issue).] It also fails not only on WriteToText, but also WriteToBigQuery. Specifically, this does not work when implementing windowing on a bounded data set and adding element timestamps. While doing this, I was working off of this example: [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/hourly_team_score.py] However, this example actually works in later versions of the beam python SDK, so I'm wondering if this is a user-issue with a very poor error message or if we're explicitly/implicitly using a feature of beam that the example does not. > WriteToText crash with "GlobalWindow -> ._IntervalWindowBase" > - > > Key: BEAM-6860 > URL: https://issues.apache.org/jira/browse/BEAM-6860 > Project: Beam > Issue Type: Bug > Components: beam-model >Affects Versions: 2.11.0 > Environment: macOS, DirectRunner, python 2.7.15 via > pyenv/pyenv-virtualenv >Reporter: Henrik >Priority: Major > Labels: newbie > > Main error: > > Cannot convert GlobalWindow to > > apache_beam.utils.windowed_value._IntervalWindowBase > This is very hard for me to debug. Doing a DoPar call before, printing the > input, gives me just what I want; so the lines of data to serialise are > "alright"; just JSON strings, in fact. > Stacktrace: > {code:java} > Traceback (most recent call last): > File "./okr_end_ride.py", line 254, in > run() > File "./okr_end_ride.py", line 250, in run > run_pipeline(pipeline_options, known_args) > File "./okr_end_ride.py", line 198, in run_pipeline > | 'write_all' >> WriteToText(known_args.output, > file_name_suffix=".txt") > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > self.run().wait_until_finish() > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 406, in run > self._options).run(False) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 419, in run > return self.runner.run_pipeline(self, self._options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py", > line 132, in run_pipeline > return runner.run_pipeline(pipeline, options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 275, in run_pipeline > default_environment=self._default_environment)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 278, in run_via_runner_api > return self.run_stages(*self.create_stages(pipeline_proto)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 354, in run_stages > stage_context.safe_coders) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 509, in run_stage > data_input, data_output) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 1206, in process_bundle > result_future = self._controller.control_handler.push(process_bundle) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 821, in push > response = self.worker.do_instruction(request) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 265, in do_instruction > request.instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 281, in process_bundle > delayed_applications = bundle_processor.process_bundle(instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 552, in process_bun
[jira] [Work logged] (BEAM-7511) KafkaTable Initialization
[ https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256996 ] ASF GitHub Bot logged work on BEAM-7511: Author: ASF GitHub Bot Created on: 10/Jun/19 16:59 Start Date: 10/Jun/19 16:59 Worklog Time Spent: 10m Work Description: XuMingmin commented on pull request #8797: [BEAM-7511] Fixes the bug in KafkaTable Initialization. URL: https://github.com/apache/beam/pull/8797#discussion_r292097650 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java ## @@ -45,26 +46,11 @@ private List topicPartitions; private Map configUpdates; - protected BeamKafkaTable(Schema beamSchema) { -super(beamSchema); - } - public BeamKafkaTable(Schema beamSchema, String bootstrapServers, List topics) { super(beamSchema); this.bootstrapServers = bootstrapServers; this.topics = topics; - } - - public BeamKafkaTable( - Schema beamSchema, List topicPartitions, String bootstrapServers) { -super(beamSchema); -this.bootstrapServers = bootstrapServers; -this.topicPartitions = topicPartitions; - } - - public BeamKafkaTable updateConsumerProperties(Map configUpdates) { Review comment: surely, create https://issues.apache.org/jira/browse/BEAM-7522 to track. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 256996) Time Spent: 1h 40m (was: 1.5h) > KafkaTable Initialization > - > > Key: BEAM-7511 > URL: https://issues.apache.org/jira/browse/BEAM-7511 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > This exception is thrown when a kafka table is created because. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028) > at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251) > at > org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814) > at > org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stre
[jira] [Created] (BEAM-7522) support customized configuration in KafkaTableProvider
Xu Mingmin created BEAM-7522: Summary: support customized configuration in KafkaTableProvider Key: BEAM-7522 URL: https://issues.apache.org/jira/browse/BEAM-7522 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Xu Mingmin expand KafkaTableProvider to support {{BeamKafkaTable.updateConsumerProperties(...)}}, so users can add customized configurations in DDL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256979 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 10/Jun/19 16:49 Start Date: 10/Jun/19 16:49 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on pull request #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#discussion_r292093933 ## File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java ## @@ -0,0 +1,501 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.dynamodb; + +import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument; + +import com.amazonaws.auth.AWSStaticCredentialsProvider; +import com.amazonaws.auth.BasicAWSCredentials; +import com.amazonaws.client.builder.AwsClientBuilder; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDB; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder; +import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException; +import com.amazonaws.services.dynamodbv2.model.AttributeValue; +import com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest; +import com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult; +import com.amazonaws.services.dynamodbv2.model.ScanRequest; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.io.Serializable; +import java.util.Map; +import java.util.function.Predicate; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.util.BackOff; +import org.apache.beam.sdk.util.BackOffUtils; +import org.apache.beam.sdk.util.FluentBackoff; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet; +import org.apache.http.HttpStatus; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for writing to https://aws.amazon.com/dynamodb/";>DynamoDB. + * + * Writing to DynamoDB + * + * Example usage: + * + * {@code + * DynamoDBIO.DynamoDBConfiguration config = DynamoDBIO.DynamoDBConfiguration.create( + * "region", "accessKey", "secretKey"); + * PCollection data = ...; + * + * data.apply(DynamoDBIO.write() + * .withRetryConfiguration( + *DynamoDBIO.RetryConfiguration.create( + * 4, org.joda.time.Duration.standardSeconds(10))) + * .withDynamoDBConfiguration(config) + * .withResultOutputTag(results)); + * } + * + * As a client, you need to provide at least the following things: + * + * + * Retry configuration + * DynamoDb configuration + * An output tag where you can get results. Example in DynamoDBIOTest + * + * + * Reading from DynamoDB + * + * Example usage: + * + * {@code + * DynamoDBIO.DynamoDBConfiguration config = DynamoDBIO.DynamoDBConfiguration.create( + * "endpointUrl", "region", "accessKey", "secretKey"); + * PCollection> actual = + * pipeline.apply(DynamoDBIO.read() + * .withScanRequestFn((v) -> new ScanRequest(tableName).withTotalSegment(10)) + * .withDynamoDBConfiguration(config)); + * } + * + * As a client, you need to provide at least the following things: + * + * + * DynamoDb configuration + * ScanReq
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256961 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 10/Jun/19 16:42 Start Date: 10/Jun/19 16:42 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on pull request #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#discussion_r292090983 ## File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java ## @@ -0,0 +1,501 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.dynamodb; + +import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument; + +import com.amazonaws.auth.AWSStaticCredentialsProvider; +import com.amazonaws.auth.BasicAWSCredentials; +import com.amazonaws.client.builder.AwsClientBuilder; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDB; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder; +import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException; +import com.amazonaws.services.dynamodbv2.model.AttributeValue; +import com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest; +import com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult; +import com.amazonaws.services.dynamodbv2.model.ScanRequest; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.io.Serializable; +import java.util.Map; +import java.util.function.Predicate; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.util.BackOff; +import org.apache.beam.sdk.util.BackOffUtils; +import org.apache.beam.sdk.util.FluentBackoff; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet; +import org.apache.http.HttpStatus; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for writing to https://aws.amazon.com/dynamodb/";>DynamoDB. + * + * Writing to DynamoDB + * + * Example usage: + * + * {@code + * DynamoDBIO.DynamoDBConfiguration config = DynamoDBIO.DynamoDBConfiguration.create( + * "region", "accessKey", "secretKey"); + * PCollection data = ...; + * + * data.apply(DynamoDBIO.write() + * .withRetryConfiguration( + *DynamoDBIO.RetryConfiguration.create( + * 4, org.joda.time.Duration.standardSeconds(10))) + * .withDynamoDBConfiguration(config) + * .withResultOutputTag(results)); + * } + * + * As a client, you need to provide at least the following things: + * + * + * Retry configuration + * DynamoDb configuration + * An output tag where you can get results. Example in DynamoDBIOTest + * + * + * Reading from DynamoDB + * + * Example usage: + * + * {@code + * DynamoDBIO.DynamoDBConfiguration config = DynamoDBIO.DynamoDBConfiguration.create( + * "endpointUrl", "region", "accessKey", "secretKey"); + * PCollection> actual = + * pipeline.apply(DynamoDBIO.read() + * .withScanRequestFn((v) -> new ScanRequest(tableName).withTotalSegment(10)) + * .withDynamoDBConfiguration(config)); + * } + * + * As a client, you need to provide at least the following things: + * + * + * DynamoDb configuration + * ScanReq
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256964 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 10/Jun/19 16:42 Start Date: 10/Jun/19 16:42 Worklog Time Spent: 10m Work Description: iemejia commented on issue #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#issuecomment-500488384 @cmachgodaddy I think I answered all the pending questions. Please write me if you still have questions or comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 256964) Time Spent: 15h (was: 14h 50m) > Add DynamoDBIO > -- > > Key: BEAM-7043 > URL: https://issues.apache.org/jira/browse/BEAM-7043 > Project: Beam > Issue Type: New Feature > Components: io-java-aws >Reporter: Cam Mach >Assignee: Cam Mach >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 15h > Remaining Estimate: 0h > > Currently we don't have any feature to write data to AWS DynamoDB. This > feature will enable us to send data to DynamoDB -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256959 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 10/Jun/19 16:41 Start Date: 10/Jun/19 16:41 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#discussion_r292090757 ## File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java ## @@ -0,0 +1,501 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.dynamodb; + +import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument; + +import com.amazonaws.auth.AWSStaticCredentialsProvider; +import com.amazonaws.auth.BasicAWSCredentials; +import com.amazonaws.client.builder.AwsClientBuilder; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDB; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder; +import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException; +import com.amazonaws.services.dynamodbv2.model.AttributeValue; +import com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest; +import com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult; +import com.amazonaws.services.dynamodbv2.model.ScanRequest; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.io.Serializable; +import java.util.Map; +import java.util.function.Predicate; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.util.BackOff; +import org.apache.beam.sdk.util.BackOffUtils; +import org.apache.beam.sdk.util.FluentBackoff; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet; +import org.apache.http.HttpStatus; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for writing to https://aws.amazon.com/dynamodb/";>DynamoDB. + * + * Writing to DynamoDB + * + * Example usage: + * + * {@code + * DynamoDBIO.DynamoDBConfiguration config = DynamoDBIO.DynamoDBConfiguration.create( + * "region", "accessKey", "secretKey"); + * PCollection data = ...; + * + * data.apply(DynamoDBIO.write() + * .withRetryConfiguration( + *DynamoDBIO.RetryConfiguration.create( + * 4, org.joda.time.Duration.standardSeconds(10))) + * .withDynamoDBConfiguration(config) + * .withResultOutputTag(results)); + * } + * + * As a client, you need to provide at least the following things: + * + * + * Retry configuration + * DynamoDb configuration + * An output tag where you can get results. Example in DynamoDBIOTest + * + * + * Reading from DynamoDB + * + * Example usage: + * + * {@code + * DynamoDBIO.DynamoDBConfiguration config = DynamoDBIO.DynamoDBConfiguration.create( + * "endpointUrl", "region", "accessKey", "secretKey"); + * PCollection> actual = + * pipeline.apply(DynamoDBIO.read() + * .withScanRequestFn((v) -> new ScanRequest(tableName).withTotalSegment(10)) + * .withDynamoDBConfiguration(config)); + * } + * + * As a client, you need to provide at least the following things: + * + * + * DynamoDb configuration + * ScanRequestF
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=256967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256967 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 10/Jun/19 16:42 Start Date: 10/Jun/19 16:42 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on pull request #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#discussion_r292090983 ## File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.java ## @@ -0,0 +1,501 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.dynamodb; + +import static org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument; + +import com.amazonaws.auth.AWSStaticCredentialsProvider; +import com.amazonaws.auth.BasicAWSCredentials; +import com.amazonaws.client.builder.AwsClientBuilder; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDB; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder; +import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException; +import com.amazonaws.services.dynamodbv2.model.AttributeValue; +import com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest; +import com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult; +import com.amazonaws.services.dynamodbv2.model.ScanRequest; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.io.Serializable; +import java.util.Map; +import java.util.function.Predicate; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.util.BackOff; +import org.apache.beam.sdk.util.BackOffUtils; +import org.apache.beam.sdk.util.FluentBackoff; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v20_0.com.google.common.annotations.VisibleForTesting; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableSet; +import org.apache.http.HttpStatus; +import org.joda.time.Duration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for writing to https://aws.amazon.com/dynamodb/";>DynamoDB. + * + * Writing to DynamoDB + * + * Example usage: + * + * {@code + * DynamoDBIO.DynamoDBConfiguration config = DynamoDBIO.DynamoDBConfiguration.create( + * "region", "accessKey", "secretKey"); + * PCollection data = ...; + * + * data.apply(DynamoDBIO.write() + * .withRetryConfiguration( + *DynamoDBIO.RetryConfiguration.create( + * 4, org.joda.time.Duration.standardSeconds(10))) + * .withDynamoDBConfiguration(config) + * .withResultOutputTag(results)); + * } + * + * As a client, you need to provide at least the following things: + * + * + * Retry configuration + * DynamoDb configuration + * An output tag where you can get results. Example in DynamoDBIOTest + * + * + * Reading from DynamoDB + * + * Example usage: + * + * {@code + * DynamoDBIO.DynamoDBConfiguration config = DynamoDBIO.DynamoDBConfiguration.create( + * "endpointUrl", "region", "accessKey", "secretKey"); + * PCollection> actual = + * pipeline.apply(DynamoDBIO.read() + * .withScanRequestFn((v) -> new ScanRequest(tableName).withTotalSegment(10)) + * .withDynamoDBConfiguration(config)); + * } + * + * As a client, you need to provide at least the following things: + * + * + * DynamoDb configuration + * ScanReq
[jira] [Work logged] (BEAM-7511) KafkaTable Initialization
[ https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256969 ] ASF GitHub Bot logged work on BEAM-7511: Author: ASF GitHub Bot Created on: 10/Jun/19 16:43 Start Date: 10/Jun/19 16:43 Worklog Time Spent: 10m Work Description: akedin commented on pull request #8797: [BEAM-7511] Fixes the bug in KafkaTable Initialization. URL: https://github.com/apache/beam/pull/8797#discussion_r292091591 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java ## @@ -45,26 +46,11 @@ private List topicPartitions; private Map configUpdates; - protected BeamKafkaTable(Schema beamSchema) { -super(beamSchema); - } - public BeamKafkaTable(Schema beamSchema, String bootstrapServers, List topics) { super(beamSchema); this.bootstrapServers = bootstrapServers; this.topics = topics; - } - - public BeamKafkaTable( - Schema beamSchema, List topicPartitions, String bootstrapServers) { -super(beamSchema); -this.bootstrapServers = bootstrapServers; -this.topicPartitions = topicPartitions; - } - - public BeamKafkaTable updateConsumerProperties(Map configUpdates) { Review comment: @XuMingmin thank you for pointing it out. I think exposing these methods in `KafkaTableProvider` is probably out of scope of this PR. These methods also would not be used in the existing code. If you rely on these in some customized version, can you contribute some of it back with tests and maybe examples? So that it doesn't get deleted by accident? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 256969) Time Spent: 1.5h (was: 1h 20m) > KafkaTable Initialization > - > > Key: BEAM-7511 > URL: https://issues.apache.org/jira/browse/BEAM-7511 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > This exception is thrown when a kafka table is created because. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028) > at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251) > at > org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814) > at > org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipe
[jira] [Work logged] (BEAM-7511) KafkaTable Initialization
[ https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256954&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256954 ] ASF GitHub Bot logged work on BEAM-7511: Author: ASF GitHub Bot Created on: 10/Jun/19 16:35 Start Date: 10/Jun/19 16:35 Worklog Time Spent: 10m Work Description: riazela commented on pull request #8797: [BEAM-7511] Fixes the bug in KafkaTable Initialization. URL: https://github.com/apache/beam/pull/8797#discussion_r292088562 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java ## @@ -45,26 +46,11 @@ private List topicPartitions; private Map configUpdates; - protected BeamKafkaTable(Schema beamSchema) { -super(beamSchema); - } - public BeamKafkaTable(Schema beamSchema, String bootstrapServers, List topics) { super(beamSchema); this.bootstrapServers = bootstrapServers; this.topics = topics; - } - - public BeamKafkaTable( - Schema beamSchema, List topicPartitions, String bootstrapServers) { -super(beamSchema); -this.bootstrapServers = bootstrapServers; -this.topicPartitions = topicPartitions; - } - - public BeamKafkaTable updateConsumerProperties(Map configUpdates) { Review comment: I returned the deleted methods back so that we can support it later in `KafkaTableProvider`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 256954) Time Spent: 1h 20m (was: 1h 10m) > KafkaTable Initialization > - > > Key: BEAM-7511 > URL: https://issues.apache.org/jira/browse/BEAM-7511 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > This exception is thrown when a kafka table is created because. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028) > at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251) > at > org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814) > at > org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47) > at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
[jira] [Work logged] (BEAM-7368) Run Python GBK load tests on portable Flink runner
[ https://issues.apache.org/jira/browse/BEAM-7368?focusedWorklogId=256950&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256950 ] ASF GitHub Bot logged work on BEAM-7368: Author: ASF GitHub Bot Created on: 10/Jun/19 16:32 Start Date: 10/Jun/19 16:32 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #8636: [BEAM-7368] Flink + Python + gbk load test URL: https://github.com/apache/beam/pull/8636#issuecomment-500484640 @mxm @angoenka I provided fixes. Could you take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 256950) Time Spent: 10h 50m (was: 10h 40m) > Run Python GBK load tests on portable Flink runner > -- > > Key: BEAM-7368 > URL: https://issues.apache.org/jira/browse/BEAM-7368 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements
[ https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=256951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256951 ] ASF GitHub Bot logged work on BEAM-7428: Author: ASF GitHub Bot Created on: 10/Jun/19 16:32 Start Date: 10/Jun/19 16:32 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #8741: [BEAM-7428] Output the timestamp on elements in ReadAllViaFileBasedSource URL: https://github.com/apache/beam/pull/8741#issuecomment-500484673 I'm not sure where the difference in behavior comes from but is it possible that input timestamps in two cases are different somehow (-inf vs current time) ? @jkff or @lukecwik might have more context. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 256951) Time Spent: 3.5h (was: 3h 20m) > ReadAllViaFileBasedSource does not output the timestamps of the read elements > - > > Key: BEAM-7428 > URL: https://issues.apache.org/jira/browse/BEAM-7428 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > This differs from the implementation of JavaReadViaImpulse that tackles a > similar problem but does output the timestamps correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)