[jira] [Created] (BEAM-8893) Issues with state and multiple workers in FnApiRunner

2019-12-05 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8893:
-

 Summary: Issues with state and multiple workers in FnApiRunner
 Key: BEAM-8893
 URL: https://issues.apache.org/jira/browse/BEAM-8893
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=354147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354147
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 05/Dec/19 08:57
Start Date: 05/Dec/19 08:57
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #9772: [BEAM-1440] 
Create a BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#discussion_r354177031
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##
 @@ -499,6 +509,189 @@ def reader(self, test_bigquery_client=None):
 kms_key=self.kms_key)
 
 
+FieldSchema = collections.namedtuple('FieldSchema', 'fields mode name type')
+
+
+def _to_bool(value):
+  return value == 'true'
+
+
+def _to_decimal(value):
+  return decimal.Decimal(value)
+
+
+def _to_bytes(value):
+  """Converts value from str to bytes on Python 3.x. Does nothing on
+  Python 2.7."""
+  return value.encode('utf-8')
+
+
+class _JsonToDictCoder(coders.Coder):
+  """A coder for a JSON string to a Python dict."""
+
+  def __init__(self, table_schema):
+self.fields = self._convert_to_tuple(table_schema.fields)
+self._converters = {
+'INTEGER': int,
+'INT64': int,
+'FLOAT': float,
+'BOOLEAN': _to_bool,
+'NUMERIC': _to_decimal,
+'BYTES': _to_bytes,
+}
+
+  @classmethod
+  def _convert_to_tuple(cls, table_field_schemas):
+"""Recursively converts the list of TableFieldSchema instances to the
+list of tuples to prevent errors when pickling and unpickling
+TableFieldSchema instances.
+"""
+if not table_field_schemas:
+  return []
+
+return [FieldSchema(cls._convert_to_tuple(x.fields), x.mode, x.name,
+x.type)
+for x in table_field_schemas]
+
+  def decode(self, value):
+value = json.loads(value)
+return self._decode_with_schema(value, self.fields)
+
+  def _decode_with_schema(self, value, schema_fields):
+for field in schema_fields:
+  if field.name not in value:
+# The field exists in the schema, but it doesn't exist in this row.
+# It probably means its value was null, as the extract to JSON job
+# doesn't preserve null fields
+value[field.name] = None
+continue
+
+  if field.type == 'RECORD':
+value[field.name] = self._decode_with_schema(value[field.name],
+ field.fields)
+  else:
+try:
+  converter = self._converters[field.type]
+  value[field.name] = converter(value[field.name])
+except KeyError:
+  # No need to do any conversion
+  pass
+return value
+
+  def is_deterministic(self):
+return True
+
+  def to_type_hint(self):
+return dict
+
+
+class _BigQuerySource(BoundedSource):
+  def __init__(self, gcs_location=None, table=None, dataset=None,
+   project=None, query=None, validate=False, coder=None,
+   use_standard_sql=False, flatten_results=True, kms_key=None):
+if table is not None and query is not None:
+  raise ValueError('Both a BigQuery table and a query were specified.'
+   ' Please specify only one of these.')
+elif table is None and query is None:
+  raise ValueError('A BigQuery table or a query must be specified')
+elif table is not None:
+  self.table_reference = bigquery_tools.parse_table_reference(
+  table, dataset, project)
+  self.query = None
+  self.use_legacy_sql = True
+else:
+  self.query = query
+  # TODO(BEAM-1082): Change the internal flag to be standard_sql
+  self.use_legacy_sql = not use_standard_sql
+  self.table_reference = None
+
+self.gcs_location = gcs_location
+self.project = project
+self.validate = validate
+self.flatten_results = flatten_results
+self.coder = coder or _JsonToDictCoder
+self.kms_key = kms_key
+self.split_result = None
+
+  def estimate_size(self):
+bq = bigquery_tools.BigQueryWrapper()
+if self.table_reference is not None:
+  table = bq.get_table(self.table_reference.projectId,
+   self.table_reference.datasetId,
+   self.table_reference.tableId)
+  return int(table.numBytes)
+else:
+  self._setup_temporary_dataset(bq)
+  job = bq._start_query_job(self.project, self.query,
+self.use_legacy_sql, self.flatten_results,
+job_id=uuid.uuid4().hex, dry_run=True,
+kms_key=self.kms_key)
+  size = int(job.statistics.totalBytesProcessed)
+
+  bq.clean_up_temporary_dataset(self.project)
+
+  return size
+
+  def split(self, desired_bundle_size

[jira] [Resolved] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-12-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng resolved BEAM-8733.
---
Resolution: Fixed

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8620) Tear down unused DoFns periodically in Java SDK harness

2019-12-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated BEAM-8620:
--
Fix Version/s: (was: 2.18.0)
   2.19.0

> Tear down unused DoFns periodically in Java SDK harness
> ---
>
> Key: BEAM-8620
> URL: https://issues.apache.org/jira/browse/BEAM-8620
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>
> Per the discussion in the ML the detail can be found here[1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness

2019-12-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated BEAM-8618:
--
Fix Version/s: (was: 2.18.0)
   2.19.0

> Tear down unused DoFns periodically in Python SDK harness
> -
>
> Key: BEAM-8618
> URL: https://issues.apache.org/jira/browse/BEAM-8618
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>
> Per the discussion in the ML, detail can be found [1],  the teardown of DoFns 
> should be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=354160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354160
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 05/Dec/19 09:50
Start Date: 05/Dec/19 09:50
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-562053864
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354160)
Time Spent: 4.5h  (was: 4h 20m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=354167&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354167
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 05/Dec/19 10:05
Start Date: 05/Dec/19 10:05
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r354211593
 
 

 ##
 File path: 
.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Java11.groovy
 ##
 @@ -19,9 +19,11 @@
 import CommonJobProperties as commonJobProperties
 import PostcommitJobBuilder
 
+final String JAVA_11_HOME = '/usr/lib/jvm/java-11-openjdk-amd64'
+final String JAVA_8_HOME = '/usr/lib/jvm/java-8-openjdk-amd64'
 
 
PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java11_ValidatesRunner_Dataflow',
-  'Run Dataflow ValidatesRunner Java 11', 'Google Cloud Dataflow Runner 
ValidatesRunner Tests On Java 11', this) {
+'Run Dataflow ValidatesRunner Java 11', 'Google Cloud Dataflow Runner 
ValidatesRunner Tests On Java 11', this) {
 
 Review comment:
   I'm not sure if I understand you - we're using the 
PostcommitJobBuilder.postCommitJob() method here. It creates 2 jobs: 
   - "beam_PostCommit_Java11_ValidatesRunner_Dataflow"
   - "beam_PostCommit_Java11_ValidatesRunner_Dataflow_PR" (the suffix is added 
automatically by the builder - there's no way to modify it if you're using the 
builder)
   
   What do you mean by "job title"? Do you mean the "name"? or the 
"triggerPhrase" or "githubUIHint" in the 
[postCommitJob](https://github.com/apache/beam/blob/master/.test-infra/jenkins/PostcommitJobBuilder.groovy#L47)
 method? In your opinion, should I separate the creation of the two jobs and 
have custom configurations (including naming) for the two?
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354167)
Time Spent: 4h 40m  (was: 4.5h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=354168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354168
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 05/Dec/19 10:10
Start Date: 05/Dec/19 10:10
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r354214465
 
 

 ##
 File path: 
.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Java11.groovy
 ##
 @@ -19,9 +19,11 @@
 import CommonJobProperties as commonJobProperties
 import PostcommitJobBuilder
 
+final String JAVA_11_HOME = '/usr/lib/jvm/java-11-openjdk-amd64'
+final String JAVA_8_HOME = '/usr/lib/jvm/java-8-openjdk-amd64'
 
 
PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java11_ValidatesRunner_Dataflow',
-  'Run Dataflow ValidatesRunner Java 11', 'Google Cloud Dataflow Runner 
ValidatesRunner Tests On Java 11', this) {
+'Run Dataflow ValidatesRunner Java 11', 'Google Cloud Dataflow Runner 
ValidatesRunner Tests On Java 11', this) {
 
 Review comment:
   In general, I think we should stick with using the builders to enforce a 
common convention for all the jobs. Changing the convention and 
improving/refactoring the builders (in case we want to do this) should be part 
of different pr - this is already a huge one IMO and focuses on different 
aspects (of course a jira ticket should be created prior this).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354168)
Time Spent: 4h 50m  (was: 4h 40m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=354172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354172
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 05/Dec/19 10:24
Start Date: 05/Dec/19 10:24
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r354221674
 
 

 ##
 File path: 
.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow_Java11.groovy
 ##
 @@ -19,9 +19,11 @@
 import CommonJobProperties as commonJobProperties
 import PostcommitJobBuilder
 
+final String JAVA_11_HOME = '/usr/lib/jvm/java-11-openjdk-amd64'
+final String JAVA_8_HOME = '/usr/lib/jvm/java-8-openjdk-amd64'
 
 Review comment:
   Good point - I agree.  This was copied from some other job 
([VR_Direct_Java11](https://github.com/apache/beam/blob/fa37fc5e176e72fd346b9a0bb907d9726b33d018/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Direct_Java11.groovy)).
 
   
   If I'm not mistaken, adding new env variables to Jenkins has to go through 
asf infra so that they add the variables and then we can easily use them. Am I 
right @Ardagan ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354172)
Time Spent: 5h  (was: 4h 50m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-8830:
---
Description: 
'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
 
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'

  was:
'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'


> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-05 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8894:
--

 Summary: Fix multiple coder bug in new spark runner
 Key: BEAM-8894
 URL: https://issues.apache.org/jira/browse/BEAM-8894
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


The test below does not pass. I fails with an EOF while calling 
NullableCoder(BigEndianCoder.decode) 

'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'

In the "old" spark runner, this test passes because it never call the 
NullableCoder because there is no serialization done. 

There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5171) org.apache.beam.sdk.io.CountingSourceTest.test[Un]boundedSourceSplits tests are flaky in Spark runner

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-5171:
--

Assignee: (was: Etienne Chauchot)

> org.apache.beam.sdk.io.CountingSourceTest.test[Un]boundedSourceSplits tests 
> are flaky in Spark runner
> -
>
> Key: BEAM-5171
> URL: https://issues.apache.org/jira/browse/BEAM-5171
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Two tests: 
>  org.apache.beam.sdk.io.CountingSourceTest.testUnboundedSourceSplits 
>  org.apache.beam.sdk.io.CountingSourceTest.testBoundedSourceSplits
> failed in a PostCommit [Spark Validates Runner test 
> suite|https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/1277/testReport/]
>  with an error that seems to be common for Spark. Could this be due to 
> misconfiguration of Spark cluster? 
> Task serialization failed: java.io.IOException: Failed to create local dir in 
> /tmp/blockmgr-de91f449-e5d1-4be4-acaa-3ee06fdfa95b/1d.
>  java.io.IOException: Failed to create local dir in 
> /tmp/blockmgr-de91f449-e5d1-4be4-acaa-3ee06fdfa95b/1d.
>  at 
> org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70)
>  at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:116)
>  at 
> org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1511)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1045)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
>  at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:841)
>  at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1404)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:123)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
>  at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>  at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>  at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1482)
>  at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1039)
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:947)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:891)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1780)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4226) Migrate hadoop dependency to 2.7.4 or upper to fix a CVE

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-4226:
--

Assignee: (was: Etienne Chauchot)

> Migrate hadoop dependency to 2.7.4 or upper to fix a CVE
> 
>
> Key: BEAM-4226
> URL: https://issues.apache.org/jira/browse/BEAM-4226
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Etienne Chauchot
>Priority: Major
>
> apache hadoop is subject to a vulnerability:
> CVE-2016-6811: Apache Hadoop Privilege escalation vulnerability
> We should upgrade the dep to maybe 2.7.4 which is the closest to what we 
> actually use (2.7.3)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-2176) Support state API in Spark streaming mode.

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-2176:
--

Assignee: (was: Etienne Chauchot)

> Support state API in Spark streaming mode.
> --
>
> Key: BEAM-2176
> URL: https://issues.apache.org/jira/browse/BEAM-2176
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Aviem Zur
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=354177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354177
 ]

ASF GitHub Bot logged work on BEAM-6627:


Author: ASF GitHub Bot
Created on: 05/Dec/19 10:42
Start Date: 05/Dec/19 10:42
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10267: [BEAM-6627] 
Add size reporting to JdbcIOIT
URL: https://github.com/apache/beam/pull/10267
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354177)
Time Spent: 17h 10m  (was: 17h)

> Use Metrics API in IO performance tests
> ---
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6209) Remove Http Metrics Sink specific methods from PipelineOptions

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-6209.

Fix Version/s: 2.10.0
   Resolution: Fixed

> Remove Http Metrics Sink specific methods from PipelineOptions
> --
>
> Key: BEAM-6209
> URL: https://issues.apache.org/jira/browse/BEAM-6209
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.10.0
>
>
> Methods specific to Metrics Http Sink should be moved to a 
> PipelineOptionsMetricsHttpSink interface to avoid having technology specific 
> methods in base classes/interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-2499) Support Custom Windows in Spark runner

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-2499:
--

Assignee: (was: Etienne Chauchot)

> Support Custom Windows in Spark runner
> --
>
> Key: BEAM-2499
> URL: https://issues.apache.org/jira/browse/BEAM-2499
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>
> If we extend {{IntervalWindow}} and we try to merge these custom windows like 
> in this PR:
> https://github.com/apache/beam/pull/3286
> Then spark runner fails with 
> {{org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.ClassCastException: 
> org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to 
> org.apache.beam.sdk.transforms.windowing.MergingCustomWindowsTest$CustomWindow}}
> It seems to be because of the cast to {{IntervalWindow}} there: 
> https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkGlobalCombineFn.java#L111



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354181
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 10:53
Start Date: 05/Dec/19 10:53
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293
 
 
   R: @aromanenko-dev 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge

[jira] [Commented] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988666#comment-16988666
 ] 

Etienne Chauchot commented on BEAM-8894:


I hav split this tests because it might be unrelated to flatten

> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=354209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354209
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:48
Start Date: 05/Dec/19 11:48
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10272: [BEAM-8337] 
publish Flink job server container images
URL: https://github.com/apache/beam/pull/10272#discussion_r354264123
 
 

 ##
 File path: website/src/contribute/release-guide.md
 ##
 @@ -691,6 +691,15 @@ done
 ./gradlew :sdks:go:container:dockerPush -Pdocker-tag=${RELEASE}_rc{RC_NUM}
 ```
 
+* Build Flink job server images and push to DockerHub.
+
+```
+FLINK_VER=("1.7", "1.8", "1.9")
 
 Review comment:
   Something like the following could do:
   
   ```bash
   FLINK_VER=($(ls -1 runners/flink | awk '/^[0-9]+\.[0-9]+$/{print}'))
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354209)
Time Spent: 1h 40m  (was: 1.5h)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=354208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354208
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:48
Start Date: 05/Dec/19 11:48
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-562096039
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354208)
Time Spent: 5h 10m  (was: 5h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread Kamil Wasilewski (Jira)
Kamil Wasilewski created BEAM-8895:
--

 Summary: BigQueryIO streaming test on Java is flaky
 Key: BEAM-8895
 URL: https://issues.apache.org/jira/browse/BEAM-8895
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Kamil Wasilewski
 Fix For: Not applicable


 
SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request*07:57:32* {*07:57:32*   "code" : 400,*07:57:32*   "errors" 
: [ {*07:57:32* "domain" : "global",*07:57:32* "message" : 
"Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",*07:57:32* "reason" : 
"invalid"*07:57:32*   } ],*07:57:32*   "message" : "Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",*07:57:32*   "status" : 
"INVALID_ARGUMENT"*07:57:32* }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=354210&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354210
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:49
Start Date: 05/Dec/19 11:49
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10272: [BEAM-8337] 
publish Flink job server container images
URL: https://github.com/apache/beam/pull/10272#discussion_r354264123
 
 

 ##
 File path: website/src/contribute/release-guide.md
 ##
 @@ -691,6 +691,15 @@ done
 ./gradlew :sdks:go:container:dockerPush -Pdocker-tag=${RELEASE}_rc{RC_NUM}
 ```
 
+* Build Flink job server images and push to DockerHub.
+
+```
+FLINK_VER=("1.7", "1.8", "1.9")
 
 Review comment:
   Something like the following could do:
   
   ```suggestion
   FLINK_VER=($(ls -1 runners/flink | awk '/^[0-9]+\.[0-9]+$/{print}'))
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354210)
Time Spent: 1h 50m  (was: 1h 40m)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8895:
---
Description: 
```
SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request*07:57:32* {*07:57:32*   "code" : 400,*07:57:32*   "errors" 
: [ {*07:57:32* "domain" : "global",*07:57:32* "message" : 
"Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",*07:57:32* "reason" : 
"invalid"*07:57:32*   } ],*07:57:32*   "message" : "Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",*07:57:32*   "status" : 
"INVALID_ARGUMENT"*07:57:32* }
```

  was:
 
SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request*07:57:32* {*07:57:32*   "code" : 400,*07:57:32*   "errors" 
: [ {*07:57:32* "domain" : "global",*07:57:32* "message" : 
"Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",*07:57:32* "reason" : 
"invalid"*07:57:32*   } ],*07:57:32*   "message" : "Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",*07:57:32*   "status" : 
"INVALID_ARGUMENT"*07:57:32* }


> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>
> ```
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request*07:57:32* {*07:57:32*   "code" : 400,*07:57:32*   
> "errors" : [ {*07:57:32* "domain" : "global",*07:57:32* 
> "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",*07:57:32* "reason" : 
> "invalid"*07:57:32*   } ],*07:57:32*   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",*07:57:32*   "status" : 
> "INVALID_ARGUMENT"*07:57:32* }
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8895:

Status: Open  (was: Triage Needed)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8895:
---
Description: 
{code:java}
SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request
07:57:32 {
07:57:32   "code" : 400,
07:57:32   "errors" : [ {
07:57:32 "domain" : "global",
07:57:32 "message" : "Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",
07:57:32 "reason" : "invalid"
07:57:32   } ],
07:57:32   "message" : "Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",
07:57:32   "status" : "INVALID_ARGUMENT"
07:57:32 }
{code}

  was:
```
SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request*07:57:32* {*07:57:32*   "code" : 400,*07:57:32*   "errors" 
: [ {*07:57:32* "domain" : "global",*07:57:32* "message" : 
"Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",*07:57:32* "reason" : 
"invalid"*07:57:32*   } ],*07:57:32*   "message" : "Invalid table ID 
\"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
be alphanumeric (plus underscores) and must be at most 1024 characters long. 
Also, Table decorators cannot be used.",*07:57:32*   "status" : 
"INVALID_ARGUMENT"*07:57:32* }
```


> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread Michal Walenia (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michal Walenia reassigned BEAM-8895:


Assignee: Michal Walenia

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=354212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354212
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:50
Start Date: 05/Dec/19 11:50
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10272: [BEAM-8337] 
publish Flink job server container images
URL: https://github.com/apache/beam/pull/10272#discussion_r354264858
 
 

 ##
 File path: website/src/contribute/release-guide.md
 ##
 @@ -691,6 +691,15 @@ done
 ./gradlew :sdks:go:container:dockerPush -Pdocker-tag=${RELEASE}_rc{RC_NUM}
 ```
 
+* Build Flink job server images and push to DockerHub.
+
+```
+FLINK_VER=("1.7", "1.8", "1.9")
 
 Review comment:
   Perhaps a sanity check that the array length is greater 0 would also be 
useful.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354212)
Time Spent: 2h  (was: 1h 50m)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=354214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354214
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:53
Start Date: 05/Dec/19 11:53
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10272: [BEAM-8337] 
publish Flink job server container images
URL: https://github.com/apache/beam/pull/10272#discussion_r354266460
 
 

 ##
 File path: website/src/contribute/release-guide.md
 ##
 @@ -691,6 +691,15 @@ done
 ./gradlew :sdks:go:container:dockerPush -Pdocker-tag=${RELEASE}_rc{RC_NUM}
 ```
 
+* Build Flink job server images and push to DockerHub.
+
+```
+FLINK_VER=("1.7", "1.8", "1.9")
 
 Review comment:
   Since all the Flink Runner versions are independent projects at the moment 
(sharing a base template), I'm not sure there is a better solution.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354214)
Time Spent: 2h 10m  (was: 2h)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=354215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354215
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:54
Start Date: 05/Dec/19 11:54
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10161: [BEAM-8746] Make local 
job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#issuecomment-562097731
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354215)
Time Spent: 2h 50m  (was: 2h 40m)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354216&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354216
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:56
Start Date: 05/Dec/19 11:56
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #10294: [BEAM-8895] 
Add BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294
 
 
   R: @lgajowy 
   WDYT?
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCo

[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354217&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354217
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:56
Start Date: 05/Dec/19 11:56
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10294: [BEAM-8895] Add 
BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#issuecomment-562098581
 
 
   Run BigQueryIO Streaming Performance Test Java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354217)
Time Spent: 20m  (was: 10m)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8835) Artifact retrieval fails with FlinkUberJarJobServer

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8835?focusedWorklogId=354218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354218
 ]

ASF GitHub Bot logged work on BEAM-8835:


Author: ASF GitHub Bot
Created on: 05/Dec/19 11:57
Start Date: 05/Dec/19 11:57
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10274: [BEAM-8835] Disable 
Flink Uber Jar by default.
URL: https://github.com/apache/beam/pull/10274#issuecomment-562098816
 
 
   Finally passing :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354218)
Time Spent: 3h  (was: 2h 50m)

> Artifact retrieval fails with FlinkUberJarJobServer
> ---
>
> Key: BEAM-8835
> URL: https://issues.apache.org/jira/browse/BEAM-8835
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We seem to be able to stage artifacts and retrieve the manifest fine, but 
> retrieving the artifacts doesn't work. This happens on both my k8s Flink 
> cluster and on my local Flink cluster. At a quick glance the artifact is in 
> the jar where it should be. cc [~robertwb]
> 2019-11-21 18:43:39,336 INFO  
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService 
>  - GetArtifact name: "pickled_main_session"
> retrieval_token: "BEAM-PIPELINE/pipeline/artifact-manifest.json"
>  failed
> java.io.IOException: Unable to load 
> e1d24d848414cecf805a7b5c2b950c6430c20eb32875dac00b40f80f3c73a141/ea0d10d07f4601782ed647e8f6ba4a055be13674ab79fa0c6e2fa44917c5264c
>  with 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader@785297ac



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8861) Disallow self-signed certs by default

2019-12-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8861:
---
Fix Version/s: 2.18.0

> Disallow self-signed certs by default
> -
>
> Key: BEAM-8861
> URL: https://issues.apache.org/jira/browse/BEAM-8861
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Colm O hEigeartaigh
>Assignee: Colm O hEigeartaigh
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The elasticsearch component allows self-signed certs by default, which is not 
> secure. It should reject them by default - I'll add a PR for this with a 
> configuration option to enable the old behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8861) Disallow self-signed certs by default in ElasticsearchIO

2019-12-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8861:
---
Summary: Disallow self-signed certs by default in ElasticsearchIO  (was: 
Disallow self-signed certs by default)

> Disallow self-signed certs by default in ElasticsearchIO
> 
>
> Key: BEAM-8861
> URL: https://issues.apache.org/jira/browse/BEAM-8861
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Colm O hEigeartaigh
>Assignee: Colm O hEigeartaigh
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The elasticsearch component allows self-signed certs by default, which is not 
> secure. It should reject them by default - I'll add a PR for this with a 
> configuration option to enable the old behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354232
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 12:16
Start Date: 05/Dec/19 12:16
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10294: [BEAM-8895] Add 
BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#issuecomment-562104817
 
 
   Run BigQueryIO Streaming Performance Test Java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354232)
Time Spent: 0.5h  (was: 20m)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8861) Disallow self-signed certificates by default in ElasticsearchIO

2019-12-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8861:
---
Summary: Disallow self-signed certificates by default in ElasticsearchIO  
(was: Disallow self-signed certs by default in ElasticsearchIO)

> Disallow self-signed certificates by default in ElasticsearchIO
> ---
>
> Key: BEAM-8861
> URL: https://issues.apache.org/jira/browse/BEAM-8861
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Colm O hEigeartaigh
>Assignee: Colm O hEigeartaigh
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The elasticsearch component allows self-signed certs by default, which is not 
> secure. It should reject them by default - I'll add a PR for this with a 
> configuration option to enable the old behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354236
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 12:17
Start Date: 05/Dec/19 12:17
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10294: [BEAM-8895] 
Add BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#discussion_r354278074
 
 

 ##
 File path: 
sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOIT.java
 ##
 @@ -101,6 +101,8 @@ public static void setup() throws IOException {
 metricsBigQueryTable = options.getMetricsBigQueryTable();
 testBigQueryDataset = options.getTestBigQueryDataset();
 testBigQueryTable = String.format("%s_%s", options.getTestBigQueryTable(), 
TEST_ID);
+// BigQuery table names must contain only alphanumerics and underscores
+testBigQueryTable = testBigQueryTable.replace("[^A-Za-z0-9_]+", "");
 
 Review comment:
   I don't think this is the right solution to this. We hide some "magic" logic 
inside the test to avoid the error, whereas we should rather provide the proper 
name. 
   I see that we generate the name in the jenkins JobDSL file and that is the 
place where we should solve the problem imo (of course if i'm not missing 
anything). 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354236)
Time Spent: 40m  (was: 0.5h)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354237
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 12:17
Start Date: 05/Dec/19 12:17
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10294: [BEAM-8895] 
Add BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#discussion_r354279165
 
 

 ##
 File path: 
sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOIT.java
 ##
 @@ -101,6 +101,8 @@ public static void setup() throws IOException {
 metricsBigQueryTable = options.getMetricsBigQueryTable();
 testBigQueryDataset = options.getTestBigQueryDataset();
 testBigQueryTable = String.format("%s_%s", options.getTestBigQueryTable(), 
TEST_ID);
+// BigQuery table names must contain only alphanumerics and underscores
+testBigQueryTable = testBigQueryTable.replace("[^A-Za-z0-9_]+", "");
 
 Review comment:
   Other than that I would avoid leaving a comment and assigning the variable 
twice. Rather than that, it would be better to have a well-named method 
encapsulating the behavior. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354237)
Time Spent: 50m  (was: 40m)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354238&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354238
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 12:18
Start Date: 05/Dec/19 12:18
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10294: [BEAM-8895] 
Add BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#discussion_r354279759
 
 

 ##
 File path: 
sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOIT.java
 ##
 @@ -101,6 +101,8 @@ public static void setup() throws IOException {
 metricsBigQueryTable = options.getMetricsBigQueryTable();
 testBigQueryDataset = options.getTestBigQueryDataset();
 testBigQueryTable = String.format("%s_%s", options.getTestBigQueryTable(), 
TEST_ID);
+// BigQuery table names must contain only alphanumerics and underscores
+testBigQueryTable = testBigQueryTable.replace("[^A-Za-z0-9_]+", "");
 
 Review comment:
   I see that you posted new changes - the comments still apply :) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354238)
Time Spent: 1h  (was: 50m)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354240
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 12:21
Start Date: 05/Dec/19 12:21
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #10294: [BEAM-8895] 
Add BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#discussion_r354281031
 
 

 ##
 File path: 
sdks/java/io/bigquery-io-perf-tests/src/test/java/org/apache/beam/sdk/bigqueryioperftests/BigQueryIOIT.java
 ##
 @@ -101,6 +101,8 @@ public static void setup() throws IOException {
 metricsBigQueryTable = options.getMetricsBigQueryTable();
 testBigQueryDataset = options.getTestBigQueryDataset();
 testBigQueryTable = String.format("%s_%s", options.getTestBigQueryTable(), 
TEST_ID);
+// BigQuery table names must contain only alphanumerics and underscores
+testBigQueryTable = testBigQueryTable.replace("[^A-Za-z0-9_]+", "");
 
 Review comment:
   The issue stems from the fact that in order to achieve unique table names 
(to avoid accidental collisions in case of multiple tests running at the same 
time) we append the test UUID to the table name. The UUID contains offending 
characters, not the name supplied in test definition.
   
   I agree with the point about method extraction. I'll do just that. Thanks 
for the suggestion!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354240)
Time Spent: 1h 10m  (was: 1h)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354273&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354273
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 12:45
Start Date: 05/Dec/19 12:45
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10294: [BEAM-8895] Add 
BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#issuecomment-562113974
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354273)
Time Spent: 1h 20m  (was: 1h 10m)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8376) Add FirestoreIO connector to Java SDK

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8376?focusedWorklogId=354283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354283
 ]

ASF GitHub Bot logged work on BEAM-8376:


Author: ASF GitHub Bot
Created on: 05/Dec/19 13:09
Start Date: 05/Dec/19 13:09
Worklog Time Spent: 10m 
  Work Description: djelekar commented on pull request #10187: [BEAM-8376] 
Initial version of firestore connector JavaSDK
URL: https://github.com/apache/beam/pull/10187#discussion_r354302144
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreWriterFn.java
 ##
 @@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.io.gcp.firestore;
+
+import com.google.api.core.ApiFuture;
+import com.google.cloud.firestore.Firestore;
+import com.google.cloud.firestore.FirestoreException;
+import com.google.cloud.firestore.FirestoreOptions;
+import com.google.cloud.firestore.WriteResult;
+import com.google.rpc.Code;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.util.BackOff;
+import org.apache.beam.sdk.util.BackOffUtils;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.apache.beam.sdk.util.Sleeper;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+
+public class FirestoreWriterFn extends DoFn {
+
+private static final Set NON_RETRYABLE_ERRORS =
+ImmutableSet.of(
+Code.FAILED_PRECONDITION.getNumber(),
+Code.INVALID_ARGUMENT.getNumber(),
+Code.PERMISSION_DENIED.getNumber(),
+Code.UNAUTHENTICATED.getNumber());
+private static final Logger LOG = 
LoggerFactory.getLogger(FirestoreWriterFn.class);
+
+// Current batch of mutations to be written.
+private final List mutations = new ArrayList();
+private int mutationsSize = 0; // Accumulated size of protos in mutations.
+private WriteBatcher writeBatcher;
+private transient AdaptiveThrottler throttler;
+private final Counter throttledSeconds =
+Metrics.counter(FirestoreWriterFn.class, 
"cumulativeThrottlingSeconds");
+private final Counter rpcErrors =
+Metrics.counter(FirestoreWriterFn.class, "firestoreRpcErrors");
+private final Counter rpcSuccesses =
+Metrics.counter(FirestoreWriterFn.class, "firestoreRpcSuccesses");
+
+private static final int MAX_RETRIES = 5;
+private static final FluentBackoff BUNDLE_WRITE_BACKOFF =
+FluentBackoff.DEFAULT
+.withMaxRetries(MAX_RETRIES)
+.withInitialBackoff(Duration.standardSeconds(5));
+private String outputTable;
+private Firestore firestore;
+private String firestoreKey;
+
+public FirestoreWriterFn(String outputTable, String firestoreKey) {
+this.outputTable = outputTable;
+this.writeBatcher = new WriteBatcherImpl();
+this.firestoreKey = firestoreKey;
+}
+
+@StartBundle
+public void startBundle(StartBundleContext c) {
+firestore = FirestoreOptions.getDefaultInstance().getService();
+writeBatcher.start();
+if (throttler == null) {
+// Initialize throttler at first use, because it is not 
serializable.
+throttler = new AdaptiveThrottler(12, 1, 1.25);
+}
+}
+
+@ProcessElement
+public void processElement(ProcessContext c) throws Exception {
+mutations.add(c.element());
+if (mutations.size() >= 
writeBatcher.nextBatchSize(System.currentTimeMillis())) {
+flushBatch();
+}
+}
+
+@FinishBundle
+public void fini

[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=354284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354284
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 05/Dec/19 13:10
Start Date: 05/Dec/19 13:10
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-562122391
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354284)
Time Spent: 5h 20m  (was: 5h 10m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=354299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354299
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 05/Dec/19 13:49
Start Date: 05/Dec/19 13:49
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9765: 
[WIP][BEAM-8382] Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-562136257
 
 
   I'd like to come back to idea of using `RetryPolicy` and `BackoffStrategy` 
classes from AWS SDK proposed by @cmachgodaddy above. We recently had a similar 
issue with failing a pipeline because of `TransientKinesisException ` caused by 
reading 2 pipelines from the same shard. It was solved by creating a new 
`AWSClientsProvider`, based on current `BasicKinesisProvider`, used by default, 
and just overriding a method `getKinesisClient()` where we configured our 
custom `RetryPolicy` and `BackoffStrategy`. 
   
   The simplified example could look like this:
   ```
 @Override
 public AmazonKinesis getKinesisClient() {
   AmazonKinesisClientBuilder clientBuilder = 
AmazonKinesisClientBuilder.standard()
   .withCredentials(getCredentialsProvider());
   clientBuilder.withRegion(region);
   
   RetryPolicy.BackoffStrategy backoffStrategy = 
   new 
PredefinedBackoffStrategies.ExponentialBackoffStrategy(baseDelay, 
maxBackoffTime);
   RetryPolicy retryPolicy = new RetryPolicy(DEFAULT_RETRY_CONDITION, 
backoffStrategy,
   DEFAULT_MAX_ERROR_RETRY, true);
   
   ClientConfiguration clientConfiguration = new ClientConfiguration();
   
clientBuilder.withClientConfiguration(clientConfiguration.withRetryPolicy(retryPolicy));
   
   return clientBuilder.build();
 }
   ```
   So, instead of adding new API, which will do similar things that already 
implemented in AWS SDK, I think it would be enough to just update KinesisIO 
javadoc and add an example showing how to configure and leverage custom 
`AWSClientsProvider` in such case.
   WDYT?
   
   PS: @jfarr I'm sorry to ask you initially to add this functionality in 
KinesisIO using Beam SDK, I was not aware about such configuration option in 
`AmazonKinesisClient`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354299)
Time Spent: 10h 50m  (was: 10h 40m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?focusedWorklogId=354301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354301
 ]

ASF GitHub Bot logged work on BEAM-8895:


Author: ASF GitHub Bot
Created on: 05/Dec/19 13:53
Start Date: 05/Dec/19 13:53
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10294: [BEAM-8895] Add 
BigQuery table name sanitization to BigQueryIOIT
URL: https://github.com/apache/beam/pull/10294#issuecomment-562137953
 
 
   Run BigQueryIO Streaming Performance Test Java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354301)
Time Spent: 1.5h  (was: 1h 20m)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354305&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354305
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 13:58
Start Date: 05/Dec/19 13:58
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562139785
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354305)
Time Spent: 20m  (was: 10m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354310
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:01
Start Date: 05/Dec/19 14:01
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10293: 
[BEAM-8830] Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#discussion_r354325856
 
 

 ##
 File path: runners/spark/src/test/resources/log4j-test.properties
 ##
 @@ -28,6 +28,7 @@ log4j.appender.testlogger.layout.ConversionPattern=%d [%t] 
%-5p %c %x - %m%n
 
 # TestSparkRunner prints general information about test pipelines execution.
 log4j.logger.org.apache.beam.runners.spark.TestSparkRunner=INFO
+log4j.logger.org.apache.beam.runners.spark=DEBUG
 
 Review comment:
   The same question as above
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354310)
Time Spent: 0.5h  (was: 20m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354311
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:01
Start Date: 05/Dec/19 14:01
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10293: 
[BEAM-8830] Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#discussion_r354325607
 
 

 ##
 File path: runners/spark/src/test/resources/log4j-test.properties
 ##
 @@ -18,7 +18,7 @@
 
 # Set root logger level to ERROR.
 # set manually to INFO for debugging purposes.
-log4j.rootLogger=ERROR, testlogger
+log4j.rootLogger=DEBUG, testlogger
 
 Review comment:
   Do we really need to have DEBUG log level in tests by default?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354311)
Time Spent: 40m  (was: 0.5h)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354320&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354320
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:07
Start Date: 05/Dec/19 14:07
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#discussion_r354331970
 
 

 ##
 File path: runners/spark/src/test/resources/log4j-test.properties
 ##
 @@ -28,6 +28,7 @@ log4j.appender.testlogger.layout.ConversionPattern=%d [%t] 
%-5p %c %x - %m%n
 
 # TestSparkRunner prints general information about test pipelines execution.
 log4j.logger.org.apache.beam.runners.spark.TestSparkRunner=INFO
+log4j.logger.org.apache.beam.runners.spark=DEBUG
 
 Review comment:
   Same answer than above :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354320)
Time Spent: 1h  (was: 50m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354319&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354319
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:07
Start Date: 05/Dec/19 14:07
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#discussion_r354331754
 
 

 ##
 File path: runners/spark/src/test/resources/log4j-test.properties
 ##
 @@ -18,7 +18,7 @@
 
 # Set root logger level to ERROR.
 # set manually to INFO for debugging purposes.
-log4j.rootLogger=ERROR, testlogger
+log4j.rootLogger=DEBUG, testlogger
 
 Review comment:
   in test I think so because it prints Beam DAG and spark DAG
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354319)
Time Spent: 50m  (was: 40m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354321&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354321
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:10
Start Date: 05/Dec/19 14:10
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10293: [BEAM-8830] Fix 
flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562144516
 
 
   @aromanenko-dev thanks for the prompt review !
   I'll squash the commits but commit 1 and 2 are not related to the same part 
so I prefer keep the 2 separate.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354321)
Time Spent: 1h 10m  (was: 1h)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354323
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:12
Start Date: 05/Dec/19 14:12
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10293: [BEAM-8830] Fix 
flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562144516
 
 
   @aromanenko-dev thanks for the prompt review !
   I'll squash the commits into the last one but commit 1 has nothing to do 
with the rest so I'll keep it separate.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354323)
Time Spent: 1h 20m  (was: 1h 10m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354324&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354324
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:14
Start Date: 05/Dec/19 14:14
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10293: [BEAM-8830] Fix 
flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562146445
 
 
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354324)
Time Spent: 1.5h  (was: 1h 20m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8424:

Status: Open  (was: Triage Needed)

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8424:

Status: Open  (was: Triage Needed)

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reopened BEAM-8424:
-
  Assignee: Lukasz Gajowy

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-8424.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988846#comment-16988846
 ] 

Lukasz Gajowy commented on BEAM-8424:
-

The previous fix (pr #9819) mitigated the issue and tests didn't timeout after 
bumping the time limit to 4,5h. Currently, the tests exceed even this limit. 

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988847#comment-16988847
 ] 

Lukasz Gajowy commented on BEAM-8424:
-

Moreover, when the Jenkins job is aborted due to timeout, we are unable to see 
the gradle scan (gradle process is killed and scan is not generated). 

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8424:
---

Assignee: (was: Lukasz Gajowy)

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354326&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354326
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:20
Start Date: 05/Dec/19 14:20
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10293: 
[BEAM-8830] Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#discussion_r354338968
 
 

 ##
 File path: runners/spark/src/test/resources/log4j-test.properties
 ##
 @@ -18,7 +18,7 @@
 
 # Set root logger level to ERROR.
 # set manually to INFO for debugging purposes.
-log4j.rootLogger=ERROR, testlogger
+log4j.rootLogger=DEBUG, testlogger
 
 Review comment:
   I'm afraid to have too much logs for all Spark tests that could consume too 
much disk space on Jenkins servers (since Spark is very verbose). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354326)
Time Spent: 1h 40m  (was: 1.5h)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8424:

Summary: Java Dataflow ValidatesRunner tests are timeouting  (was: java 11 
dataflow validates runner tests are timeouting)

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=354329&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354329
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:23
Start Date: 05/Dec/19 14:23
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-562150319
 
 
   The dataflow VR tests are timeouting: 
https://issues.apache.org/jira/browse/BEAM-8424 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354329)
Time Spent: 5.5h  (was: 5h 20m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8424:

Description: 
[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]

[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]

these jobs take more than currently set timeout (3h). 

 

EDIT: currently, after reopening the issue the timeout is set to 4.5h. 

 

 

 

  was:
[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]

[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]


these jobs take more than currently set timeout (3h). 

 

 

 


> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354333
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 14:31
Start Date: 05/Dec/19 14:31
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562153757
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354333)
Time Spent: 1h 50m  (was: 1h 40m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8830:
---
Labels: structured-streaming  (was: )

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8860) Implement combine with context in new spark runner

2019-12-05 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8860:
---
Labels: structured-streaming  (was: )

> Implement combine with context in new spark runner
> --
>
> Key: BEAM-8860
> URL: https://issues.apache.org/jira/browse/BEAM-8860
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>
> Validates runner tests below fail in spark structured streaming runner 
> because combine with context (for side inputs) is not implemented:
>  
> 'org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContextEmpty''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSlidingWindowsCombineWithContext'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-05 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8894:
---
Labels: structured-streaming  (was: )

> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8827) Support Spark 3 on Spark Structured Streaming Runner

2019-12-05 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8827:
---
Labels: structured-streaming  (was: )

> Support Spark 3 on Spark Structured Streaming Runner 
> -
>
> Key: BEAM-8827
> URL: https://issues.apache.org/jira/browse/BEAM-8827
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>
> Spark Structured Streaming Runner uses currently non stable Spark APIs for 
> its Source translation, the DataSourceV2 API was changed on Spark 3. We need 
> probably a new class or a new compatibility layer to support this for Spark 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread fdiazgon (Jira)
fdiazgon created BEAM-8896:
--

 Summary: WITH query AS + SELECT query JOIN other throws invalid 
type
 Key: BEAM-8896
 URL: https://issues.apache.org/jira/browse/BEAM-8896
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Affects Versions: 2.16.0
Reporter: fdiazgon


The first one of the two following queries fails, despite queries being 
equivalent:
{code:java}
Pipeline p = Pipeline.create();

Schema schemaA =
Schema.of(
Schema.Field.of("id", Schema.FieldType.BYTES),
Schema.Field.of("fA1", Schema.FieldType.STRING));

Schema schemaB =
Schema.of(
Schema.Field.of("id", Schema.FieldType.STRING),
Schema.Field.of("fB1", Schema.FieldType.STRING));

PCollection inputA =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));

PCollection inputB =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));

// Fails
String query1 =
"WITH query AS "
+ "( "
+ " SELECT id, fA1, fA1 AS fA1_2 "
+ " FROM tblA"
+ ") "
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query "
+ "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";

// Ok
String query2 =
"WITH query AS "
+ "( "
+ " SELECT fA1, fB1, fA1 AS fA1_2 "
+ " FROM tblA "
+ " JOIN tblB "
+ " ON (TO_HEX(tblA.id) = tblB.id) "
+ ")"
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query ";

Schema transform2 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query2))
.getSchema();
System.out.println(transform2);

Schema transform1 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query1))
.getSchema();
System.out.println(transform1);
{code}
 

The error is:
{noformat}
Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is invalid 
for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread "main" 
java.lang.AssertionError: Field ordinal 2 is invalid for  type 
'RecordType(VARBINARY id, VARCHAR fA1)' at 
org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
 

If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), both 
queries work fine. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-8470.

Fix Version/s: 2.18.0
   Resolution: Fixed

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
> Fix For: 2.18.0
>
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8852) KafkaIO withEOS option does not use the topic provided in ProducerRecord

2019-12-05 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988905#comment-16988905
 ] 

Alexey Romanenko commented on BEAM-8852:


Yes, I think that current implementation of EOS sink in KafkaIO supports only 
one output topic. So we can't use topic name from ProducerRecord since they 
could be different and not even known in advance. Perhaps, we need add this 
into javadoc and warn user more clearly about that.

> KafkaIO withEOS option does not use the topic provided in ProducerRecord
> 
>
> Key: BEAM-8852
> URL: https://issues.apache.org/jira/browse/BEAM-8852
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.15.0
>Reporter: DAVID MARTINEZ MATA
>Priority: Major
>  Labels: eos, kafka
>
> Work In Progress defining the case accurately



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354363
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 15:23
Start Date: 05/Dec/19 15:23
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562139785
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354363)
Time Spent: 2h  (was: 1h 50m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354365
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 15:23
Start Date: 05/Dec/19 15:23
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562153757
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354365)
Time Spent: 2h 20m  (was: 2h 10m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354364
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 15:23
Start Date: 05/Dec/19 15:23
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562176789
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354364)
Time Spent: 2h 10m  (was: 2h)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4287) SplittableDoFn: splitAtFraction() API for Java

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4287?focusedWorklogId=354366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354366
 ]

ASF GitHub Bot logged work on BEAM-4287:


Author: ASF GitHub Bot
Created on: 05/Dec/19 15:23
Start Date: 05/Dec/19 15:23
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10295: Revert 
"[BEAM-4287] Add trySplit API to Java restriction tracker matc…
URL: https://github.com/apache/beam/pull/10295
 
 
   …hing Python SDK definition. Remove GetPartition()"
   
   This reverts commit 1ed36b430527e1c93c5763d23229379b1eb9f582, reversing
   changes made to 5143d408ebbbd044a024cfd84b87f70dd42b3ead.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Statu

[jira] [Work logged] (BEAM-4287) SplittableDoFn: splitAtFraction() API for Java

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4287?focusedWorklogId=354367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354367
 ]

ASF GitHub Bot logged work on BEAM-4287:


Author: ASF GitHub Bot
Created on: 05/Dec/19 15:24
Start Date: 05/Dec/19 15:24
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10295: Revert "[BEAM-4287] 
Add trySplit API to Java restriction tracker matc…
URL: https://github.com/apache/beam/pull/10295#issuecomment-562177107
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354367)
Time Spent: 1.5h  (was: 1h 20m)

> SplittableDoFn: splitAtFraction() API for Java
> --
>
> Key: BEAM-4287
> URL: https://issues.apache.org/jira/browse/BEAM-4287
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> SDF currently only has a checkpoint() API. This Jira is about adding the 
> splitAtFraction() API and its support in runners that support the respective 
> feature for sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988910#comment-16988910
 ] 

Lukasz Gajowy commented on BEAM-8424:
-

I created a PR where I test if the only Java-related change that was introduced 
right before the tests started timeouting affects the VR tests execution time: 
[https://github.com/apache/beam/pull/10295]

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354394&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354394
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 15:46
Start Date: 05/Dec/19 15:46
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#discussion_r354390280
 
 

 ##
 File path: runners/spark/src/test/resources/log4j-test.properties
 ##
 @@ -18,7 +18,7 @@
 
 # Set root logger level to ERROR.
 # set manually to INFO for debugging purposes.
-log4j.rootLogger=ERROR, testlogger
+log4j.rootLogger=DEBUG, testlogger
 
 Review comment:
   well it is just for the runner not for the spark engine. And the runner puts 
so few logs (I think only the DAGs).
But anyway, it appears that for some reason, to have the debug logs even in 
UTests, it requires to change the production conf log file! Spark has always 
been quite difficult to set up with log4j. Maybe it is due to the test file 
named log4j-**test**.properties. Anyway I'll put it back to error until we 
figure out how to configure it properly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354394)
Time Spent: 2.5h  (was: 2h 20m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354403
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 15:52
Start Date: 05/Dec/19 15:52
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10293: [BEAM-8830] 
Fix flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562189718
 
 
   Some ValidatesRunner tests started to fail even if they passed before
   ```
   16:24:54 > Task :runners:spark:validatesStructuredStreamingRunnerBatch
   16:39:08 
   16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testPairWithIndexWindowedTimestampedBounded FAILED
   16:39:08 java.lang.RuntimeException at SplittableDoFnTest.java:220
   16:39:08 Caused by: org.apache.spark.SparkException
   16:39:08 
   16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testPairWithIndexBasicBounded FAILED
   16:39:08 java.lang.IllegalStateException at SplittableDoFnTest.java:153
   16:39:08 
   16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testOutputAfterCheckpointBounded FAILED
   16:39:08 java.lang.IllegalStateException at SplittableDoFnTest.java:311
   16:39:08 
   16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testAdditionalOutputBounded FAILED
   16:39:08 java.lang.IllegalStateException at SplittableDoFnTest.java:637
   16:39:36 
   16:39:36 179 tests completed, 4 failed, 1 skipped
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354403)
Time Spent: 2h 40m  (was: 2.5h)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354414&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354414
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:00
Start Date: 05/Dec/19 16:00
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10293: [BEAM-8830] Fix 
flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562193544
 
 
   > Some ValidatesRunner tests started to fail even if they passed before
   > 
   > ```
   > 16:24:54 > Task :runners:spark:validatesStructuredStreamingRunnerBatch
   > 16:39:08 
   > 16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testPairWithIndexWindowedTimestampedBounded FAILED
   > 16:39:08 java.lang.RuntimeException at SplittableDoFnTest.java:220
   > 16:39:08 Caused by: org.apache.spark.SparkException
   > 16:39:08 
   > 16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testPairWithIndexBasicBounded FAILED
   > 16:39:08 java.lang.IllegalStateException at SplittableDoFnTest.java:153
   > 16:39:08 
   > 16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testOutputAfterCheckpointBounded FAILED
   > 16:39:08 java.lang.IllegalStateException at SplittableDoFnTest.java:311
   > 16:39:08 
   > 16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testAdditionalOutputBounded FAILED
   > 16:39:08 java.lang.IllegalStateException at SplittableDoFnTest.java:637
   > 16:39:36 
   > 16:39:36 179 tests completed, 4 failed, 1 skipped
   > ```
   
   Strange because it does not use emptyDataset (the only thing that has 
changed). It might be a side effect, I'll take a look
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354414)
Time Spent: 2h 50m  (was: 2h 40m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov reassigned BEAM-8896:
---

Assignee: Andrew Pilloud

> WITH query AS + SELECT query JOIN other throws invalid type
> ---
>
> Key: BEAM-8896
> URL: https://issues.apache.org/jira/browse/BEAM-8896
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.16.0
>Reporter: fdiazgon
>Assignee: Andrew Pilloud
>Priority: Major
>
> The first one of the two following queries fails, despite queries being 
> equivalent:
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schemaA =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.BYTES),
> Schema.Field.of("fA1", Schema.FieldType.STRING));
> Schema schemaB =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.STRING),
> Schema.Field.of("fB1", Schema.FieldType.STRING));
> PCollection inputA =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));
> PCollection inputB =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));
> // Fails
> String query1 =
> "WITH query AS "
> + "( "
> + " SELECT id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";
> // Ok
> String query2 =
> "WITH query AS "
> + "( "
> + " SELECT fA1, fB1, fA1 AS fA1_2 "
> + " FROM tblA "
> + " JOIN tblB "
> + " ON (TO_HEX(tblA.id) = tblB.id) "
> + ")"
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query ";
> Schema transform2 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query2))
> .getSchema();
> System.out.println(transform2);
> Schema transform1 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query1))
> .getSchema();
> System.out.println(transform1);
> {code}
>  
> The error is:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is 
> invalid for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread 
> "main" java.lang.AssertionError: Field ordinal 2 is invalid for  type 
> 'RecordType(VARBINARY id, VARCHAR fA1)' at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
>  
> If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), both 
> queries work fine. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Gleb Kanterov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988944#comment-16988944
 ] 

Gleb Kanterov commented on BEAM-8896:
-

cc [~kirillkozlov]

> WITH query AS + SELECT query JOIN other throws invalid type
> ---
>
> Key: BEAM-8896
> URL: https://issues.apache.org/jira/browse/BEAM-8896
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.16.0
>Reporter: fdiazgon
>Assignee: Andrew Pilloud
>Priority: Major
>
> The first one of the two following queries fails, despite queries being 
> equivalent:
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schemaA =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.BYTES),
> Schema.Field.of("fA1", Schema.FieldType.STRING));
> Schema schemaB =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.STRING),
> Schema.Field.of("fB1", Schema.FieldType.STRING));
> PCollection inputA =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));
> PCollection inputB =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));
> // Fails
> String query1 =
> "WITH query AS "
> + "( "
> + " SELECT id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";
> // Ok
> String query2 =
> "WITH query AS "
> + "( "
> + " SELECT fA1, fB1, fA1 AS fA1_2 "
> + " FROM tblA "
> + " JOIN tblB "
> + " ON (TO_HEX(tblA.id) = tblB.id) "
> + ")"
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query ";
> Schema transform2 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query2))
> .getSchema();
> System.out.println(transform2);
> Schema transform1 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query1))
> .getSchema();
> System.out.println(transform1);
> {code}
>  
> The error is:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is 
> invalid for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread 
> "main" java.lang.AssertionError: Field ordinal 2 is invalid for  type 
> 'RecordType(VARBINARY id, VARCHAR fA1)' at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
>  
> If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), both 
> queries work fine. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread fdiazgon (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fdiazgon updated BEAM-8896:
---
Description: 
The first one of the two following queries fails, despite queries being 
equivalent:
{code:java}
Pipeline p = Pipeline.create();

Schema schemaA =
Schema.of(
Schema.Field.of("id", Schema.FieldType.BYTES),
Schema.Field.of("fA1", Schema.FieldType.STRING));

Schema schemaB =
Schema.of(
Schema.Field.of("id", Schema.FieldType.STRING),
Schema.Field.of("fB1", Schema.FieldType.STRING));

PCollection inputA =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));

PCollection inputB =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));

// Fails
String query1 =
"WITH query AS "
+ "( "
+ " SELECT id, fA1, fA1 AS fA1_2 "
+ " FROM tblA"
+ ") "
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query "
+ "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";

// Ok
String query2 =
"WITH query AS "
+ "( "
+ " SELECT fA1, fB1, fA1 AS fA1_2 "
+ " FROM tblA "
+ " JOIN tblB "
+ " ON (TO_HEX(tblA.id) = tblB.id) "
+ ")"
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query ";

// Ok
String query3 =
"WITH query AS "
+ "( "
+ " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
+ " FROM tblA"
+ ") "
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query "
+ "JOIN tblB ON (query.id = tblB.id)";

Schema transform3 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query3))
.getSchema();
System.out.println(transform3);

Schema transform2 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query2))
.getSchema();
System.out.println(transform2);

Schema transform1 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query1))
.getSchema();
System.out.println(transform1);
{code}
 

The error is:
{noformat}
Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is invalid 
for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread "main" 
java.lang.AssertionError: Field ordinal 2 is invalid for  type 
'RecordType(VARBINARY id, VARCHAR fA1)' at 
org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
 

If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), all 
queries work fine. 

  was:
The first one of the two following queries fails, despite queries being 
equivalent:
{code:java}
Pipeline p = Pipeline.create();

Schema schemaA =
Schema.of(
Schema.Field.of("id", Schema.FieldType.BYTES),
Schema.Field.of("fA1", Schema.FieldType.STRING));

Schema schemaB =
Schema.of(
Schema.Field.of("id", Schema.FieldType.STRING),
Schema.Field.of("fB1", Schema.FieldType.STRING));

PCollection inputA =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));

PCollection inputB =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));

// Fails
String query1 =
"WITH query AS "
+ "( "
+ " SELECT id, fA1, fA1 AS fA1_2 "
+ " FROM tblA"
+ ") "
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query "
+ "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";

// Ok
String query2 =
"WITH query AS "
+ "( "
+ " SELECT fA1, fB1, fA1 AS fA1_2 "
+ " FROM tblA "
+ " JOIN tblB "
+ " ON (TO_HEX(tblA.id) = tblB.id) "
+ ")"
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query ";

Schema transform2 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query2))
.getSchema();
System.out.println(transform2);

Schema transform1 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query1))
.getSchema();
System.out.println(transform1);
{code}
 

The error is:
{noformat}
Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is invalid 
for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread "main" 
java.lang.AssertionError: Field ordinal 2 is invalid for  type 
'RecordType(VARBINARY id, VARCHAR fA1)' at 
org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
 

If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), both 
queries work fine. 


> WITH query AS + SELECT query JOIN other throws invalid type
> ---
>
> Key: BEAM-8896
> URL: https://issues.apache.org/jira/browse/BEAM-8896
> Project: Beam
>  Issue Type: Bug

[jira] [Updated] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread fdiazgon (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fdiazgon updated BEAM-8896:
---
Description: 
The first one of the three following queries fails, despite queries being 
equivalent:
{code:java}
Pipeline p = Pipeline.create();

Schema schemaA =
Schema.of(
Schema.Field.of("id", Schema.FieldType.BYTES),
Schema.Field.of("fA1", Schema.FieldType.STRING));

Schema schemaB =
Schema.of(
Schema.Field.of("id", Schema.FieldType.STRING),
Schema.Field.of("fB1", Schema.FieldType.STRING));

PCollection inputA =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));

PCollection inputB =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));

// Fails
String query1 =
"WITH query AS "
+ "( "
+ " SELECT id, fA1, fA1 AS fA1_2 "
+ " FROM tblA"
+ ") "
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query "
+ "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";

// Ok
String query2 =
"WITH query AS "
+ "( "
+ " SELECT fA1, fB1, fA1 AS fA1_2 "
+ " FROM tblA "
+ " JOIN tblB "
+ " ON (TO_HEX(tblA.id) = tblB.id) "
+ ")"
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query ";

// Ok
String query3 =
"WITH query AS "
+ "( "
+ " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
+ " FROM tblA"
+ ") "
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query "
+ "JOIN tblB ON (query.id = tblB.id)";

Schema transform3 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query3))
.getSchema();
System.out.println(transform3);

Schema transform2 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query2))
.getSchema();
System.out.println(transform2);

Schema transform1 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query1))
.getSchema();
System.out.println(transform1);
{code}
 

The error is:
{noformat}
Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is invalid 
for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread "main" 
java.lang.AssertionError: Field ordinal 2 is invalid for  type 
'RecordType(VARBINARY id, VARCHAR fA1)' at 
org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
 

If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), all 
queries work fine. 

  was:
The first one of the two following queries fails, despite queries being 
equivalent:
{code:java}
Pipeline p = Pipeline.create();

Schema schemaA =
Schema.of(
Schema.Field.of("id", Schema.FieldType.BYTES),
Schema.Field.of("fA1", Schema.FieldType.STRING));

Schema schemaB =
Schema.of(
Schema.Field.of("id", Schema.FieldType.STRING),
Schema.Field.of("fB1", Schema.FieldType.STRING));

PCollection inputA =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));

PCollection inputB =

p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));

// Fails
String query1 =
"WITH query AS "
+ "( "
+ " SELECT id, fA1, fA1 AS fA1_2 "
+ " FROM tblA"
+ ") "
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query "
+ "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";

// Ok
String query2 =
"WITH query AS "
+ "( "
+ " SELECT fA1, fB1, fA1 AS fA1_2 "
+ " FROM tblA "
+ " JOIN tblB "
+ " ON (TO_HEX(tblA.id) = tblB.id) "
+ ")"
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query ";

// Ok
String query3 =
"WITH query AS "
+ "( "
+ " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
+ " FROM tblA"
+ ") "
+ "SELECT fA1, fB1, fA1_2 "
+ "FROM query "
+ "JOIN tblB ON (query.id = tblB.id)";

Schema transform3 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query3))
.getSchema();
System.out.println(transform3);

Schema transform2 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query2))
.getSchema();
System.out.println(transform2);

Schema transform1 =
PCollectionTuple.of("tblA", inputA)
.and("tblB", inputB)
.apply(SqlTransform.query(query1))
.getSchema();
System.out.println(transform1);
{code}
 

The error is:
{noformat}
Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is invalid 
for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread "main" 
java.lang.AssertionError: Field ordinal 2 is invalid for  type 
'RecordType(VARBINARY id, VARCHAR fA1)' at 
org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAc

[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354422&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354422
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:18
Start Date: 05/Dec/19 16:18
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10293: [BEAM-8830] Fix 
flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562201468
 
 
   > > Some ValidatesRunner tests started to fail even if they passed before
   > > ```
   > > 16:24:54 > Task :runners:spark:validatesStructuredStreamingRunnerBatch
   > > 16:39:08 
   > > 16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testPairWithIndexWindowedTimestampedBounded FAILED
   > > 16:39:08 java.lang.RuntimeException at SplittableDoFnTest.java:220
   > > 16:39:08 Caused by: org.apache.spark.SparkException
   > > 16:39:08 
   > > 16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testPairWithIndexBasicBounded FAILED
   > > 16:39:08 java.lang.IllegalStateException at 
SplittableDoFnTest.java:153
   > > 16:39:08 
   > > 16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testOutputAfterCheckpointBounded FAILED
   > > 16:39:08 java.lang.IllegalStateException at 
SplittableDoFnTest.java:311
   > > 16:39:08 
   > > 16:39:08 org.apache.beam.sdk.transforms.SplittableDoFnTest > 
testAdditionalOutputBounded FAILED
   > > 16:39:08 java.lang.IllegalStateException at 
SplittableDoFnTest.java:637
   > > 16:39:36 
   > > 16:39:36 179 tests completed, 4 failed, 1 skipped
   > > ```
   > 
   > Strange because it does not use emptyDataset (the only thing that has 
changed). It might be a side effect, I'll take a look
   
   The test passes on my machine. Maybe an unrelated flaky test
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354422)
Time Spent: 3h 20m  (was: 3h 10m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354420&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354420
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:18
Start Date: 05/Dec/19 16:18
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10293: [BEAM-8830] Fix 
flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562201209
 
 
   The test passes on my machine. Maybe an unrelated flaky test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354420)
Time Spent: 3h  (was: 2h 50m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354421&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354421
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:18
Start Date: 05/Dec/19 16:18
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10293: [BEAM-8830] Fix 
flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562201209
 
 
   The test passes on my machine. Maybe an unrelated flaky test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354421)
Time Spent: 3h 10m  (was: 3h)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?focusedWorklogId=354423&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354423
 ]

ASF GitHub Bot logged work on BEAM-8830:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:19
Start Date: 05/Dec/19 16:19
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10293: [BEAM-8830] Fix 
flatten in the new spark runner
URL: https://github.com/apache/beam/pull/10293#issuecomment-562201570
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354423)
Time Spent: 3.5h  (was: 3h 20m)

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8861) Disallow self-signed certificates by default in ElasticsearchIO

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8861?focusedWorklogId=354433&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354433
 ]

ASF GitHub Bot logged work on BEAM-8861:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:31
Start Date: 05/Dec/19 16:31
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10251: [BEAM-8861] 
Disallow self-signed certificates by default in ElasticsearchIO
URL: https://github.com/apache/beam/pull/10251
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354433)
Time Spent: 20m  (was: 10m)

> Disallow self-signed certificates by default in ElasticsearchIO
> ---
>
> Key: BEAM-8861
> URL: https://issues.apache.org/jira/browse/BEAM-8861
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Colm O hEigeartaigh
>Assignee: Colm O hEigeartaigh
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The elasticsearch component allows self-signed certs by default, which is not 
> secure. It should reject them by default - I'll add a PR for this with a 
> configuration option to enable the old behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8897) JdbcIO issue when type is Numeric

2019-12-05 Thread Mark Gates (Jira)
Mark Gates created BEAM-8897:


 Summary: JdbcIO issue when type is Numeric
 Key: BEAM-8897
 URL: https://issues.apache.org/jira/browse/BEAM-8897
 Project: Beam
  Issue Type: Bug
  Components: io-java-jdbc
Affects Versions: 2.16.0
 Environment: Java 8
Reporter: Mark Gates



The actual use case is such that I am connecting to Oracle via JDBC making the 
query: "select 1 from dual".

The number type results in an exception. Example:


{code:java}
PCollection row = p.apply(
JdbcIO.readRows()

.withDataSourceProviderFn(JdbcIO.PoolableDataSourceProvider.of(configuration))
.withQuery("select 1 as t1 from dual")
.withFetchSize(1)
.withOutputParallelization(true));


row.apply(ParDo.of(new DoFn() {
  @ProcessElement
  public void processElement(@Element Row row, OutputReceiver out, 
ProcessContext pc) {
System.out.println(row.getSchema().toString() + " : " + row.toString());
out.output(row);
  }
}));

p.run().waitUntilFinish();
{code}


Exception:
Exception in thread "main" 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.lang.IllegalArgumentException
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
at SimpleJdbc.main(SimpleJdbc.java:45)
Caused by: java.lang.IllegalArgumentException
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127)
at 
org.apache.beam.sdk.io.jdbc.LogicalTypes$FixedPrecisionNumeric.toInputType(LogicalTypes.java:237)
at 
org.apache.beam.sdk.io.jdbc.LogicalTypes$FixedPrecisionNumeric.toInputType(LogicalTypes.java:221)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil.lambda$createLogicalTypeExtractor$ca0ab2ec$1(SchemaUtil.java:272)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil$BeamRowMapper.mapRow(SchemaUtil.java:337)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil$BeamRowMapper.mapRow(SchemaUtil.java:315)
at 
org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn.processElement(JdbcIO.java:854)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8852) KafkaIO withEOS option does not use the topic provided in ProducerRecord

2019-12-05 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8852:
---
Status: Open  (was: Triage Needed)

> KafkaIO withEOS option does not use the topic provided in ProducerRecord
> 
>
> Key: BEAM-8852
> URL: https://issues.apache.org/jira/browse/BEAM-8852
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.15.0
>Reporter: DAVID MARTINEZ MATA
>Priority: Major
>  Labels: eos, kafka
>
> Work In Progress defining the case accurately



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=354438&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354438
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:34
Start Date: 05/Dec/19 16:34
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-562208201
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354438)
Time Spent: 13h 50m  (was: 13h 40m)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8861) Disallow self-signed certificates by default in ElasticsearchIO

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8861?focusedWorklogId=354440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354440
 ]

ASF GitHub Bot logged work on BEAM-8861:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:35
Start Date: 05/Dec/19 16:35
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10296: 
[release-2.18.0][BEAM-8861] Disallow self-signed certificates by default in 
ElasticsearchIO
URL: https://github.com/apache/beam/pull/10296
 
 
   This is the cherry-picked mentioned in the ML
   R: @udim 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354440)
Time Spent: 0.5h  (was: 20m)

> Disallow self-signed certificates by default in ElasticsearchIO
> ---
>
> Key: BEAM-8861
> URL: https://issues.apache.org/jira/browse/BEAM-8861
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Colm O hEigeartaigh
>Assignee: Colm O hEigeartaigh
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The elasticsearch component allows self-signed certs by default, which is not 
> secure. It should reject them by default - I'll add a PR for this with a 
> configuration option to enable the old behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=354439&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354439
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:35
Start Date: 05/Dec/19 16:35
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-562208319
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354439)
Time Spent: 14h  (was: 13h 50m)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8897) JdbcIO issue when type is Numeric

2019-12-05 Thread Mark Gates (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Gates updated BEAM-8897:
-
Description: 
The actual use case is such that I am connecting to Oracle via JDBC making the 
query: "select 1 from dual". If I try this on real data with a Number type 
column the same issue occurs.

The number type results in an exception. Example:


{code:java}
PCollection row = p.apply(
JdbcIO.readRows()

.withDataSourceProviderFn(JdbcIO.PoolableDataSourceProvider.of(configuration))
.withQuery("select 1 as t1 from dual")
.withFetchSize(1)
.withOutputParallelization(true));


row.apply(ParDo.of(new DoFn() {
  @ProcessElement
  public void processElement(@Element Row row, OutputReceiver out, 
ProcessContext pc) {
System.out.println(row.getSchema().toString() + " : " + row.toString());
out.output(row);
  }
}));

p.run().waitUntilFinish();
{code}


Exception:
Exception in thread "main" 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.lang.IllegalArgumentException
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
at SimpleJdbc.main(SimpleJdbc.java:45)
Caused by: java.lang.IllegalArgumentException
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127)
at 
org.apache.beam.sdk.io.jdbc.LogicalTypes$FixedPrecisionNumeric.toInputType(LogicalTypes.java:237)
at 
org.apache.beam.sdk.io.jdbc.LogicalTypes$FixedPrecisionNumeric.toInputType(LogicalTypes.java:221)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil.lambda$createLogicalTypeExtractor$ca0ab2ec$1(SchemaUtil.java:272)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil$BeamRowMapper.mapRow(SchemaUtil.java:337)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil$BeamRowMapper.mapRow(SchemaUtil.java:315)
at 
org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn.processElement(JdbcIO.java:854)

  was:

The actual use case is such that I am connecting to Oracle via JDBC making the 
query: "select 1 from dual".

The number type results in an exception. Example:


{code:java}
PCollection row = p.apply(
JdbcIO.readRows()

.withDataSourceProviderFn(JdbcIO.PoolableDataSourceProvider.of(configuration))
.withQuery("select 1 as t1 from dual")
.withFetchSize(1)
.withOutputParallelization(true));


row.apply(ParDo.of(new DoFn() {
  @ProcessElement
  public void processElement(@Element Row row, OutputReceiver out, 
ProcessContext pc) {
System.out.println(row.getSchema().toString() + " : " + row.toString());
out.output(row);
  }
}));

p.run().waitUntilFinish();
{code}


Exception:
Exception in thread "main" 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.lang.IllegalArgumentException
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
at SimpleJdbc.main(SimpleJdbc.java:45)
Caused by: java.lang.IllegalArgumentException
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127)
at 
org.apache.beam.sdk.io.jdbc.LogicalTypes$FixedPrecisionNumeric.toInputType(LogicalTypes.java:237)
at 
org.apache.beam.sdk.io.jdbc.LogicalTypes$FixedPrecisionNumeric.toInputType(LogicalTypes.java:221)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil.lambda$createLogicalTypeExtractor$ca0ab2ec$1(SchemaUtil.java:272)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil$BeamRowMapper.mapRow(SchemaUtil.java:337)
at 
org.apache.beam.sdk.io.jdbc.SchemaUtil$BeamRowMapper.mapRow(SchemaUtil.java:315)
at 
org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn.processElement(JdbcIO.java:854)


> JdbcIO issue when type is Numeric
> -
>
> Key: BEAM-8897
> URL: https://issues.apache.org/jira/browse/BEAM-8897
> Project: Beam

[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=354443&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354443
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:38
Start Date: 05/Dec/19 16:38
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-562209713
 
 
   I've tidied up commit history a bit. Also, I've renamed `ReadFromBigQuery` 
to `_ReadFromBigQuery` - I forgot to do this in an earlier commit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354443)
Time Spent: 14h 10m  (was: 14h)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8815) Portable pipeline execution without artifact staging

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8815?focusedWorklogId=354445&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354445
 ]

ASF GitHub Bot logged work on BEAM-8815:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:39
Start Date: 05/Dec/19 16:39
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #10285: [BEAM-8815] Define 
the no artifacts retrieval token in proto
URL: https://github.com/apache/beam/pull/10285#issuecomment-562210142
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354445)
Time Spent: 5h 10m  (was: 5h)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
> Fix For: 2.17.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354447&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354447
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 05/Dec/19 16:40
Start Date: 05/Dec/19 16:40
Worklog Time Spent: 10m 
  Work Description: amoght commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r354422677
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompressorInputStream.java
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import io.airlift.compress.lzo.LzoCodec;
+import java.io.IOException;
+import java.io.InputStream;
+import org.apache.commons.compress.compressors.CompressorInputStream;
+import org.apache.commons.compress.utils.CountingInputStream;
+import org.apache.commons.compress.utils.IOUtils;
+import org.apache.commons.compress.utils.InputStreamStatistics;
+
+/**
+ * {@link CompressorInputStream} implementation to create LZO encoded stream. 
Library relies on https://github.com/airlift/aircompressor/";>LZO
+ *
+ * @since 1.18
+ */
+public class LzoCompressorInputStream extends CompressorInputStream
+implements InputStreamStatistics {
+
+  private final CountingInputStream countingStream;
+  private final InputStream lzoIS;
+
+  /**
+   * Wraps the given stream into a aircompressor's HadoopLzoInputStream using 
the LzoCodec.
+   *
+   * @param inStream the stream to write to
+   * @throws IOException if aircompressor does
+   */
+  public LzoCompressorInputStream(final InputStream inStream) throws 
IOException {
+this.lzoIS =
+new LzoCodec().createInputStream(countingStream = new 
CountingInputStream(inStream));
+  }
+
+  @Override
+  public int available() throws IOException {
+return lzoIS.available();
+  }
+
+  @Override
+  public void close() throws IOException {
+lzoIS.close();
+  }
+
+  @Override
+  public int read(final byte[] b) throws IOException {
+return read(b, 0, b.length);
+  }
+
+  @Override
+  public long skip(final long n) throws IOException {
+return IOUtils.skip(lzoIS, n);
+  }
+
+  @Override
+  public void mark(final int readlimit) {
+lzoIS.mark(readlimit);
+  }
+
+  @Override
+  public boolean markSupported() {
+return lzoIS.markSupported();
+  }
+
+  @Override
+  public int read() throws IOException {
+final int ret = lzoIS.read();
+count(ret == -1 ? 0 : 1);
+return ret;
+  }
+
+  @Override
+  public int read(final byte[] buf, final int off, final int len) throws 
IOException {
+if (len == 0) {
+  return 0;
+}
+final int ret = lzoIS.read(buf, off, len);
 
 Review comment:
   No, this case is getting handled. This check has been put simply for the 
reason that if buffer length is 0, the read method doesn't even get executed 
and is handled here itself. Basically, to avoid unnecessary method call 
overhead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354447)
Time Spent: 2.5h  (was: 2h 20m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO

  1   2   3   4   >