[jira] [Updated] (BEAM-14556) Honor custom formatters being installed on the root logging handler

2022-06-03 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14556:
-
Fix Version/s: 2.40.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Honor custom formatters being installed on the root logging handler
> ---
>
> Key: BEAM-14556
> URL: https://issues.apache.org/jira/browse/BEAM-14556
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This will allow users to create custom formatters integrating with an MDC if 
> they so choose by using a JvmInitializer to install their custom formatter on 
> the root logging handler.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-6258) Data channel failing after some time for 1G data input

2022-06-03 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-6258:

Fix Version/s: 2.40.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Data channel failing after some time for 1G data input
> --
>
> Key: BEAM-6258
> URL: https://issues.apache.org/jira/browse/BEAM-6258
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Luke Cwik
>Priority: P3
> Fix For: 2.40.0
>
> Attachments: d44b7eda9e4c_java_server_logs.logs.gz, 
> d44b7eda9e4c_python_client_logs.log.bz2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Data channel and logging channel are failing after some time with 1GB input 
> data for chicago taxi.
>  
> E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with 
> error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
> Exception in thread read_grpc_client_inputs:
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
>  self.run()
>  File "/usr/local/lib/python2.7/threading.py", line 754, in run
>  self.__target(*self.__args, **self.__kwargs)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 273, in 
>  target=lambda: self._read_inputs(elements_iterator),
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RE

[jira] [Assigned] (BEAM-6258) Data channel failing after some time for 1G data input

2022-06-03 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-6258:
---

Assignee: Luke Cwik

> Data channel failing after some time for 1G data input
> --
>
> Key: BEAM-6258
> URL: https://issues.apache.org/jira/browse/BEAM-6258
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Luke Cwik
>Priority: P3
> Attachments: d44b7eda9e4c_java_server_logs.logs.gz, 
> d44b7eda9e4c_python_client_logs.log.bz2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Data channel and logging channel are failing after some time with 1GB input 
> data for chicago taxi.
>  
> E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with 
> error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
> Exception in thread read_grpc_client_inputs:
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
>  self.run()
>  File "/usr/local/lib/python2.7/threading.py", line 754, in run
>  self.__target(*self.__args, **self.__kwargs)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 273, in 
>  target=lambda: self._read_inputs(elements_iterator),
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/loc

[jira] [Created] (BEAM-14556) Honor custom formatters being installed on the root logging handler

2022-06-02 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14556:


 Summary: Honor custom formatters being installed on the root 
logging handler
 Key: BEAM-14556
 URL: https://issues.apache.org/jira/browse/BEAM-14556
 Project: Beam
  Issue Type: New Feature
  Components: runner-dataflow, sdk-java-harness
Reporter: Luke Cwik
Assignee: Luke Cwik


This will allow users to create custom formatters integrating with an MDC if 
they so choose by using a JvmInitializer to install their custom formatter on 
the root logging handler.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14539) JulHandlerPrintStream failing to buffer carry over bytes

2022-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14539:
-
Resolution: Fixed
Status: Resolved  (was: Open)

> JulHandlerPrintStream failing to buffer carry over bytes
> 
>
> Key: BEAM-14539
> URL: https://issues.apache.org/jira/browse/BEAM-14539
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.33.0, 2.34.0, 2.35.0, 2.36.0, 2.37.0, 2.38.0, 2.39.0
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> JulHandlerPrintStream was not flushing in a loop for large byte arrays which 
> meant that the carry over could be much larger then the assumed 6 bytes.
> Saw these logs for a job:
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: Index 9 out of bounds for length 6
> at 
> org.apache.beam.runners.dataflow.worker.logging.JulHandlerPrintStreamAdapterFactory$JulHandlerPrintStream.write(JulHandlerPrintStreamAdapterFactory.java:142)
> at java.base/java.io.PrintStream.write(PrintStream.java:559)
> at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
> at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
> at java.base/sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
> at 
> java.base/java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:184)
> at java.base/java.io.PrintStream.write(PrintStream.java:606)
> at java.base/java.io.PrintStream.print(PrintStream.java:745)
> at java.base/java.io.PrintStream.append(PrintStream.java:1147)
> at java.base/java.io.PrintStream.append(PrintStream.java:1188)
> at java.base/java.io.PrintStream.append(PrintStream.java:63)
> at java.base/java.util.Formatter$FixedString.print(Formatter.java:2754)
> at java.base/java.util.Formatter.format(Formatter.java:2661)
> at java.base/java.io.PrintStream.format(PrintStream.java:1053)
> at java.base/java.io.PrintStream.printf(PrintStream.java:949)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14539) JulHandlerPrintStream failing to buffer carry over bytes

2022-05-31 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14539:
-
Description: 
JulHandlerPrintStream was not flushing in a loop for large byte arrays which 
meant that the carry over could be much larger then the assumed 6 bytes.

Saw these logs for a job:
{noformat}
java.lang.ArrayIndexOutOfBoundsException: Index 9 out of bounds for length 6
at 
org.apache.beam.runners.dataflow.worker.logging.JulHandlerPrintStreamAdapterFactory$JulHandlerPrintStream.write(JulHandlerPrintStreamAdapterFactory.java:142)
at java.base/java.io.PrintStream.write(PrintStream.java:559)
at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
at java.base/sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
at java.base/java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:184)
at java.base/java.io.PrintStream.write(PrintStream.java:606)
at java.base/java.io.PrintStream.print(PrintStream.java:745)
at java.base/java.io.PrintStream.append(PrintStream.java:1147)
at java.base/java.io.PrintStream.append(PrintStream.java:1188)
at java.base/java.io.PrintStream.append(PrintStream.java:63)
at java.base/java.util.Formatter$FixedString.print(Formatter.java:2754)
at java.base/java.util.Formatter.format(Formatter.java:2661)
at java.base/java.io.PrintStream.format(PrintStream.java:1053)
at java.base/java.io.PrintStream.printf(PrintStream.java:949)
...
{noformat}


  was:
JulHandlerPrintStream was not flushing in a loop for large byte arrays which 
meant that the carry over could be much larger then the assumed 6 bytes.

Saw these logs for a job:
{noformat}
2022-05-26 07:40:30.284 PDT
at 
org.apache.beam.runners.dataflow.worker.logging.JulHandlerPrintStreamAdapterFactory$JulHandlerPrintStream.write(JulHandlerPrintStreamAdapterFactory.java:142)
2022-05-26 07:40:30.284 PDT
at java.base/java.io.PrintStream.write(PrintStream.java:559)

2022-05-26 07:40:30.284 PDT
at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
2022-05-26 07:40:30.285 PDT
at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
2022-05-26 07:40:30.286 PDT
at java.base/sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:184)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.write(PrintStream.java:606)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.print(PrintStream.java:745)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.append(PrintStream.java:1147)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.append(PrintStream.java:1188)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.append(PrintStream.java:63)
2022-05-26 07:40:30.286 PDT
at java.base/java.util.Formatter$FixedString.print(Formatter.java:2754)
2022-05-26 07:40:30.286 PDT
at java.base/java.util.Formatter.format(Formatter.java:2661)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.format(PrintStream.java:1053)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.printf(PrintStream.java:949)
{noformat}



> JulHandlerPrintStream failing to buffer carry over bytes
> 
>
> Key: BEAM-14539
> URL: https://issues.apache.org/jira/browse/BEAM-14539
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.33.0, 2.34.0, 2.35.0, 2.36.0, 2.37.0, 2.38.0, 2.39.0
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.40.0
>
>
> JulHandlerPrintStream was not flushing in a loop for large byte arrays which 
> meant that the carry over could be much larger then the assumed 6 bytes.
> Saw these logs for a job:
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: Index 9 out of bounds for length 6
> at 
> org.apache.beam.runners.dataflow.worker.logging.JulHandlerPrintStreamAdapterFactory$JulHandlerPrintStream.write(JulHandlerPrintStreamAdapterFactory.java:142)
> at java.base/java.io.PrintStream.write(PrintStream.java:559)
> at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
> at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
> at java.base/sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
> at 
> java.base/java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:184)
> at java.base/java.io.PrintStream.write(PrintStream.java:606)
> at java.base/java.io.PrintStream.print(PrintStream.java:745)
> at java.base/java.io.PrintStream.append(PrintStream.java:1147)
> at java.base/java.io.PrintStream.append(PrintStream.java:1188)
> at java.base/java.io.PrintStream.append(PrintStream.java:63)
> at java.base/java.ut

[jira] [Created] (BEAM-14539) JulHandlerPrintStream failing to buffer carry over bytes

2022-05-31 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14539:


 Summary: JulHandlerPrintStream failing to buffer carry over bytes
 Key: BEAM-14539
 URL: https://issues.apache.org/jira/browse/BEAM-14539
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Affects Versions: 2.39.0, 2.38.0, 2.37.0, 2.36.0, 2.35.0, 2.34.0, 2.33.0
Reporter: Luke Cwik
Assignee: Luke Cwik
 Fix For: 2.40.0


JulHandlerPrintStream was not flushing in a loop for large byte arrays which 
meant that the carry over could be much larger then the assumed 6 bytes.

Saw these logs for a job:
{noformat}
2022-05-26 07:40:30.284 PDT
at 
org.apache.beam.runners.dataflow.worker.logging.JulHandlerPrintStreamAdapterFactory$JulHandlerPrintStream.write(JulHandlerPrintStreamAdapterFactory.java:142)
2022-05-26 07:40:30.284 PDT
at java.base/java.io.PrintStream.write(PrintStream.java:559)

2022-05-26 07:40:30.284 PDT
at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
2022-05-26 07:40:30.285 PDT
at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
2022-05-26 07:40:30.286 PDT
at java.base/sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:184)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.write(PrintStream.java:606)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.print(PrintStream.java:745)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.append(PrintStream.java:1147)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.append(PrintStream.java:1188)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.append(PrintStream.java:63)
2022-05-26 07:40:30.286 PDT
at java.base/java.util.Formatter$FixedString.print(Formatter.java:2754)
2022-05-26 07:40:30.286 PDT
at java.base/java.util.Formatter.format(Formatter.java:2661)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.format(PrintStream.java:1053)
2022-05-26 07:40:30.286 PDT
at java.base/java.io.PrintStream.printf(PrintStream.java:949)
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-6258) Data channel failing after some time for 1G data input

2022-05-27 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543078#comment-17543078
 ] 

Luke Cwik commented on BEAM-6258:
-

The underlying issue was fixed in gRPC c-core and a min version update to 
1.33.1 will ensure that this no longer happens since it will contain 
https://github.com/grpc/grpc/commit/6e1655447ab2146a643114687d7916249bfdf018 
which is the fix for https://github.com/grpc/grpc-java/issues/5188

> Data channel failing after some time for 1G data input
> --
>
> Key: BEAM-6258
> URL: https://issues.apache.org/jira/browse/BEAM-6258
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Priority: P3
> Attachments: d44b7eda9e4c_java_server_logs.logs.gz, 
> d44b7eda9e4c_python_client_logs.log.bz2
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Data channel and logging channel are failing after some time with 1GB input 
> data for chicago taxi.
>  
> E1218 02:44:02.837680206 72 chttp2_transport.cc:1148] Received a GOAWAY with 
> error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
> Exception in thread read_grpc_client_inputs:
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
>  self.run()
>  File "/usr/local/lib/python2.7/threading.py", line 754, in run
>  self.__target(*self.__args, **self.__kwargs)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 273, in 
>  target=lambda: self._read_inputs(elements_iterator),
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  return self._next()
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 338, in 
> _next
>  raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.RESOURCE_EXHAUSTED, GOAWAY received)>
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 145, in _execute
>  response = task()
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 180, in 
>  self._execute(lambda: worker.do_instruction(work), work)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in do_instruction
>  request.instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 269, in process_bundle
>  bundle_processor.process_bundle(instruction_id)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 481, in process_bundle
>  instruction_id, expected_targets):
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 209, in input_elements
>  raise_(t, v, tb)
>  File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/data_plane.py",
>  line 260, in _read_inputs
>  for elements in elements_iterator:
>  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 347, in 
> next
>  retur

[jira] [Updated] (BEAM-14496) PrecombineGroupingTable not inheriting a values output

2022-05-24 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14496:
-
Resolution: Fixed
Status: Resolved  (was: Open)

> PrecombineGroupingTable not inheriting a values output
> --
>
> Key: BEAM-14496
> URL: https://issues.apache.org/jira/browse/BEAM-14496
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The precombine grouping table needs to produce an output with a timestamp 
> from one of the input records.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14298) Can't resolve org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde

2022-05-24 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14298:
-
Fix Version/s: Not applicable
 Assignee: Yi Hu  (was: Kenneth Knowles)
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Can't resolve org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde
> ---
>
> Key: BEAM-14298
> URL: https://issues.apache.org/jira/browse/BEAM-14298
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Liam Miller-Cushon
>Assignee: Yi Hu
>Priority: P0
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I am trying to run the 'Linkage checker' documented at 
> [https://cwiki.apache.org/confluence/display/BEAM/Dependency+Upgrades#DependencyUpgrades-Linkagecheckeranalysis]
>  
> I am seeing errors trying to resolve  
> org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde. There was a previous 
> report of the same dep not being available in standard maven repos 
> (https://issues.apache.org/jira/browse/BEAM-11689), which was marked fixed, 
> but I'm still seeing something similar.
>  
> {{./gradlew -Ppublishing -PjavaLinkageArtifactIds=beam-sdks-java-core 
> :checkJavaLinkage}}
> {{...}}
> {{* What went wrong:}}
> {{Execution failed for task ':sdks:java:io:hcatalog:compileJava'.}}
> {{> Could not resolve all files for configuration 
> ':sdks:java:io:hcatalog:compileClasspath'.}}
> {{   > Could not find org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.}}
> {{     Searched in the following locations:}}
> {{       - 
> file:~/src/beam/sdks/java/io/hcatalog/offline-repository/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom}}
> {{       - 
> https://repo.maven.apache.org/maven2/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom}}
> {{       - 
> file:~/.m2/repository/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom}}
> {{       - 
> https://public.nexus.pentaho.org/repository/proxy-public-3rd-party-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom}}
> {{       - 
> https://oss.sonatype.org/content/repositories/staging/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom}}
> {{       - 
> https://repository.apache.org/snapshots/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom}}
> {{       - 
> https://repository.apache.org/content/repositories/releases/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom}}
> {{     Required by:}}
> {{         project :sdks:java:io:hcatalog > org.apache.hive:hive-exec:2.1.0 > 
> org.apache.calcite:calcite-core:1.6.0}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14496) PrecombineGroupingTable not inheriting a values output

2022-05-20 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14496:


 Summary: PrecombineGroupingTable not inheriting a values output
 Key: BEAM-14496
 URL: https://issues.apache.org/jira/browse/BEAM-14496
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Luke Cwik
Assignee: Luke Cwik
 Fix For: 2.40.0


The precombine grouping table needs to produce an output with a timestamp from 
one of the input records.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-3576) protos should import one another via globally valid paths

2022-05-20 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-3576:

Fix Version/s: 2.39.0
 Assignee: Milan Patel
   Resolution: Duplicate
   Status: Resolved  (was: Open)

> protos should import one another via globally valid paths
> -
>
> Key: BEAM-3576
> URL: https://issues.apache.org/jira/browse/BEAM-3576
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Milan Patel
>Priority: P3
> Fix For: 2.39.0
>
>
> Right now the proto sub-bits depend on each other via jar since they build 
> via maven. In those jars, the proto files themselves are dumped in the root. 
> They should be placed in at least "beam/whatever.proto" or possibly 
> "apache/beam/whatever.proto". This will make the proto files valid in a 
> global namespace (like Java and Go imports and, I dare say, all well thought 
> out import mechanisms)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-13667) Users cannot provide their own sharding function when using FileIO

2022-05-19 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539913#comment-17539913
 ] 

Luke Cwik commented on BEAM-13667:
--

Really it would be best if support for dynamic sharing/GroupIntoBatches was 
added to the FlinkRunner. This would remove the need for specifying sharing 
functions.

> Users cannot provide their own sharding function when using FileIO
> --
>
> Key: BEAM-13667
> URL: https://issues.apache.org/jira/browse/BEAM-13667
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.35.0
>Reporter: Sandeep Kathula
>Priority: P3
>
> Beam uses RandomShardingFunction 
> ([https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L834])
>  by default for sharding when using FileIO.write().
>  
> RandomShardingFuncction doesn’t work well with Flink Runner. Its assigning 
> same key (hashDestination(destination, destinationCoder))  along with the 
> shard number. 
> ([https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L863])
> This is causing different ShardedKeys going to same task slot. As a result, 
> there is no equal distribution of files written from different task slots.
> E.g. 2 files are written by task slot 1, 3 files written by task slot 2, 0 
> files written by other task slots.
>  
> As a result, we are seeing Out of Memory from the pods writing more files.
>  
> At 
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L695-L698]
> There is an option to give a different sharding function. But there is no way 
> for the user to mention different sharding function when using FileIO class.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14429) SyntheticUnboundedSource(with SDF) produce duplicate records when split with DEFAULT_DESIRED_NUM_SPLITS

2022-05-10 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534506#comment-17534506
 ] 

Luke Cwik commented on BEAM-14429:
--

It is used by portable runners (e.g. dataflow runner V2) for all unbounded 
sources (e.g. non SDF Kafka).

> SyntheticUnboundedSource(with SDF) produce duplicate records when split with 
> DEFAULT_DESIRED_NUM_SPLITS
> ---
>
> Key: BEAM-14429
> URL: https://issues.apache.org/jira/browse/BEAM-14429
> Project: Beam
>  Issue Type: Bug
>  Components: io-common
>Reporter: Yichi Zhang
>Assignee: John Casey
>Priority: P1
> Fix For: 2.39.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> With the default 20 split, the num records produced by 
> Read.from(SyntheticUnboundedSource) is always larger than the numRecords 
> specified. the more splits the more actual number records produced is off. 
> And the Read step tends to take longer time with more splits.
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L512]
> The issue is manifested with java LoadTests on dataflow runner v2.
> Initial suspicion is that duplicate source readers for the same restriction 
> and checkpoint were created by multiple UnboundedSourceAsSDFWrapperFns.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14429) SyntheticUnboundedSource(with SDF) produce duplicate records when split with DEFAULT_DESIRED_NUM_SPLITS

2022-05-10 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534410#comment-17534410
 ] 

Luke Cwik commented on BEAM-14429:
--

Can you [fix this log 
statement|https://github.com/apache/beam/blob/5c21fbccec5e1e831dd0040bd7f631c050865430/sdks/java/io/synthetic/src/main/java/org/apache/beam/sdk/io/synthetic/SyntheticUnboundedSource.java#L112]
 so we can see the result of the initial split. All I see right now is:

{noformat}
Split into 20 bundles of sizes: 
[org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@1a0e6e3e, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@3d060183, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@413f3849, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@17e62ad8, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@7ef0d984, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@73f7d5c0, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@4b324687, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@6d2405d3, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@1560cd0a, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@587443b3, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@12b3044, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@4c6ca72f, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@5f773b35, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@25e482b3, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@7d152f5b, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@5469ba09, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@32219d22, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@335170cf, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@676afe09, 
org.apache.beam.sdk.io.synthetic.SyntheticUnboundedSource@7e4aa91f]
{noformat}


> SyntheticUnboundedSource(with SDF) produce duplicate records when split with 
> DEFAULT_DESIRED_NUM_SPLITS
> ---
>
> Key: BEAM-14429
> URL: https://issues.apache.org/jira/browse/BEAM-14429
> Project: Beam
>  Issue Type: Bug
>  Components: io-common
>Reporter: Yichi Zhang
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> With the default 20 split, the num records produced by 
> Read.from(SyntheticUnboundedSource) is always larger than the numRecords 
> specified. the more splits the more actual number records produced is off. 
> And the Read step tends to take longer time with more splits.
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L512]
> The issue is manifested with java LoadTests on dataflow runner v2.
> Initial suspicion is that duplicate source readers for the same restriction 
> and checkpoint were created by multiple UnboundedSourceAsSDFWrapperFns.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14064) ElasticSearchIO#Write buffering and outputting across windows

2022-05-03 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14064:
-
Resolution: Fixed
Status: Resolved  (was: In Progress)

> ElasticSearchIO#Write buffering and outputting across windows
> -
>
> Key: BEAM-14064
> URL: https://issues.apache.org/jira/browse/BEAM-14064
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch
>Affects Versions: 2.35.0, 2.36.0, 2.37.0
>Reporter: Luke Cwik
>Assignee: Evan Galpin
>Priority: P1
> Fix For: 2.39.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Source: https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db
> Bug PR: https://github.com/apache/beam/pull/15381
> ElasticsearchIO is collecting results from elements in window X and then 
> trying to output them in window Y when flushing the batch. This exposed a bug 
> where elements that were being buffered were being output as part of a 
> different window than what the window that produced them was.
> This became visible because validation was added recently to ensure that when 
> the pipeline is processing elements in window X that output with a timestamp 
> is valid for window X. Note that this validation only occurs in 
> *@ProcessElement* since output is associated with the current window with the 
> input element that is being processed.
> It is ok to do this in *@FinishBundle* since there is no existing windowing 
> context and when you output that element is assigned to an appropriate window.
> *Further Context*
> We’ve bisected it to being introduced in 2.35.0, and I’m reasonably certain 
> it’s this PR https://github.com/apache/beam/pull/15381
> Our scenario is pretty trivial, we read off Pubsub and write to Elastic in a 
> streaming job, the config for the source and sink is respectively
> {noformat}
> pipeline.apply(
> PubsubIO.readStrings().fromSubscription(subscription)
> ).apply(ParseJsons.of(OurObject::class.java))
> .setCoder(KryoCoder.of())
> {noformat}
> and
> {noformat}
> ElasticsearchIO.write()
> .withUseStatefulBatches(true)
> .withMaxParallelRequestsPerWindow(1)
> .withMaxBufferingDuration(Duration.standardSeconds(30))
> // 5 bytes **> KiB **> MiB, so 5 MiB
> .withMaxBatchSizeBytes(5L * 1024 * 1024)
> // # of docs
> .withMaxBatchSize(1000)
> .withConnectionConfiguration(
> ElasticsearchIO.ConnectionConfiguration.create(
> arrayOf(host),
> "fubar",
> "_doc"
> ).withConnectTimeout(5000)
> .withSocketTimeout(3)
> )
> .withRetryConfiguration(
> ElasticsearchIO.RetryConfiguration.create(
> 10,
> // the duration is wall clock, against the connection and 
> socket timeouts specified
> // above. I.e., 10 x 30s is gonna be more than 3 minutes, 
> so if we're getting
> // 10 socket timeouts in a row, this would ignore the 
> "10" part and terminate
> // after 6. The idea is that in a mixed failure mode, 
> you'd get different timeouts
> // of different durations, and on average 10 x fails < 4m.
> // That said, 4m is arbitrary, so adjust as and when 
> needed.
> Duration.standardMinutes(4)
> )
> )
> .withIdFn { f: JsonNode -> f["id"].asText() }
> .withIndexFn { f: JsonNode -> f["schema_name"].asText() }
> .withIsDeleteFn { f: JsonNode -> f["_action"].asText("noop") == 
> "delete" }
> {noformat}
> We recently tried upgrading 2.33 to 2.36 and immediately hit a bug in the 
> consumer, due to alleged time skew, specifically
> {noformat}
> 2022-03-07 10:48:37.886 GMTError message from worker: 
> java.lang.IllegalArgumentException: Cannot output with timestamp 
> 2022-03-07T10:43:38.640Z. Output timestamps must be no earlier than the 
> timestamp of the 
> current input (2022-03-07T10:43:43.562Z) minus the allowed skew (0 
> milliseconds) and no later than 294247-01-10T04:00:54.775Z. See the 
> DoFn#getAllowedTimestampSkew() Javadoc 
> for details on changing the allowed skew. 
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.checkTimestamp(SimpleDoFnRunner.java:446)
>  
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.outputWithTimestamp(SimpleDoFnRunner.java:422)
>  
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn

[jira] [Assigned] (BEAM-14387) DirectRunner does not update reference to currentRestriction when running in SDF

2022-04-29 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-14387:


Assignee: Pablo Estrada  (was: Luke Cwik)

> DirectRunner does not update reference to currentRestriction when running in 
> SDF
> 
>
> Key: BEAM-14387
> URL: https://issues.apache.org/jira/browse/BEAM-14387
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: P2
>
> I have an SDF implementation that looks like so:
>  
> {code:java}
> class MyRestrictionTracker {
>   MyRestriction restriction;
>   currentRestriction() { return restriction; }
>   tryClaim(MyPosition position) {
> this.restriction = new MyRestriction(position)
>   }
> }{code}
> I ran this on the DirectRunner, and the restriction would never advance: It 
> would get stuck on the very first value.
> I also ran this on DataflowRunner, and the problem did not exist there: This 
> ran fine.
>  
> I was able to fix this on the DirectRunner (it works well on Dataflow as 
> well) by changing the restriction to be mutable. Something like this:
>  
> {code:java}
> class MyRestrictionTracker {
>   MyRestriction restriction;
>   currentRestriction() { return restriction; }
>   tryClaim(MyPosition position) {
> this.restriction.position = position;
>   }
> }{code}
> This looks like an execution issue with SDF on DirectRunner: The DirectRunner 
> is likely storing a reference to `currentRestriction()` and never updating it 
> as it runs.
>  
> I'm happy to fix this on the DirectRunner - I would just like to find 
> pointers : )



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14387) DirectRunner does not update reference to currentRestriction when running in SDF

2022-04-29 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530321#comment-17530321
 ] 

Luke Cwik commented on BEAM-14387:
--


Run a debugger and start tracing from 
https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/SplittableProcessElementsEvaluatorFactory.java

Note that this will enter the SplittableProcessElementsInvoker which is shared 
implementation across Java based runners (e.g. dataflow v1, flink). I would 
check that the current bundle is being checkpointed since that is the only time 
the restriction advances from the perspective of the runner.

> DirectRunner does not update reference to currentRestriction when running in 
> SDF
> 
>
> Key: BEAM-14387
> URL: https://issues.apache.org/jira/browse/BEAM-14387
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Pablo Estrada
>Assignee: Luke Cwik
>Priority: P2
>
> I have an SDF implementation that looks like so:
>  
> {code:java}
> class MyRestrictionTracker {
>   MyRestriction restriction;
>   currentRestriction() { return restriction; }
>   tryClaim(MyPosition position) {
> this.restriction = new MyRestriction(position)
>   }
> }{code}
> I ran this on the DirectRunner, and the restriction would never advance: It 
> would get stuck on the very first value.
> I also ran this on DataflowRunner, and the problem did not exist there: This 
> ran fine.
>  
> I was able to fix this on the DirectRunner (it works well on Dataflow as 
> well) by changing the restriction to be mutable. Something like this:
>  
> {code:java}
> class MyRestrictionTracker {
>   MyRestriction restriction;
>   currentRestriction() { return restriction; }
>   tryClaim(MyPosition position) {
> this.restriction.position = position;
>   }
> }{code}
> This looks like an execution issue with SDF on DirectRunner: The DirectRunner 
> is likely storing a reference to `currentRestriction()` and never updating it 
> as it runs.
>  
> I'm happy to fix this on the DirectRunner - I would just like to find 
> pointers : )



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14184) DirectStreamObserver does not respect channel isReady

2022-04-27 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14184:
-
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> DirectStreamObserver does not respect channel isReady
> -
>
> Key: BEAM-14184
> URL: https://issues.apache.org/jira/browse/BEAM-14184
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.39.0
>
>
> Leads to OOMs like:
> {noformat}
> Output channel stalled for 1023s, outbound thread CHAIN MapPartition 
> (MapPartition at [1]PerformInference) -> FlatMap (FlatMap at 
> ExtractOutput[0]) -> Map (Key Extractor) -> GroupCombine (GroupCombine at 
> GroupCombine: 
> PerformInferenceAndCombineResults_dep_049/GroupPredictionsByImage) -> Map 
> (Key Extractor) (1/1). See: https://issues.apache.org/jira/browse/BEAM-4280 
> for the history for this issue.
> Feb 18, 2022 11:51:05 AM 
> org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyServerTransport 
> notifyTerminated
> INFO: Transport failed
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.OutOfDirectMemoryError:
>  failed to allocate 2097152 byte(s) of direct memory (used: 1205862679, max: 
> 1207959552)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188)
> {noformat}
> See more context in 
> https://lists.apache.org/thread/llmxodbmczhn10c98prs8wmd5hy4nvff



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14187) NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow

2022-04-26 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14187:
-
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

Follow-up NPE was fixed by https://github.com/apache/beam/pull/17454

> NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow
> --
>
> Key: BEAM-14187
> URL: https://issues.apache.org/jira/browse/BEAM-14187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Minbo Bae
>Assignee: Minbo Bae
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> h6. Problem
> Dataflow Java batch jobs with large side input intermittently throws 
> {{NullPointerException}} or {{{}IllegalStateException{}}}.
>  * 
> [NullPointerException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/npe.png]
>  happens at 
> [IsmReaderImpl.overKeyComponents|https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L217]:
>  * 
> [IllegalStateException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/IllegalStateException.png]
>  happens at [IsmReaderImpl. initializeForKeyedRead 
> |https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L500].
> (all error logs in the Dataflow job is 
> [here|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/downloaded-logs-20220327-171955.json].)
> h6. Hypothesis
> The {{initializeForKeyedRead}} is not synchronized. Multiple threads can 
> enter the method so that initialize the index for the same shard and update 
> {{indexPerShard}} without synchronization. And, the {{overKeyComponents}} 
> also accesses {{indexPerShard}} without synchronization. As {{indexPerShard}} 
> is just a {{HashMap}} which is not thread-safe, it can cause 
> {{NullPointerException}} and {{IllegalStateException}} above.
> h6. Suggestion
> I think it can fix this issue if we change the type of {{indexPerShard}} to a 
> thread-safe map (e.g. {{ConcurrentHashMap}}).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14187) NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow

2022-04-26 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528414#comment-17528414
 ] 

Luke Cwik commented on BEAM-14187:
--

There is a consistent NullPointerException now occurring at 
https://github.com/apache/beam/blob/8ecd51397550bff26f1a245dc4afd4ae6ea4c046/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L554

Optional.of has to be changed to Optional.fromNullable

> NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow
> --
>
> Key: BEAM-14187
> URL: https://issues.apache.org/jira/browse/BEAM-14187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Minbo Bae
>Assignee: Minbo Bae
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> h6. Problem
> Dataflow Java batch jobs with large side input intermittently throws 
> {{NullPointerException}} or {{{}IllegalStateException{}}}.
>  * 
> [NullPointerException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/npe.png]
>  happens at 
> [IsmReaderImpl.overKeyComponents|https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L217]:
>  * 
> [IllegalStateException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/IllegalStateException.png]
>  happens at [IsmReaderImpl. initializeForKeyedRead 
> |https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L500].
> (all error logs in the Dataflow job is 
> [here|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/downloaded-logs-20220327-171955.json].)
> h6. Hypothesis
> The {{initializeForKeyedRead}} is not synchronized. Multiple threads can 
> enter the method so that initialize the index for the same shard and update 
> {{indexPerShard}} without synchronization. And, the {{overKeyComponents}} 
> also accesses {{indexPerShard}} without synchronization. As {{indexPerShard}} 
> is just a {{HashMap}} which is not thread-safe, it can cause 
> {{NullPointerException}} and {{IllegalStateException}} above.
> h6. Suggestion
> I think it can fix this issue if we change the type of {{indexPerShard}} to a 
> thread-safe map (e.g. {{ConcurrentHashMap}}).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14187) NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow

2022-04-26 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528415#comment-17528415
 ] 

Luke Cwik commented on BEAM-14187:
--

This needs to be fixed for 2.39 release otherwise Dataflow bounded pipelines 
with side inputs will likely fail.

> NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow
> --
>
> Key: BEAM-14187
> URL: https://issues.apache.org/jira/browse/BEAM-14187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Minbo Bae
>Assignee: Minbo Bae
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> h6. Problem
> Dataflow Java batch jobs with large side input intermittently throws 
> {{NullPointerException}} or {{{}IllegalStateException{}}}.
>  * 
> [NullPointerException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/npe.png]
>  happens at 
> [IsmReaderImpl.overKeyComponents|https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L217]:
>  * 
> [IllegalStateException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/IllegalStateException.png]
>  happens at [IsmReaderImpl. initializeForKeyedRead 
> |https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L500].
> (all error logs in the Dataflow job is 
> [here|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/downloaded-logs-20220327-171955.json].)
> h6. Hypothesis
> The {{initializeForKeyedRead}} is not synchronized. Multiple threads can 
> enter the method so that initialize the index for the same shard and update 
> {{indexPerShard}} without synchronization. And, the {{overKeyComponents}} 
> also accesses {{indexPerShard}} without synchronization. As {{indexPerShard}} 
> is just a {{HashMap}} which is not thread-safe, it can cause 
> {{NullPointerException}} and {{IllegalStateException}} above.
> h6. Suggestion
> I think it can fix this issue if we change the type of {{indexPerShard}} to a 
> thread-safe map (e.g. {{ConcurrentHashMap}}).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (BEAM-14187) NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow

2022-04-26 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reopened BEAM-14187:
--

> NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow
> --
>
> Key: BEAM-14187
> URL: https://issues.apache.org/jira/browse/BEAM-14187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Minbo Bae
>Assignee: Minbo Bae
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> h6. Problem
> Dataflow Java batch jobs with large side input intermittently throws 
> {{NullPointerException}} or {{{}IllegalStateException{}}}.
>  * 
> [NullPointerException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/npe.png]
>  happens at 
> [IsmReaderImpl.overKeyComponents|https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L217]:
>  * 
> [IllegalStateException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/IllegalStateException.png]
>  happens at [IsmReaderImpl. initializeForKeyedRead 
> |https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L500].
> (all error logs in the Dataflow job is 
> [here|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/downloaded-logs-20220327-171955.json].)
> h6. Hypothesis
> The {{initializeForKeyedRead}} is not synchronized. Multiple threads can 
> enter the method so that initialize the index for the same shard and update 
> {{indexPerShard}} without synchronization. And, the {{overKeyComponents}} 
> also accesses {{indexPerShard}} without synchronization. As {{indexPerShard}} 
> is just a {{HashMap}} which is not thread-safe, it can cause 
> {{NullPointerException}} and {{IllegalStateException}} above.
> h6. Suggestion
> I think it can fix this issue if we change the type of {{indexPerShard}} to a 
> thread-safe map (e.g. {{ConcurrentHashMap}}).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-13608) Dynamic Topics management

2022-04-25 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-13608:
-
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Dynamic Topics management
> -
>
> Key: BEAM-13608
> URL: https://issues.apache.org/jira/browse/BEAM-13608
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jms
>Reporter: Vincent BALLADA
>Assignee: Vincent BALLADA
>Priority: P2
>  Labels: assigned:
> Fix For: 2.39.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> JmsIO write function is able to publish messages to topics with static names:
> company/employee/id/1234567.
> Some AMQP/JMS broker provides the ability to publish to dynamic topics like:
> company/employee/id/\{employeeId}
> If we want to handle that with Apache Beam JmsIO, we must create a branch per 
> employeeId, which is not suitable for a company with thousand of employee, or 
> other similat use cases.
> The JmsIO write function should provide the ability to handle dynamic topics.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14184) DirectStreamObserver does not respect channel isReady

2022-04-13 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521697#comment-17521697
 ] 

Luke Cwik commented on BEAM-14184:
--

Also, if your willing to review the fix, that would help get it submitted 
sooner.

> DirectStreamObserver does not respect channel isReady
> -
>
> Key: BEAM-14184
> URL: https://issues.apache.org/jira/browse/BEAM-14184
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>
> Leads to OOMs like:
> {noformat}
> Output channel stalled for 1023s, outbound thread CHAIN MapPartition 
> (MapPartition at [1]PerformInference) -> FlatMap (FlatMap at 
> ExtractOutput[0]) -> Map (Key Extractor) -> GroupCombine (GroupCombine at 
> GroupCombine: 
> PerformInferenceAndCombineResults_dep_049/GroupPredictionsByImage) -> Map 
> (Key Extractor) (1/1). See: https://issues.apache.org/jira/browse/BEAM-4280 
> for the history for this issue.
> Feb 18, 2022 11:51:05 AM 
> org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyServerTransport 
> notifyTerminated
> INFO: Transport failed
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.OutOfDirectMemoryError:
>  failed to allocate 2097152 byte(s) of direct memory (used: 1205862679, max: 
> 1207959552)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188)
> {noformat}
> See more context in 
> https://lists.apache.org/thread/llmxodbmczhn10c98prs8wmd5hy4nvff



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14184) DirectStreamObserver does not respect channel isReady

2022-04-13 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521695#comment-17521695
 ] 

Luke Cwik commented on BEAM-14184:
--

https://github.com/apache/beam/pull/17358 is out for review with a fix that 
should make it into the next release.

You need to set the experiments pipeline option. You can do this during 
pipeline construction or on the command line when launching your pipeline via 
*--experiments=...*. See section 2.1 of the [programming 
guide|https://beam.apache.org/documentation/programming-guide/] for more 
details on setting pipeline options.

> DirectStreamObserver does not respect channel isReady
> -
>
> Key: BEAM-14184
> URL: https://issues.apache.org/jira/browse/BEAM-14184
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>
> Leads to OOMs like:
> {noformat}
> Output channel stalled for 1023s, outbound thread CHAIN MapPartition 
> (MapPartition at [1]PerformInference) -> FlatMap (FlatMap at 
> ExtractOutput[0]) -> Map (Key Extractor) -> GroupCombine (GroupCombine at 
> GroupCombine: 
> PerformInferenceAndCombineResults_dep_049/GroupPredictionsByImage) -> Map 
> (Key Extractor) (1/1). See: https://issues.apache.org/jira/browse/BEAM-4280 
> for the history for this issue.
> Feb 18, 2022 11:51:05 AM 
> org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyServerTransport 
> notifyTerminated
> INFO: Transport failed
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.OutOfDirectMemoryError:
>  failed to allocate 2097152 byte(s) of direct memory (used: 1205862679, max: 
> 1207959552)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204)
> at 
> org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188)
> {noformat}
> See more context in 
> https://lists.apache.org/thread/llmxodbmczhn10c98prs8wmd5hy4nvff



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14187) NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow

2022-04-11 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14187:
-
Fix Version/s: 2.39.0
 Assignee: Minbo Bae
   Resolution: Fixed
   Status: Resolved  (was: Triage Needed)

> NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow
> --
>
> Key: BEAM-14187
> URL: https://issues.apache.org/jira/browse/BEAM-14187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Minbo Bae
>Assignee: Minbo Bae
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> h6. Problem
> Dataflow Java batch jobs with large side input intermittently throws 
> {{NullPointerException}} or {{{}IllegalStateException{}}}.
>  * 
> [NullPointerException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/npe.png]
>  happens at 
> [IsmReaderImpl.overKeyComponents|https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L217]:
>  * 
> [IllegalStateException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/IllegalStateException.png]
>  happens at [IsmReaderImpl. initializeForKeyedRead 
> |https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L500].
> (all error logs in the Dataflow job is 
> [here|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/downloaded-logs-20220327-171955.json].)
> h6. Hypothesis
> The {{initializeForKeyedRead}} is not synchronized. Multiple threads can 
> enter the method so that initialize the index for the same shard and update 
> {{indexPerShard}} without synchronization. And, the {{overKeyComponents}} 
> also accesses {{indexPerShard}} without synchronization. As {{indexPerShard}} 
> is just a {{HashMap}} which is not thread-safe, it can cause 
> {{NullPointerException}} and {{IllegalStateException}} above.
> h6. Suggestion
> I think it can fix this issue if we change the type of {{indexPerShard}} to a 
> thread-safe map (e.g. {{ConcurrentHashMap}}).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14192) ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime

2022-04-07 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519138#comment-17519138
 ] 

Luke Cwik commented on BEAM-14192:
--

Did a dataflow service release pick up your updated containers?




> ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has 
> been compiled by a more recent version of the Java Runtime
> --
>
> Key: BEAM-14192
> URL: https://issues.apache.org/jira/browse/BEAM-14192
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, test-failures
>Reporter: Luke Cwik
>Assignee: Kiley Sok
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The Dataflow ITs fails with a class version mismatch. I believe the Dataflow 
> v1 container that is being tested was built with the wrong JDK version.
> Jenkins: 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink
> Example Failure:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
> has been compiled by a more recent version of the Java Runtime (class file 
> version 55.0), this version of the Java Runtime only recognizes class file 
> versions up to 52.0
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13939) Go SDK: Protobuf namespace conflict

2022-04-07 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-13939:
-
Status: Resolved  (was: Resolved)

> Go SDK: Protobuf namespace conflict
> ---
>
> Key: BEAM-13939
> URL: https://issues.apache.org/jira/browse/BEAM-13939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go
>Affects Versions: 2.36.0
>Reporter: Milan Patel
>Assignee: Milan Patel
>Priority: P2
> Fix For: 2.39.0
>
> Attachments: demobug.zip
>
>  Time Spent: 28h 40m
>  Remaining Estimate: 0h
>
> The Go SDK generated grpc protobufs are not namespaced with enough 
> granularity. If a user has another external dependency with the same protobuf 
> file registered with the proto runtime, their compiled binary will panic at 
> runtime pointing the user to this [doc 
> page|https://developers.google.com/protocol-buffers/docs/reference/go/faq#fix-namespace-conflict].
>  
> In the interim, following the instructions to add either ldflags to the 
> compiler or an environment var to the binary works, but this is an unideal 
> solution since only one of the duplicate proto specifications will be 
> accessible from a [global 
> registry|https://pkg.go.dev/google.golang.org/protobuf@v1.27.1/reflect/protoregistry].
>  
> Ask: Regenerate the go protos such that descriptors like 
> [these|https://github.com/apache/beam/blob/84353a7c973d3acaaa56d81c265dce7193a56be5/sdks/go/pkg/beam/model/pipeline_v1/metrics.pb.go#L797-L811]
>  are outputted with filenames that are more granular, such as a filename that 
> includes the directory structure of the repository.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13939) Go SDK: Protobuf namespace conflict

2022-04-07 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-13939:


Assignee: Milan Patel

> Go SDK: Protobuf namespace conflict
> ---
>
> Key: BEAM-13939
> URL: https://issues.apache.org/jira/browse/BEAM-13939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go
>Affects Versions: 2.36.0
>Reporter: Milan Patel
>Assignee: Milan Patel
>Priority: P2
> Attachments: demobug.zip
>
>  Time Spent: 28h 40m
>  Remaining Estimate: 0h
>
> The Go SDK generated grpc protobufs are not namespaced with enough 
> granularity. If a user has another external dependency with the same protobuf 
> file registered with the proto runtime, their compiled binary will panic at 
> runtime pointing the user to this [doc 
> page|https://developers.google.com/protocol-buffers/docs/reference/go/faq#fix-namespace-conflict].
>  
> In the interim, following the instructions to add either ldflags to the 
> compiler or an environment var to the binary works, but this is an unideal 
> solution since only one of the duplicate proto specifications will be 
> accessible from a [global 
> registry|https://pkg.go.dev/google.golang.org/protobuf@v1.27.1/reflect/protoregistry].
>  
> Ask: Regenerate the go protos such that descriptors like 
> [these|https://github.com/apache/beam/blob/84353a7c973d3acaaa56d81c265dce7193a56be5/sdks/go/pkg/beam/model/pipeline_v1/metrics.pb.go#L797-L811]
>  are outputted with filenames that are more granular, such as a filename that 
> includes the directory structure of the repository.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14187) NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow

2022-04-07 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519078#comment-17519078
 ] 

Luke Cwik commented on BEAM-14187:
--

Note that initializeBloomFilterAndIndexPerShard is synchronized which is the 
only method that mutates the indexPerShard variable. Since 
initializeBloomFilterAndIndexPerShard is invoked within overKeyComponents that 
means there is a memory barrier between the thread that created the 
indexPerShard and other threads ensuring that the object is safely published 
across threads.

Also, the [Javadoc for 
HashMap|https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html] says 
that concurrent reads are ok, its concurrent modifications that require 
synchronization:
"If multiple threads access a hash map concurrently, and at least one of the 
threads modifies the map structurally, it must be synchronized externally."

> NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow
> --
>
> Key: BEAM-14187
> URL: https://issues.apache.org/jira/browse/BEAM-14187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Minbo Bae
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h6. Problem
> Dataflow Java batch jobs with large side input intermittently throws 
> {{NullPointerException}} or {{{}IllegalStateException{}}}.
>  * 
> [NullPointerException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/npe.png]
>  happens at 
> [IsmReaderImpl.overKeyComponents|https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L217]:
>  * 
> [IllegalStateException|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/IllegalStateException.png]
>  happens at [IsmReaderImpl. initializeForKeyedRead 
> |https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/IsmReaderImpl.java#L500].
> (all error logs in the Dataflow job is 
> [here|https://gist.githubusercontent.com/baeminbo/459e283eadbc7752c9f23616b52d958a/raw/f0480b8eaff590fb3f3ae2ab98ddce7dd3b4a237/downloaded-logs-20220327-171955.json].)
> h6. Hypothesis
> The {{initializeForKeyedRead}} is not synchronized. Multiple threads can 
> enter the method so that initialize the index for the same shard and update 
> {{indexPerShard}} without synchronization. And, the {{overKeyComponents}} 
> also accesses {{indexPerShard}} without synchronization. As {{indexPerShard}} 
> is just a {{HashMap}} which is not thread-safe, it can cause 
> {{NullPointerException}} and {{IllegalStateException}} above.
> h6. Suggestion
> I think it can fix this issue if we change the type of {{indexPerShard}} to a 
> thread-safe map (e.g. {{ConcurrentHashMap}}).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14144) Record JFR profiles when GC thrashing is detected

2022-04-07 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519001#comment-17519001
 ] 

Luke Cwik commented on BEAM-14144:
--

Of interest in that article was also the jdk-jfr-compat library that allowed us 
to compile against the API but I couldn't find a released version of it but 
maybe someone else has done the same thing already or we could also implement 
stub APIs to compile against since the JDK will provide it's implementation on 
the classpath earlier.

> Record JFR profiles when GC thrashing is detected
> -
>
> Key: BEAM-14144
> URL: https://issues.apache.org/jira/browse/BEAM-14144
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> It'd be useful for debugging GC issues in jobs to have allocation profiles 
> when it was occurring.   In java 9+, JFR is included in the JDK, we could use 
> that to record profiles when GC thrashing is detected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (BEAM-14144) Record JFR profiles when GC thrashing is detected

2022-04-07 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518987#comment-17518987
 ] 

Luke Cwik edited comment on BEAM-14144 at 4/7/22 4:07 PM:
--

Note that OpenJDK 8u supports JFR. We already use the OpenJDK 8u as the base 
docker container for Apache Beam.

https://developers.redhat.com/blog/2020/08/25/get-started-with-jdk-flight-recorder-in-openjdk-8u#


was (Author: lcwik):
Note that OpenJDK 8u supports JFR. We already use the OpenJDK 8u as the base 
docker container for Apache Beam.

> Record JFR profiles when GC thrashing is detected
> -
>
> Key: BEAM-14144
> URL: https://issues.apache.org/jira/browse/BEAM-14144
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> It'd be useful for debugging GC issues in jobs to have allocation profiles 
> when it was occurring.   In java 9+, JFR is included in the JDK, we could use 
> that to record profiles when GC thrashing is detected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14144) Record JFR profiles when GC thrashing is detected

2022-04-07 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518987#comment-17518987
 ] 

Luke Cwik commented on BEAM-14144:
--

Note that OpenJDK 8u supports JFR. We already use the OpenJDK 8u as the base 
docker container for Apache Beam.

> Record JFR profiles when GC thrashing is detected
> -
>
> Key: BEAM-14144
> URL: https://issues.apache.org/jira/browse/BEAM-14144
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> It'd be useful for debugging GC issues in jobs to have allocation profiles 
> when it was occurring.   In java 9+, JFR is included in the JDK, we could use 
> that to record profiles when GC thrashing is detected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14175) "Read loop was aborted" errors are very noisy when large jobs fail

2022-04-07 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14175:
-
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> "Read loop was aborted" errors are very noisy when large jobs fail
> --
>
> Key: BEAM-14175
> URL: https://issues.apache.org/jira/browse/BEAM-14175
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a large dataflow job fails or is cancelled, "Read loop was aborted." is 
> logged potentially 1000s of times as read operations are interrupted.  These 
> could be instead logged at debug level rather than error to reduce noise in 
> the logs, they often hide actual useful logs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14253) pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2

2022-04-06 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518156#comment-17518156
 ] 

Luke Cwik commented on BEAM-14253:
--

There is an internal bug with dataflow service where it's 0 (aka unix epoch). 
It is about to be fixed and rolled out.

> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
> 
>
> Key: BEAM-14253
> URL: https://issues.apache.org/jira/browse/BEAM-14253
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp, test-failures
>Reporter: Daniel Oliveira
>Assignee: Daniel Collins
>Priority: P1
>
> Example: 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1455/testReport/junit/org.apache.beam.sdk.io.gcp.pubsublite/ReadWriteIT/testReadWrite/
> {noformat}
> java.lang.AssertionError: Did not receive signal on 
> projects/apache-beam-testing/subscriptions/result-subscription--586739339276181574
>  in 300s
> {noformat}
> Dataflow logs show this, might be related:
> {noformat}
> Error message from worker: java.lang.IllegalArgumentException
> 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:127)
> 
> org.apache.beam.sdk.io.gcp.pubsublite.internal.SubscriptionPartitionLoader$GeneratorFn.getInitialWatermarkEstimatorState(SubscriptionPartitionLoader.java:76)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13519) Java precommit flaky (timing out)

2022-03-31 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515652#comment-17515652
 ] 

Luke Cwik commented on BEAM-13519:
--

testCachingOfClient definitely was getting stuck, fix in 
https://github.com/apache/beam/pull/17240 with a few more fixes related to 
threading as well.

> Java precommit flaky (timing out)
> -
>
> Key: BEAM-13519
> URL: https://issues.apache.org/jira/browse/BEAM-13519
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kiley Sok
>Priority: P1
>  Labels: flake
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Java precommits are sometimes timing out with no clear cause. Gradle will log 
> a bunch of routine build tasks, and then Jenkins will abort the job much 
> later. There are no logs to indicate what happened. It is not even clear 
> which task or tasks, if any, was the culprit, since many tasks are run in 
> parallel.
> 01:53:28 > Task :sdks:java:testing:nexmark:build
> 01:53:28 > Task :sdks:java:testing:nexmark:buildDependents
> 01:53:28 > Task :sdks:java:extensions:sql:zetasql:buildDependents
> 01:53:28 > Task :sdks:java:io:google-cloud-platform:buildDependents
> 01:53:28 > Task :sdks:java:extensions:sql:buildDependents
> 01:53:28 > Task :sdks:java:io:kafka:buildDependents
> 01:53:28 > Task :sdks:java:extensions:join-library:buildDependents
> 01:53:28 > Task :sdks:java:io:synthetic:buildDependents
> 01:53:28 > Task :sdks:java:io:mongodb:buildDependents
> 01:53:28 > Task :sdks:java:io:thrift:buildDependents
> 01:53:28 > Task :sdks:java:testing:test-utils:buildDependents
> 01:53:28 > Task :sdks:java:expansion-service:buildDependents
> 01:53:28 > Task :sdks:java:extensions:arrow:buildDependents
> 01:53:28 > Task :sdks:java:extensions:protobuf:buildDependents
> 01:53:28 > Task :sdks:java:io:common:buildDependents
> 01:53:28 > Task :runners:direct-java:buildDependents
> 01:53:28 > Task :runners:local-java:buildDependents
> 01:53:28 Build timed out (after 120 minutes). Marking the build as aborted.
> https://ci-beam.apache.org/job/beam_PreCommit_Java_cron/4874/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14177) GroupByKey iteration caching broken for portable runners like Dataflow runner v2

2022-03-29 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14177:
-
Resolution: Fixed
Status: Resolved  (was: Open)

Yes, can close now that it is cherry picked.

> GroupByKey iteration caching broken for portable runners like Dataflow runner 
> v2
> 
>
> Key: BEAM-14177
> URL: https://issues.apache.org/jira/browse/BEAM-14177
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The wrong cache key is being used as it has not been namespaced to the state 
> key.
> This was previously being done within StateFetchingIterators but 
> https://github.com/apache/beam/pull/17121 changed that to use a single shared 
> key.
> The fix is to subcache the cache before passing it into 
> StateFetchingIterators restoring the prior behavior.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14192) ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime

2022-03-28 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513661#comment-17513661
 ] 

Luke Cwik commented on BEAM-14192:
--

b/224581228 was fixed internally, does the Dataflow container need to be 
rebuilt?

> ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has 
> been compiled by a more recent version of the Java Runtime
> --
>
> Key: BEAM-14192
> URL: https://issues.apache.org/jira/browse/BEAM-14192
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, test-failures
>Reporter: Luke Cwik
>Assignee: Kiley Sok
>Priority: P2
>
> The Dataflow ITs fails with a class version mismatch. I believe the Dataflow 
> v1 container that is being tested was built with the wrong JDK version.
> Jenkins: 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink
> Example Failure:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
> has been compiled by a more recent version of the Java Runtime (class file 
> version 55.0), this version of the Java Runtime only recognizes class file 
> versions up to 52.0
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (BEAM-12815) Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network buffers

2022-03-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reopened BEAM-12815:
--

It looks like it is still failing on TextXLang_Multi.

Example failure: https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/5207/


> Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network 
> buffers
> --
>
> Key: BEAM-12815
> URL: https://issues.apache.org/jira/browse/BEAM-12815
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language, sdk-go
>Reporter: Daniel Oliveira
>Assignee: Danny McCormick
>Priority: P3
> Fix For: Not applicable
>
>
> When running the cross-language test suites () Flink fails on TestXLang_Multi 
> with the following error:
> {noformat}
> 19:29:14 2021/08/27 02:29:14  (): java.io.IOException: Insufficient number of 
> network buffers: required 17, but only 16 available. The total number of 
> network buffers is currently set to 2048 of 32768 bytes each. You can 
> increase this number by setting the configuration keys 
> 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 
> 'taskmanager.memory.network.max'.
> 19:29:14 2021/08/27 02:29:14 Job state: FAILED
> 19:29:14 --- FAIL: TestXLang_Multi (6.26s){noformat}
> This doesn't seem to be a parallelism problem (go test is run with "-p 1" as 
> expected) and is only happening on this specific test.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14192) ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime

2022-03-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14192:
-
Summary: ITs run on Dataflow v1 fails with 
org/apache/commons/logging/LogFactory has been compiled by a more recent 
version of the Java Runtime  (was: IT run on Dataflow v1 fails with 
org/apache/commons/logging/LogFactory has been compiled by a more recent 
version of the Java Runtime)

> ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has 
> been compiled by a more recent version of the Java Runtime
> --
>
> Key: BEAM-14192
> URL: https://issues.apache.org/jira/browse/BEAM-14192
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Luke Cwik
>Assignee: Kiley Sok
>Priority: P2
>
> The Dataflow ITs fails with a class version mismatch. I believe the Dataflow 
> v1 container that is being tested was built with the wrong JDK version.
> Jenkins: 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink
> Example Failure:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
> has been compiled by a more recent version of the Java Runtime (class file 
> version 55.0), this version of the Java Runtime only recognizes class file 
> versions up to 52.0
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14192) ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime

2022-03-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14192:
-
Component/s: runner-dataflow

> ITs run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has 
> been compiled by a more recent version of the Java Runtime
> --
>
> Key: BEAM-14192
> URL: https://issues.apache.org/jira/browse/BEAM-14192
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, test-failures
>Reporter: Luke Cwik
>Assignee: Kiley Sok
>Priority: P2
>
> The Dataflow ITs fails with a class version mismatch. I believe the Dataflow 
> v1 container that is being tested was built with the wrong JDK version.
> Jenkins: 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink
> Example Failure:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
> has been compiled by a more recent version of the Java Runtime (class file 
> version 55.0), this version of the Java Runtime only recognizes class file 
> versions up to 52.0
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14192) IT run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime

2022-03-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14192:
-
Description: 
The Dataflow ITs fails with a class version mismatch. I believe the Dataflow v1 
container that is being tested was built with the wrong JDK version.

Jenkins: 
https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink

Example Failure:
{noformat}
java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.beam.sdk.util.UserCodeException: 
java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
has been compiled by a more recent version of the Java Runtime (class file 
version 55.0), this version of the Java Runtime only recognizes class file 
versions up to 52.0
at 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
at 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
at 
org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
at 
org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
{noformat}

  was:
The IT fails with a class version mismatch. This is odd since this implies that 
we are somehow getting a Apache Commons Logging jar that was compiled with Java 
11.

Note that I have only seen this in this IT and not others which is pointing at 
zetasketch classpath being broken in some way.

My first thought was that we were pulling in a dependency with an incorrectly 
shaded apache-commons logging class but when I dumped the classpath and all the 
jar contents on the commons-logging-1.2.jar contained this class.

First found failure: https://ci-beam.apache.org/job/beam_PostCommit_Java/8786/

Example Failure:
{noformat}
java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.beam.sdk.util.UserCodeException: 
java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
has been compiled by a more recent version of the Java Runtime (class file 
version 55.0), this version of the Java Runtime only recognizes class file 
versions up to 52.0
at 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
at 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
at 
org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
at 
org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
{noformat}


> IT run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has 
> been compiled by a more recent version of the Java Runtime
> -
>
> Key: BEAM-14192
> URL: https://issues.apache.org/jira/browse/BEAM-14192
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Luke Cwik
>Assignee: Kiley Sok
>Priority: P2
>
> The Dataflow ITs fails with a class version mismatch. I believe the Dataflow 
> v1 container that is being tested was built with the wrong JDK version.
> Jenkins: 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV1/1427/#showFailuresLink
> Example Failure:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
> has been compiled by a more recent version of the Java Runtime (class file 
> version 55.0), this version of the Java Runtime only recognizes class file 
> versions up to 52.0
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-14192) BigQueryHllSketchCompatibilityIT fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime

2022-03-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-14192:


Assignee: Kiley Sok  (was: Andrew Pilloud)

> BigQueryHllSketchCompatibilityIT fails with 
> org/apache/commons/logging/LogFactory has been compiled by a more recent 
> version of the Java Runtime
> 
>
> Key: BEAM-14192
> URL: https://issues.apache.org/jira/browse/BEAM-14192
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Luke Cwik
>Assignee: Kiley Sok
>Priority: P2
>
> The IT fails with a class version mismatch. This is odd since this implies 
> that we are somehow getting a Apache Commons Logging jar that was compiled 
> with Java 11.
> Note that I have only seen this in this IT and not others which is pointing 
> at zetasketch classpath being broken in some way.
> My first thought was that we were pulling in a dependency with an incorrectly 
> shaded apache-commons logging class but when I dumped the classpath and all 
> the jar contents on the commons-logging-1.2.jar contained this class.
> First found failure: https://ci-beam.apache.org/job/beam_PostCommit_Java/8786/
> Example Failure:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
> has been compiled by a more recent version of the Java Runtime (class file 
> version 55.0), this version of the Java Runtime only recognizes class file 
> versions up to 52.0
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14192) IT run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime

2022-03-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14192:
-
Summary: IT run on Dataflow v1 fails with 
org/apache/commons/logging/LogFactory has been compiled by a more recent 
version of the Java Runtime  (was: BigQueryHllSketchCompatibilityIT fails with 
org/apache/commons/logging/LogFactory has been compiled by a more recent 
version of the Java Runtime)

> IT run on Dataflow v1 fails with org/apache/commons/logging/LogFactory has 
> been compiled by a more recent version of the Java Runtime
> -
>
> Key: BEAM-14192
> URL: https://issues.apache.org/jira/browse/BEAM-14192
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Luke Cwik
>Assignee: Kiley Sok
>Priority: P2
>
> The IT fails with a class version mismatch. This is odd since this implies 
> that we are somehow getting a Apache Commons Logging jar that was compiled 
> with Java 11.
> Note that I have only seen this in this IT and not others which is pointing 
> at zetasketch classpath being broken in some way.
> My first thought was that we were pulling in a dependency with an incorrectly 
> shaded apache-commons logging class but when I dumped the classpath and all 
> the jar contents on the commons-logging-1.2.jar contained this class.
> First found failure: https://ci-beam.apache.org/job/beam_PostCommit_Java/8786/
> Example Failure:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
> has been compiled by a more recent version of the Java Runtime (class file 
> version 55.0), this version of the Java Runtime only recognizes class file 
> versions up to 52.0
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
>   at 
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
>   at 
> org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14193) SpannerChangeStreamOrderedWithinKeyGloballyIT and SpannerChangeStreamOrderedWithinKeyIT fail on Direct Runner

2022-03-28 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14193:


 Summary: SpannerChangeStreamOrderedWithinKeyGloballyIT and 
SpannerChangeStreamOrderedWithinKeyIT fail on Direct Runner
 Key: BEAM-14193
 URL: https://issues.apache.org/jira/browse/BEAM-14193
 Project: Beam
  Issue Type: Bug
  Components: runner-direct, test-failures
Reporter: Luke Cwik
Assignee: Pablo Estrada


Like BEAM-14151, it seems as though DirectRunner fails on these tests pretty 
consistently:
{noformat}
org.apache.beam.sdk.io.gcp.spanner.changestreams.it.SpannerChangeStreamOrderedWithinKeyGloballyIT.testOrderedWithinKey
org.apache.beam.sdk.io.gcp.spanner.changestreams.it.SpannerChangeStreamOrderedWithinKeyIT.testOrderedWithinKey
{noformat}

This has been failing for a few days, occassionally it passes but that could be 
a fluke that the direct runner executed it in the order that was expected.

Example failure: https://ci-beam.apache.org/job/beam_PostCommit_Java/8776/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14192) BigQueryHllSketchCompatibilityIT fails with org/apache/commons/logging/LogFactory has been compiled by a more recent version of the Java Runtime

2022-03-28 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14192:


 Summary: BigQueryHllSketchCompatibilityIT fails with 
org/apache/commons/logging/LogFactory has been compiled by a more recent 
version of the Java Runtime
 Key: BEAM-14192
 URL: https://issues.apache.org/jira/browse/BEAM-14192
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Luke Cwik
Assignee: Andrew Pilloud


The IT fails with a class version mismatch. This is odd since this implies that 
we are somehow getting a Apache Commons Logging jar that was compiled with Java 
11.

Note that I have only seen this in this IT and not others which is pointing at 
zetasketch classpath being broken in some way.

My first thought was that we were pulling in a dependency with an incorrectly 
shaded apache-commons logging class but when I dumped the classpath and all the 
jar contents on the commons-logging-1.2.jar contained this class.

First found failure: https://ci-beam.apache.org/job/beam_PostCommit_Java/8786/

Example Failure:
{noformat}
java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.beam.sdk.util.UserCodeException: 
java.lang.UnsupportedClassVersionError: org/apache/commons/logging/LogFactory 
has been compiled by a more recent version of the Java Runtime (class file 
version 55.0), this version of the Java Runtime only recognizes class file 
versions up to 52.0
at 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:187)
at 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
at 
org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:56)
at 
org.apache.beam.runners.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:39)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14191) CrossLanguageJdbcIOTest broken with "Cannot load JDBC driver class 'com.mysql.cj.jdbc.Driver'"

2022-03-28 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14191:


 Summary: CrossLanguageJdbcIOTest broken with "Cannot load JDBC 
driver class 'com.mysql.cj.jdbc.Driver'"
 Key: BEAM-14191
 URL: https://issues.apache.org/jira/browse/BEAM-14191
 Project: Beam
  Issue Type: Bug
  Components: cross-language, test-failures
Reporter: Luke Cwik
Assignee: Chamikara Madhusanka Jayalath
 Fix For: 2.38.0


Example failure: 
https://ci-beam.apache.org/job/beam_PostCommit_Python36/5113/testReport/junit/apache_beam.io.external.xlang_jdbcio_it_test/CrossLanguageJdbcIOTest/test_xlang_jdbc_read_1_mysql/

Culprit: https://github.com/apache/beam/pull/17167

{noformat}
RuntimeError: Pipeline 
BeamApp-jenkins-0325181709-516ba346_d7972792-2c60-4332-8107-ac73c585491a failed 
in state FAILED: java.lang.RuntimeException: Error received from SDK harness 
for instruction 19: org.apache.beam.sdk.util.UserCodeException: 
java.sql.SQLException: Cannot load JDBC driver class 'com.mysql.cj.jdbc.Driver'
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14184) DirectStreamObserver does not respect channel isReady

2022-03-25 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14184:


 Summary: DirectStreamObserver does not respect channel isReady
 Key: BEAM-14184
 URL: https://issues.apache.org/jira/browse/BEAM-14184
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Luke Cwik
Assignee: Luke Cwik


Leads to OOMs like:
{noformat}
Output channel stalled for 1023s, outbound thread CHAIN MapPartition 
(MapPartition at [1]PerformInference) -> FlatMap (FlatMap at ExtractOutput[0]) 
-> Map (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
PerformInferenceAndCombineResults_dep_049/GroupPredictionsByImage) -> Map (Key 
Extractor) (1/1). See: https://issues.apache.org/jira/browse/BEAM-4280 for the 
history for this issue.
Feb 18, 2022 11:51:05 AM 
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyServerTransport 
notifyTerminated
INFO: Transport failed
org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.OutOfDirectMemoryError:
 failed to allocate 2097152 byte(s) of direct memory (used: 1205862679, max: 
1207959552)
at 
org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
at 
org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
at 
org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645)
at 
org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621)
at 
org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204)
at 
org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188)
{noformat}

See more context in 
https://lists.apache.org/thread/llmxodbmczhn10c98prs8wmd5hy4nvff



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14179) MonitoringInfoMetricName null value guard uncovering additional issues

2022-03-25 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512570#comment-17512570
 ] 

Luke Cwik commented on BEAM-14179:
--

The required_labels field tells us which 
https://github.com/apache/beam/blob/d62b5d6c43847a3b5ecb1321144cf10c6931bd98/model/pipeline/src/main/proto/metrics.proto#L311
 labels are necessary.

This reduces the problem now down to what should be done when the value is 
unknown.

> MonitoringInfoMetricName null value guard uncovering additional issues
> --
>
> Key: BEAM-14179
> URL: https://issues.apache.org/jira/browse/BEAM-14179
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: P2
> Fix For: 2.38.0
>
>
> Additional integration testing 
> (//cloud/dataflow/testing/integration/sdk:V1ReadIT_testE2EV1Read) caught that 
> https://github.com/apache/beam/pull/17094 causes a regression:
> The test failed with:
> {noformat}
> Caused by: java.lang.NullPointerException: null value in entry: 
> DATASTORE_NAMESPACE=null
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:100)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:74)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:464)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437)
>   at 
> org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.(MonitoringInfoMetricName.java:46)
>   at 
> org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.named(MonitoringInfoMetricName.java:93)
>   at 
> org.apache.beam.runners.core.metrics.ServiceCallMetric.call(ServiceCallMetric.java:82)
>   at 
> org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.runQueryWithRetries(DatastoreV1.java:927)
>   at 
> org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.processElement(DatastoreV1.java:965)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14179) MonitoringInfoMetricName null value guard uncovering additional issues

2022-03-25 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512561#comment-17512561
 ] 

Luke Cwik commented on BEAM-14179:
--

The question is what is the intended outcome in this case for these service 
call metrics:
1) It could be that they are required, then we should fail rightly in the code
2) They are optional and we should pass in some value like "null", "", or "N/A"
3) The key can be filtered from the map if the value is null.

> MonitoringInfoMetricName null value guard uncovering additional issues
> --
>
> Key: BEAM-14179
> URL: https://issues.apache.org/jira/browse/BEAM-14179
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: P2
> Fix For: 2.38.0
>
>
> Additional integration testing 
> (//cloud/dataflow/testing/integration/sdk:V1ReadIT_testE2EV1Read) caught that 
> https://github.com/apache/beam/pull/17094 causes a regression:
> The test failed with:
> {noformat}
> Caused by: java.lang.NullPointerException: null value in entry: 
> DATASTORE_NAMESPACE=null
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:100)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:74)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:464)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437)
>   at 
> org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.(MonitoringInfoMetricName.java:46)
>   at 
> org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.named(MonitoringInfoMetricName.java:93)
>   at 
> org.apache.beam.runners.core.metrics.ServiceCallMetric.call(ServiceCallMetric.java:82)
>   at 
> org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.runQueryWithRetries(DatastoreV1.java:927)
>   at 
> org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.processElement(DatastoreV1.java:965)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14179) MonitoringInfoMetricName null value guard uncovering additional issues

2022-03-25 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14179:
-
Status: Open  (was: Triage Needed)

> MonitoringInfoMetricName null value guard uncovering additional issues
> --
>
> Key: BEAM-14179
> URL: https://issues.apache.org/jira/browse/BEAM-14179
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: P2
> Fix For: 2.38.0
>
>
> Additional integration testing 
> (//cloud/dataflow/testing/integration/sdk:V1ReadIT_testE2EV1Read) caught that 
> https://github.com/apache/beam/pull/17094 causes a regression:
> The test failed with:
> {noformat}
> Caused by: java.lang.NullPointerException: null value in entry: 
> DATASTORE_NAMESPACE=null
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:100)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:74)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:464)
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437)
>   at 
> org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.(MonitoringInfoMetricName.java:46)
>   at 
> org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.named(MonitoringInfoMetricName.java:93)
>   at 
> org.apache.beam.runners.core.metrics.ServiceCallMetric.call(ServiceCallMetric.java:82)
>   at 
> org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.runQueryWithRetries(DatastoreV1.java:927)
>   at 
> org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.processElement(DatastoreV1.java:965)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14179) MonitoringInfoMetricName null value guard uncovering additional issues

2022-03-25 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14179:


 Summary: MonitoringInfoMetricName null value guard uncovering 
additional issues
 Key: BEAM-14179
 URL: https://issues.apache.org/jira/browse/BEAM-14179
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp, sdk-java-harness
Reporter: Luke Cwik
Assignee: Daniel Oliveira
 Fix For: 2.38.0


Additional integration testing 
(//cloud/dataflow/testing/integration/sdk:V1ReadIT_testE2EV1Read) caught that 
https://github.com/apache/beam/pull/17094 causes a regression:

The test failed with:
{noformat}
Caused by: java.lang.NullPointerException: null value in entry: 
DATASTORE_NAMESPACE=null
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:100)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.RegularImmutableMap.fromEntries(RegularImmutableMap.java:74)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:464)
at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:437)
at 
org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.(MonitoringInfoMetricName.java:46)
at 
org.apache.beam.runners.core.metrics.MonitoringInfoMetricName.named(MonitoringInfoMetricName.java:93)
at 
org.apache.beam.runners.core.metrics.ServiceCallMetric.call(ServiceCallMetric.java:82)
at 
org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.runQueryWithRetries(DatastoreV1.java:927)
at 
org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read$ReadFn.processElement(DatastoreV1.java:965)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14177) GroupByKey iteration caching broken for portable runners like Dataflow runner v2

2022-03-25 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14177:
-
Description: 
The wrong cache key is being used as it has not been namespaced to the state 
key.

This was previously being done within StateFetchingIterators but 
https://github.com/apache/beam/pull/17121 changed that to use a single shared 
key.

The fix is to subcache the cache before passing it into StateFetchingIterators 
restoring the prior behavior.

  was:
The wrong cache key is being used as it has not been namespaced to the state 
key.

This was previously being done within StateFetchingIterators but 
https://github.com/apache/beam/pull/17121 changed that to use a single shared 
key.


> GroupByKey iteration caching broken for portable runners like Dataflow runner 
> v2
> 
>
> Key: BEAM-14177
> URL: https://issues.apache.org/jira/browse/BEAM-14177
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The wrong cache key is being used as it has not been namespaced to the state 
> key.
> This was previously being done within StateFetchingIterators but 
> https://github.com/apache/beam/pull/17121 changed that to use a single shared 
> key.
> The fix is to subcache the cache before passing it into 
> StateFetchingIterators restoring the prior behavior.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14177) GroupByKey iteration caching broken for portable runners like Dataflow runner v2

2022-03-25 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14177:
-
Summary: GroupByKey iteration caching broken for portable runners like 
Dataflow runner v2  (was: GroupByKey re-iteration caching broken for portable 
runners like Dataflow runner v2)

> GroupByKey iteration caching broken for portable runners like Dataflow runner 
> v2
> 
>
> Key: BEAM-14177
> URL: https://issues.apache.org/jira/browse/BEAM-14177
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The wrong cache key is being used as it has not been namespaced to the state 
> key.
> This was previously being done within StateFetchingIterators but 
> https://github.com/apache/beam/pull/17121 changed that to use a single shared 
> key.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14177) GroupByKey re-iteration caching broken for portable runners like Dataflow runner v2

2022-03-25 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14177:
-
Summary: GroupByKey re-iteration caching broken for portable runners like 
Dataflow runner v2  (was: GroupByKey re-iteration broken for portable runners 
like Dataflow runner v2)

> GroupByKey re-iteration caching broken for portable runners like Dataflow 
> runner v2
> ---
>
> Key: BEAM-14177
> URL: https://issues.apache.org/jira/browse/BEAM-14177
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The wrong cache key is being used as it has not been namespaced to the state 
> key.
> This was previously being done within StateFetchingIterators but 
> https://github.com/apache/beam/pull/17121 changed that to use a single shared 
> key.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14177) GroupByKey re-iteration broken for portable runners like Dataflow runner v2

2022-03-25 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14177:
-
Summary: GroupByKey re-iteration broken for portable runners like Dataflow 
runner v2  (was: GroupByKey re-iteration broken for Dataflow runner v2)

> GroupByKey re-iteration broken for portable runners like Dataflow runner v2
> ---
>
> Key: BEAM-14177
> URL: https://issues.apache.org/jira/browse/BEAM-14177
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The wrong cache key is being used as it has not been namespaced to the state 
> key.
> This was previously being done within StateFetchingIterators but 
> https://github.com/apache/beam/pull/17121 changed that to use a single shared 
> key.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14177) GroupByKey re-iteration broken for Dataflow runner v2

2022-03-25 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14177:


 Summary: GroupByKey re-iteration broken for Dataflow runner v2
 Key: BEAM-14177
 URL: https://issues.apache.org/jira/browse/BEAM-14177
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Luke Cwik
Assignee: Luke Cwik
 Fix For: 2.38.0


The wrong cache key is being used as it has not been namespaced to the state 
key.

This was previously being done within StateFetchingIterators but 
https://github.com/apache/beam/pull/17121 changed that to use a single shared 
key.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14116) Fix Pub/Sub Lite IO and SDF performance issues with shuffles

2022-03-24 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512074#comment-17512074
 ] 

Luke Cwik commented on BEAM-14116:
--

https://github.com/apache/beam/pull/17004 seems to have broken the internal 
cloud/dataflow/testing/integration/sdk:streaming_GroupByKeyTest_BasicTests_testLargeKeys1MB_wm_service_local
 test reliably for Dataflow.

Please work with the release manager [~danoliveira] to rollback or forward fix.

> Fix Pub/Sub Lite IO and SDF performance issues with shuffles
> 
>
> Key: BEAM-14116
> URL: https://issues.apache.org/jira/browse/BEAM-14116
> Project: Beam
>  Issue Type: Task
>  Components: io-java-gcp, runner-dataflow
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-3305) Consider: Go ViewFn support

2022-03-24 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-3305:

Description: 
We can add it as an optional field to beam.SideInput (see: pkg/beam/option.go). 
The execution side needs to support it as well. However, with the various side 
input forms, it's not clear how valuable such a feature would be.

It is strongly recommended to add Map Side Inputs 
https://issues.apache.org/jira/browse/BEAM-3293 before implementing this 
suggestion, and required to have caching implemented 
https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit 
is acheived.

See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need 
to be changed.

Make providing a ViewFn part of the beam.SideInput struct, either as a method 
to validate & update unexported fields, or adding an optional field. 

The ViewFn would be a function that takes in the iterator/multimap version of 
the Side Input and returns whatever the pre-processed type is. That value would 
be what is cached to reduce work per common window.

The DoFn should then be able to use that type in the position of the 
ProcessElement method for that side input. This avoids issues that only work in 
of the Global Window like trying to process the side input in StartBundle. 
Something would need to be changed about handling the DoFn method signature 
analysis to support additional alternative side input types.

  was:
We can add it as an optional field to beam.SideInput (see: pkg/beam/option.go). 
The execution side needs to support it as well. However, with the various side 
input forms, it's not clear how valuable such a feature would be.

It is strongly recommended to add Map Side Inputs 
https://issues.apache.org/jira/browse/BEAM-3293 before implementing this 
suggestion, and required to have caching implemented 
https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit 
is acheived.

See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need 
to be changed.

Make providing a ViewFn part of the beam.SideInput struct, either as a method 
to validate & update unexported fields, or adding an optional field. 

The ViewFn would be a function that takes in the iterator version of the Side 
Input and returns whatever the pre-processed type is. That value would be what 
is cached to reduce work per common window.

The DoFn should then be able to use that type in the position of the 
ProcessElement method for that side input. This avoids issues that only work in 
of the Global Window like trying to process the side input in StartBundle. 
Something would need to be changed about handling the DoFn method signature 
analysis to support additional alternative side input types.


> Consider: Go ViewFn support
> ---
>
> Key: BEAM-3305
> URL: https://issues.apache.org/jira/browse/BEAM-3305
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Priority: P3
>
> We can add it as an optional field to beam.SideInput (see: 
> pkg/beam/option.go). The execution side needs to support it as well. However, 
> with the various side input forms, it's not clear how valuable such a feature 
> would be.
> It is strongly recommended to add Map Side Inputs 
> https://issues.apache.org/jira/browse/BEAM-3293 before implementing this 
> suggestion, and required to have caching implemented 
> https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little 
> benefit is acheived.
> See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need 
> to be changed.
> Make providing a ViewFn part of the beam.SideInput struct, either as a method 
> to validate & update unexported fields, or adding an optional field. 
> The ViewFn would be a function that takes in the iterator/multimap version of 
> the Side Input and returns whatever the pre-processed type is. That value 
> would be what is cached to reduce work per common window.
> The DoFn should then be able to use that type in the position of the 
> ProcessElement method for that side input. This avoids issues that only work 
> in of the Global Window like trying to process the side input in StartBundle. 
> Something would need to be changed about handling the DoFn method signature 
> analysis to support additional alternative side input types.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-11325) KafkaIO should be able to read from new added topic/partition automatically during pipeline execution time

2022-03-23 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511311#comment-17511311
 ] 

Luke Cwik commented on BEAM-11325:
--

We should use BEAM-13852 to track the fix.

> KafkaIO should be able to read from new added topic/partition automatically 
> during pipeline execution time
> --
>
> Key: BEAM-11325
> URL: https://issues.apache.org/jira/browse/BEAM-11325
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
> Fix For: 2.29.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14134) Many coders cause significant unnecessary allocations

2022-03-23 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14134:
-
Fix Version/s: 2.38.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Many coders cause significant unnecessary allocations
> -
>
> Key: BEAM-14134
> URL: https://issues.apache.org/jira/browse/BEAM-14134
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Many coders (BigEndian*, Map, Iterable, Instant) use DataInputStream to read 
> longs/ints/shorts.  Internally each DataInputStream allocates ~200 bytes of 
> buffers when instantiated.  This means every long, int, short, etc decoded 
> allocates over 200 bytes.
> We should eliminate all uses of DataInputStream in hot-paths and replace it 
> with something more efficient.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13695) Provide more accurate size estimates for cache objects in Java 17

2022-03-22 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-13695:


Assignee: Kiley Sok

> Provide more accurate size estimates for cache objects in Java 17
> -
>
> Key: BEAM-13695
> URL: https://issues.apache.org/jira/browse/BEAM-13695
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Kiley Sok
>Assignee: Kiley Sok
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14154) Collect And Deploy Playground Examples workflow consistently fails

2022-03-22 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510924#comment-17510924
 ] 

Luke Cwik commented on BEAM-14154:
--

I manually disabled the workflow from the github UI. !disabled.png! 

> Collect And Deploy Playground Examples workflow consistently fails
> --
>
> Key: BEAM-14154
> URL: https://issues.apache.org/jira/browse/BEAM-14154
> Project: Beam
>  Issue Type: Bug
>  Components: beam-playground, test-failures
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: P2
> Attachments: disabled.png
>
>
> https://github.com/apache/beam/actions/workflows/playground_deploy_examples.yml?query=is%3Afailure
>  is failing consistently. Has failed 344 times and have never succeeded in 
> the past 3 months.
> Fails with:
> {noformat}
> Run helm del --namespace $K8S_NAMESPACE $HELM_APP_NAME
> Error: Kubernetes cluster unreachable: Get "http://localhost:8080/version": 
> dial tcp [::1]:8080: connect: connection refused
> Error: Process completed with exit code 1.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14154) Collect And Deploy Playground Examples workflow consistently fails

2022-03-22 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14154:
-
Attachment: disabled.png

> Collect And Deploy Playground Examples workflow consistently fails
> --
>
> Key: BEAM-14154
> URL: https://issues.apache.org/jira/browse/BEAM-14154
> Project: Beam
>  Issue Type: Bug
>  Components: beam-playground, test-failures
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: P2
> Attachments: disabled.png
>
>
> https://github.com/apache/beam/actions/workflows/playground_deploy_examples.yml?query=is%3Afailure
>  is failing consistently. Has failed 344 times and have never succeeded in 
> the past 3 months.
> Fails with:
> {noformat}
> Run helm del --namespace $K8S_NAMESPACE $HELM_APP_NAME
> Error: Kubernetes cluster unreachable: Get "http://localhost:8080/version": 
> dial tcp [::1]:8080: connect: connection refused
> Error: Process completed with exit code 1.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14154) Collect And Deploy Playground Examples workflow consistently fails

2022-03-22 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14154:


 Summary: Collect And Deploy Playground Examples workflow 
consistently fails
 Key: BEAM-14154
 URL: https://issues.apache.org/jira/browse/BEAM-14154
 Project: Beam
  Issue Type: Bug
  Components: beam-playground, test-failures
Reporter: Luke Cwik
Assignee: Pablo Estrada


https://github.com/apache/beam/actions/workflows/playground_deploy_examples.yml?query=is%3Afailure
 is failing consistently. Has failed 344 times and have never succeeded in the 
past 3 months.

Fails with:
{noformat}
Run helm del --namespace $K8S_NAMESPACE $HELM_APP_NAME
Error: Kubernetes cluster unreachable: Get "http://localhost:8080/version": 
dial tcp [::1]:8080: connect: connection refused
Error: Process completed with exit code 1.
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14154) Collect And Deploy Playground Examples workflow consistently fails

2022-03-22 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14154:
-
Status: Open  (was: Triage Needed)

> Collect And Deploy Playground Examples workflow consistently fails
> --
>
> Key: BEAM-14154
> URL: https://issues.apache.org/jira/browse/BEAM-14154
> Project: Beam
>  Issue Type: Bug
>  Components: beam-playground, test-failures
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: P2
>
> https://github.com/apache/beam/actions/workflows/playground_deploy_examples.yml?query=is%3Afailure
>  is failing consistently. Has failed 344 times and have never succeeded in 
> the past 3 months.
> Fails with:
> {noformat}
> Run helm del --namespace $K8S_NAMESPACE $HELM_APP_NAME
> Error: Kubernetes cluster unreachable: Get "http://localhost:8080/version": 
> dial tcp [::1]:8080: connect: connection refused
> Error: Process completed with exit code 1.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14152) SpannerChangeStreamErrorTest.testUnavailableExceptionRetries flaky

2022-03-22 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14152:
-
Status: Open  (was: Triage Needed)

> SpannerChangeStreamErrorTest.testUnavailableExceptionRetries flaky
> --
>
> Key: BEAM-14152
> URL: https://issues.apache.org/jira/browse/BEAM-14152
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, test-failures
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Example failures: 
> * 
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21726/testReport/junit/org.apache.beam.sdk.io.gcp.spanner.changestreams/SpannerChangeStreamErrorTest/testUnavailableExceptionRetries/
> * 
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21723/testReport/junit/org.apache.beam.sdk.io.gcp.spanner.changestreams/SpannerChangeStreamErrorTest/testUnavailableExceptionRetries/
> {noformat}
> java.lang.AssertionError: 
> Expected: a value greater than <1>
>  but: <0> was less than <1>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>   at 
> org.apache.beam.sdk.io.gcp.spanner.changestreams.SpannerChangeStreamErrorTest.testUnavailableExceptionRetries(SpannerChangeStreamErrorTest.java:166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:323)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:258)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14132) Java 17 slow for nexmark query 3

2022-03-22 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510909#comment-17510909
 ] 

Luke Cwik commented on BEAM-14132:
--

Query 3 uses a stateful DoFn. A large difference between Java 17 and prior 
versions is that Jamm can't measure the object sizes that are placed in the 
cache so the object sizing isn't accurate and also incurs a cost where it 
throws an exception and a backup method is used: 
https://github.com/apache/beam/blob/c2d4e5162afdc5ad9d32da7eca681581322e515f/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/Caches.java#L70

Should be addressed for benchmarks with 
https://github.com/apache/beam/pull/17110 but this solution requires users to 
still add modules for jamm where we could have enumerated them. This might 
eventually be part of the JVM as per https://openjdk.java.net/jeps/8249196


> Java 17 slow for nexmark query 3
> 
>
> Key: BEAM-14132
> URL: https://issues.apache.org/jira/browse/BEAM-14132
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kiley Sok
>Priority: P2
> Attachments: Screen Shot 2022-03-18 at 5.09.00 PM.png
>
>
> http://metrics.beam.apache.org/d/8INnSY9Mz/nexmark-dataflow-runner-v2?orgId=1&var-processingType=streaming&var-ID=All&from=now-30d&to=now
> For streaming runner v2, Java 17 is twice as slow as Java8/11 on query 3
> https://github.com/apache/beam/blob/master/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query3.java



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14152) SpannerChangeStreamErrorTest.testUnavailableExceptionRetries flaky

2022-03-22 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510907#comment-17510907
 ] 

Luke Cwik commented on BEAM-14152:
--

Test disabled in https://github.com/apache/beam/pull/17158

> SpannerChangeStreamErrorTest.testUnavailableExceptionRetries flaky
> --
>
> Key: BEAM-14152
> URL: https://issues.apache.org/jira/browse/BEAM-14152
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, test-failures
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Example failures: 
> * 
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21726/testReport/junit/org.apache.beam.sdk.io.gcp.spanner.changestreams/SpannerChangeStreamErrorTest/testUnavailableExceptionRetries/
> * 
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21723/testReport/junit/org.apache.beam.sdk.io.gcp.spanner.changestreams/SpannerChangeStreamErrorTest/testUnavailableExceptionRetries/
> {noformat}
> java.lang.AssertionError: 
> Expected: a value greater than <1>
>  but: <0> was less than <1>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>   at 
> org.apache.beam.sdk.io.gcp.spanner.changestreams.SpannerChangeStreamErrorTest.testUnavailableExceptionRetries(SpannerChangeStreamErrorTest.java:166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:323)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:258)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14152) SpannerChangeStreamErrorTest.testUnavailableExceptionRetries flaky

2022-03-22 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14152:


 Summary: 
SpannerChangeStreamErrorTest.testUnavailableExceptionRetries flaky
 Key: BEAM-14152
 URL: https://issues.apache.org/jira/browse/BEAM-14152
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp, test-failures
Reporter: Luke Cwik
Assignee: Pablo Estrada


Example failures: 
* 
https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21726/testReport/junit/org.apache.beam.sdk.io.gcp.spanner.changestreams/SpannerChangeStreamErrorTest/testUnavailableExceptionRetries/
* 
https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21723/testReport/junit/org.apache.beam.sdk.io.gcp.spanner.changestreams/SpannerChangeStreamErrorTest/testUnavailableExceptionRetries/

{noformat}
java.lang.AssertionError: 
Expected: a value greater than <1>
 but: <0> was less than <1>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at 
org.apache.beam.sdk.io.gcp.spanner.changestreams.SpannerChangeStreamErrorTest.testUnavailableExceptionRetries(SpannerChangeStreamErrorTest.java:166)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:323)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:258)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (BEAM-14038) Add the ability to easily start up Python expansion services.

2022-03-22 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510875#comment-17510875
 ] 

Luke Cwik edited comment on BEAM-14038 at 3/22/22, 8:19 PM:


Filed https://issues.apache.org/jira/browse/BEAM-14148 for the extremely flaky 
ExternalPythonTransformTest.trivialPythonTransform test. Started rollback of 
https://github.com/apache/beam/pull/17035 in 
https://github.com/apache/beam/pull/17154


was (Author: lcwik):
Filed https://issues.apache.org/jira/browse/BEAM-14148 for the extremely flaky 
ExternalPythonTransformTest.trivialPythonTransform test. Started rollback in 
https://github.com/apache/beam/pull/17154

> Add the ability to easily start up Python expansion services.
> -
>
> Key: BEAM-14038
> URL: https://issues.apache.org/jira/browse/BEAM-14038
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14038) Add the ability to easily start up Python expansion services.

2022-03-22 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510875#comment-17510875
 ] 

Luke Cwik commented on BEAM-14038:
--

Filed https://issues.apache.org/jira/browse/BEAM-14148 for the extremely flaky 
ExternalPythonTransformTest.trivialPythonTransform test. Started rollback in 
https://github.com/apache/beam/pull/17154

> Add the ability to easily start up Python expansion services.
> -
>
> Key: BEAM-14038
> URL: https://issues.apache.org/jira/browse/BEAM-14038
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14148) ExternalPythonTransformTest.trivialPythonTransform flaky

2022-03-22 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510876#comment-17510876
 ] 

Luke Cwik commented on BEAM-14148:
--

Started rollback of https://github.com/apache/beam/pull/17035 in 
https://github.com/apache/beam/pull/17154

> ExternalPythonTransformTest.trivialPythonTransform flaky
> 
>
> Key: BEAM-14148
> URL: https://issues.apache.org/jira/browse/BEAM-14148
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language, test-failures
>Reporter: Luke Cwik
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example run: 
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/4806/testReport/junit/org.apache.beam.sdk.extensions.python/ExternalPythonTransformTest/trivialPythonTransform/
> {noformat}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout 
> waiting for Python service startup after 16616 seconds.
>   at 
> org.apache.beam.sdk.extensions.python.ExternalPythonTransform.expand(ExternalPythonTransform.java:107)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:548)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:482)
>   at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:363)
>   at 
> org.apache.beam.sdk.extensions.python.ExternalPythonTransformTest.trivialPythonTransform(ExternalPythonTransformTest.java:41)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14148) ExternalPythonTransformTest.trivialPythonTransform flaky

2022-03-22 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14148:


 Summary: ExternalPythonTransformTest.trivialPythonTransform flaky
 Key: BEAM-14148
 URL: https://issues.apache.org/jira/browse/BEAM-14148
 Project: Beam
  Issue Type: Bug
  Components: cross-language, test-failures
Reporter: Luke Cwik
Assignee: Robert Bradshaw


Example run: 
https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/4806/testReport/junit/org.apache.beam.sdk.extensions.python/ExternalPythonTransformTest/trivialPythonTransform/

{noformat}
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout 
waiting for Python service startup after 16616 seconds.
at 
org.apache.beam.sdk.extensions.python.ExternalPythonTransform.expand(ExternalPythonTransform.java:107)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:548)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:482)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:363)
at 
org.apache.beam.sdk.extensions.python.ExternalPythonTransformTest.trivialPythonTransform(ExternalPythonTransformTest.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-11325) KafkaIO should be able to read from new added topic/partition automatically during pipeline execution time

2022-03-21 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510121#comment-17510121
 ] 

Luke Cwik commented on BEAM-11325:
--

This is not working, see https://issues.apache.org/jira/browse/BEAM-13852 as a 
discussion around fixing this.

> KafkaIO should be able to read from new added topic/partition automatically 
> during pipeline execution time
> --
>
> Key: BEAM-11325
> URL: https://issues.apache.org/jira/browse/BEAM-11325
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
> Fix For: 2.29.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13852) KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions

2022-03-21 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510120#comment-17510120
 ] 

Luke Cwik commented on BEAM-13852:
--

DoFn can't use a processing time timer since the timer won't fire anymore once 
the watermark advances to infinity. Processing time timers only fire while the 
watermark isn't at infinity/MAX. It would likely be best to replace the 
implementation with the Watch[1] transform which will hold up the watermark 
appropriately and was built for this kind of use case already.

Discussion in https://lists.apache.org/thread/fo6jrg15pwd6bjpl6k8g4l0z953mp7ry

1: 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java

> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
> --
>
> Key: BEAM-13852
> URL: https://issues.apache.org/jira/browse/BEAM-13852
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Reporter: John Casey
>Assignee: John Casey
>Priority: P2
>
> KafkaIO.Read().withDynamicRead() is correctly pulling messages from existing 
> topics, but isn't picking up new topics or partitions when they are created.
> Currently, this appears to be an interaction between the timer configuration, 
> and the duration of the operating window, that is causing the problem



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14064) ElasticSearchIO#Write buffering and outputting across windows

2022-03-17 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508393#comment-17508393
 ] 

Luke Cwik commented on BEAM-14064:
--

Set the fix version

> ElasticSearchIO#Write buffering and outputting across windows
> -
>
> Key: BEAM-14064
> URL: https://issues.apache.org/jira/browse/BEAM-14064
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch
>Affects Versions: 2.35.0, 2.36.0, 2.37.0
>Reporter: Luke Cwik
>Assignee: Evan Galpin
>Priority: P1
> Fix For: 2.38.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Source: https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db
> Bug PR: https://github.com/apache/beam/pull/15381
> ElasticsearchIO is collecting results from elements in window X and then 
> trying to output them in window Y when flushing the batch. This exposed a bug 
> where elements that were being buffered were being output as part of a 
> different window than what the window that produced them was.
> This became visible because validation was added recently to ensure that when 
> the pipeline is processing elements in window X that output with a timestamp 
> is valid for window X. Note that this validation only occurs in 
> *@ProcessElement* since output is associated with the current window with the 
> input element that is being processed.
> It is ok to do this in *@FinishBundle* since there is no existing windowing 
> context and when you output that element is assigned to an appropriate window.
> *Further Context*
> We’ve bisected it to being introduced in 2.35.0, and I’m reasonably certain 
> it’s this PR https://github.com/apache/beam/pull/15381
> Our scenario is pretty trivial, we read off Pubsub and write to Elastic in a 
> streaming job, the config for the source and sink is respectively
> {noformat}
> pipeline.apply(
> PubsubIO.readStrings().fromSubscription(subscription)
> ).apply(ParseJsons.of(OurObject::class.java))
> .setCoder(KryoCoder.of())
> {noformat}
> and
> {noformat}
> ElasticsearchIO.write()
> .withUseStatefulBatches(true)
> .withMaxParallelRequestsPerWindow(1)
> .withMaxBufferingDuration(Duration.standardSeconds(30))
> // 5 bytes **> KiB **> MiB, so 5 MiB
> .withMaxBatchSizeBytes(5L * 1024 * 1024)
> // # of docs
> .withMaxBatchSize(1000)
> .withConnectionConfiguration(
> ElasticsearchIO.ConnectionConfiguration.create(
> arrayOf(host),
> "fubar",
> "_doc"
> ).withConnectTimeout(5000)
> .withSocketTimeout(3)
> )
> .withRetryConfiguration(
> ElasticsearchIO.RetryConfiguration.create(
> 10,
> // the duration is wall clock, against the connection and 
> socket timeouts specified
> // above. I.e., 10 x 30s is gonna be more than 3 minutes, 
> so if we're getting
> // 10 socket timeouts in a row, this would ignore the 
> "10" part and terminate
> // after 6. The idea is that in a mixed failure mode, 
> you'd get different timeouts
> // of different durations, and on average 10 x fails < 4m.
> // That said, 4m is arbitrary, so adjust as and when 
> needed.
> Duration.standardMinutes(4)
> )
> )
> .withIdFn { f: JsonNode -> f["id"].asText() }
> .withIndexFn { f: JsonNode -> f["schema_name"].asText() }
> .withIsDeleteFn { f: JsonNode -> f["_action"].asText("noop") == 
> "delete" }
> {noformat}
> We recently tried upgrading 2.33 to 2.36 and immediately hit a bug in the 
> consumer, due to alleged time skew, specifically
> {noformat}
> 2022-03-07 10:48:37.886 GMTError message from worker: 
> java.lang.IllegalArgumentException: Cannot output with timestamp 
> 2022-03-07T10:43:38.640Z. Output timestamps must be no earlier than the 
> timestamp of the 
> current input (2022-03-07T10:43:43.562Z) minus the allowed skew (0 
> milliseconds) and no later than 294247-01-10T04:00:54.775Z. See the 
> DoFn#getAllowedTimestampSkew() Javadoc 
> for details on changing the allowed skew. 
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.checkTimestamp(SimpleDoFnRunner.java:446)
>  
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.outputWithTimestamp(SimpleDoFnRunner.java:422)
>  
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$B

[jira] [Updated] (BEAM-14027) Java PreCommit flaky org.apache.beam.sdk.transforms/ParDoLifecycleTest#testTeardownCalledAfterExceptionInProcessElementStateful

2022-03-09 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14027:
-
Fix Version/s: Not applicable
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Java PreCommit flaky 
> org.apache.beam.sdk.transforms/ParDoLifecycleTest#testTeardownCalledAfterExceptionInProcessElementStateful
> ---
>
> Key: BEAM-14027
> URL: https://issues.apache.org/jira/browse/BEAM-14027
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Luke Cwik
>Assignee: Reuven Lax
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Looks like https://github.com/apache/beam/pull/16727 is using the static map 
> concurrently which is not thread safe.
> Causing failures like:
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:1498)
>   at java.util.AbstractCollection.toString(AbstractCollection.java:461)
>   at java.lang.String.valueOf(String.java:2994)
>   at java.lang.StringBuilder.append(StringBuilder.java:131)
>   at 
> org.apache.beam.sdk.transforms.ParDoLifecycleTest$ExceptionThrowingFn.validate(ParDoLifecycleTest.java:392)
>   at 
> org.apache.beam.sdk.transforms.ParDoLifecycleTest$ExceptionThrowingFn.access$300(ParDoLifecycleTest.java:356)
>   at 
> org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful(ParDoLifecycleTest.java:269)
> {noformat}
> See failure in:
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21291/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInProcessElementStateful/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14064) ElasticSearchIO#Write buffering and outputting across windows

2022-03-07 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-14064:
-
Description: 
Source: https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db
Bug PR: https://github.com/apache/beam/pull/15381

ElasticsearchIO is collecting results from elements in window X and then trying 
to output them in window Y when flushing the batch. This exposed a bug where 
elements that were being buffered were being output as part of a different 
window than what the window that produced them was.

This became visible because validation was added recently to ensure that when 
the pipeline is processing elements in window X that output with a timestamp is 
valid for window X. Note that this validation only occurs in *@ProcessElement* 
since output is associated with the current window with the input element that 
is being processed.

It is ok to do this in *@FinishBundle* since there is no existing windowing 
context and when you output that element is assigned to an appropriate window.

*Further Context*

We’ve bisected it to being introduced in 2.35.0, and I’m reasonably certain 
it’s this PR https://github.com/apache/beam/pull/15381

Our scenario is pretty trivial, we read off Pubsub and write to Elastic in a 
streaming job, the config for the source and sink is respectively


{noformat}
pipeline.apply(
PubsubIO.readStrings().fromSubscription(subscription)
).apply(ParseJsons.of(OurObject::class.java))
.setCoder(KryoCoder.of())
{noformat}
and


{noformat}
ElasticsearchIO.write()
.withUseStatefulBatches(true)
.withMaxParallelRequestsPerWindow(1)
.withMaxBufferingDuration(Duration.standardSeconds(30))
// 5 bytes **> KiB **> MiB, so 5 MiB
.withMaxBatchSizeBytes(5L * 1024 * 1024)
// # of docs
.withMaxBatchSize(1000)
.withConnectionConfiguration(
ElasticsearchIO.ConnectionConfiguration.create(
arrayOf(host),
"fubar",
"_doc"
).withConnectTimeout(5000)
.withSocketTimeout(3)
)
.withRetryConfiguration(
ElasticsearchIO.RetryConfiguration.create(
10,
// the duration is wall clock, against the connection and 
socket timeouts specified
// above. I.e., 10 x 30s is gonna be more than 3 minutes, 
so if we're getting
// 10 socket timeouts in a row, this would ignore the "10" 
part and terminate
// after 6. The idea is that in a mixed failure mode, you'd 
get different timeouts
// of different durations, and on average 10 x fails < 4m.
// That said, 4m is arbitrary, so adjust as and when needed.
Duration.standardMinutes(4)
)
)
.withIdFn { f: JsonNode -> f["id"].asText() }
.withIndexFn { f: JsonNode -> f["schema_name"].asText() }
.withIsDeleteFn { f: JsonNode -> f["_action"].asText("noop") == 
"delete" }
{noformat}

We recently tried upgrading 2.33 to 2.36 and immediately hit a bug in the 
consumer, due to alleged time skew, specifically

{noformat}
2022-03-07 10:48:37.886 GMTError message from worker: 
java.lang.IllegalArgumentException: Cannot output with timestamp 
2022-03-07T10:43:38.640Z. Output timestamps must be no earlier than the 
timestamp of the 
current input (2022-03-07T10:43:43.562Z) minus the allowed skew (0 
milliseconds) and no later than 294247-01-10T04:00:54.775Z. See the 
DoFn#getAllowedTimestampSkew() Javadoc 
for details on changing the allowed skew. 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.checkTimestamp(SimpleDoFnRunner.java:446)
 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.outputWithTimestamp(SimpleDoFnRunner.java:422)
 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn$ProcessContextAdapter.output(ElasticsearchIO.java:2364)
 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.flushAndOutputResults(ElasticsearchIO.java:2404)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.addAndMaybeFlush(ElasticsearchIO.java:2419)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOStatefulFn.processElement(ElasticsearchIO.java:2300)
{noformat}

I’ve bisected it and 2.34 works fine, 2.35 is the first version this breaks, 
and it seems like the code in the trace is largely added by the PR linked 
above. The error usually claims a skew of a few seconds, but obviously I can’t 
override getAllowedTimestampSkew() on the internal Elastic DoFn, and it’s 
marked deprecated anyway.

I

[jira] [Created] (BEAM-14064) ElasticSearchIO#Write buffering and outputting across windows

2022-03-07 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14064:


 Summary: ElasticSearchIO#Write buffering and outputting across 
windows
 Key: BEAM-14064
 URL: https://issues.apache.org/jira/browse/BEAM-14064
 Project: Beam
  Issue Type: Bug
  Components: io-java-elasticsearch
Affects Versions: 2.36.0, 2.35.0
Reporter: Luke Cwik
Assignee: Evan Galpin


Source: https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db
Bug PR: https://github.com/apache/beam/pull/15381

ElasticsearchIO is collecting results from elements in window X and then trying 
to output them in window Y when flushing the batch. This exposed a bug where 
elements that were being buffered were being output as part of a different 
window than what the window that produced them was.

This became visible because validation was added recently to ensure that when 
the pipeline is processing elements in window X that output with a timestamp is 
valid for window X. Note that this validation only occurs in @ProcessElement 
since output is associated with the current window with the input element that 
is being processed.

Further Context

We’ve bisected it to being introduced in 2.35.0, and I’m reasonably certain 
it’s this PR https://github.com/apache/beam/pull/15381

Our scenario is pretty trivial, we read off Pubsub and write to Elastic in a 
streaming job, the config for the source and sink is respectively


{noformat}
pipeline.apply(
PubsubIO.readStrings().fromSubscription(subscription)
).apply(ParseJsons.of(OurObject::class.java))
.setCoder(KryoCoder.of())
{noformat}
and


{noformat}
ElasticsearchIO.write()
.withUseStatefulBatches(true)
.withMaxParallelRequestsPerWindow(1)
.withMaxBufferingDuration(Duration.standardSeconds(30))
// 5 bytes **> KiB **> MiB, so 5 MiB
.withMaxBatchSizeBytes(5L * 1024 * 1024)
// # of docs
.withMaxBatchSize(1000)
.withConnectionConfiguration(
ElasticsearchIO.ConnectionConfiguration.create(
arrayOf(host),
"fubar",
"_doc"
).withConnectTimeout(5000)
.withSocketTimeout(3)
)
.withRetryConfiguration(
ElasticsearchIO.RetryConfiguration.create(
10,
// the duration is wall clock, against the connection and 
socket timeouts specified
// above. I.e., 10 x 30s is gonna be more than 3 minutes, 
so if we're getting
// 10 socket timeouts in a row, this would ignore the "10" 
part and terminate
// after 6. The idea is that in a mixed failure mode, you'd 
get different timeouts
// of different durations, and on average 10 x fails < 4m.
// That said, 4m is arbitrary, so adjust as and when needed.
Duration.standardMinutes(4)
)
)
.withIdFn { f: JsonNode -> f["id"].asText() }
.withIndexFn { f: JsonNode -> f["schema_name"].asText() }
.withIsDeleteFn { f: JsonNode -> f["_action"].asText("noop") == 
"delete" }
{noformat}

We recently tried upgrading 2.33 to 2.36 and immediately hit a bug in the 
consumer, due to alleged time skew, specifically

{noformat}
2022-03-07 10:48:37.886 GMTError message from worker: 
java.lang.IllegalArgumentException: Cannot output with timestamp 
2022-03-07T10:43:38.640Z. Output timestamps must be no earlier than the 
timestamp of the 
current input (2022-03-07T10:43:43.562Z) minus the allowed skew (0 
milliseconds) and no later than 294247-01-10T04:00:54.775Z. See the 
DoFn#getAllowedTimestampSkew() Javadoc 
for details on changing the allowed skew. 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.checkTimestamp(SimpleDoFnRunner.java:446)
 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.outputWithTimestamp(SimpleDoFnRunner.java:422)
 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn$ProcessContextAdapter.output(ElasticsearchIO.java:2364)
 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.flushAndOutputResults(ElasticsearchIO.java:2404)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.addAndMaybeFlush(ElasticsearchIO.java:2419)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOStatefulFn.processElement(ElasticsearchIO.java:2300)
{noformat}

I’ve bisected it and 2.34 works fine, 2.35 is the first version this breaks, 
and it seems like the code in the trace is largely added by the PR linked 
above. The error usually claims a skew of a few seconds, but obviously I can’t 
override getAllo

[jira] [Created] (BEAM-14028) Java PreCommit flaky org.apache.beam.sdk.transforms/ParDoLifecycleTest#testTeardownCalledAfterExceptionInProcessElementStateful

2022-03-02 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14028:


 Summary: Java PreCommit flaky 
org.apache.beam.sdk.transforms/ParDoLifecycleTest#testTeardownCalledAfterExceptionInProcessElementStateful
 Key: BEAM-14028
 URL: https://issues.apache.org/jira/browse/BEAM-14028
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Luke Cwik
Assignee: Reuven Lax


It looks like the static callStatesMap is modified concurrently by multiple 
instances of the DoFn.

Causes failures like 
(https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21291/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInProcessElementStateful/):
{noformat}
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
at java.util.HashMap$ValueIterator.next(HashMap.java:1498)
at java.util.AbstractCollection.toString(AbstractCollection.java:461)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at 
org.apache.beam.sdk.transforms.ParDoLifecycleTest$ExceptionThrowingFn.validate(ParDoLifecycleTest.java:392)
at 
org.apache.beam.sdk.transforms.ParDoLifecycleTest$ExceptionThrowingFn.access$300(ParDoLifecycleTest.java:356)
at 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful(ParDoLifecycleTest.java:269)
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Deleted] (BEAM-14028) Java PreCommit flaky org.apache.beam.sdk.transforms/ParDoLifecycleTest#testTeardownCalledAfterExceptionInProcessElementStateful

2022-03-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik deleted BEAM-14028:
-


> Java PreCommit flaky 
> org.apache.beam.sdk.transforms/ParDoLifecycleTest#testTeardownCalledAfterExceptionInProcessElementStateful
> ---
>
> Key: BEAM-14028
> URL: https://issues.apache.org/jira/browse/BEAM-14028
> Project: Beam
>  Issue Type: Bug
>Reporter: Luke Cwik
>Assignee: Reuven Lax
>Priority: P2
>
> It looks like the static callStatesMap is modified concurrently by multiple 
> instances of the DoFn.
> Causes failures like 
> (https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21291/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInProcessElementStateful/):
> {noformat}
> java.util.ConcurrentModificationException
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:1498)
>   at java.util.AbstractCollection.toString(AbstractCollection.java:461)
>   at java.lang.String.valueOf(String.java:2994)
>   at java.lang.StringBuilder.append(StringBuilder.java:131)
>   at 
> org.apache.beam.sdk.transforms.ParDoLifecycleTest$ExceptionThrowingFn.validate(ParDoLifecycleTest.java:392)
>   at 
> org.apache.beam.sdk.transforms.ParDoLifecycleTest$ExceptionThrowingFn.access$300(ParDoLifecycleTest.java:356)
>   at 
> org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful(ParDoLifecycleTest.java:269)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14027) Java PreCommit flaky org.apache.beam.sdk.transforms/ParDoLifecycleTest#testTeardownCalledAfterExceptionInProcessElementStateful

2022-03-02 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-14027:


 Summary: Java PreCommit flaky 
org.apache.beam.sdk.transforms/ParDoLifecycleTest#testTeardownCalledAfterExceptionInProcessElementStateful
 Key: BEAM-14027
 URL: https://issues.apache.org/jira/browse/BEAM-14027
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Luke Cwik
Assignee: Reuven Lax


Looks like https://github.com/apache/beam/pull/16727 is using the static map 
concurrently which is not thread safe.

Causing failures like:
{noformat}
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
at java.util.HashMap$ValueIterator.next(HashMap.java:1498)
at java.util.AbstractCollection.toString(AbstractCollection.java:461)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at 
org.apache.beam.sdk.transforms.ParDoLifecycleTest$ExceptionThrowingFn.validate(ParDoLifecycleTest.java:392)
at 
org.apache.beam.sdk.transforms.ParDoLifecycleTest$ExceptionThrowingFn.access$300(ParDoLifecycleTest.java:356)
at 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful(ParDoLifecycleTest.java:269)
{noformat}

See failure in:
https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/21291/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInProcessElementStateful/




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-12756) Negative value being returned for progress when using OffsetRangeTracker

2022-03-01 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-12756:
-
Fix Version/s: Not applicable
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Negative value being returned for progress when using OffsetRangeTracker
> 
>
> Key: BEAM-12756
> URL: https://issues.apache.org/jira/browse/BEAM-12756
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P3
> Fix For: Not applicable
>
> Attachments: job_logs.png
>
>
> range.getFrom() returns 1 and range.getTo() returns 9223372036854775807 
> (Long.MAX_VALUE).  In the returned Progress object, workCompleted is 0.0 and 
> workRemaining is 9.223372036854776E18
> Somehow, in the Dataflow backend, it becomes -9223372036854775808 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-12756) Negative value being returned for progress when using OffsetRangeTracker

2022-03-01 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499626#comment-17499626
 ] 

Luke Cwik commented on BEAM-12756:
--

This was an internal Dataflow bug because of a conversion of a large double 
value needed to be turned into a saturated cast vs a static cast since in c++ a 
cast from a large double value beyond the bounds of an int64 returns the 
smallest negative integer.

> Negative value being returned for progress when using OffsetRangeTracker
> 
>
> Key: BEAM-12756
> URL: https://issues.apache.org/jira/browse/BEAM-12756
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P3
> Attachments: job_logs.png
>
>
> range.getFrom() returns 1 and range.getTo() returns 9223372036854775807 
> (Long.MAX_VALUE).  In the returned Progress object, workCompleted is 0.0 and 
> workRemaining is 9.223372036854776E18
> Somehow, in the Dataflow backend, it becomes -9223372036854775808 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-12756) Negative value being returned for progress when using OffsetRangeTracker

2022-02-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-12756:


Assignee: Luke Cwik

> Negative value being returned for progress when using OffsetRangeTracker
> 
>
> Key: BEAM-12756
> URL: https://issues.apache.org/jira/browse/BEAM-12756
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P3
> Attachments: job_logs.png
>
>
> range.getFrom() returns 1 and range.getTo() returns 9223372036854775807 
> (Long.MAX_VALUE).  In the returned Progress object, workCompleted is 0.0 and 
> workRemaining is 9.223372036854776E18
> Somehow, in the Dataflow backend, it becomes -9223372036854775808 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14008) Java PreCommit failing at HEAD in sdks:java:fn-execution:analyzeClassesDepdencies

2022-02-28 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499144#comment-17499144
 ] 

Luke Cwik commented on BEAM-14008:
--

https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/4672 got past the 
failures.

> Java PreCommit failing at HEAD in 
> sdks:java:fn-execution:analyzeClassesDepdencies
> -
>
> Key: BEAM-14008
> URL: https://issues.apache.org/jira/browse/BEAM-14008
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, test-failures
>Reporter: Brian Hulette
>Assignee: Luke Cwik
>Priority: P1
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> First failure: https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/5144/
> {code}
> 13:45:10 1: Task failed with an exception.
> 13:45:10 ---
> 13:45:10 * What went wrong:
> 13:45:10 Execution failed for task 
> ':sdks:java:fn-execution:analyzeClassesDependencies'.
> 13:45:10 > Dependency analysis found issues.
> 13:45:10   usedUndeclaredArtifacts
> 13:45:10- org.apache.avro:avro:1.8.2@jar
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13965) Polymorphic types not supported in PipelineOptions

2022-02-25 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-13965:
-
Fix Version/s: 2.38.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Polymorphic types not supported in PipelineOptions
> --
>
> Key: BEAM-13965
> URL: https://issues.apache.org/jira/browse/BEAM-13965
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I noticed that polymorphic types using @JsonTypeInfo and @JsonSubTypes are 
> currently not supported in pipeline options as deserialization fails lacking 
> necessary type information. One has to provide a de/serializer and handle 
> things manually.
> Looks like the deserialization code path should just follow the serialization 
> path.
>  
> {noformat}
> diff --git 
> a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
>  
> b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
> --- 
> a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
> (revision 53f5a3c1756509b6fab75d3946a9f95247d02184)
> +++ 
> b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
> (date 1645114275316)
> @@ -1725,7 +1725,8 @@
>  }
>}
>  
> -  private static JsonDeserializer 
> computeDeserializerForMethod(Method method) {
> +  private static Optional> 
> computeCustomDeserializerForMethod(
> +  Method method) {
>  try {
>BeanProperty prop = createBeanProperty(method);
>AnnotatedMember annotatedMethod = prop.getMember();
> @@ -1736,16 +1737,10 @@
>.getAnnotationIntrospector()
>.findDeserializer(annotatedMethod);
>  
> -  JsonDeserializer jsonDeserializer =
> +  return Optional.fromNullable(
>DESERIALIZATION_CONTEXT
>.get()
> -  .deserializerInstance(annotatedMethod, maybeDeserializerClass);
> -
> -  if (jsonDeserializer == null) {
> -jsonDeserializer =
> -
> DESERIALIZATION_CONTEXT.get().findContextualValueDeserializer(prop.getType(), 
> prop);
> -  }
> -  return jsonDeserializer;
> +  .deserializerInstance(annotatedMethod, 
> maybeDeserializerClass));
>  } catch (JsonMappingException e) {
>throw new RuntimeException(e);
>  }
> @@ -1771,11 +1766,12 @@
> * JsonDeserialize} the specified deserializer from the annotation is 
> returned, otherwise the
> * default is returned.
> */
> -  private static JsonDeserializer getDeserializerForMethod(Method 
> method) {
> +  private static @Nullable JsonDeserializer 
> getCustomDeserializerForMethod(Method method) {
>  return CACHE
>  .get()
>  .deserializerCache
> -.computeIfAbsent(method, 
> PipelineOptionsFactory::computeDeserializerForMethod);
> +.computeIfAbsent(method, 
> PipelineOptionsFactory::computeCustomDeserializerForMethod)
> +.orNull();
>}
>  
>/**
> @@ -1796,10 +1792,13 @@
>return null;
>  }
>  
> +JsonDeserializer jsonDeserializer = 
> getCustomDeserializerForMethod(method);
> +if (jsonDeserializer == null) {
> +  return DESERIALIZATION_CONTEXT.get().readTreeAsValue(node, 
> method.getReturnType());
> +}
> +
>  JsonParser parser = new TreeTraversingParser(node, MAPPER);
>  parser.nextToken();
> -
> -JsonDeserializer jsonDeserializer = 
> getDeserializerForMethod(method);
>  return jsonDeserializer.deserialize(parser, 
> DESERIALIZATION_CONTEXT.get());
>}
>  
> @@ -2055,7 +2054,8 @@
>  private final Map>, 
> Registration> combinedCache =
>  Maps.newConcurrentMap();
>  
> -private final Map> deserializerCache = 
> Maps.newConcurrentMap();
> +private final Map>> 
> deserializerCache =
> +Maps.newConcurrentMap();
>  
>  private final Map>> 
> serializerCache =
>  Maps.newConcurrentMap();
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13930) Address StateSpec inconsistency between Runner and Fn API

2022-02-14 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-13930:
-
Resolution: Fixed
Status: Resolved  (was: Open)

> Address StateSpec inconsistency between Runner and Fn API
> -
>
> Key: BEAM-13930
> URL: https://issues.apache.org/jira/browse/BEAM-13930
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.37.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The ability to mix and match runners and SDKs is accomplished through two 
> portability layers:
> 1. The Runner API provides an SDK-and-runner-independent definition of a Beam 
> pipeline
> 2. The Fn API allows a runner to invoke SDK-specific user-defined functions
> Apache Beam pipelines support executing stateful DoFns[1]. To support this 
> execution the Runner API defines multiple user state specifications:
> * ReadModifyWriteStateSpec
> * BagStateSpec
> * OrderedListStateSpec
> * CombiningStateSpec
> * MapStateSpec
> * SetStateSpec
> The Fn API[2] defines APIs[3] to get, append and clear user state currently 
> supporting a BagUserState and MultimapUserState protocol.
> Since there is no clear mapping between the Runner API and Fn API state 
> specifications, there is no way for a runner to know that it supports a given 
> API necessary to support the execution of the pipeline. The Runner will also 
> have to manage additional runtime metadata associated with which protocol was 
> used for a type of state so that it can successfully manage the state’s 
> lifetime once it can be garbage collected.
> Please see the doc[4] for further details and a proposal on how to address 
> this shortcoming.
> 1: https://beam.apache.org/blog/stateful-processing/
> 2: 
> https://github.com/apache/beam/blob/3ad05523f4cdf5122fc319276fcb461f768af39d/model/fn-execution/src/main/proto/beam_fn_api.proto#L742
> 3: https://s.apache.org/beam-fn-state-api-and-bundle-processing
> 4: http://doc/1ELKTuRTV3C5jt_YoBBwPdsPa5eoXCCOSKQ3GPzZrK7Q



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13930) Address StateSpec inconsistency between Runner and Fn API

2022-02-14 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-13930:
-
Fix Version/s: 2.37.0

> Address StateSpec inconsistency between Runner and Fn API
> -
>
> Key: BEAM-13930
> URL: https://issues.apache.org/jira/browse/BEAM-13930
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.37.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The ability to mix and match runners and SDKs is accomplished through two 
> portability layers:
> 1. The Runner API provides an SDK-and-runner-independent definition of a Beam 
> pipeline
> 2. The Fn API allows a runner to invoke SDK-specific user-defined functions
> Apache Beam pipelines support executing stateful DoFns[1]. To support this 
> execution the Runner API defines multiple user state specifications:
> * ReadModifyWriteStateSpec
> * BagStateSpec
> * OrderedListStateSpec
> * CombiningStateSpec
> * MapStateSpec
> * SetStateSpec
> The Fn API[2] defines APIs[3] to get, append and clear user state currently 
> supporting a BagUserState and MultimapUserState protocol.
> Since there is no clear mapping between the Runner API and Fn API state 
> specifications, there is no way for a runner to know that it supports a given 
> API necessary to support the execution of the pipeline. The Runner will also 
> have to manage additional runtime metadata associated with which protocol was 
> used for a type of state so that it can successfully manage the state’s 
> lifetime once it can be garbage collected.
> Please see the doc[4] for further details and a proposal on how to address 
> this shortcoming.
> 1: https://beam.apache.org/blog/stateful-processing/
> 2: 
> https://github.com/apache/beam/blob/3ad05523f4cdf5122fc319276fcb461f768af39d/model/fn-execution/src/main/proto/beam_fn_api.proto#L742
> 3: https://s.apache.org/beam-fn-state-api-and-bundle-processing
> 4: http://doc/1ELKTuRTV3C5jt_YoBBwPdsPa5eoXCCOSKQ3GPzZrK7Q



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-13930) Address StateSpec inconsistency between Runner and Fn API

2022-02-11 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-13930:


 Summary: Address StateSpec inconsistency between Runner and Fn API
 Key: BEAM-13930
 URL: https://issues.apache.org/jira/browse/BEAM-13930
 Project: Beam
  Issue Type: Improvement
  Components: beam-model, sdk-java-core, sdk-py-core
Reporter: Luke Cwik
Assignee: Luke Cwik


The ability to mix and match runners and SDKs is accomplished through two 
portability layers:
1. The Runner API provides an SDK-and-runner-independent definition of a Beam 
pipeline
2. The Fn API allows a runner to invoke SDK-specific user-defined functions

Apache Beam pipelines support executing stateful DoFns[1]. To support this 
execution the Runner API defines multiple user state specifications:
* ReadModifyWriteStateSpec
* BagStateSpec
* OrderedListStateSpec
* CombiningStateSpec
* MapStateSpec
* SetStateSpec

The Fn API[2] defines APIs[3] to get, append and clear user state currently 
supporting a BagUserState and MultimapUserState protocol.

Since there is no clear mapping between the Runner API and Fn API state 
specifications, there is no way for a runner to know that it supports a given 
API necessary to support the execution of the pipeline. The Runner will also 
have to manage additional runtime metadata associated with which protocol was 
used for a type of state so that it can successfully manage the state’s 
lifetime once it can be garbage collected.

Please see the doc[4] for further details and a proposal on how to address this 
shortcoming.

1: https://beam.apache.org/blog/stateful-processing/
2: 
https://github.com/apache/beam/blob/3ad05523f4cdf5122fc319276fcb461f768af39d/model/fn-execution/src/main/proto/beam_fn_api.proto#L742
3: https://s.apache.org/beam-fn-state-api-and-bundle-processing
4: http://doc/1ELKTuRTV3C5jt_YoBBwPdsPa5eoXCCOSKQ3GPzZrK7Q




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13686) OOM while logging a large pipeline even when logging level is higher

2022-02-04 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-13686:
-
Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> OOM while logging a large pipeline even when logging level is higher
> 
>
> Key: BEAM-13686
> URL: https://issues.apache.org/jira/browse/BEAM-13686
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.35.0
>Reporter: Vitaly Ivanov
>Assignee: Vitaly Ivanov
>Priority: P1
> Fix For: 2.37.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Logging message is calculated even when logging level is higher, so there is 
> no WA to increase the logging level. 
> {noformat}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>     at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
>     at 
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
>     at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538)
>     at java.base/java.lang.StringBuilder.append(StringBuilder.java:174)
>     at java.base/java.lang.StringBuilder.append(StringBuilder.java:85)
>     at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:573)
>     at java.base/java.lang.StringBuilder.append(StringBuilder.java:204)
>     at java.base/java.lang.StringBuilder.append(StringBuilder.java:85)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:860)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:574)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:743)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:443)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printMessage(TextFormat.java:705)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.print(TextFormat.java:353)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:597)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:743)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:443)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printMessage(TextFormat.java:705)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.print(TextFormat.java:353)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:597)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:743)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:435)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printMessage(TextFormat.java:705)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.print(TextFormat.java:353)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:597)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:743)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:443)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printMessage(TextFormat.java:705)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.print(TextFormat.java:353)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.print(TextFormat.java:339)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat$Printer.printToString(TextFormat.java:606)
>     at 
> org.apache.beam.vendor.grpc.v1p36p0.com.google.protobuf.TextFormat.printToString(TextFormat.java:149){noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13814) Run errorprone with Java 11 (required in 2.11.0 and newer)

2022-02-03 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486779#comment-17486779
 ] 

Luke Cwik commented on BEAM-13814:
--

We could have a Java 11 specific PreCommit.

Also, unless someone is using Java 11+, they won't be able to reproduce locally 
and only via PR so having a dedicated PreCommit might make sense for speed 
reasons.

> Run errorprone with Java 11 (required in 2.11.0 and newer)
> --
>
> Key: BEAM-13814
> URL: https://issues.apache.org/jira/browse/BEAM-13814
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Brian Hulette
>Priority: P2
>
> errorprone 2.11.0 only supports JDK 11 (but can still validate Java 8 
> classes). We should find a way to upgrade errorprone while still supporting 
> Java 8. 
> Perhaps we could modify BeamModulePlugin to disable errorprone when building 
> with Java 8, and make sure that there's a Java 11 PreCommit that will run it?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13801) Add standard_coders.yaml cases for beam:coder:state_backed_iterable:v1

2022-02-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-13801:


Assignee: (was: Luke Cwik)

> Add standard_coders.yaml cases for beam:coder:state_backed_iterable:v1
> --
>
> Key: BEAM-13801
> URL: https://issues.apache.org/jira/browse/BEAM-13801
> Project: Beam
>  Issue Type: Test
>  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Priority: P2
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We lack coverage for beam:coder:state_backed_iterable:v1 in 
> standard_coders.yaml



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13801) Add standard_coders.yaml cases for beam:coder:state_backed_iterable:v1

2022-02-02 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486004#comment-17486004
 ] 

Luke Cwik commented on BEAM-13801:
--

Java and Python are now covered. Follow-up with coverage in Go.

> Add standard_coders.yaml cases for beam:coder:state_backed_iterable:v1
> --
>
> Key: BEAM-13801
> URL: https://issues.apache.org/jira/browse/BEAM-13801
> Project: Beam
>  Issue Type: Test
>  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We lack coverage for beam:coder:state_backed_iterable:v1 in 
> standard_coders.yaml



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-13801) Add standard_coders.yaml cases for beam:coder:state_backed_iterable:v1

2022-02-01 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-13801:


 Summary: Add standard_coders.yaml cases for 
beam:coder:state_backed_iterable:v1
 Key: BEAM-13801
 URL: https://issues.apache.org/jira/browse/BEAM-13801
 Project: Beam
  Issue Type: Test
  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
Reporter: Luke Cwik
Assignee: Luke Cwik


We lack coverage for beam:coder:state_backed_iterable:v1 in standard_coders.yaml



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13777) confluent schema registry cache capacity

2022-01-31 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-13777:
-
Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> confluent schema registry cache capacity
> 
>
> Key: BEAM-13777
> URL: https://issues.apache.org/jira/browse/BEAM-13777
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Mostafa Aghajani
>Assignee: Mostafa Aghajani
>Priority: P2
> Fix For: 2.37.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Change cache capacity should be specified as input parameter instead of 
> default max integer. The usage can be quite different case by case and a 
> default Integer max value can lead to error like this depending on the setup:
> {{Exception in thread "main" java.lang.OutOfMemoryError: Java heap space}}
> Some documentation link on the parameter: 
> [https://docs.confluent.io/5.4.2/clients/confluent-kafka-dotnet/api/Confluent.SchemaRegistry.CachedSchemaRegistryClient.html#Confluent_SchemaRegistry_CachedSchemaRegistryClient_DefaultMaxCachedSchemas]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13353) KafkaIO should raise an error if both .withReadCommitted() and .commitOffsetsInFinalize() are used

2022-01-18 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478281#comment-17478281
 ] 

Luke Cwik commented on BEAM-13353:
--

Added details to the description.

> KafkaIO should raise an error if both .withReadCommitted() and 
> .commitOffsetsInFinalize() are used
> --
>
> Key: BEAM-13353
> URL: https://issues.apache.org/jira/browse/BEAM-13353
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>
> Read committed tells KafkaIO to only read messages that are already committed 
> which means that committing offsets in finalize is a no-op.
> Users should be using one or the other and is similar to making sure that 
> either auto commit is enabled in the reader of KafkaIO does the committing 
> which we have error checking for already.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13353) KafkaIO should raise an error if both .withReadCommitted() and .commitOffsetsInFinalize() are used

2022-01-18 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-13353:
-
Description: 
Read committed tells KafkaIO to only read messages that are already committed 
which means that committing offsets in finalize is a no-op.

Users should be using one or the other and is similar to making sure that 
either auto commit is enabled in the reader of KafkaIO does the committing 
which we have error checking for already.

> KafkaIO should raise an error if both .withReadCommitted() and 
> .commitOffsetsInFinalize() are used
> --
>
> Key: BEAM-13353
> URL: https://issues.apache.org/jira/browse/BEAM-13353
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>
> Read committed tells KafkaIO to only read messages that are already committed 
> which means that committing offsets in finalize is a no-op.
> Users should be using one or the other and is similar to making sure that 
> either auto commit is enabled in the reader of KafkaIO does the committing 
> which we have error checking for already.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   5   6   7   8   9   10   >