Build failed in Jenkins: beam_PostCommit_Python_Verify #6210

2018-10-09 Thread Apache Jenkins Server
See 


--
[...truncated 1.30 MB...]
raise RuntimeError('x')
RuntimeError: x

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"
 line 131, in _execute
response = task()
  File 
"
 line 166, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"
 line 212, in do_instruction
request.instruction_id)
  File 
"
 line 234, in process_bundle
processor.process_bundle(instruction_id)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
  File 
"
 line 349, in process_instruction_id
yield
  File 
"
 line 234, in process_bundle
processor.process_bundle(instruction_id)
  File 
"
 line 419, in process_bundle
].process_encoded(data.data)
  File 
"
 line 124, in process_encoded
self.output(decoded_value)
  File 
"
 line 168, in output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File 
"
 line 88, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 269, in process
self.output(windowed_value)
  File 
"
 line 168, in output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File 
"
 line 88, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 424, in process
self.dofn_receiver.receive(o)
  File 
"
 line 673, in receive
self.process(windowed_value)
  File 
"
 line 679, in process
self._reraise_augmented(exn)
  File 
"
 line 677, in process
self.do_fn_invoker.invoke_process(windowed_value)
  File 
"
 line 414, in invoke_process
windowed_value, self.process_method(windowed_value.value))
  File 
"
 line 787, in process_outputs
self.main_receivers.receive(windowed_value)
  File 
"
 line 88, in receive
cython.cast(Operation, consumer).process(windowed_value)
  File 
"
 line 424, in process
self.dofn_receiver.receive(o)
  File 
"
 line 673, in receive
self.process(windowed_value)
  File 
"
 line 679, in process
self._reraise_augmen

[jira] [Work logged] (BEAM-3655) Port MaxPerKeyExamplesTest off DoFnTester

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3655?focusedWorklogId=152550&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152550
 ]

ASF GitHub Bot logged work on BEAM-3655:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:04
Start Date: 09/Oct/18 07:04
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6542: [BEAM-3655] Port MaxPerKeyExamplesTest off DoFnTester
URL: https://github.com/apache/beam/pull/6542#discussion_r223580557
 
 

 ##
 File path: 
examples/java/src/test/java/org/apache/beam/examples/cookbook/MaxPerKeyExamplesTest.java
 ##
 @@ -66,22 +71,27 @@
   private static final TableRow resultRow1 =
   new TableRow().set("month", 6).set("max_mean_temp", 85.3);
   private static final TableRow resultRow2 =
+  new TableRow().set("month", 6).set("max_mean_temp", 45.3);
+  private static final TableRow resultRow3 =
   new TableRow().set("month", 7).set("max_mean_temp", 75.4);
 
+  @Rule public TestPipeline p = TestPipeline.create();
+
   @Test
-  public void testExtractTempFn() throws Exception {
-DoFnTester> extractTempFn = 
DoFnTester.of(new ExtractTempFn());
-List> results = extractTempFn.processBundle(TEST_ROWS);
-Assert.assertThat(results, CoreMatchers.hasItem(kv1));
-Assert.assertThat(results, CoreMatchers.hasItem(kv2));
-Assert.assertThat(results, CoreMatchers.hasItem(kv3));
+  @Category(ValidatesRunner.class)
 
 Review comment:
   I think it's ok for now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152550)
Time Spent: 40m  (was: 0.5h)

> Port MaxPerKeyExamplesTest off DoFnTester
> -
>
> Key: BEAM-3655
> URL: https://issues.apache.org/jira/browse/BEAM-3655
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Aleksandr Kokhaniukov
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (e05dcef -> 372afba)

2018-10-09 Thread aromanenko
This is an automated email from the ASF dual-hosted git repository.

aromanenko pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from e05dcef  Merge pull request #6551: [BEAM-5613] Snapshot of Python 
depedency and add it to nightly snapshot job
 add 3d8d0b8  [BEAM-3655] Port MaxPerKeyExamplesTest off DoFnTester
 add 372afba  Merge pull request #6542: [BEAM-3655] Port 
MaxPerKeyExamplesTest off DoFnTester

No new revisions were added by this update.

Summary of changes:
 .../examples/cookbook/MaxPerKeyExamplesTest.java   | 38 ++
 1 file changed, 24 insertions(+), 14 deletions(-)



[jira] [Work logged] (BEAM-3655) Port MaxPerKeyExamplesTest off DoFnTester

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3655?focusedWorklogId=152552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152552
 ]

ASF GitHub Bot logged work on BEAM-3655:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:06
Start Date: 09/Oct/18 07:06
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev closed pull request #6542: [BEAM-3655] 
Port MaxPerKeyExamplesTest off DoFnTester
URL: https://github.com/apache/beam/pull/6542
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/examples/java/src/test/java/org/apache/beam/examples/cookbook/MaxPerKeyExamplesTest.java
 
b/examples/java/src/test/java/org/apache/beam/examples/cookbook/MaxPerKeyExamplesTest.java
index 93e969c0fdb..2a9c31f3d66 100644
--- 
a/examples/java/src/test/java/org/apache/beam/examples/cookbook/MaxPerKeyExamplesTest.java
+++ 
b/examples/java/src/test/java/org/apache/beam/examples/cookbook/MaxPerKeyExamplesTest.java
@@ -22,11 +22,16 @@
 import java.util.List;
 import org.apache.beam.examples.cookbook.MaxPerKeyExamples.ExtractTempFn;
 import org.apache.beam.examples.cookbook.MaxPerKeyExamples.FormatMaxesFn;
-import org.apache.beam.sdk.transforms.DoFnTester;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.testing.ValidatesRunner;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.ParDo;
 import org.apache.beam.sdk.values.KV;
-import org.hamcrest.CoreMatchers;
-import org.junit.Assert;
+import org.apache.beam.sdk.values.PCollection;
+import org.junit.Rule;
 import org.junit.Test;
+import org.junit.experimental.categories.Category;
 import org.junit.runner.RunWith;
 import org.junit.runners.JUnit4;
 
@@ -66,22 +71,27 @@
   private static final TableRow resultRow1 =
   new TableRow().set("month", 6).set("max_mean_temp", 85.3);
   private static final TableRow resultRow2 =
+  new TableRow().set("month", 6).set("max_mean_temp", 45.3);
+  private static final TableRow resultRow3 =
   new TableRow().set("month", 7).set("max_mean_temp", 75.4);
 
+  @Rule public TestPipeline p = TestPipeline.create();
+
   @Test
-  public void testExtractTempFn() throws Exception {
-DoFnTester> extractTempFn = 
DoFnTester.of(new ExtractTempFn());
-List> results = extractTempFn.processBundle(TEST_ROWS);
-Assert.assertThat(results, CoreMatchers.hasItem(kv1));
-Assert.assertThat(results, CoreMatchers.hasItem(kv2));
-Assert.assertThat(results, CoreMatchers.hasItem(kv3));
+  @Category(ValidatesRunner.class)
+  public void testExtractTempFn() {
+PCollection> results =
+p.apply(Create.of(TEST_ROWS)).apply(ParDo.of(new ExtractTempFn()));
+PAssert.that(results).containsInAnyOrder(ImmutableList.of(kv1, kv2, kv3));
+p.run().waitUntilFinish();
   }
 
   @Test
-  public void testFormatMaxesFn() throws Exception {
-DoFnTester, TableRow> formatMaxesFnFn = 
DoFnTester.of(new FormatMaxesFn());
-List results = formatMaxesFnFn.processBundle(TEST_KVS);
-Assert.assertThat(results, CoreMatchers.hasItem(resultRow1));
-Assert.assertThat(results, CoreMatchers.hasItem(resultRow2));
+  @Category(ValidatesRunner.class)
+  public void testFormatMaxesFn() {
+PCollection results =
+p.apply(Create.of(TEST_KVS)).apply(ParDo.of(new FormatMaxesFn()));
+PAssert.that(results).containsInAnyOrder(resultRow1, resultRow2, 
resultRow3);
+p.run().waitUntilFinish();
   }
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152552)
Time Spent: 50m  (was: 40m)

> Port MaxPerKeyExamplesTest off DoFnTester
> -
>
> Key: BEAM-3655
> URL: https://issues.apache.org/jira/browse/BEAM-3655
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Aleksandr Kokhaniukov
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Website_Publish #142

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[alexander.kohanyukov] [BEAM-3655] Port MaxPerKeyExamplesTest off DoFnTester

--
[...truncated 7.78 KB...]
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 1.383 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.023 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 3,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':bu

[jira] [Resolved] (BEAM-3655) Port MaxPerKeyExamplesTest off DoFnTester

2018-10-09 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko resolved BEAM-3655.

   Resolution: Fixed
Fix Version/s: 2.8.0

> Port MaxPerKeyExamplesTest off DoFnTester
> -
>
> Key: BEAM-3655
> URL: https://issues.apache.org/jira/browse/BEAM-3655
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Aleksandr Kokhaniukov
>Priority: Major
>  Labels: beginner, newbie, starter
> Fix For: 2.8.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2887) Python SDK support for portable pipelines

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2887?focusedWorklogId=152564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152564
 ]

ASF GitHub Bot logged work on BEAM-2887:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:21
Start Date: 09/Oct/18 07:21
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6504: [BEAM-2887] Remove 
special FnApi version of wordcount.
URL: https://github.com/apache/beam/pull/6504#issuecomment-428087351
 
 
   Run Python Postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152564)
Time Spent: 2.5h  (was: 2h 20m)

> Python SDK support for portable pipelines
> -
>
> Key: BEAM-2887
> URL: https://issues.apache.org/jira/browse/BEAM-2887
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ahmet Altay
>Priority: Major
>  Labels: portability
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2887) Python SDK support for portable pipelines

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2887?focusedWorklogId=152563&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152563
 ]

ASF GitHub Bot logged work on BEAM-2887:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:21
Start Date: 09/Oct/18 07:21
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6504: [BEAM-2887] Remove 
special FnApi version of wordcount.
URL: https://github.com/apache/beam/pull/6504#issuecomment-428087330
 
 
   Thanks. This test is passing regularly, but I'd really like to see an 
all-clean run before merging. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152563)
Time Spent: 2h 20m  (was: 2h 10m)

> Python SDK support for portable pipelines
> -
>
> Key: BEAM-2887
> URL: https://issues.apache.org/jira/browse/BEAM-2887
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ahmet Altay
>Priority: Major
>  Labels: portability
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=152566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152566
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:23
Start Date: 09/Oct/18 07:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r223584542
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,62 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data)
+# As it's used below to alternate jitter.
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+# Least squares fit for y = a + bx over all points.
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit for y = a + bx over all points.
+b, a = np.polyfit(xs, ys, 1)
+
+n = len(xs)
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  # https://en.wikipedia.org/wiki/Cook%27s_distance
+  sum_x = sum(xs)
+  sum_x2 = sum(xs**2)
+  errs = a + b * xs - ys
+  s2 = sum(errs**2) / (n - 2)
+  if s2 == 0:
+# It's an exact fit!
+return a, b
+  h = (sum_x2 - 2 * sum_x * xs + n * xs**2) / (n * sum_x2 - sum_x**2)
+  cook_ds = 0.5 / s2 * errs**2 * (h / (1 - h)**2)
+
+  # Re-compute the regression, excluding those points with Cook's distance
+  # greater than 0.5, and weighting by the inverse of x to give a more
+  # stable y-intercept.
+  weight = (cook_ds <= 0.5) / xs
 
 Review comment:
   Smaller batches are more accurate predictors of the fixed cost. I'll update 
the comment. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152566)
Time Spent: 5h 10m  (was: 5h)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b6

[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=152569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152569
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:25
Start Date: 09/Oct/18 07:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r223584542
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,62 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data)
+# As it's used below to alternate jitter.
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+# Least squares fit for y = a + bx over all points.
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit for y = a + bx over all points.
+b, a = np.polyfit(xs, ys, 1)
+
+n = len(xs)
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  # https://en.wikipedia.org/wiki/Cook%27s_distance
+  sum_x = sum(xs)
+  sum_x2 = sum(xs**2)
+  errs = a + b * xs - ys
+  s2 = sum(errs**2) / (n - 2)
+  if s2 == 0:
+# It's an exact fit!
+return a, b
+  h = (sum_x2 - 2 * sum_x * xs + n * xs**2) / (n * sum_x2 - sum_x**2)
+  cook_ds = 0.5 / s2 * errs**2 * (h / (1 - h)**2)
+
+  # Re-compute the regression, excluding those points with Cook's distance
+  # greater than 0.5, and weighting by the inverse of x to give a more
+  # stable y-intercept.
+  weight = (cook_ds <= 0.5) / xs
 
 Review comment:
   Smaller batches are more accurate predictors of the fixed cost. I'll update 
the comment. 
   
   One way to think about this is that there is a fair amount of variance in 
processing elements. When one has y = (a+err_b) + (b+err_b)*x, a small err_b 
can greatly influence the prediction for a when x is large. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152569)
Time Spent: 5h 20m  (was: 5h 10m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob

[jira] [Work logged] (BEAM-5634) Bring Dataflow Java Worker Code into Beam

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5634?focusedWorklogId=152580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152580
 ]

ASF GitHub Bot logged work on BEAM-5634:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:37
Start Date: 09/Oct/18 07:37
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6561: 
[BEAM-5634] Bring dataflow java worker code into beam
URL: https://github.com/apache/beam/pull/6561#discussion_r223588256
 
 

 ##
 File path: runners/google-cloud-dataflow-java/worker/build.gradle
 ##
 @@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**/
+// Apply BeamModulePlugin
+
+// Reuse project_root/buildSrc in this build.gradle file to reduce the
+// maintenance burden and simpily this file. See BeamModulePlugin for
+// documentation on default build tasks and properties that are enabled in
+// addition to natures that will be applied to worker.
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+group = "org.apache.beam.runners.dataflow"
+
+/**/
+// Apply Java nature with customized configurations
+
+// Set a specific version of 'com.google.apis:google-api-services-dataflow'
+// by adding -Pdataflow.version= in Gradle command. Otherwise,
+// 'google_clients_version' defined in BeamModulePlugin will be used as 
default.
+def DATAFLOW_VERSION = "dataflow.version"
+
+// To build FnAPI or legacy worker.
+// Use -PisLegacyWorker in Gradle command if build legacy worker, otherwise,
+// FnAPI worker is considered as default.
+def is_legacy_worker = {
+  return project.hasProperty("isLegacyWorker")
+}
+
+// Get full dependency of 'com.google.apis:google-api-services-dataflow'
+def google_api_services_dataflow = project.hasProperty(DATAFLOW_VERSION) ? 
"com.google.apis:google-api-services-dataflow:" + getProperty(DATAFLOW_VERSION) 
: library.java.google_api_services_dataflow
+
+// Returns a string representing the relocated path to be used with the shadow
+// plugin when given a suffix such as "com.".
+def getWorkerRelocatedPath = { String suffix ->
+  return ("org.apache.beam.runners.dataflow.worker.repackaged."
+  + suffix)
+}
+
+// Following listed dependencies will be shaded only in fnapi worker, not 
legacy
+// worker
+def sdk_provided_dependencies = [
+  "org.apache.beam:beam-runners-google-cloud-dataflow-java:$version",
+  "org.apache.beam:beam-sdks-java-core:$version",
+  
"org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:$version",
+  "org.apache.beam:beam-sdks-java-io-google-cloud-platform:$version",
+  google_api_services_dataflow,
+  library.java.avro,
+  library.java.google_api_client,
+  library.java.google_http_client,
+  library.java.google_http_client_jackson,
+  library.java.jackson_annotations,
+  library.java.jackson_core,
+  library.java.jackson_databind,
+  library.java.joda_time,
+]
+
+// Exclude unneeded dependencies when building jar
+def excluded_dependencies = [
+  "com.google.auto.service:auto-service",  // Provided scope added from 
applyJavaNature
+  "com.google.auto.value:auto-value",  // Provided scope added from 
applyJavaNature
+  "org.codehaus.jackson:jackson-core-asl", // Exclude an old version of 
jackson-core-asl introduced by google-http-client-jackson
+  "org.objenesis:objenesis",   // Transitive dependency 
introduced from Beam
+  "org.tukaani:xz",// Transitive dependency 
introduced from Beam
+  library.java.commons_compress,   // Transitive dependency 
introduced from Beam
+  library.java.error_prone_annotations,// Provided scope added in 
worker
+  library.java.hamcrest_core,  // Test only
+  library.java.hamcrest_library,   // Test only
+  library.java.junit,  // Test only
+]
+
+applyJavaNature(validateShadowJar: false, shadowClosure: 
DEFAULT_SHADOW_CLOSURE << {
+  dependencies {
+

[jira] [Work logged] (BEAM-5634) Bring Dataflow Java Worker Code into Beam

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5634?focusedWorklogId=152579&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152579
 ]

ASF GitHub Bot logged work on BEAM-5634:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:37
Start Date: 09/Oct/18 07:37
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6561: 
[BEAM-5634] Bring dataflow java worker code into beam
URL: https://github.com/apache/beam/pull/6561#discussion_r223588256
 
 

 ##
 File path: runners/google-cloud-dataflow-java/worker/build.gradle
 ##
 @@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**/
+// Apply BeamModulePlugin
+
+// Reuse project_root/buildSrc in this build.gradle file to reduce the
+// maintenance burden and simpily this file. See BeamModulePlugin for
+// documentation on default build tasks and properties that are enabled in
+// addition to natures that will be applied to worker.
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+
+group = "org.apache.beam.runners.dataflow"
+
+/**/
+// Apply Java nature with customized configurations
+
+// Set a specific version of 'com.google.apis:google-api-services-dataflow'
+// by adding -Pdataflow.version= in Gradle command. Otherwise,
+// 'google_clients_version' defined in BeamModulePlugin will be used as 
default.
+def DATAFLOW_VERSION = "dataflow.version"
+
+// To build FnAPI or legacy worker.
+// Use -PisLegacyWorker in Gradle command if build legacy worker, otherwise,
+// FnAPI worker is considered as default.
+def is_legacy_worker = {
+  return project.hasProperty("isLegacyWorker")
+}
+
+// Get full dependency of 'com.google.apis:google-api-services-dataflow'
+def google_api_services_dataflow = project.hasProperty(DATAFLOW_VERSION) ? 
"com.google.apis:google-api-services-dataflow:" + getProperty(DATAFLOW_VERSION) 
: library.java.google_api_services_dataflow
+
+// Returns a string representing the relocated path to be used with the shadow
+// plugin when given a suffix such as "com.".
+def getWorkerRelocatedPath = { String suffix ->
+  return ("org.apache.beam.runners.dataflow.worker.repackaged."
+  + suffix)
+}
+
+// Following listed dependencies will be shaded only in fnapi worker, not 
legacy
+// worker
+def sdk_provided_dependencies = [
+  "org.apache.beam:beam-runners-google-cloud-dataflow-java:$version",
+  "org.apache.beam:beam-sdks-java-core:$version",
+  
"org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:$version",
+  "org.apache.beam:beam-sdks-java-io-google-cloud-platform:$version",
+  google_api_services_dataflow,
+  library.java.avro,
+  library.java.google_api_client,
+  library.java.google_http_client,
+  library.java.google_http_client_jackson,
+  library.java.jackson_annotations,
+  library.java.jackson_core,
+  library.java.jackson_databind,
+  library.java.joda_time,
+]
+
+// Exclude unneeded dependencies when building jar
+def excluded_dependencies = [
+  "com.google.auto.service:auto-service",  // Provided scope added from 
applyJavaNature
+  "com.google.auto.value:auto-value",  // Provided scope added from 
applyJavaNature
+  "org.codehaus.jackson:jackson-core-asl", // Exclude an old version of 
jackson-core-asl introduced by google-http-client-jackson
+  "org.objenesis:objenesis",   // Transitive dependency 
introduced from Beam
+  "org.tukaani:xz",// Transitive dependency 
introduced from Beam
+  library.java.commons_compress,   // Transitive dependency 
introduced from Beam
+  library.java.error_prone_annotations,// Provided scope added in 
worker
+  library.java.hamcrest_core,  // Test only
+  library.java.hamcrest_library,   // Test only
+  library.java.junit,  // Test only
+]
+
+applyJavaNature(validateShadowJar: false, shadowClosure: 
DEFAULT_SHADOW_CLOSURE << {
+  dependencies {
+

[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=152586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152586
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 09/Oct/18 07:57
Start Date: 09/Oct/18 07:57
Worklog Time Spent: 10m 
  Work Description: Juta commented on a change in pull request #6602: 
[BEAM-5621] Fix unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#discussion_r223593569
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -223,7 +223,7 @@ def __init__(self,
 if target_batch_duration_secs and target_batch_duration_secs <= 0:
   raise ValueError("target_batch_duration_secs (%s) must be positive" % (
   target_batch_duration_secs))
-if max(0, target_batch_overhead, target_batch_duration_secs) == 0:
+if not (target_batch_overhead or target_batch_duration_secs):
 
 Review comment:
   In python 3 max(0, None) gives TypeError: unorderable types: NoneType() > 
int(). The previous two if conditions already check whether the number given is 
greater than 0. Al that remains to be checked is whether at least one of them 
is not None.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152586)
Time Spent: 40m  (was: 0.5h)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5620) Some tests use assertItemsEqual method, not available in Python 3

2018-10-09 Thread Matthias Feys (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Feys resolved BEAM-5620.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Some tests use assertItemsEqual method, not available in Python 3
> -
>
> Key: BEAM-5620
> URL: https://issues.apache.org/jira/browse/BEAM-5620
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Matthias Feys
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See: 
> https://github.com/apache/beam/search?q=assertItemsEqual&unscoped_q=assertItemsEqual



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread Juta Staes (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juta Staes reassigned BEAM-5624:


Assignee: Simon

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>
> ==
> ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse')
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 234, in load_module
> return load_source(name, filename, file)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 673, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 54, in 
> class TestAvro(unittest.TestCase):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 89, in TestAvro
> SCHEMA = avro.schema.parse('''
> AttributeError: module 'avro.schema' has no attribute 'parse'
> Note that we use a different implementation of avro/avro-python3 package 
> depending on Python version. We are also evaluating potential replacement of 
> avro with fastavro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Python_Verify #6211

2018-10-09 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=152590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152590
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 09/Oct/18 08:36
Start Date: 09/Oct/18 08:36
Worklog Time Spent: 10m 
  Work Description: Juta commented on a change in pull request #6602: 
[BEAM-5621] Fix unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#discussion_r223605699
 
 

 ##
 File path: sdks/python/apache_beam/testing/util.py
 ##
 @@ -112,8 +112,8 @@ def equal_to(expected):
   expected = list(expected)
 
   def _equal(actual):
-sorted_expected = sorted(expected)
-sorted_actual = sorted(actual)
+sorted_expected = sorted(expected, key=str)
 
 Review comment:
   Yes the test fails because integers and strings happen to be inside the same 
collection, which is being sorted. In python 2 sorted will sort different types 
such as strings and int together while in python 3 this gives a TypeError.
   Using the `key=str` as an argument to `sorted` does not solve the problem 
however because some input fails when using the `str` function. I am not sure 
what would be a good solution to this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152590)
Time Spent: 50m  (was: 40m)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=152593&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152593
 ]

ASF GitHub Bot logged work on BEAM-5621:


Author: ASF GitHub Bot
Created on: 09/Oct/18 08:47
Start Date: 09/Oct/18 08:47
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6602: 
[BEAM-5621] Fix unorderable types in python 3
URL: https://github.com/apache/beam/pull/6602#discussion_r223609509
 
 

 ##
 File path: sdks/python/apache_beam/testing/util.py
 ##
 @@ -112,8 +112,8 @@ def equal_to(expected):
   expected = list(expected)
 
   def _equal(actual):
-sorted_expected = sorted(expected)
-sorted_actual = sorted(actual)
+sorted_expected = sorted(expected, key=str)
 
 Review comment:
   Simply setting key=str is the wrong thing to do here. The contract is that 
using assert_that(..., equal_to(...)) can only be used on fully orderable 
PCollections. Likely collections with heterogeneous types are in error, but if 
we do want to support this we must change the algorithm here. 
   
   One hack would be to partition by type and then sort each bucket. A cheap 
way to (almost) do this is letting `key=lambda x: (hash(type(x)), type(x), x)`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152593)
Time Spent: 1h  (was: 50m)

> Several tests fail on Python 3 with TypeError: unorderable types: str() < 
> int()
> ---
>
> Key: BEAM-5621
> URL: https://issues.apache.org/jira/browse/BEAM-5621
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_remove_duplicates 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 677, in process
> self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py",
>  line 414, in invoke_process
> windowed_value, self.process_method(windowed_value.value))
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py",
>  line 1068, in 
> wrapper = lambda x: [fn(x)]
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py",
>  line 115, in _equal
> sorted_expected = sorted(expected)
> TypeError: unorderable types: str() < int()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152600
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 09:14
Start Date: 09/Oct/18 09:14
Worklog Time Spent: 10m 
  Work Description: splovyt commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223619025
 
 

 ##
 File path: sdks/python/apache_beam/io/source_test_utils_test.py
 ##
 @@ -47,12 +48,18 @@ def _create_source(self, data):
 for bundle in source.split(float('inf')):
   return bundle.source
 
+  @unittest.skipIf(os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1',
+   'This test still needs to be fixed on Python 3'
+   'TODO: BEAM-5627')
 
 Review comment:
   I believe they have a similar origin, yes. I saw that someone is assigned to 
this issue, so let's merge this PR so that the assignee can take a look. 
@tvalentyn PTAL at my updated commit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152600)
Time Spent: 3h  (was: 2h 50m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (372afba -> 28f2090)

2018-10-09 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 372afba  Merge pull request #6542: [BEAM-3655] Port 
MaxPerKeyExamplesTest off DoFnTester
 add 57c8f93  [BEAM-4858] Clean up and improve batch size estimator.
 add 28f2090  Merge pull request #6375 [BEAM-4858] Improve batch size 
estimator.

No new revisions were added by this update.

Summary of changes:
 sdks/python/apache_beam/transforms/util.py  | 113 ++--
 sdks/python/apache_beam/transforms/util_test.py |  70 +++
 2 files changed, 154 insertions(+), 29 deletions(-)



[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=152625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152625
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 09/Oct/18 10:54
Start Date: 09/Oct/18 10:54
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #6375: [BEAM-4858] Clean 
up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/transforms/util.py 
b/sdks/python/apache_beam/transforms/util.py
index 8a999691f03..067d4f74aaa 100644
--- a/sdks/python/apache_beam/transforms/util.py
+++ b/sdks/python/apache_beam/transforms/util.py
@@ -30,7 +30,6 @@
 from builtins import zip
 
 from future.utils import itervalues
-from past.utils import old_div
 
 from apache_beam import typehints
 from apache_beam.metrics import Metrics
@@ -213,6 +212,7 @@ def __init__(self,
max_batch_size=1000,
target_batch_overhead=.1,
target_batch_duration_secs=1,
+   variance=0.25,
clock=time.time):
 if min_batch_size > max_batch_size:
   raise ValueError("Minimum (%s) must not be greater than maximum (%s)" % (
@@ -230,6 +230,7 @@ def __init__(self,
 self._max_batch_size = max_batch_size
 self._target_batch_overhead = target_batch_overhead
 self._target_batch_duration_secs = target_batch_duration_secs
+self._variance = variance
 self._clock = clock
 self._data = []
 self._ignore_next_timing = False
@@ -269,23 +270,63 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data)
+# As it's used below to alternate jitter.
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+# Least squares fit for y = a + bx over all points.
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit for y = a + bx over all points.
+b, a = np.polyfit(xs, ys, 1)
+
+n = len(xs)
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  # https://en.wikipedia.org/wiki/Cook%27s_distance
+  sum_x = sum(xs)
+  sum_x2 = sum(xs**2)
+  errs = a + b * xs - ys
+  s2 = sum(errs**2) / (n - 2)
+  if s2 == 0:
+# It's an exact fit!
+return a, b
+  h = (sum_x2 - 2 * sum_x * xs + n * xs**2) / (n * sum_x2 - sum_x**2)
+  cook_ds = 0.5 / s2 * errs**2 * (h / (1 - h)**2)
+
+  # Re-compute the regression, excluding those points with Cook's distance
+  # greater than 0.5, and weighting by the inverse of x to give a more
+  # stable y-intercept (as small batches have relatively information
+  # about the fixed ovehead).
+  weight = (cook_ds <= 0.5) / xs
+  b, a = np.polyfit(xs, ys, 1, w=weight)
+  return a, b
+
+  try:
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+linear_regression = linear_regression_numpy
+  except ImportError:
+linear_regression = linear_regression_no_numpy
 
   def next_batch_size(self):
 if self._min_batch_size == self._max_batch_size:
@@ -300,14 +341,14 @@ def next_batch_size(self):
   self._min_batch_size * self._MAX_GROWTH_FACTOR),
   self._min_bat

[beam] branch master updated (28f2090 -> 61c5ba0)

2018-10-09 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 28f2090  Merge pull request #6375 [BEAM-4858] Improve batch size 
estimator.
 add 1bcd18d  [BEAM-2887] Remove special FnApi version of wordcount.
 add 81866ea  Actually use opts.
 new 61c5ba0  Merge pull request #6504 [BEAM-2887] Remove special FnApi 
version of wordcount.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../python/apache_beam/examples/wordcount_fnapi.py | 146 -
 .../apache_beam/examples/wordcount_it_test.py  |  20 ++-
 2 files changed, 8 insertions(+), 158 deletions(-)
 delete mode 100644 sdks/python/apache_beam/examples/wordcount_fnapi.py



[jira] [Work logged] (BEAM-2887) Python SDK support for portable pipelines

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2887?focusedWorklogId=152626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152626
 ]

ASF GitHub Bot logged work on BEAM-2887:


Author: ASF GitHub Bot
Created on: 09/Oct/18 10:55
Start Date: 09/Oct/18 10:55
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #6504: [BEAM-2887] Remove 
special FnApi version of wordcount.
URL: https://github.com/apache/beam/pull/6504
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/examples/wordcount_fnapi.py 
b/sdks/python/apache_beam/examples/wordcount_fnapi.py
deleted file mode 100644
index bf4998af15e..000
--- a/sdks/python/apache_beam/examples/wordcount_fnapi.py
+++ /dev/null
@@ -1,146 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""A word-counting workflow using the experimental FnApi.
-
-For the stable wordcount example see wordcount.py.
-"""
-
-# TODO(BEAM-2887): Merge with wordcount.py.
-
-from __future__ import absolute_import
-
-import argparse
-import logging
-import re
-
-from past.builtins import unicode
-
-import apache_beam as beam
-from apache_beam.io import ReadFromText
-# TODO(BEAM-2887): Enable after the issue is fixed.
-# from apache_beam.io import WriteToText
-from apache_beam.metrics import Metrics
-from apache_beam.metrics.metric import MetricsFilter
-from apache_beam.options.pipeline_options import DebugOptions
-from apache_beam.options.pipeline_options import PipelineOptions
-from apache_beam.options.pipeline_options import SetupOptions
-
-
-class WordExtractingDoFn(beam.DoFn):
-  """Parse each line of input text into words."""
-
-  def __init__(self):
-super(WordExtractingDoFn, self).__init__()
-self.words_counter = Metrics.counter(self.__class__, 'words')
-self.word_lengths_counter = Metrics.counter(self.__class__, 'word_lengths')
-self.word_lengths_dist = Metrics.distribution(
-self.__class__, 'word_len_dist')
-self.empty_line_counter = Metrics.counter(self.__class__, 'empty_lines')
-
-  def process(self, element):
-"""Returns an iterator over the words of this element.
-
-The element is a line of text.  If the line is blank, note that, too.
-
-Args:
-  element: the element being processed
-
-Returns:
-  The processed element.
-"""
-text_line = element.strip()
-if not text_line:
-  self.empty_line_counter.inc(1)
-words = re.findall(r'[A-Za-z\']+', text_line)
-for w in words:
-  self.words_counter.inc()
-  self.word_lengths_counter.inc(len(w))
-  self.word_lengths_dist.update(len(w))
-return words
-
-
-def run(argv=None):
-  """Main entry point; defines and runs the wordcount pipeline."""
-  parser = argparse.ArgumentParser()
-  parser.add_argument('--input',
-  dest='input',
-  default='gs://dataflow-samples/shakespeare/kinglear.txt',
-  help='Input file to process.')
-  parser.add_argument('--output',
-  dest='output',
-  required=True,
-  help='Output file to write results to.')
-  known_args, pipeline_args = parser.parse_known_args(argv)
-
-  # We use the save_main_session option because one or more DoFn's in this
-  # workflow rely on global context (e.g., a module imported at module level).
-  pipeline_options = PipelineOptions(pipeline_args)
-  pipeline_options.view_as(SetupOptions).save_main_session = True
-  p = beam.Pipeline(options=pipeline_options)
-
-  # Ensure that the experiment flag is set explicitly by the user.
-  debug_options = pipeline_options.view_as(DebugOptions)
-  use_fn_api = (
-  debug_options.experiments and 'beam_fn_api' in debug_options.experiments)
-  assert use_fn_api, 'Enable beam_fn_api experiment, in order run this 
example.'
-
-  # Read the text file[pattern] into a PCollection.
-  lines =

[beam] 01/01: Merge pull request #6504 [BEAM-2887] Remove special FnApi version of wordcount.

2018-10-09 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 61c5ba09bea0a373009efeea2513d5d398482b8b
Merge: 28f2090 81866ea
Author: Robert Bradshaw 
AuthorDate: Tue Oct 9 12:55:29 2018 +0200

Merge pull request #6504 [BEAM-2887] Remove special FnApi version of 
wordcount.

[BEAM-2887] Remove special FnApi version of wordcount.

 .../python/apache_beam/examples/wordcount_fnapi.py | 146 -
 .../apache_beam/examples/wordcount_it_test.py  |  20 ++-
 2 files changed, 8 insertions(+), 158 deletions(-)



Build failed in Jenkins: beam_PostCommit_Website_Publish #143

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[robertwb] [BEAM-2887] Remove special FnApi version of wordcount.

[robertwb] Actually use opts.

[robertwb] [BEAM-4858] Clean up and improve batch size estimator.

--
[...truncated 7.60 KB...]
:buildSrc:assemble (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Daemon worker,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Daemon worker,5,main]) completed. Took 1.46 
secs.
:buildSrc:spotlessGroovyCheck (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Daemon worker,5,main]) completed. Took 
0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Daemon worker,5,main]) completed. Took 
0.024 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Daemon worker,5,main]) completed. 
Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Daemon worker,5,main]) completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Daemon worker,5,main]) completed. Took 0.001 
secs.
:buildSrc:compileTestGroovy (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Daemon worker,5,main]) completed. Took 
0.001 secs.
:buildSrc:processTestResources (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Daemon worker,5,main]) completed. Took 
0.001 secs.
:buildSrc:testClasses (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Daemon worker,5,main]) completed. Took 0.0 secs.
:buildSrc:test (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Daemon worker,5,main]) completed. Took 0.002 secs.
:buildSrc:check (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:check
Skipping task ':buildSrc:check' as it has no actions.
:buildSrc:check (Thread[Daemon worker,5,main]) completed. Took 0.0 secs.
:buildSrc:build (Thread[Daemon worker,5,main]) started.

> Task :buildSrc:build
Skipping task ':buildSrc:build' as it has no actions.
:buildSrc:build (Thread[Daemon worker,

Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #283

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[robertwb] [BEAM-4858] Clean up and improve batch size estimator.

--
[...truncated 51.06 MB...]
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(12/16) (bca5ebf25c4539e1af2e15e3e71cd6c7) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
a99270906486da2a0eeb4ded059ddabc.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(14/16) (0a4a477c708d29c4d34bc23e08a1fce4) switched from RUNNING to FINISHED.
[ToKeyedWorkItem (11/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (11/16) (20e5e5fb720aecff9a79efac11341c34) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (8/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (8/16) (ad7b2dfac3ab290d407b8a9d518a5c63) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (11/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (11/16) 
(20e5e5fb720aecff9a79efac11341c34).
[ToKeyedWorkItem (8/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (8/16) 
(ad7b2dfac3ab290d407b8a9d518a5c63).
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (10/16) (4b579d7e4637858f1eaa2312087347ff) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (10/16) 
(4b579d7e4637858f1eaa2312087347ff).
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (13/16) (045dffe12a23f5de923e8be9761793be) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (13/16) 
(045dffe12a23f5de923e8be9761793be).
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16) 
(a86e05151790ebda5dc2bea2f8974c02) switched from RUNNING to FINISHED.
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for 
GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16) 
(a86e05151790ebda5dc2bea2f8974c02).
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (16/16) (0754b17fa13cab384681ac8e87570ac0) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (1/16) (4bfb8bf2ad423ff2c2b8409925265e67) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (16/16) 
(0754b17fa13cab384681ac8e87570ac0).
[ToKeyedWorkItem (11/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (11/16) 
(20e5e5fb720aecff9a79efac11341c34) [FINISHED]
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (1/16) 
(4bfb8bf2ad423ff2c2b8409925265e67).
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
99e699fe9c133027eb0220d0bcd02d73.
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (16/16) 
(0754b17fa13cab384681ac8e87570ac0) [FINISHED]
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - Ensuring all FileSystem 
streams are closed for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16) 
(a86e05151790ebda5dc2bea2f8974c02) [FINISHED]
[ToKeyedWorkItem (5/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (5/16) (57dcdd18df1d44936b947bc333945a09) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (5/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (5/16) 
(57dcdd18df1d44936b947bc333945a09).
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (13/16) 
(045dffe12a23f5de923e8be9761793be) [FINISHED]
[ToKey

Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #284

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[robertwb] [BEAM-2887] Remove special FnApi version of wordcount.

[robertwb] Actually use opts.

--
[...truncated 51.08 MB...]
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
9120a87f63377a103ad32be78c72d526.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
41abffcbc36be8d134fede11e8b3038a.
[flink-akka.actor.default-dispatcher-6] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (4/16) 
(376f23e2150861debf4ed9577f4feeb5) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
0b78d2395ffe65dd7a1a3af6cdf69b3d.
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (13/16) (233821a26c5ff651170e34e65d5b6f23) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (13/16) 
(233821a26c5ff651170e34e65d5b6f23).
[flink-akka.actor.default-dispatcher-6] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (6/16) 
(fb0d00f6aac7e4b8df930ede83cb0703) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
2f9052d3e8af90d70f98d7e7b0b1fc69.
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (13/16) 
(233821a26c5ff651170e34e65d5b6f23) [FINISHED]
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
8ecb0cfff029c8315daa9428c0c49836.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
de42fabad6dff8dc4ba8d64e2fcf7bdd.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
ba4df80311c9e4c3c3ef955db90bcb99.
[flink-akka.actor.default-dispatcher-6] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(12/16) (d70e10860b69418d9287513c366aeee4) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-3] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
0add7b66a97ff55457cd520f091d95fe.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
f147f2419f8fea0ae5c65ab3c5fe9790.
[ToKeyedWorkItem (11/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (11/16) (18a7a2e379faf10636b7199c25e63f64) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (11/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (11/16) 
(18a7a2e379faf10636b7199c25e63f64).
[ToKeyedWorkItem (11/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (11/16) 
(18a7a2e379faf10636b7199c25e63f64) [FINISHED]
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
5180f4c419e627856cfe93e7aaf1a403.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
4d642d0bb68892eef7430f41aeb8af22.
[flink-akka.actor.default-dispatcher-6] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (8/16) 
(f4a3c7873d4f82afa92cc6a59acdbea0) switched from RUNNING to FIN

[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=152632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152632
 ]

ASF GitHub Bot logged work on BEAM-4130:


Author: ASF GitHub Bot
Created on: 09/Oct/18 11:36
Start Date: 09/Oct/18 11:36
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6607: 
[BEAM-4130] Use port 0 to pick dynamic port-BugFix
URL: https://github.com/apache/beam/pull/6607#discussion_r223663160
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java
 ##
 @@ -201,7 +201,7 @@ public void stop() {
   private GrpcFnServer createJobServer() throws 
IOException {
 InMemoryJobService service = createJobService();
 GrpcFnServer jobServiceGrpcFnServer;
-if (Strings.isNullOrEmpty(configuration.host)) {
+if (configuration.port == 0) {
 
 Review comment:
   Ok, let's do that in a follow-up.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152632)
Time Spent: 11h 20m  (was: 11h 10m)

> Portable Flink runner JobService entry point in a Docker container
> --
>
> Key: BEAM-4130
> URL: https://issues.apache.org/jira/browse/BEAM-4130
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 2.7.0
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> The portable Flink runner exists as a Job Service that runs somewhere. We 
> need a main entry point that itself spins up the job service (and artifact 
> staging service). The main program itself should be packaged into an uberjar 
> such that it can be run locally or submitted to a Flink deployment via `flink 
> run`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (61c5ba0 -> 21e30ae)

2018-10-09 Thread mxm
This is an automated email from the ASF dual-hosted git repository.

mxm pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 61c5ba0  Merge pull request #6504 [BEAM-2887] Remove special FnApi 
version of wordcount.
 add f23de8d  [BEAM-4130] Use port 0 to pick dynamic port-BugFix
 add 21e30ae  Merge pull request #6607: [BEAM-4130] Use port 0 to pick 
dynamic port

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/beam/runners/flink/FlinkJobServerDriver.java | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)



[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=152635&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152635
 ]

ASF GitHub Bot logged work on BEAM-4130:


Author: ASF GitHub Bot
Created on: 09/Oct/18 11:37
Start Date: 09/Oct/18 11:37
Worklog Time Spent: 10m 
  Work Description: mxm closed pull request #6607: [BEAM-4130] Use port 0 
to pick dynamic port-BugFix
URL: https://github.com/apache/beam/pull/6607
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java
index 2096cb98550..34f2edb5abb 100644
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java
@@ -17,7 +17,6 @@
  */
 package org.apache.beam.runners.flink;
 
-import com.google.common.base.Strings;
 import com.google.common.util.concurrent.ListeningExecutorService;
 import com.google.common.util.concurrent.MoreExecutors;
 import com.google.common.util.concurrent.ThreadFactoryBuilder;
@@ -201,7 +200,7 @@ public void stop() {
   private GrpcFnServer createJobServer() throws 
IOException {
 InMemoryJobService service = createJobService();
 GrpcFnServer jobServiceGrpcFnServer;
-if (Strings.isNullOrEmpty(configuration.host)) {
+if (configuration.port == 0) {
   jobServiceGrpcFnServer = GrpcFnServer.allocatePortAndCreateFor(service, 
jobServerFactory);
 } else {
   Endpoints.ApiServiceDescriptor descriptor =
@@ -239,7 +238,7 @@ private InMemoryJobService createJobService() throws 
IOException {
   throws IOException {
 BeamFileSystemArtifactStagingService service = new 
BeamFileSystemArtifactStagingService();
 final GrpcFnServer 
artifactStagingService;
-if (Strings.isNullOrEmpty(configuration.host)) {
+if (configuration.artifactPort == 0) {
   artifactStagingService =
   GrpcFnServer.allocatePortAndCreateFor(service, 
artifactServerFactory);
 } else {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152635)
Time Spent: 11.5h  (was: 11h 20m)

> Portable Flink runner JobService entry point in a Docker container
> --
>
> Key: BEAM-4130
> URL: https://issues.apache.org/jira/browse/BEAM-4130
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 2.7.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> The portable Flink runner exists as a Job Service that runs somewhere. We 
> need a main entry point that itself spins up the job service (and artifact 
> staging service). The main program itself should be packaged into an uberjar 
> such that it can be run locally or submitted to a Flink deployment via `flink 
> run`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Website_Publish #144

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[ankurgoenka] [BEAM-4130] Use port 0 to pick dynamic port-BugFix

--
[...truncated 7.68 KB...]
:buildSrc:assemble (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc',5,main]) completed. Took 
0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 1.396 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.023 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for 
':buildSrc',5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for 
':buildSrc',5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc',5,main]) completed. 
Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc',5,main]) 
started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc',5,main]) 
completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc',5,main]) completed. 
Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc',5,main]) completed. Took 
0.002 secs.
:buildSrc:check (Thread[Task worker for ':buildSrc',5,main]) started.

> Task :buildSrc:check
Skipping task ':buildSrc:check' as it has no actions.
:buildSrc:ch

Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #285

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[ankurgoenka] [BEAM-4130] Use port 0 to pick dynamic port-BugFix

--
[...truncated 51.05 MB...]
[GroupByKey -> 24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (13/16)] 
INFO org.apache.flink.runtime.taskmanager.Task - Ensuring all FileSystem 
streams are closed for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (13/16) 
(de16378b910df2d535691e4418c251c4) [FINISHED]
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - 
assert_that/Create/Read/Reshuffle/ReshufflePerKey/GroupByKey -> 
74assert_that/Create/Read/Reshuffle/ReshufflePerKey/GroupByKey/GroupByWindow.None/beam:env:docker:v1:0
 (12/16) (036f083100d8ea23f9912acf769534c6) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (2/16) 
(5c77fa88faf23de25538545a5a74569c) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16) 
(0c3409848054a01820813e68ef613731) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (9/16) 
(b3f740a5a68f2000a135754c1b0c8594) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (1/16) 
(d2b2bf6c2ce327d1a9dee3ed3f26416f) switched from RUNNING to FINISHED.
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (16/16) (2492c64db4f38ecc6bbaf2d575d994ee) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (16/16) 
(2492c64db4f38ecc6bbaf2d575d994ee).
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (8/16) 
(afc0407e7e79684ca25c73ea52ccb559) switched from RUNNING to FINISHED.
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (16/16) 
(2492c64db4f38ecc6bbaf2d575d994ee) [FINISHED]
[ToKeyedWorkItem (6/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (6/16) (420b27026224e0485f3c0e2ffafdf7b0) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (6/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (6/16) 
(420b27026224e0485f3c0e2ffafdf7b0).
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (10/16) (db6e9c54b6b911de101421129f99394e) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (10/16) 
(db6e9c54b6b911de101421129f99394e).
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (1/16) (20290e0bb0185ab4b81e371afedb86cb) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (1/16) 
(20290e0bb0185ab4b81e371afedb86cb).
[ToKeyedWorkItem (11/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (11/16) (e72b030988f7076b897b7940c047a0f9) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (10/16) 
(db6e9c54b6b911de101421129f99394e) [FINISHED]
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (1/16) 
(20290e0bb0185ab4b81e371afedb86cb) [FINISHED]
[ToKeyedWorkItem (12/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (12/16) (e2bf17bc4563aa8ea329e26ae8367e08) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (12/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (12/16) 
(e2bf17bc4563aa8ea329e26ae8367e08).
[ToKeyedWorkItem (12/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (12/16) 
(e2bf17bc4563aa8ea329e26ae8367e08) [FINISHED]
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(14/16) (bb5047278a3b862d3b552dec6cff0e40) switched from RUNNING to FINISHED.
[ToKeyedWorkItem (4/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (4/16) (b573d9ecf874227f9e33fc69ee210619) switched from RUNNING 
to FINISHED.
[ToKeye

Build failed in Jenkins: beam_PostCommit_Website_Publish #145

2018-10-09 Thread Apache Jenkins Server
See 


--
[...truncated 7.74 KB...]
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 7,5,main]) 
started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 7,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
completed. Took 1.473 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) completed. Took 0.024 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
6,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 6,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 6,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 6,5,main]) completed. 
Took 0.002 secs.
:buildSrc:check (Thread[Task worker for ':buil

Build failed in Jenkins: beam_PreCommit_Website_Stage_GCS_Cron #20

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[robertwb] [BEAM-2887] Remove special FnApi version of wordcount.

[alexander.kohanyukov] [BEAM-3655] Port MaxPerKeyExamplesTest off DoFnTester

[robertwb] Actually use opts.

[ankurgoenka] [BEAM-4130] Use port 0 to pick dynamic port-BugFix

[robertwb] [BEAM-4858] Clean up and improve batch size estimator.

--
[...truncated 6.88 KB...]
Skipping task ':buildSrc:classes' as it has no actions.
:buildSrc:classes (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:jar (Thread[Task worker for ':buildSrc' Thread 8,5,main]) started.

> Task :buildSrc:jar
Build cache key for task ':buildSrc:jar' is 7445e5c45b21f8a690f2f547fcb49594
Caching disabled for task ':buildSrc:jar': Caching has not been enabled for the 
task
Task ':buildSrc:jar' is not up-to-date because:
  No history is available.
:buildSrc:jar (Thread[Task worker for ':buildSrc' Thread 8,5,main]) completed. 
Took 0.097 secs.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 1.174 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.001 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.027 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) completed. Took 0.003 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
8,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 

Build failed in Jenkins: beam_PreCommit_Website_Cron #158

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[robertwb] [BEAM-2887] Remove special FnApi version of wordcount.

[alexander.kohanyukov] [BEAM-3655] Port MaxPerKeyExamplesTest off DoFnTester

[robertwb] Actually use opts.

[ankurgoenka] [BEAM-4130] Use port 0 to pick dynamic port-BugFix

[robertwb] [BEAM-4858] Clean up and improve batch size estimator.

--
[...truncated 166.33 KB...]
  *  External link http://images/logos/runners/spark.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/go.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/java.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/python.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/logos/sdks/scala.png failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/downloads/index.html
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/index.html
  *  External link http://get-started/beam-overview failed: response code 0 
means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://get-started/mobile-gaming-example failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  http:// is an invalid URL (line 77)
 

  
- ./generated-content/get-started/mobile-gaming-example/index.html
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://documentation/programming-guide/ failed: response 
code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-basic.png failed: response code 
0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
 Either way, the return message (if any) from the server is: 
Couldn't resolve host name
  *  External link http://images/gaming-example-event-time-narrow.gif failed: 
response code 0 means something's wrong.
 It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.
 Sometimes, making too many requests at once also breaks things.
   

[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152654
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223669841
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
+  def workerScript = 
"${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot"
+  def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \""
 
 Review comment:
   Could we rename this to `sdkWorkerFileCode`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152654)
Time Spent: 4h 50m  (was: 4h 40m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152655&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152655
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223667213
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
+  String harnessType = "Docker"
+}
+
 def flinkCompatibilityMatrix = {
-  def type = it
-  def name = 'flinkCompatibilityMatrix' + type
+  def config = it ? it as CompatibilityMatrixConfig : new 
CompatibilityMatrixConfig()
+  def type = config.type
+  def harnessType = config.harnessType
+  def name = 'flinkCompatibilityMatrix' + type + harnessType
   tasks.create(name: name) {
 dependsOn 'setupVirtualenv'
-dependsOn ':beam-sdks-python-container:docker'
+dependsOn 'createProcessWorker'
 dependsOn ':beam-runners-flink_2.11-job-server:shadowJar'
+if (type.toLowerCase() == 'docker')
+  dependsOn ':beam-sdks-python-container:docker'
 doLast {
   exec {
 executable 'sh'
-args '-c', ". ${envdir}/bin/activate && pip install -e . && python -m 
apache_beam.runners.portability.flink_runner_test 
${project(":beam-runners-flink_2.11-job-server:").shadowJar.archivePath} 
${type}"
+args '-c', ". ${envdir}/bin/activate && pip install -e . && python -m 
apache_beam.runners.portability.flink_runner_test 
--flink_job_server_jar=${project(":beam-runners-flink_2.11-job-server:").shadowJar.archivePath}
 --type=${type} --harness_type=${harnessType}"
   }
 }
   }
 }
 
-flinkCompatibilityMatrix('Batch')
-flinkCompatibilityMatrix('Streaming')
+flinkCompatibilityMatrix(type: 'Batch')
+flinkCompatibilityMatrix(type:'Streaming')
 
 Review comment:
   Missing space after colon


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152655)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152651
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223671137
 
 

 ##
 File path: 
.test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Flink.groovy
 ##
 @@ -33,8 +33,8 @@ 
PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_VR_Flink',
   steps {
 gradle {
   rootBuildScriptDir(commonJobProperties.checkoutDir)
-  tasks(':beam-sdks-python:flinkCompatibilityMatrixBatch')
-  tasks(':beam-sdks-python:flinkCompatibilityMatrixStreaming')
+  tasks(':beam-sdks-python:flinkCompatibilityMatrixBatchProcess')
 
 Review comment:
   +1 Appending arguments to the task name is not particularly readable. 
Parameters would make this more readable. The task `flinkCompatibilityMatrix` 
is already parameterized, so we just need to change the syntax here and remove 
the task generation code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152651)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152650
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223668109
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner_test.py
 ##
 @@ -34,12 +36,26 @@
   # Run as
   #
   # python -m apache_beam.runners.portability.flink_runner_test \
-  # /path/to/job_server.jar \
+  # --flink_job_server_jar=/path/to/job_server.jar \
+  # --type=Batch \
+  # --harness_type=docker \
   # [FlinkRunnerTest.test_method, ...]
-  flinkJobServerJar = sys.argv.pop(1)
-  streaming = sys.argv.pop(1).lower() == 'streaming'
 
-  # This is defined here to only be run when we invoke this file explicitly.
+  parser = argparse.ArgumentParser(add_help=True)
+  parser.add_argument('--flink_job_server_jar',
+  help='Job server jar to submit jobs.')
+  parser.add_argument('--type', default='batch',
+  help='Job type. batch or streaming')
 
 Review comment:
   Would make this a boolean parameter called `streaming`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152650)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152649
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223670431
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
 
 Review comment:
   Add a comment, that this can be Batch or Streaming? Or convert this to 
boolean `isStreaming`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152649)
Time Spent: 4.5h  (was: 4h 20m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152652
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223667225
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
+  String harnessType = "Docker"
+}
+
 def flinkCompatibilityMatrix = {
-  def type = it
-  def name = 'flinkCompatibilityMatrix' + type
+  def config = it ? it as CompatibilityMatrixConfig : new 
CompatibilityMatrixConfig()
+  def type = config.type
+  def harnessType = config.harnessType
+  def name = 'flinkCompatibilityMatrix' + type + harnessType
   tasks.create(name: name) {
 dependsOn 'setupVirtualenv'
-dependsOn ':beam-sdks-python-container:docker'
+dependsOn 'createProcessWorker'
 dependsOn ':beam-runners-flink_2.11-job-server:shadowJar'
+if (type.toLowerCase() == 'docker')
+  dependsOn ':beam-sdks-python-container:docker'
 doLast {
   exec {
 executable 'sh'
-args '-c', ". ${envdir}/bin/activate && pip install -e . && python -m 
apache_beam.runners.portability.flink_runner_test 
${project(":beam-runners-flink_2.11-job-server:").shadowJar.archivePath} 
${type}"
+args '-c', ". ${envdir}/bin/activate && pip install -e . && python -m 
apache_beam.runners.portability.flink_runner_test 
--flink_job_server_jar=${project(":beam-runners-flink_2.11-job-server:").shadowJar.archivePath}
 --type=${type} --harness_type=${harnessType}"
   }
 }
   }
 }
 
-flinkCompatibilityMatrix('Batch')
-flinkCompatibilityMatrix('Streaming')
+flinkCompatibilityMatrix(type: 'Batch')
+flinkCompatibilityMatrix(type:'Streaming')
+flinkCompatibilityMatrix(type: 'Batch', harnessType: 'Process')
+flinkCompatibilityMatrix(type:'Streaming', harnessType: 'Process')
 
 Review comment:
   Missing space after colon


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152652)
Time Spent: 4h 40m  (was: 4.5h)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152653&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152653
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223669759
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
 
 Review comment:
   outputFile => sdkWorkerFile.
   
   Wonder if we can just use relative paths and skip the generation code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152653)
Time Spent: 4h 40m  (was: 4.5h)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152656&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152656
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223669060
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
+  def workerScript = 
"${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot"
+  def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \""
+  outputs.file outputFile
+  doLast{
 
 Review comment:
   space after `doLast`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152656)
Time Spent: 5h  (was: 4h 50m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152648&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152648
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 12:09
Start Date: 09/Oct/18 12:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223670096
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') {
   }
 }
 
+class CompatibilityMatrixConfig {
+  String type
+  String harnessType = "Docker"
 
 Review comment:
   Make this an enum?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152648)
Time Spent: 4.5h  (was: 4h 20m)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #286

2018-10-09 Thread Apache Jenkins Server
See 


--
[...truncated 51.05 MB...]
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (10/16) 
(34e1f10d9f9a061e66251636c1ab9a7a) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
054d756168fc3360fc71aa1ea09d20fb.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
539e5d85fdac6d79b7d20a3b1d48474c.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
39a20e04b60fbdc11a17a0872b0e2af6.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
7738d633da1f2a0b8e2dde4abccdafbe.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
a07c9d850cc7b0d3b95862108ad258e3.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (11/16) 
(2d817c575c01d2151b660614aafc1ae5) switched from RUNNING to FINISHED.
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (16/16) (920844ff1132341c797822b21f6e19c6) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (13/16) (d59fb35bcb21507271c3a847a613164f) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (16/16) 
(920844ff1132341c797822b21f6e19c6).
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (13/16) 
(d59fb35bcb21507271c3a847a613164f).
[ToKeyedWorkItem (9/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (9/16) (25ce91f70e96bd109cbf6171cf4d6488) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (2/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (2/16) (d80ee8e102ec64f8ce620c90cb1c3f00) switched from RUNNING 
to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
43d832e7f2a91abc070a398b8e7713bd.
[ToKeyedWorkItem (13/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (13/16) 
(d59fb35bcb21507271c3a847a613164f) [FINISHED]
[ToKeyedWorkItem (2/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (2/16) 
(d80ee8e102ec64f8ce620c90cb1c3f00).
[ToKeyedWorkItem (12/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (12/16) (f0fb2e0f4785062def6e4505ea1901ad) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (12/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (12/16) 
(f0fb2e0f4785062def6e4505ea1901ad).
[ToKeyedWorkItem (9/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (9/16) 
(25ce91f70e96bd109cbf6171cf4d6488).
[ToKeyedWorkItem (16/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (16/16) 
(920844ff1132341c797822b21f6e19c6) [FINISHED]
[ToKeyedWorkItem (12/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (12/16) 
(f0fb2e0f4785062def6e4505ea1901ad) [FINISHED]
[ToKeyedWorkItem (6/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (6/16) (bd251b0636020c1694bd2c750ca7e4f4) switched from RUNNING 
to FINISHED.
[ToKeyedWorkItem (9/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (9/16) 
(25ce91f70e96bd109cbf6171cf4d6488) [FINISHED]
[ToKeyedWorkItem (2/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (2/16) 
(d80ee8e102ec64f8ce620c90cb1c3f

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152677
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 13:53
Start Date: 09/Oct/18 13:53
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223709815
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
 
 Review comment:
   I was just to give it a name, not for deduplication


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152677)
Time Spent: 1h 10m  (was: 1h)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152678&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152678
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 13:53
Start Date: 09/Oct/18 13:53
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223709815
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
 
 Review comment:
   It was just to give it a name, not for deduplication


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152678)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152679
 ]

ASF GitHub Bot logged work on BEAM-5467:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:04
Start Date: 09/Oct/18 14:04
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6532: 
[BEAM-5467] Use process SDKHarness to run flink PVR tests.
URL: https://github.com/apache/beam/pull/6532#discussion_r223714650
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') {
 }
   }
 }
+
+project.task('createProcessWorker') {
+  dependsOn ':beam-sdks-python-container:build'
+  dependsOn 'setupVirtualenv'
+  def outputFile = file("${project.buildDir}/sdk_worker.sh")
+  def workerScript = 
"${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot"
+  def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \""
 
 Review comment:
   `sdkWorkerCommand`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152679)
Time Spent: 5h 10m  (was: 5h)

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152680
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:04
Start Date: 09/Oct/18 14:04
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223714684
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
 
 Review comment:
   I thought about it but the difference between the loops are the generic 
type, so with type erasure, I would need to pass a class to the common method 
and use a switch class in the code. That would be even less maintainable


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152680)
Time Spent: 1.5h  (was: 1h 20m)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apac

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152692&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152692
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:48
Start Date: 09/Oct/18 14:48
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223714684
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
 
 Review comment:
   I thought about it but the difference between the loops are the generic 
type, so with type erasure, I would need to pass a class to the common method 
and use a switch class in the code. I prefer the 3 loops to that kind of code. 
WDYT, anything to suggest ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152692)
Time Spent: 1h 40m  (was: 1.5h)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
>  

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152693
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:54
Start Date: 09/Oct/18 14:54
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223736905
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createGaugeGraphiteMessage(gauge, false));
+}
+
+for (MetricResult distr

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152694&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152694
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 14:55
Start Date: 09/Oct/18 14:55
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223737331
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/CustomMetricQueryResults.java
 ##
 @@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.extensions.metrics;
+
+import java.util.Collections;
+import java.util.List;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricName;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.joda.time.Instant;
+
+/** Test class to be used as a input to {@link MetricsSink} implementations 
tests. */
+class CustomMetricQueryResults implements MetricQueryResults {
+
+  private final boolean isCommittedSupported;
+
+  CustomMetricQueryResults(boolean isCommittedSupported) {
+this.isCommittedSupported = isCommittedSupported;
+  }
+
+  @Override
+  public List> getCounters() {
+return Collections.singletonList(
+new MetricResult() {
+
+  @Override
+  public MetricName getName() {
+return MetricName.named("ns1", "n1");
+  }
+
+  @Override
+  public String getStep() {
+return "s1";
+  }
+
+  @Override
+  public Long getCommitted() {
+if (!isCommittedSupported) {
+  // This is what getCommitted code is like for 
AccumulatedMetricResult on runners
+  // that do not support committed metrics
+  throw new UnsupportedOperationException(
+  "This runner does not currently support committed"
+  + " metrics results. Please use 'attempted' instead.");
+}
+return 10L;
 
 Review comment:
   yes just test value


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152694)
Time Spent: 2h  (was: 1h 50m)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152697
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:01
Start Date: 09/Oct/18 15:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223731816
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -21,7 +21,9 @@
 
 import io
 import logging
+import os
 
 Review comment:
   We recently merged https://github.com/apache/beam/pull/6587. All tests in 
this file are now passing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152697)
Time Spent: 3h 10m  (was: 3h)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152696&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152696
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:01
Start Date: 09/Oct/18 15:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6590: 
[BEAM-5315] Partially port io
URL: https://github.com/apache/beam/pull/6590#discussion_r223733107
 
 

 ##
 File path: sdks/python/apache_beam/io/filebasedsink_test.py
 ##
 @@ -75,6 +76,10 @@ def _create_temp_file(self, name='', suffix=''):
 
 class MyFileBasedSink(filebasedsink.FileBasedSink):
 
+  @unittest.skipIf(sys.version_info[0] == 3 and
+   os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1',
+   'This test still needs to be fixed on Python 3.'
+   'TODO: BEAM-5627')
 
 Review comment:
   Let's add: TODO: BEAM-5627, BEAM-5618


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152696)
Time Spent: 3h 10m  (was: 3h)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152704
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:10
Start Date: 09/Oct/18 15:10
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223743972
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSinkTest.java
 ##
 @@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import static org.junit.Assert.assertTrue;
+
+import java.io.IOException;
+import java.net.ServerSocket;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+/** Test class for MetricsGraphiteSink. */
+public class MetricsGraphiteSinkTest {
+  private static NetworkMockServer graphiteServer;
+  private static int port;
+
+  @BeforeClass
+  public static void beforeClass() throws IOException, InterruptedException {
+// get free local port
+ServerSocket serverSocket = new ServerSocket(0);
+port = serverSocket.getLocalPort();
+serverSocket.close();
+graphiteServer = new NetworkMockServer(port);
+Thread.sleep(200);
+graphiteServer.clear();
+graphiteServer.start();
+  }
+
+  @Before
+  public void before() {
+graphiteServer.clear();
+  }
+
+  @AfterClass
+  public static void afterClass() throws IOException {
+graphiteServer.stop();
+  }
+
+  @Test
+  public void testWriteMetricsWithCommittedSupported() throws Exception {
+MetricQueryResults metricQueryResults = new CustomMetricQueryResults(true);
+PipelineOptions pipelineOptions = PipelineOptionsFactory.create();
+pipelineOptions.setMetricsGraphitePort(port);
+pipelineOptions.setMetricsGraphiteHost("127.0.0.1");
+MetricsGraphiteSink metricsGraphiteSink = new 
MetricsGraphiteSink(pipelineOptions);
+metricsGraphiteSink.writeMetrics(metricQueryResults);
+Thread.sleep(2000L);
 
 Review comment:
   Yes because, when we write messages to the socket, then on a different 
thread `NetworkMockServer` reads the socket and adds messages to an arrayList 
so that they could be read in the assert. On heavy load jenkins server, the 
test might fail because the `NetworkMockServer` does not have enough time to 
add the messages to the arraylist


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152704)
Time Spent: 2h 10m  (was: 2h)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152705
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:11
Start Date: 09/Oct/18 15:11
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223744438
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createGaugeGraphiteMessage(gauge, false));
+}
+
+for (MetricResult distr

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152707
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:15
Start Date: 09/Oct/18 15:15
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223745814
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createGaugeGraphiteMessage(gauge, false));
+}
+
+for (MetricResult distr

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152708
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:15
Start Date: 09/Oct/18 15:15
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6569: 
[BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223745814
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createGaugeGraphiteMessage(gauge, false));
+}
+
+for (MetricResult distr

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152709&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152709
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:16
Start Date: 09/Oct/18 15:16
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #6569: [BEAM-4553] 
Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#issuecomment-428233453
 
 
   @aromanenko-dev thanks for the review ! I answered all your comments PTAL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152709)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643606#comment-16643606
 ] 

Scott Wegner commented on BEAM-5683:


[~pabloem] / [~robertwb] can either of you help out?

bq. Can we access pip subprocess logs?

> [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary
> --
>
> Key: BEAM-5683
> URL: https://issues.apache.org/jira/browse/BEAM-5683
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Scott Wegner
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests]
>  * [Test source 
> code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390]
> Initial investigation:
> Seems to be failing on pip download.
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 277, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 104, in run
> result = super(TestPipeline, self).run(test_runner_api)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 389, in run_pipeline
> self.dataflow_client.create_job(self.job), self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 490, in create_job
> self.create_job_description(job)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 519, in create_job_description
> resources = self._stage_resour
> ces(job.options)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 452, in _stage_resources
> staging_location=google_cloud_options.staging_location)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 161, in stage_job_resources
> requirements_cache_path)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 411, in _populate_requirements_cache
> processes.check_call(cmd_args)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py",
>  line 46, in check_call
> return subprocess.check_call(*args, **kwargs)
>   File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
> raise CalledProcessError(retcode, cmd)
> CalledProcessError: Command 
> '['/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/build/gradleenv/bin/python',
>  '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', 
> 'postcommit_requirements.txt', '--exists-action', 'i', '--no-binary', 
> ':all:']' returned non-zero exit status 1
> 
> _After you've filled out the above de

[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152710
 ]

ASF GitHub Bot logged work on BEAM-5624:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:20
Start Date: 09/Oct/18 15:20
Worklog Time Spent: 10m 
  Work Description: splovyt opened a new pull request #6616: [BEAM-5624] 
Fix avro.schema parser for py3
URL: https://github.com/apache/beam/pull/6616
 
 
   Fix for the following error mentioned in BEAM-5624:
   _AttributeError (module 'avro.schema' has no attribute 'parse')_
   
   This is is part of a series of PRs with goal to make Apache Beam PY3 
compatible. The proposal with the outlined approach has been documented 
[here](https://s.apache.org/beam-python-3).
   
   @tvalentyn @Fematich @charlesccychen @aaltay @Juta @manuzhang 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152710)
Time Spent: 10m
Remaining Estimate: 0h

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: 

[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152728
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:41
Start Date: 09/Oct/18 15:41
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6615: 
[BEAM-5326] Shim main class and fix Go artifact naming mismatch for c…
URL: https://github.com/apache/beam/pull/6615#discussion_r223756962
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java
 ##
 @@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.google.cloud.dataflow.worker;
+
+/** Temporary redirect for 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness. */
 
 Review comment:
   I'll track that separately. The condition is internal to Dataflow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152728)
Time Spent: 50m  (was: 40m)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5281) There are several deadlinks in beam-site, please removed.

2018-10-09 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner resolved BEAM-5281.

   Resolution: Duplicate
Fix Version/s: Not applicable

Dupe of BEAM-5681; there's something broken in the pre-commit scripts.

> There are several deadlinks in beam-site, please removed.
> -
>
> Key: BEAM-5281
> URL: https://issues.apache.org/jira/browse/BEAM-5281
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Boyuan Zhang
>Assignee: Melissa Pashniak
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Deadlinks in beam-site cause nightly build failed: 
> https://scans.gradle.com/s/nzwfwj6iqlgrg/console-log?task=:beam-website:testWebsite#L13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread Maximilian Michels (JIRA)
Maximilian Michels created BEAM-5687:


 Summary: Checkpointing in portable pipelines does not work
 Key: BEAM-5687
 URL: https://issues.apache.org/jira/browse/BEAM-5687
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: 2.9.0


Checkpoints fail:

{noformat}
AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
ToKeyedWorkItem (1/1).}
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not materialize checkpoint 2 for operator 
Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
ToKeyedWorkItem (1/1).
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
... 6 more
Caused by: java.util.concurrent.ExecutionException: 
java.lang.NullPointerException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
at 
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
... 5 more
Caused by: java.lang.NullPointerException
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
at 
org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
at 
org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
at 
org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
at 
org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
... 7 more
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5688) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert

2018-10-09 Thread Scott Wegner (JIRA)
Scott Wegner created BEAM-5688:
--

 Summary: [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 
Fails on githubPullRequestId assert
 Key: BEAM-5688
 URL: https://issues.apache.org/jira/browse/BEAM-5688
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Scott Wegner
Assignee: Alan Myrvold


_Use this form to file an issue for test failure:_
 * [Jenkins 
Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/20/]
 * [Gradle Build Scan|https://gradle.com/s/7mqwgjegf5hge]
 * [Test source 
code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L234]

Initial investigation:

This is a problem with how the website gradle scripts are implemented to accept 
an githubPullRequestId. The Cron job will not have an associated PR, so this 
currently fails.


_After you've filled out the above details, please [assign the issue to an 
individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
 Assignee should [treat test failures as 
high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
 helping to fix the issue or find a more appropriate owner. See [Apache Beam 
Post-Commit Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5689) Remove artifact naming constraint for portable Dataflow job

2018-10-09 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5689:
---

 Summary: Remove artifact naming constraint for portable Dataflow 
job
 Key: BEAM-5689
 URL: https://issues.apache.org/jira/browse/BEAM-5689
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow
Reporter: Henning Rohde
Assignee: Henning Rohde


Artifact names/keys are not preserved in Dataflow. Remove the below workarounds 
when they are.

 * Go Dataflow runner
 * Java and Python container boot code (probably)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5688) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert

2018-10-09 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643659#comment-16643659
 ] 

Scott Wegner commented on BEAM-5688:


I have https://github.com/apache/beam/pull/6608 out to fix this, but the 
current version doesn't work.

> [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on 
> githubPullRequestId assert
> --
>
> Key: BEAM-5688
> URL: https://issues.apache.org/jira/browse/BEAM-5688
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/20/]
>  * [Gradle Build Scan|https://gradle.com/s/7mqwgjegf5hge]
>  * [Test source 
> code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L234]
> Initial investigation:
> This is a problem with how the website gradle scripts are implemented to 
> accept an githubPullRequestId. The Cron job will not have an associated PR, 
> so this currently fails.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary

2018-10-09 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643661#comment-16643661
 ] 

Pablo Estrada commented on BEAM-5683:
-

I'll take a look in a bit

> [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary
> --
>
> Key: BEAM-5683
> URL: https://issues.apache.org/jira/browse/BEAM-5683
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Scott Wegner
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests]
>  * [Test source 
> code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390]
> Initial investigation:
> Seems to be failing on pip download.
> ==
> ERROR: test_multiple_empty_outputs 
> (apache_beam.transforms.ptransform_test.PTransformTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py",
>  line 277, in test_multiple_empty_outputs
> pipeline.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 104, in run
> result = super(TestPipeline, self).run(test_runner_api)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 403, in run
> self.to_runner_api(), self.runner, self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py",
>  line 416, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 50, in run_pipeline
> self.result = super(TestDataflowRunner, self).run_pipeline(pipeline)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 389, in run_pipeline
> self.dataflow_client.create_job(self.job), self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 490, in create_job
> self.create_job_description(job)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 519, in create_job_description
> resources = self._stage_resour
> ces(job.options)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py",
>  line 452, in _stage_resources
> staging_location=google_cloud_options.staging_location)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 161, in stage_job_resources
> requirements_cache_path)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py",
>  line 411, in _populate_requirements_cache
> processes.check_call(cmd_args)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py",
>  line 46, in check_call
> return subprocess.check_call(*args, **kwargs)
>   File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
> raise CalledProcessError(retcode, cmd)
> CalledProcessError: Command 
> '['/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/build/gradleenv/bin/python',
>  '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', 
> 'postcommit_requirements.txt', '--exists-action', 'i', '--no-binary', 
> ':all:']' returned non-zero exit status 1
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.

[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152732
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:50
Start Date: 09/Oct/18 15:50
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6615: 
[BEAM-5326] Shim main class and fix Go artifact naming mismatch for c…
URL: https://github.com/apache/beam/pull/6615#discussion_r223760415
 
 

 ##
 File path: sdks/go/pkg/beam/runners/dataflow/dataflow.go
 ##
 @@ -149,10 +149,12 @@ func Execute(ctx context.Context, p *beam.Pipeline) 
error {
return fmt.Errorf("failed to generate model pipeline: %v", err)
}
 
-   id := atomic.AddInt32(&unique, 1)
-   modelURL := gcsx.Join(*stagingLocation, fmt.Sprintf("model-%v-%v", id, 
time.Now().UnixNano()))
-   workerURL := gcsx.Join(*stagingLocation, fmt.Sprintf("worker-%v-%v", 
id, time.Now().UnixNano()))
-   jarURL := gcsx.Join(*stagingLocation, 
fmt.Sprintf("dataflow-worker-%v-%v.jar", id, time.Now().UnixNano()))
+   // NOTE(herohde) 10/8/2018: the last segment of the names must be 
"worker" and "dataflow-worker.jar".
 
 Review comment:
   Opened BEAM-5689


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152732)
Time Spent: 1h  (was: 50m)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5686) Remove DataflowRunnerHarness shim again

2018-10-09 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-5686:
---

 Summary: Remove DataflowRunnerHarness shim again
 Key: BEAM-5686
 URL: https://issues.apache.org/jira/browse/BEAM-5686
 Project: Beam
  Issue Type: Task
  Components: runner-dataflow
Reporter: Henning Rohde
Assignee: Henning Rohde






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152733
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:51
Start Date: 09/Oct/18 15:51
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #6615: 
[BEAM-5326] Shim main class and fix Go artifact naming mismatch for c…
URL: https://github.com/apache/beam/pull/6615#discussion_r223760656
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java
 ##
 @@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.google.cloud.dataflow.worker;
+
+/** Temporary redirect for 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness. */
 
 Review comment:
   Opened BEAM-5686 for when we can remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152733)
Time Spent: 1h 10m  (was: 1h)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152734
 ]

ASF GitHub Bot logged work on BEAM-5326:


Author: ASF GitHub Bot
Created on: 09/Oct/18 15:51
Start Date: 09/Oct/18 15:51
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #6615: [BEAM-5326] Shim main 
class and fix Go artifact naming mismatch for c…
URL: https://github.com/apache/beam/pull/6615#issuecomment-428246844
 
 
   Thanks @boyuanzz. PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152734)
Time Spent: 1h 20m  (was: 1h 10m)

> SDK support for custom dataflow worker jar
> --
>
> Key: BEAM-5326
> URL: https://issues.apache.org/jira/browse/BEAM-5326
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Doc: 
> https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-5690:
-

 Summary: Issue with GroupByKey in BeamSql using SparkRunner
 Key: BEAM-5690
 URL: https://issues.apache.org/jira/browse/BEAM-5690
 Project: Beam
  Issue Type: Task
  Components: runner-spark
Reporter: Kenneth Knowles
Assignee: Amit Sela


Reported on user@

{quote}We are trying to setup a pipeline with using BeamSql and the trigger 
used is default (AfterWatermark crosses the window). 
Below is the pipeline:
  
   KafkaSource (KafkaIO) 
   ---> Windowing (FixedWindow 1min)
   ---> BeamSql
   ---> KafkaSink (KafkaIO)
 
We are using Spark Runner for this. 
The BeamSql query is:
{code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}

We are grouping by Col3 which is a string. It can hold values string[0-9]. 
 
The records are getting emitted out at 1 min to kafka sink, but the output 
record in kafka is not as expected.
Below is the output observed: (WST and WET are indicators for window start time 
and window end time)

{code}
{"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00 0}
{code}
{quote}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5690:
-

Assignee: (was: Amit Sela)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (21e30ae -> f6945d5)

2018-10-09 Thread anton
This is an automated email from the ASF dual-hosted git repository.

anton pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 21e30ae  Merge pull request #6607: [BEAM-4130] Use port 0 to pick 
dynamic port
 add f04a564  [BEAM-5254] Add Samza Runner translator registrar and 
refactor config generation
 add f6945d5  Merge pull request #6292 from xinyuiscool/BEAM-5254

No new revisions were added by this update.

Summary of changes:
 .../org/apache/beam/runners/samza/SamzaRunner.java |  11 +-
 .../runners/samza/adapter/BoundedSourceSystem.java |  25 +---
 .../samza/adapter/UnboundedSourceSystem.java   |  25 +---
 .../runners/samza/translation/ConfigBuilder.java   | 148 -
 .../runners/samza/translation/ConfigContext.java   |  61 +
 .../translation/ParDoBoundMultiTranslator.java |  22 ++-
 .../runners/samza/translation/ReadTranslator.java  |  47 ++-
 .../samza/translation/SamzaPipelineTranslator.java | 112 +---
 ...age-info.java => SamzaTranslatorRegistrar.java} |   8 +-
 ...anslator.java => TransformConfigGenerator.java} |   7 +-
 .../samza/translation/TransformTranslator.java |   2 +-
 .../samza/translation/TranslationContext.java  |   2 +-
 12 files changed, 262 insertions(+), 208 deletions(-)
 create mode 100644 
runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ConfigContext.java
 copy 
runners/samza/src/main/java/org/apache/beam/runners/samza/translation/{package-info.java
 => SamzaTranslatorRegistrar.java} (82%)
 copy 
runners/samza/src/main/java/org/apache/beam/runners/samza/translation/{TransformTranslator.java
 => TransformConfigGenerator.java} (79%)



[jira] [Work logged] (BEAM-5254) Add Samza Runner translator registrar and refactor config generation

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5254?focusedWorklogId=152737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152737
 ]

ASF GitHub Bot logged work on BEAM-5254:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:00
Start Date: 09/Oct/18 16:00
Worklog Time Spent: 10m 
  Work Description: akedin closed pull request #6292: [BEAM-5254] Add Samza 
Runner translator registrar and refactor config
URL: https://github.com/apache/beam/pull/6292
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java 
b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java
index 6e67e385756..bba10ddd962 100644
--- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java
+++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java
@@ -33,7 +33,6 @@
 import org.apache.beam.sdk.values.PValue;
 import org.apache.samza.application.StreamApplication;
 import org.apache.samza.config.Config;
-import org.apache.samza.config.MapConfig;
 import org.apache.samza.metrics.MetricsRegistryMap;
 import org.apache.samza.operators.ContextManager;
 import org.apache.samza.operators.StreamGraph;
@@ -76,20 +75,18 @@ public SamzaPipelineResult run(Pipeline pipeline) {
 
 // Add a dummy source for use in special cases (TestStream, empty flatten)
 final PValue dummySource = pipeline.apply("Dummy Input Source", 
Create.of("dummy"));
-
 final Map idMap = PViewToIdMapper.buildIdMap(pipeline);
-final Map config = ConfigBuilder.buildConfig(pipeline, 
options, idMap);
 
-final SamzaExecutionContext executionContext = new SamzaExecutionContext();
+final ConfigBuilder configBuilder = new ConfigBuilder(options);
+SamzaPipelineTranslator.createConfig(pipeline, idMap, configBuilder);
+final ApplicationRunner runner = 
ApplicationRunner.fromConfig(configBuilder.build());
 
-final ApplicationRunner runner = ApplicationRunner.fromConfig(new 
MapConfig(config));
+final SamzaExecutionContext executionContext = new SamzaExecutionContext();
 
 final StreamApplication app =
 new StreamApplication() {
   @Override
   public void init(StreamGraph streamGraph, Config config) {
-// TODO: we should probably not be creating the execution context 
this early since it needs
-// to be shipped off to various tasks.
 streamGraph.withContextManager(
 new ContextManager() {
   @Override
diff --git 
a/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java
 
b/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java
index 1bef011a34f..f653cfc934b 100644
--- 
a/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java
+++ 
b/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java
@@ -69,28 +69,6 @@
  */
 // TODO: instrumentation for the consumer
 public class BoundedSourceSystem {
-  /**
-   * Returns the configuration required to instantiate a consumer for the 
given {@link
-   * BoundedSource}.
-   *
-   * @param id a unique id for the source. Must use only valid characters for 
a system name in
-   * Samza.
-   * @param source the source
-   * @param coder a coder to deserialize messages received by the source's 
consumer
-   * @param  the type of object produced by the source consumer
-   */
-  public static  Map createConfigFor(
-  String id, BoundedSource source, Coder> coder, 
String stepName) {
-final Map config = new HashMap<>();
-final String streamPrefix = "systems." + id;
-config.put(streamPrefix + ".samza.factory", 
BoundedSourceSystem.Factory.class.getName());
-config.put(streamPrefix + ".source", 
Base64Serializer.serializeUnchecked(source));
-config.put(streamPrefix + ".coder", 
Base64Serializer.serializeUnchecked(coder));
-config.put(streamPrefix + ".stepName", stepName);
-config.put("streams." + id + ".samza.system", id);
-config.put("streams." + id + ".samza.bounded", "true");
-return config;
-  }
 
   private static  List> split(
   BoundedSource source, SamzaPipelineOptions pipelineOptions) throws 
Exception {
@@ -414,8 +392,7 @@ private void enqueueUninterruptibly(IncomingMessageEnvelope 
envelope) {
 
   /**
* A {@link SystemFactory} that produces a {@link BoundedSourceSystem} for a 
particular {@link
-   * BoundedSource} registered in {@link Config} via {@link 
#createConfigFor(String, BoundedSource,
-   * Coder, String)}.
+   * BoundedSource} regist

[beam] branch asf-site updated: Publishing website 2018/10/09 16:01:28 at commit f6945d5

2018-10-09 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0b32751  Publishing website 2018/10/09 16:01:28 at commit f6945d5
0b32751 is described below

commit 0b32751176882c2ab6b035ca8317f4726974f622
Author: jenkins 
AuthorDate: Tue Oct 9 16:01:29 2018 +

Publishing website 2018/10/09 16:01:28 at commit f6945d5



Jenkins build is back to normal : beam_PostCommit_Website_Publish #146

2018-10-09 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643682#comment-16643682
 ] 

Kenneth Knowles commented on BEAM-5690:
---

CC [~kedin] [~apilloud] [~xumingming] [~mingmxu] [~amaliujia]

Since it is not reproduced in the Flink runner or Direct runner, the SQL 
implementation of GROUP BY is probably triggering some latent bug in the Spark 
runner's streaming mode.

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Python_VR_Flink #287

2018-10-09 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152749
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:15
Start Date: 09/Oct/18 16:15
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #6617: [BEAM-5687] Fix 
checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617
 
 
   ###  [BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines
   
   This provides the input WindowValue Coder to ExecutableStageDoFnOperator 
which
   ensures that the buffered elements can be checkpointed correctly.
   
   ### [BEAM-3727] Do not shutdown Impulse sources to enable checkpointing
   
   Flink's checkpointing won't work properly after sources have finished. They 
need
   to be up and running for as long as checkpoints should be taken. This was
   already the case for the non-portable UnboundedSourceWrapper but it needs to 
be
   extended also for Impulse transforms. 
   
   ### [BEAM-3727] Allow sources to shutdown when checkpointing is disabled 
   
   
   CC @tweise @angoenka 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152749)
Time Spent: 10m
Remaining Estimate: 0h

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Compon

[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152758
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:23
Start Date: 09/Oct/18 16:23
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6617: 
[BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617#discussion_r223771625
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -203,6 +203,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 .getCheckpointConfig()
 .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints);
   }
+} else {
+  // checkpointing is disabled, we can allow shutting down sources when 
they're done
 
 Review comment:
   great, let's add that link as a comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152758)
Time Spent: 40m  (was: 0.5h)

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Checkpoints fail:
> {noformat}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
>   at 
> org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
>   at 
> org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUti

[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152759
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:25
Start Date: 09/Oct/18 16:25
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6617: 
[BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617#discussion_r223772311
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -203,6 +203,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 .getCheckpointConfig()
 .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints);
   }
+} else {
+  // checkpointing is disabled, we can allow shutting down sources when 
they're done
 
 Review comment:
   It's also in `ImpulseSourceFunction` but doesn't hurt to add it here as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152759)
Time Spent: 50m  (was: 40m)

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Checkpoints fail:
> {noformat}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
>   at 
> org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
>   at 
> org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.flink.util.Futur

[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152754
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:19
Start Date: 09/Oct/18 16:19
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6617: 
[BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617#discussion_r223770313
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -203,6 +203,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 .getCheckpointConfig()
 .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints);
   }
+} else {
+  // checkpointing is disabled, we can allow shutting down sources when 
they're done
 
 Review comment:
   Is there a Flink JIRA for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152754)
Time Spent: 20m  (was: 10m)

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Checkpoints fail:
> {noformat}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
>   at 
> org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
>   at 
> org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50

[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152755
 ]

ASF GitHub Bot logged work on BEAM-5687:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:20
Start Date: 09/Oct/18 16:20
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6617: 
[BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines 
URL: https://github.com/apache/beam/pull/6617#discussion_r223770717
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -203,6 +203,9 @@ static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 .getCheckpointConfig()
 .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints);
   }
+} else {
+  // checkpointing is disabled, we can allow shutting down sources when 
they're done
 
 Review comment:
   https://issues.apache.org/jira/browse/FLINK-2491


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152755)
Time Spent: 0.5h  (was: 20m)

> Checkpointing in portable pipelines does not work
> -
>
> Key: BEAM-5687
> URL: https://issues.apache.org/jira/browse/BEAM-5687
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.9.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Checkpoints fail:
> {noformat}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> 
> ToKeyedWorkItem (1/1).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162)
>   at 
> org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136)
>   at 
> org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394)
>   at 
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352)
>   at 
> org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(Fu

[jira] [Work logged] (BEAM-5660) Add dataflow java worker unit tests into precommit

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5660?focusedWorklogId=152760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152760
 ]

ASF GitHub Bot logged work on BEAM-5660:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:27
Start Date: 09/Oct/18 16:27
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #6613: [BEAM-5660] Add both 
dataflow legacy worker and fn-api worker into JavaPreCommit
URL: https://github.com/apache/beam/pull/6613#issuecomment-428259602
 
 
   @boyuanzz Is the website failure unrelated?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152760)
Time Spent: 20m  (was: 10m)

> Add dataflow java worker unit tests into precommit
> --
>
> Key: BEAM-5660
> URL: https://issues.apache.org/jira/browse/BEAM-5660
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152762&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152762
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:28
Start Date: 09/Oct/18 16:28
Worklog Time Spent: 10m 
  Work Description: splovyt commented on issue #6590: [BEAM-5315] Partially 
port io
URL: https://github.com/apache/beam/pull/6590#issuecomment-428259931
 
 
   @tvalentyn I have rebased, although the checks seem to be hanging. PTAL and 
please merge if approved (I am at an event next two days). Thanks once again 
for the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152762)
Time Spent: 3h 20m  (was: 3h 10m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Simon
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Py_VR_Dataflow #1303

2018-10-09 Thread Apache Jenkins Server
See 


Changes:

[xinyuliu.us] [BEAM-5254] Add Samza Runner translator registrar and refactor 
config

--
[...truncated 74.75 KB...]
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting mock (from -r postcommit_requirements.txt (line 2))
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.3.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.3.0.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.3.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.3.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.3.0.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.3.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.3.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Successfully downloaded pyhamcrest mock setuptools six funcsigs 

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152771
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:36
Start Date: 09/Oct/18 16:36
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223776184
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
 
 Review comment:
   If it's used only once then we could you use just character, but ok with a 
constant


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152771)
Time Spent: 3h  (was: 2h 50m)

> Implement a Graphite sink for the metrics pusher
> 
>
> Key: BEAM-4553
> URL: https://issues.apache.org/jira/browse/BEAM-4553
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Today only a REST Http sink that sends raw json metrics using POST request to 
> a http server is available. It is more a POC sink. It would be good to code 
> the first real metrics sink. Some of the most popular is Graphite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152777
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:42
Start Date: 09/Oct/18 16:42
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223777983
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createGaugeGraphiteMessage(gauge, false));
+}
+
+for (MetricResult 

[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152779&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152779
 ]

ASF GitHub Bot logged work on BEAM-4553:


Author: ASF GitHub Bot
Created on: 09/Oct/18 16:46
Start Date: 09/Oct/18 16:46
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor 
MetricsHttpSink test
URL: https://github.com/apache/beam/pull/6569#discussion_r223779339
 
 

 ##
 File path: 
runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java
 ##
 @@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.extensions.metrics;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.BufferedWriter;
+import java.io.OutputStreamWriter;
+import java.net.InetAddress;
+import java.net.Socket;
+import java.nio.charset.Charset;
+import java.util.Locale;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricsSink;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics 
are reported with the
+ * timestamp (seconds from epoch) when the push to the sink was done (except 
with gauges that
+ * already have a timestamp value). The graphite metric name will be in the 
form of
+ * 
beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType
 For example:
+ * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code
+ * beam.distribution.throughput.nbRecordsPerSec.attempted.mean}
+ */
+public class MetricsGraphiteSink implements MetricsSink {
+  private static final Charset UTF_8 = Charset.forName("UTF-8");
+  private static final Pattern WHITESPACE = Pattern.compile("[\\s]+");
+  private static final String SPACE_REPLACEMENT = "_";
+  private final String address;
+  private final int port;
+  private final Charset charset;
+
+  public MetricsGraphiteSink(PipelineOptions pipelineOptions) {
+this.address = pipelineOptions.getMetricsGraphiteHost();
+this.port = pipelineOptions.getMetricsGraphitePort();
+this.charset = UTF_8;
+  }
+
+  @Experimental(Experimental.Kind.METRICS)
+  @Override
+  public void writeMetrics(MetricQueryResults metricQueryResults) throws 
Exception {
+final long metricTimestamp = System.currentTimeMillis() / 1000L;
+Socket socket = new Socket(InetAddress.getByName(address), port);
+BufferedWriter writer =
+new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), 
charset));
+StringBuilder messagePayload = new StringBuilder();
+Iterable> counters = metricQueryResults.getCounters();
+Iterable> gauges = 
metricQueryResults.getGauges();
+Iterable> distributions =
+metricQueryResults.getDistributions();
+
+for (MetricResult counter : counters) {
+  // if committed metrics are not supported, exception is thrown and we 
don't append the message
+  try {
+messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createCounterGraphiteMessage(metricTimestamp, 
counter, false));
+}
+
+for (MetricResult gauge : gauges) {
+  try {
+messagePayload.append(createGaugeGraphiteMessage(gauge, true));
+  } catch (UnsupportedOperationException e) {
+if (!e.getMessage().contains("committed metrics")) {
+  throw e;
+}
+  }
+  messagePayload.append(createGaugeGraphiteMessage(gauge, false));
+}
+
+for (MetricResult 

[jira] [Commented] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643746#comment-16643746
 ] 

Valentyn Tymofieiev commented on BEAM-5624:
---

This issue is due to a small API change (see: 
https://github.com/apache/beam/pull/6616). 
However there are some troubling reports about bad experience with 
avro-python3, see [1,2].

That said, we may want to migrate to fastavro sooner than later, FYI [~udim] 
 [~chamikara] [~altay].

[1] https://github.com/common-workflow-language/cwltool/issues/524
[2] https://issues.apache.org/jira/browse/AVRO-2046 

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ==
> ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse')
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 234, in load_module
> return load_source(name, filename, file)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 673, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 54, in 
> class TestAvro(unittest.TestCase):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 89, in TestAvro
> SCHEMA = avro.schema.parse('''
> AttributeError: module 'avro.schema' has no attribute 'parse'
> Note that we use a different implementation of avro/avro-python3 package 
> depending on Python version. We are also evaluating potential replacement of 
> avro with fastavro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1081) annotations should support custom messages and classes

2018-10-09 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643748#comment-16643748
 ] 

Ahmet Altay commented on BEAM-1081:
---

Adding tests would be good. There is also the "1. ability to customize message" 
part of the original issue.

> annotations should support custom messages and classes
> --
>
> Key: BEAM-1081
> URL: https://issues.apache.org/jira/browse/BEAM-1081
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: newbie, starter
>
> Update 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py
>  to add 2 new features:
> 1. ability to customize message
> 2. ability to tag classes (not only functions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5254) Add Samza Runner translator registrar and refactor config generation

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5254?focusedWorklogId=152814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152814
 ]

ASF GitHub Bot logged work on BEAM-5254:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:12
Start Date: 09/Oct/18 17:12
Worklog Time Spent: 10m 
  Work Description: xinyuiscool commented on issue #6292: [BEAM-5254] Add 
Samza Runner translator registrar and refactor config
URL: https://github.com/apache/beam/pull/6292#issuecomment-428274726
 
 
   Thanks for merging it! I might have a few more coming in the next few days 
for Samza Runner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152814)
Time Spent: 1.5h  (was: 1h 20m)

> Add Samza Runner translator registrar and refactor config generation
> 
>
> Key: BEAM-5254
> URL: https://issues.apache.org/jira/browse/BEAM-5254
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-samza
>Reporter: Xinyu Liu
>Assignee: Xinyu Liu
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add a registrar for transform translators in Samza Runner so we allow 
> customized translators. Also refactors the config generation part so it can 
> be extended outside open source beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152813
 ]

ASF GitHub Bot logged work on BEAM-5624:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:12
Start Date: 09/Oct/18 17:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #6616: 
[BEAM-5624] Fix avro.schema parser for py3
URL: https://github.com/apache/beam/pull/6616#discussion_r223788054
 
 

 ##
 File path: sdks/python/apache_beam/io/avroio_test.py
 ##
 @@ -25,10 +25,15 @@
 from builtins import range
 
 import avro.datafile
-import avro.schema
 from avro.datafile import DataFileWriter
 from avro.io import DatumWriter
 import hamcrest as hc
+# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports
+try:
+  from avro.schema import Parse
 
 Review comment:
   Could you add a comment here about, in what versions of avro which version 
of parse is supported. (We can use this information to remove this block later 
on.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152813)
Time Spent: 20m  (was: 10m)

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ==
> ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse')
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 234, in load_module
> return load_source(name, filename, file)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 673, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 54, in 
> class TestAvro(unittest.TestCase):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 89, in TestAvro
> SCHEMA = avro.schema.parse('''
> AttributeError: module 'avro.schema' has no attribute 'parse'
> Note that we use a different implementation of avro/avro-python3 package 
> depending on Python version. We are also evaluating potential replacement of 
> avro with fastavro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] annotated tag v2.7.0 updated (e2edd49 -> 51e57fd)

2018-10-09 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to annotated tag v2.7.0
in repository https://gitbox.apache.org/repos/asf/beam.git.


*** WARNING: tag v2.7.0 was modified! ***

from e2edd49  (commit)
  to 51e57fd  (tag)
 tagging e2edd4918ac7a6f9b39b1186bbd6f6bb783568d3 (commit)
 replaces v2.7.0-RC2
  by Charles Chen
  on Wed Sep 26 02:48:38 2018 +

- Log -
[Gradle Release Plugin] - creating tag:  'v2.7.0-RC3'.
---


No new revisions were added by this update.

Summary of changes:



[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152815&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152815
 ]

ASF GitHub Bot logged work on BEAM-5624:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:19
Start Date: 09/Oct/18 17:19
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6616: 
[BEAM-5624] Fix avro.schema parser for py3
URL: https://github.com/apache/beam/pull/6616#discussion_r223790260
 
 

 ##
 File path: sdks/python/apache_beam/io/avroio_test.py
 ##
 @@ -25,10 +25,15 @@
 from builtins import range
 
 import avro.datafile
-import avro.schema
 from avro.datafile import DataFileWriter
 from avro.io import DatumWriter
 import hamcrest as hc
+# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports
+try:
+  from avro.schema import Parse
 
 Review comment:
   +1 to @aaltay's comment. Also we will need a similar change in other places 
in Beam where we use avro.schema.parse. See `apache_beam/io/avroio.py`, 
`apache_beam/examples/avro_bitcoin.py`. This can be done in another PR if you 
prefer.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152815)
Time Spent: 0.5h  (was: 20m)

> Avro IO does not work with avro-python3 package out-of-the-box on Python 3, 
> several tests fail with AttributeError (module 'avro.schema' has no attribute 
> 'parse') 
> ---
>
> Key: BEAM-5624
> URL: https://issues.apache.org/jira/browse/BEAM-5624
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ==
> ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse')
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py",
>  line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py",
>  line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py",
>  line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 234, in load_module
> return load_source(name, filename, file)
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py",
>  line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 673, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 54, in 
> class TestAvro(unittest.TestCase):
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py",
>  line 89, in TestAvro
> SCHEMA = avro.schema.parse('''
> AttributeError: module 'avro.schema' has no attribute 'parse'
> Note that we use a different implementation of avro/avro-python3 package 
> depending on Python version. We are also evaluating potential replacement of 
> avro with fastavro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2018-10-09 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643769#comment-16643769
 ] 

Xu Mingmin commented on BEAM-5690:
--

Is this the error specifically? Seems duplicated {{0}} counts here,
{code:java}
{"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00   +"}
{"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
+","WET":"2018-10-09  09-56-00 0}{code}

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5684) Need a test that verifies Flattening / not-flattening of BQ nested records

2018-10-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5684?focusedWorklogId=152821&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152821
 ]

ASF GitHub Bot logged work on BEAM-5684:


Author: ASF GitHub Bot
Created on: 09/Oct/18 17:38
Start Date: 09/Oct/18 17:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6609: [BEAM-5684] Adding a 
BQNestedRecords Test
URL: https://github.com/apache/beam/pull/6609#issuecomment-428283083
 
 
   Run Java PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 152821)
Time Spent: 1h 20m  (was: 1h 10m)

> Need a test that verifies Flattening / not-flattening of BQ nested records
> --
>
> Key: BEAM-5684
> URL: https://issues.apache.org/jira/browse/BEAM-5684
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5691) testWebsite failed owing to several url 404 not found

2018-10-09 Thread Boyuan Zhang (JIRA)
Boyuan Zhang created BEAM-5691:
--

 Summary: testWebsite failed owing to several url 404 not found
 Key: BEAM-5691
 URL: https://issues.apache.org/jira/browse/BEAM-5691
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Boyuan Zhang


test log: 
https://builds.apache.org/job/beam_PreCommit_Website_Commit/262/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >