[jira] [Created] (BEAM-1701) Add Meter metric type to Python SDK

2017-03-12 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1701:
---

 Summary: Add Meter metric type to Python SDK
 Key: BEAM-1701
 URL: https://issues.apache.org/jira/browse/BEAM-1701
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py
Reporter: Aviem Zur
Assignee: Ahmet Altay






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1699) Timer metric type

2017-03-12 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1699:
---

 Summary: Timer metric type
 Key: BEAM-1699
 URL: https://issues.apache.org/jira/browse/BEAM-1699
 Project: Beam
  Issue Type: New Feature
  Components: beam-model, sdk-java-core, sdk-py
Reporter: Aviem Zur
Assignee: Ben Chambers


Add support for Timer metric type to the SDK.
Interface should be along the lines of:
{code}
void update(Duration duration);
 T time(Callable  event);
Closeable time();
{code}
Compare to 
http://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Timer.html

Could reuse code for this and https://issues.apache.org/jira/browse/BEAM-1613



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1698) Comply with metrics querying semantics in Spark Runner

2017-03-12 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1698:
---

 Summary: Comply with metrics querying semantics in Spark Runner
 Key: BEAM-1698
 URL: https://issues.apache.org/jira/browse/BEAM-1698
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Aviem Zur
Assignee: Aviem Zur






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1344) Uniform metrics step name semantics

2017-03-12 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1344:
---

Assignee: Ben Chambers  (was: Aviem Zur)

> Uniform metrics step name semantics
> ---
>
> Key: BEAM-1344
> URL: https://issues.apache.org/jira/browse/BEAM-1344
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Ben Chambers
>
> Agree on and implement uniform metrics step name semantics which runners 
> would adhere to.
> Current discussion seems to point at a string with the pipeline graph path to 
> the step's transform. Something along the lines of: 
> "PBegin/SomeInputTransform/SomeParDo/...MyTransform.#Running_number_for_collisions".
> Also agree on and implement metrics querying semantics. Current discussion 
> seems to point at a substring or regex matching of steps on given string 
> input.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/476bf8f8b1bd63ec49a9f4f45d87402d49b9c887216f3b54cb748a12@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1344) Uniform metrics step name semantics

2017-03-12 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1344:
---

Assignee: Aviem Zur  (was: Ben Chambers)

> Uniform metrics step name semantics
> ---
>
> Key: BEAM-1344
> URL: https://issues.apache.org/jira/browse/BEAM-1344
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Agree on and implement uniform metrics step name semantics which runners 
> would adhere to.
> Current discussion seems to point at a string with the pipeline graph path to 
> the step's transform. Something along the lines of: 
> "PBegin/SomeInputTransform/SomeParDo/...MyTransform.#Running_number_for_collisions".
> Also agree on and implement metrics querying semantics. Current discussion 
> seems to point at a substring or regex matching of steps on given string 
> input.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/476bf8f8b1bd63ec49a9f4f45d87402d49b9c887216f3b54cb748a12@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1697) Allow output time that is some offset from target time of event time timer

2017-03-12 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1697:
--
Description: Today, when an event time timer fires, the outputs are 
associated with its firing timestamp. But it might be useful to be able to have 
the output timestamp be at some offset or explicit time instead, so the output 
watermark could be held further from the input watermark.  (was: Today, when an 
event time timer fires, the outputs are associated with its firing timestamp. 
But it might be useful to be able to have the output timestamp be at some 
offset instead, so the output watermark could be held further from the input 
watermark.)

> Allow output time that is some offset from target time of event time timer
> --
>
> Key: BEAM-1697
> URL: https://issues.apache.org/jira/browse/BEAM-1697
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Today, when an event time timer fires, the outputs are associated with its 
> firing timestamp. But it might be useful to be able to have the output 
> timestamp be at some offset or explicit time instead, so the output watermark 
> could be held further from the input watermark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1696) Allow explicit output time for processing time timers

2017-03-12 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906835#comment-15906835
 ] 

Kenneth Knowles commented on BEAM-1696:
---

That is a fair point. I don't have a confident understanding of whether 
BEAM-1697 should just be rolled into this or not. Probably yes.

> Allow explicit output time for processing time timers
> -
>
> Key: BEAM-1696
> URL: https://issues.apache.org/jira/browse/BEAM-1696
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Today, a processing time timer is not associated with a particular output 
> timestamp; instead the user must explicitly output to the desired timestamp.
> This has a few drawbacks:
>  - Probably need to maintain state that indicates what the timestamp should be
>  - The output watermark is not held to that timestamp
> Something like {{processingTimer.set(...).withOutputTimestamp(...)}} (or 
> perhaps some more involved API with an {{OutputTimeFn}}-like policy attached) 
> would work nicely, so when {{@OnTimer}} is called the right timestamp is 
> automatic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Python_Verify #1493

2017-03-12 Thread Apache Jenkins Server
See 


--
[...truncated 387.60 KB...]
Collecting appdirs>=1.4.0 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/appdirs-1.4.3.tar.gz
Collecting pyparsing (from packaging>=16.8->setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/pyparsing-2.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr packaging 
appdirs pyparsing
ok
test_as_list_and_as_dict_side_inputs 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... DEPRECATION: pip 
install --download has been deprecated and will be removed in the future. Pip 
now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
:318:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
:122:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-34.3.2.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-2.0.0.tar.gz
Collecting packaging>=16.8 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/packaging-16.8.tar.gz
Collecting appdirs>=1.4.0 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/appdirs-1.4.3.tar.gz
Collecting pyparsing (from packaging>=16.8->setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/pyparsing-2.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr packaging 
appdirs pyparsing
ok
test_as_list_with_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... DEPRECATION: pip 
install --download has been deprecated and will be removed in the future. Pip 
now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-34.3.2.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-2.0.0.tar.gz
Collecting packaging>=16.8 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 

[jira] [Created] (BEAM-1697) Allow output time that is some offset from target time of event time timer

2017-03-12 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-1697:
-

 Summary: Allow output time that is some offset from target time of 
event time timer
 Key: BEAM-1697
 URL: https://issues.apache.org/jira/browse/BEAM-1697
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-core
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


Today, when an event time timer fires, the outputs are associated with its 
firing timestamp. But it might be useful to be able to have the output 
timestamp be at some offset instead, so the output watermark could be held 
further from the input watermark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1696) Allow explicit output time for processing time timers

2017-03-12 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1696:
--
Issue Type: New Feature  (was: Bug)

> Allow explicit output time for processing time timers
> -
>
> Key: BEAM-1696
> URL: https://issues.apache.org/jira/browse/BEAM-1696
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Today, a processing time timer is not associated with a particular output 
> timestamp; instead the user must explicitly output to the desired timestamp.
> This has a few drawbacks:
>  - Probably need to maintain state that indicates what the timestamp should be
>  - The output watermark is not held to that timestamp
> Something like {{processingTimer.set(...).withOutputTimestamp(...)}} (or 
> perhaps some more involved API with an {{OutputTimeFn}}-like policy attached) 
> would work nicely, so when {{@OnTimer}} is called the right timestamp is 
> automatic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1696) Allow explicit output time for processing time timers

2017-03-12 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-1696:
-

 Summary: Allow explicit output time for processing time timers
 Key: BEAM-1696
 URL: https://issues.apache.org/jira/browse/BEAM-1696
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


Today, a processing time timer is not associated with a particular output 
timestamp; instead the user must explicitly output to the desired timestamp.

This has a few drawbacks:

 - Probably need to maintain state that indicates what the timestamp should be
 - The output watermark is not held to that timestamp

Something like {{processingTimer.set(...).withOutputTimestamp(...)}} (or 
perhaps some more involved API with an {{OutputTimeFn}}-like policy attached) 
would work nicely, so when {{@OnTimer}} is called the right timestamp is 
automatic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Python_Verify #1492

2017-03-12 Thread Apache Jenkins Server
See 


--
[...truncated 392.53 KB...]
ok
test_as_list_without_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... DEPRECATION: pip 
install --download has been deprecated and will be removed in the future. Pip 
now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
:318:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
:122:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-34.3.2.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-2.0.0.tar.gz
Collecting packaging>=16.8 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/packaging-16.8.tar.gz
Collecting appdirs>=1.4.0 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/appdirs-1.4.3.tar.gz
Collecting pyparsing (from packaging>=16.8->setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/pyparsing-2.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr packaging 
appdirs pyparsing
ok
test_as_singleton_with_different_defaults_with_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... DEPRECATION: pip 
install --download has been deprecated and will be removed in the future. Pip 
now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-34.3.2.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-2.0.0.tar.gz
Collecting packaging>=16.8 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/packaging-16.8.tar.gz
Collecting appdirs>=1.4.0 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/appdirs-1.4.3.tar.gz
Collecting pyparsing (from packaging>=16.8->setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/pyparsing-2.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr 

svn commit: r18698 - in /dev/beam/0.6.0: apache-beam-0.6.0-python.zip.md5 apache-beam-0.6.0-python.zip.sha1

2017-03-12 Thread altay
Author: altay
Date: Sun Mar 12 21:32:08 2017
New Revision: 18698

Log:
Update hashes for the python source zip file.


Modified:
dev/beam/0.6.0/apache-beam-0.6.0-python.zip.md5
dev/beam/0.6.0/apache-beam-0.6.0-python.zip.sha1

Modified: dev/beam/0.6.0/apache-beam-0.6.0-python.zip.md5
==
--- dev/beam/0.6.0/apache-beam-0.6.0-python.zip.md5 (original)
+++ dev/beam/0.6.0/apache-beam-0.6.0-python.zip.md5 Sun Mar 12 21:32:08 2017
@@ -1 +1 @@
-7d4170e381ce0e1aa8d11bee2e63d151  apache-beam-0.6.0.zip
+7d4170e381ce0e1aa8d11bee2e63d151  apache-beam-0.6.0-python.zip

Modified: dev/beam/0.6.0/apache-beam-0.6.0-python.zip.sha1
==
--- dev/beam/0.6.0/apache-beam-0.6.0-python.zip.sha1 (original)
+++ dev/beam/0.6.0/apache-beam-0.6.0-python.zip.sha1 Sun Mar 12 21:32:08 2017
@@ -1 +1 @@
-ccece0ecca10c4c6019cba2ffb0963b187bb89d3  apache-beam-0.6.0.zip
+ccece0ecca10c4c6019cba2ffb0963b187bb89d3  apache-beam-0.6.0-python.zip




Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #1220

2017-03-12 Thread Apache Jenkins Server
See 




[jira] [Reopened] (BEAM-1582) ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.

2017-03-12 Thread Amit Sela (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela reopened BEAM-1582:
-

This test keeps flaking so I'll leave it open until we resolve it, or move to 
PostCommit and accept the fact that it flakes at times.

> ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.
> --
>
> Key: BEAM-1582
> URL: https://issues.apache.org/jira/browse/BEAM-1582
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
> Fix For: First stable release
>
>
> See: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/
> After some digging in it appears that a second firing occurs (though only one 
> is expected) but it doesn't come from a stale state (state is empty before it 
> fires).
> Might be a retry happening for some reason, which is OK in terms of 
> fault-tolerance guarantees (at-least-once), but not so much in terms of flaky 
> tests. 
> I'm looking into this hoping to fix this ASAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1581) JsonIO

2017-03-12 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906648#comment-15906648
 ] 

Eugene Kirpichov commented on BEAM-1581:


[~aviemzur] - I think XML source/sink are a bad design in retrospect - they 
were both, as far as I remember, created mostly to demonstrate the whole idea 
of file-based sources/sinks, and before the best practices for transform 
development shaped up, and did not really come from unifying an experience with 
an array of XML use cases either. We should not use them as API guidance.

The suggestion to have an abstract JsonIO seems to contradict the 
recommendation from the PTransform Style Guide (see 
https://beam.apache.org/contribute/ptransform-style-guide/#injecting-user-specified-behavior)
 to use PTransform composition as an extensibility device whenever possible 
(instead of inheritance) - and that recommendation is specifically directed at 
cases like this; the better alternative is to return String's and let the user 
compose it with a ParDo parsing the strings.

[~eljefe6a] - "File as a self-contained JSON" means there's no JSON-specific 
logic, it's simply "File as a self-contained String" - we should definitely 
have that, but under a separate JIRA issue.

Aviem / Jesse - could you perhaps come up with a list of common ways in which 
you have seen people store a collection of stuff in JSON file(s)? I think 
without that, or while keeping it implicit, we're kind of acting blindly. Let's 
list all the known use cases and abstract upward from that.

> JsonIO
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> A new IO (with source and sink) which will read/write Json files.
> Similarly to {{XmlSource}}/{{XmlSink}}, this IO should have a 
> {{JsonSource}}/{{JonSink}} which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1581) JsonIO

2017-03-12 Thread Jesse Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906637#comment-15906637
 ] 

Jesse Anderson commented on BEAM-1581:
--

I think it will have to support several modes:
Line as a self-contained JSON
File as a self-contained JSON

I think the output should support:
arrays [PCollectionLine, PCollectionLine]
key/value lists {PCollectionLine, PCollectionLine}

> JsonIO
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> A new IO (with source and sink) which will read/write Json files.
> Similarly to {{XmlSource}}/{{XmlSink}}, this IO should have a 
> {{JsonSource}}/{{JonSink}} which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1695) Improve Python-SDK's programming guide

2017-03-12 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated BEAM-1695:
-
Description: 
Beam's programming guide provides a tutorial-like structure to introduce the 
user to the main concepts.

Due to flaws of the snippets the copied code needs altering to work.
Some of the problems:
* Section "Creating the pipeline"
** {{import apache_beam as beam}} statement is missing from the beginning
** The command line arguments are not parsed
* Section "Creating a PCollection from in-memory data"
** {{pipeline_options}} variable is undefined
** {{my_options}} variable is undefined
* Section "ParDo": 
** It is not explained how to define {{words}} variable
* Section "Advanced combinations using CombineFn" and "Combining a PCollection 
into a single value" has the same code snippet
* Section "Combining values in a key-grouped collection":
** It is not explained how to define {{player_accuracies}}
* Section "Using Flatten and Partition"
** The code snippet contains unnecessary markers ({{[START 
model_multiple_pcollections_tuple]}})
* Section "partition":
** {{students}} variable is undefined

This list might not be complete.

The website's repo is located at: https://github.com/apache/beam-site
The snippets are taken from: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py



  was:
Beam's programming guide provides a tutorial-like structure to introduce the 
user to the main concepts.

Due to flaws of the snippets the copied code needs altering to work.
Some of the problems per section
* Section "Creating the pipeline"
** {{import apache_beam as beam}} statement is missing from the beginning
** The command line arguments are not parsed
* Section "Creating a PCollection from in-memory data"
** {{pipeline_options}} variable is undefined
** {{my_options}} variable is undefined
* Section "ParDo": 
** It is not explained how to define {{words}} variable
* Section "Advanced combinations using CombineFn" and "Combining a PCollection 
into a single value" has the same code snippet
* Section "Combining values in a key-grouped collection":
** It is not explained how to define {{player_accuracies}}
* Section "Using Flatten and Partition"
** The code snippet contains unnecessary markers ({{[START 
model_multiple_pcollections_tuple]}})
* Section "partition":
** {{students}} variable is undefined

This list might not be complete.

The website's repo is located at: https://github.com/apache/beam-site
The snippets are taken from: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py




> Improve Python-SDK's programming guide
> --
>
> Key: BEAM-1695
> URL: https://issues.apache.org/jira/browse/BEAM-1695
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Tibor Kiss
>Priority: Minor
>  Labels: newbie, starter
>
> Beam's programming guide provides a tutorial-like structure to introduce the 
> user to the main concepts.
> Due to flaws of the snippets the copied code needs altering to work.
> Some of the problems:
> * Section "Creating the pipeline"
> ** {{import apache_beam as beam}} statement is missing from the beginning
> ** The command line arguments are not parsed
> * Section "Creating a PCollection from in-memory data"
> ** {{pipeline_options}} variable is undefined
> ** {{my_options}} variable is undefined
> * Section "ParDo": 
> ** It is not explained how to define {{words}} variable
> * Section "Advanced combinations using CombineFn" and "Combining a 
> PCollection into a single value" has the same code snippet
> * Section "Combining values in a key-grouped collection":
> ** It is not explained how to define {{player_accuracies}}
> * Section "Using Flatten and Partition"
> ** The code snippet contains unnecessary markers ({{[START 
> model_multiple_pcollections_tuple]}})
> * Section "partition":
> ** {{students}} variable is undefined
> This list might not be complete.
> The website's repo is located at: https://github.com/apache/beam-site
> The snippets are taken from: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1695) Improve Python-SDK's programming guide

2017-03-12 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated BEAM-1695:
-
Description: 
Beam's programming guide provides a tutorial-like structure to introduce the 
user to the main concepts.

Due to flaws of the snippets the copied code needs altering to work.
Some of the problems per section
* Section "Creating the pipeline"
** {{import apache_beam as beam}} statement is missing from the beginning
** The command line arguments are not parsed
* Section "Creating a PCollection from in-memory data"
** {{pipeline_options}} variable is undefined
** {{my_options}} variable is undefined
* Section "ParDo": 
** It is not explained how to define {{words}} variable
* Section "Advanced combinations using CombineFn" and "Combining a PCollection 
into a single value" has the same code snippet
* Section "Combining values in a key-grouped collection":
** It is not explained how to define {{player_accuracies}}
* Section "Using Flatten and Partition"
** The code snippet contains unnecessary markers ({{[START 
model_multiple_pcollections_tuple]}})
* Section "partition":
** {{students}} variable is undefined

This list might not be complete.

The website's repo is located at: https://github.com/apache/beam-site
The snippets are taken from: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py



  was:
Beam's programming guide provides a tutorial-like structure to introduce the 
user to the main concepts.

Due to flaws of the snippets the copied code needs altering to work.
Some of the problems per section
1) Section "Creating the pipeline"
- {{import apache_beam as beam}} statement is missing from the beginning
- The command line arguments are not parsed
2) Section "Creating a PCollection from in-memory data"
- {{pipeline_options}} variable is undefined
- {{my_options}} variable is undefined
3) Section "ParDo": 
- It is not explained how to define {{words}} variable
4) Section "Advanced combinations using CombineFn" and "Combining a PCollection 
into a single value" has the same code snippet
5) Section "Combining values in a key-grouped collection":
   - It is not explained how to define {{player_accuracies}}
6) Section "Using Flatten and Partition"
   - The code snippet contains unnecessary markers ({{[START 
model_multiple_pcollections_tuple]}})
7) Section "partition":
   - {{students}} variable is undefined

This list might not be complete.

The website's repo is located at: https://github.com/apache/beam-site
The snippets are taken from: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py




> Improve Python-SDK's programming guide
> --
>
> Key: BEAM-1695
> URL: https://issues.apache.org/jira/browse/BEAM-1695
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Tibor Kiss
>Priority: Minor
>  Labels: newbie, starter
>
> Beam's programming guide provides a tutorial-like structure to introduce the 
> user to the main concepts.
> Due to flaws of the snippets the copied code needs altering to work.
> Some of the problems per section
> * Section "Creating the pipeline"
> ** {{import apache_beam as beam}} statement is missing from the beginning
> ** The command line arguments are not parsed
> * Section "Creating a PCollection from in-memory data"
> ** {{pipeline_options}} variable is undefined
> ** {{my_options}} variable is undefined
> * Section "ParDo": 
> ** It is not explained how to define {{words}} variable
> * Section "Advanced combinations using CombineFn" and "Combining a 
> PCollection into a single value" has the same code snippet
> * Section "Combining values in a key-grouped collection":
> ** It is not explained how to define {{player_accuracies}}
> * Section "Using Flatten and Partition"
> ** The code snippet contains unnecessary markers ({{[START 
> model_multiple_pcollections_tuple]}})
> * Section "partition":
> ** {{students}} variable is undefined
> This list might not be complete.
> The website's repo is located at: https://github.com/apache/beam-site
> The snippets are taken from: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1695) Improve Python-SDK's programming guide

2017-03-12 Thread Tibor Kiss (JIRA)
Tibor Kiss created BEAM-1695:


 Summary: Improve Python-SDK's programming guide
 Key: BEAM-1695
 URL: https://issues.apache.org/jira/browse/BEAM-1695
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Tibor Kiss
Priority: Minor


Beam's programming guide provides a tutorial-like structure to introduce the 
user to the main concepts.

Due to flaws of the snippets the copied code needs altering to work.
Some of the problems per section
1) Section "Creating the pipeline"
- {{import apache_beam as beam}} statement is missing from the beginning
- The command line arguments are not parsed
2) Section "Creating a PCollection from in-memory data"
- {{pipeline_options}} variable is undefined
- {{my_options}} variable is undefined
3) Section "ParDo": 
- It is not explained how to define {{words}} variable
4) Section "Advanced combinations using CombineFn" and "Combining a PCollection 
into a single value" has the same code snippet
5) Section "Combining values in a key-grouped collection":
   - It is not explained how to define {{player_accuracies}}
6) Section "Using Flatten and Partition"
   - The code snippet contains unnecessary markers ({{[START 
model_multiple_pcollections_tuple]}})
7) Section "partition":
   - {{students}} variable is undefined

This list might not be complete.

The website's repo is located at: https://github.com/apache/beam-site
The snippets are taken from: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1219

2017-03-12 Thread Apache Jenkins Server
See 




[jira] [Comment Edited] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-12 Thread peay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906595#comment-15906595
 ] 

peay edited comment on BEAM-1573 at 3/12/17 5:23 PM:
-

[~rangadi] I have made some good progress, with running the deserializer within 
Kafka on the consumer thread. An issue with this is how to implement 
{{approxBacklogInBytes}}. The current implementation actually deserializes 
manually (i.e., outside Kafka and the consumer thread) in {{advance}}. The 0.10 
API has {{ConsumerRecord.serializedKeySize()}} and 
{{ConsumerRecord.serializedValueSize()}} which would be perfect to handle that 
computation, but the 0.9 API does not have it. What would you suggest?


was (Author: peay):
[~rangadi] I have made some good progress, with running the deserializer within 
Kafka on the consumer thread. An issue with this is how to implement 
{{approxBacklogInBytes}}. The current implementation actually deserializes 
manually (i.e., outside Kafka and the consumer thread) in {{advance}}. The 0.10 
API has {{ConsumeRecord.serializedKeySize()}} and 
{{ConsumerRecord.serializedValueSize()}} which would be perfect to handle that 
computation, but the 0.9 API does not have it. What would you suggest?

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
> Fix For: Not applicable
>
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-12 Thread peay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906595#comment-15906595
 ] 

peay edited comment on BEAM-1573 at 3/12/17 5:23 PM:
-

[~rangadi] I have made some good progress, with running the deserializer within 
Kafka on the consumer thread. An issue with this is how to implement 
{{approxBacklogInBytes}}. The current implementation actually deserializes 
manually (i.e., outside Kafka and the consumer thread) in {{advance}}. The 0.10 
API has {{ConsumeRecord.serializedKeySize()}} and 
{{ConsumerRecord.serializedValueSize()}} which would be perfect to handle that 
computation, but the 0.9 API does not have it. What would you suggest?


was (Author: peay):
[~rangadi] I have made some good progress, with running the deserializer within 
Kafka on the consumer thread. An issue with this is how to implement 
{{approxBacklogInBytes}}. The current implementation actually deserializes 
manually (i.e., outside Kafka and the consumer thread) in `advance`. The 0.10 
API has {{ConsumeRecord.serializedKeySize()}} and 
{{ConsumerRecord.serializedValueSize()}} which would be perfect to handle that 
computation, but the 0.9 API does not have it. What would you suggest?

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
> Fix For: Not applicable
>
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-12 Thread peay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906595#comment-15906595
 ] 

peay edited comment on BEAM-1573 at 3/12/17 5:23 PM:
-

[~rangadi] I have made some good progress, with running the deserializer within 
Kafka on the consumer thread. An issue with this is how to implement 
{{approxBacklogInBytes}}. The current implementation actually deserializes 
manually (i.e., outside Kafka and the consumer thread) in `advance`. The 0.10 
API has {{ConsumeRecord.serializedKeySize()}} and 
{{ConsumerRecord.serializedValueSize()}} which would be perfect to handle that 
computation, but the 0.9 API does not have it. What would you suggest?


was (Author: peay):
[~rangadi] I have made some good progress, with running the deserializer within 
Kafka on the consumer thread. An issue with this is how to implement 
`approxBacklogInBytes`. The current implementation actually deserializes 
manually (i.e., outside Kafka and the consumer thread) in `advance`. The 0.10 
API has {{ConsumeRecord.serializedKeySize()}} and 
{{ConsumerRecord.serializedValueSize()}} which would be perfect to handle that 
computation, but the 0.9 API does not have it. What would you suggest?

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
> Fix For: Not applicable
>
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #2886

2017-03-12 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #2226: [BEAM-1690] Revert BigQueryIO bit of 'Make all uses...

2017-03-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2226


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1690) BigQueryTornadoesIT failing

2017-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906478#comment-15906478
 ] 

ASF GitHub Bot commented on BEAM-1690:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2226


> BigQueryTornadoesIT failing
> ---
>
> Key: BEAM-1690
> URL: https://issues.apache.org/jira/browse/BEAM-1690
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, runner-dataflow, testing
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> [Since build 
> 2857|https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/changes] 
> (with one aberration) BigQueryTornadoesIT has been failing reliably on the 
> Dataflow runner. The changes around that time do not seem related.
> Example build: 
> https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-examples-java/2857/testReport/junit/org.apache.beam.examples.cookbook/BigQueryTornadoesIT/testE2EBigQueryTornadoes/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/2] beam git commit: This closes #2226

2017-03-12 Thread amitsela
This closes #2226


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/781e4172
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/781e4172
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/781e4172

Branch: refs/heads/master
Commit: 781e4172c3f36863d1c9145d4c18cd0910f2436a
Parents: b6ca062 839c906
Author: Amit Sela 
Authored: Sun Mar 12 11:59:06 2017 +0200
Committer: Amit Sela 
Committed: Sun Mar 12 11:59:06 2017 +0200

--
 .../java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
--




[1/2] beam git commit: Revert BigQueryIO bit of 'Make all uses of CountingOutputStream close their resources'

2017-03-12 Thread amitsela
Repository: beam
Updated Branches:
  refs/heads/master b6ca062fc -> 781e4172c


Revert BigQueryIO bit of 'Make all uses of CountingOutputStream close their 
resources'

This reverts the portion of commit 3115dbdca1858511e98476b5c79e6cca98782b0b
that touches BigQueryIO, which caused a double close bug.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/839c906a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/839c906a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/839c906a

Branch: refs/heads/master
Commit: 839c906a583bef6ef7c0739479231f096df58bef
Parents: b6ca062
Author: Kenneth Knowles 
Authored: Fri Mar 10 19:01:23 2017 -0800
Committer: Amit Sela 
Committed: Sun Mar 12 11:58:43 2017 +0200

--
 .../java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/839c906a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
index 81aa50b..0e1c6fc 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
@@ -2272,9 +2272,7 @@ public class BigQueryIO {
 
   public final KV close() throws IOException {
 channel.close();
-KV record = KV.of(fileName, out.getCount());
-out.close();
-return record;
+return KV.of(fileName, out.getCount());
   }
 }
 



[jira] [Commented] (BEAM-1612) Support real Bundle in Flink runner

2017-03-12 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906475#comment-15906475
 ] 

Aljoscha Krettek commented on BEAM-1612:


[~kenn] I think BEAM-1312 might have been a typo.

> Support real Bundle in Flink runner
> ---
>
> Key: BEAM-1612
> URL: https://issues.apache.org/jira/browse/BEAM-1612
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>
> The Bundle is very important in the beam model. Users can use the bundle to 
> flush buffer, can reuse many heavyweight resources in a bundle. Most IO 
> plugins use the bundle to flush. 
> Moreover, FlinkRunner can also use Bundle to reduce access to the FlinkState, 
> such as first placed in JavaHeap, flush into RocksDbState when invoke 
> finishBundle , this can reduce the number of serialization.
> But now FlinkRunner calls the finishBundle every processElement. We need 
> support real Bundle.
> I think we can have the following implementations:
> 1.Invoke finishBundle and next startBundle in {{snapshot}} of Flink. But 
> sometimes this "Bundle" maybe too big. This depends on the user's checkpoint 
> configuration.
> 2.Manually control the size of the bundle. The half-bundle will be flushed to 
> a full-bundle by count or eventTime or processTime or {{snapshot}}. We do not 
> need to wait, just call the startBundle and finishBundle at the right time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1629) Metrics/aggregators accumulators should be instantiated before traversing pipeline

2017-03-12 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1629:

Fix Version/s: First stable release

> Metrics/aggregators accumulators should be instantiated before traversing 
> pipeline
> --
>
> Key: BEAM-1629
> URL: https://issues.apache.org/jira/browse/BEAM-1629
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> Today aggregators and metrics accumulators are instantiated after pipeline is 
> traversed, and so, if code that runs on driver accesses these singletons but 
> is not part of a closure a {{NullPointerException}} is thrown.
> This is fine for streaming branches of a pipeline as you probably did 
> something wrong, but for batch branches this should be allowed (In completely 
> batch pipelines accumulators are instantiated before traversal and everything 
> works fine, the issue occurs in streaming+batch combined pipelines) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1629) Metrics/aggregators accumulators should be instantiated before traversing pipeline

2017-03-12 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1629.
-
Resolution: Fixed

> Metrics/aggregators accumulators should be instantiated before traversing 
> pipeline
> --
>
> Key: BEAM-1629
> URL: https://issues.apache.org/jira/browse/BEAM-1629
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> Today aggregators and metrics accumulators are instantiated after pipeline is 
> traversed, and so, if code that runs on driver accesses these singletons but 
> is not part of a closure a {{NullPointerException}} is thrown.
> This is fine for streaming branches of a pipeline as you probably did 
> something wrong, but for batch branches this should be allowed (In completely 
> batch pipelines accumulators are instantiated before traversal and everything 
> works fine, the issue occurs in streaming+batch combined pipelines) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1693) Detect suitable Python & pip executables in Python-SDK

2017-03-12 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated BEAM-1693:
-
Issue Type: Improvement  (was: Bug)

> Detect suitable Python & pip executables in Python-SDK
> --
>
> Key: BEAM-1693
> URL: https://issues.apache.org/jira/browse/BEAM-1693
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>
> Python SDK currently supports Python-2.7 only.
> The Python interpreter & pip definition in pom.xml points to {{python2}} & 
> {{pip2}} respectively. 
> Users with multiple Python interpreters installed might end up having python2 
> and pip2 pointing to their 2.6 installation. (This scenario happens mostly on 
> OS X machines.)
> There is no single, valid name for the executables as different OSes install 
> those binaries in various names:
> - CentOS6/EPEL: pip (python 2.6) & pip2 (python 2.6) & pip2.6 (python 2.6)
> - CentOS7/EPEL: pip (python 2.7) & pip2 (python 2.7) & pip2.7 (python 2.7)
> - Debian7: pip (python 2.7) & pip-2.6 (python 2.6) & pip-2.7 (python 2.7)
> - Debian8: pip (python 2.7) & pip2 (python 2.7)
> - Debian9: pip (python 2.7) & pip2 (python 2.7)
> - Ubuntu1204: pip (python 2.7)
> - Ubuntu1404: pip2 (python 2.7)
> - Ubuntu1604: pip (python 2.7) & pip2 (python 2.7)
> - OS X: pip (python 2.6) & pip2 (python 2.6) & pip2.7 (brew / python 2.7)
> - Windows: pip-2.7 (python.org based installer)
> To overcome this problem the pom.xml should be extended to determine the 
> suitable Python interpreter & pip automatically, in a platform independent 
> way.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #2885

2017-03-12 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1694) Fix docstring inaccuracies in Python-SDK

2017-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906459#comment-15906459
 ] 

ASF GitHub Bot commented on BEAM-1694:
--

GitHub user tibkiss opened a pull request:

https://github.com/apache/beam/pull/2228

[BEAM-1694] Fix docstring inaccuracies in Python-SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tibkiss/beam BEAM-1694

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2228


commit 551919a359e3ef16db15e2e10f98ef258f655a33
Author: Tibor Kiss 
Date:   2017-03-11T06:13:04Z

[BEAM-1694] Fix docstring inaccuracies in Python-SDK




> Fix docstring inaccuracies in Python-SDK
> 
>
> Key: BEAM-1694
> URL: https://issues.apache.org/jira/browse/BEAM-1694
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Trivial
>
> Certain functions / classes has inaccurate documentation: 
>  - missing/non-matching arguments between code & doc
>  - arguments present in docstring but not in code
>  - single quotation used for docstring
> These issues should be fixed to not to mislead developers browsing the 
> API-doc. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2228: [BEAM-1694] Fix docstring inaccuracies in Python-SD...

2017-03-12 Thread tibkiss
GitHub user tibkiss opened a pull request:

https://github.com/apache/beam/pull/2228

[BEAM-1694] Fix docstring inaccuracies in Python-SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tibkiss/beam BEAM-1694

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2228


commit 551919a359e3ef16db15e2e10f98ef258f655a33
Author: Tibor Kiss 
Date:   2017-03-11T06:13:04Z

[BEAM-1694] Fix docstring inaccuracies in Python-SDK




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1629) Metrics/aggregators accumulators should be instantiated before traversing pipeline

2017-03-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906453#comment-15906453
 ] 

ASF GitHub Bot commented on BEAM-1629:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2171


> Metrics/aggregators accumulators should be instantiated before traversing 
> pipeline
> --
>
> Key: BEAM-1629
> URL: https://issues.apache.org/jira/browse/BEAM-1629
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Today aggregators and metrics accumulators are instantiated after pipeline is 
> traversed, and so, if code that runs on driver accesses these singletons but 
> is not part of a closure a {{NullPointerException}} is thrown.
> This is fine for streaming branches of a pipeline as you probably did 
> something wrong, but for batch branches this should be allowed (In completely 
> batch pipelines accumulators are instantiated before traversal and everything 
> works fine, the issue occurs in streaming+batch combined pipelines) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/2] beam git commit: [BEAM-1629] Init metrics/aggregators accumulators before traversing pipeline

2017-03-12 Thread staslevin
Repository: beam
Updated Branches:
  refs/heads/master d16715309 -> b6ca062fc


[BEAM-1629] Init metrics/aggregators accumulators before traversing pipeline


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/874c8d0d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/874c8d0d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/874c8d0d

Branch: refs/heads/master
Commit: 874c8d0da65568b01cd5f184e303d39c7810a8bf
Parents: d167153
Author: Aviem Zur 
Authored: Mon Mar 6 20:48:48 2017 +0200
Committer: Stas Levin 
Committed: Sun Mar 12 10:02:23 2017 +0200

--
 .../spark/SparkNativePipelineVisitor.java   |  4 --
 .../beam/runners/spark/SparkPipelineResult.java |  8 +--
 .../apache/beam/runners/spark/SparkRunner.java  | 65 ++--
 .../beam/runners/spark/SparkRunnerDebugger.java | 30 ++---
 .../beam/runners/spark/TestSparkRunner.java |  4 +-
 .../aggregators/AggregatorsAccumulator.java | 44 +
 .../spark/aggregators/SparkAggregators.java | 40 ++--
 .../spark/metrics/AggregatorMetricSource.java   | 11 ++--
 .../spark/metrics/MetricsAccumulator.java   | 38 
 .../spark/metrics/SparkBeamMetricSource.java| 11 ++--
 .../spark/metrics/SparkMetricsContainer.java| 17 ++---
 .../spark/translation/TransformTranslator.java  | 13 ++--
 .../SparkRunnerStreamingContextFactory.java |  3 +
 .../streaming/StreamingTransformTranslator.java | 10 +--
 .../metrics/sink/NamedAggregatorsTest.java  | 15 +
 .../ResumeFromCheckpointStreamingTest.java  |  4 +-
 16 files changed, 156 insertions(+), 161 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/874c8d0d/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkNativePipelineVisitor.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkNativePipelineVisitor.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkNativePipelineVisitor.java
index 056da97..c2784a2 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkNativePipelineVisitor.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkNativePipelineVisitor.java
@@ -19,7 +19,6 @@
 package org.apache.beam.runners.spark;
 
 import com.google.common.base.Joiner;
-import com.google.common.base.Optional;
 import com.google.common.base.Predicate;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
@@ -27,11 +26,9 @@ import java.lang.reflect.Field;
 import java.lang.reflect.InvocationTargetException;
 import java.util.ArrayList;
 import java.util.List;
-import org.apache.beam.runners.spark.metrics.MetricsAccumulator;
 import org.apache.beam.runners.spark.translation.EvaluationContext;
 import org.apache.beam.runners.spark.translation.SparkPipelineTranslator;
 import org.apache.beam.runners.spark.translation.TransformEvaluator;
-import org.apache.beam.runners.spark.translation.streaming.Checkpoint;
 import org.apache.beam.sdk.io.Read;
 import org.apache.beam.sdk.runners.TransformHierarchy;
 import org.apache.beam.sdk.transforms.MapElements;
@@ -55,7 +52,6 @@ public class SparkNativePipelineVisitor extends 
SparkRunner.Evaluator {
   SparkNativePipelineVisitor(SparkPipelineTranslator translator, 
EvaluationContext ctxt) {
 super(translator, ctxt);
 this.transforms = new ArrayList<>();
-MetricsAccumulator.init(ctxt.getSparkContext(), 
Optional.absent());
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/beam/blob/874c8d0d/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
index ddc1964..ed1e0c8 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
@@ -27,7 +27,6 @@ import java.util.concurrent.TimeoutException;
 import org.apache.beam.runners.spark.aggregators.SparkAggregators;
 import org.apache.beam.runners.spark.metrics.SparkMetricResults;
 import org.apache.beam.runners.spark.translation.SparkContextFactory;
-import org.apache.beam.sdk.AggregatorRetrievalException;
 import org.apache.beam.sdk.AggregatorValues;
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.PipelineResult;
@@ -84,13 +83,12 @@ public abstract class SparkPipelineResult implements 
PipelineResult {
   throws TimeoutException, ExecutionException, InterruptedException;
 
   

[2/2] beam git commit: This closes #2171

2017-03-12 Thread staslevin
This closes #2171


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b6ca062f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b6ca062f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b6ca062f

Branch: refs/heads/master
Commit: b6ca062fcfa31884baf08b804d04c12dee10b62e
Parents: d167153 874c8d0
Author: Stas Levin 
Authored: Sun Mar 12 10:02:30 2017 +0200
Committer: Stas Levin 
Committed: Sun Mar 12 10:02:30 2017 +0200

--
 .../spark/SparkNativePipelineVisitor.java   |  4 --
 .../beam/runners/spark/SparkPipelineResult.java |  8 +--
 .../apache/beam/runners/spark/SparkRunner.java  | 65 ++--
 .../beam/runners/spark/SparkRunnerDebugger.java | 30 ++---
 .../beam/runners/spark/TestSparkRunner.java |  4 +-
 .../aggregators/AggregatorsAccumulator.java | 44 +
 .../spark/aggregators/SparkAggregators.java | 40 ++--
 .../spark/metrics/AggregatorMetricSource.java   | 11 ++--
 .../spark/metrics/MetricsAccumulator.java   | 38 
 .../spark/metrics/SparkBeamMetricSource.java| 11 ++--
 .../spark/metrics/SparkMetricsContainer.java| 17 ++---
 .../spark/translation/TransformTranslator.java  | 13 ++--
 .../SparkRunnerStreamingContextFactory.java |  3 +
 .../streaming/StreamingTransformTranslator.java | 10 +--
 .../metrics/sink/NamedAggregatorsTest.java  | 15 +
 .../ResumeFromCheckpointStreamingTest.java  |  4 +-
 16 files changed, 156 insertions(+), 161 deletions(-)
--




[jira] [Created] (BEAM-1694) Fix docstring inaccuracies in Python-SDK

2017-03-12 Thread Tibor Kiss (JIRA)
Tibor Kiss created BEAM-1694:


 Summary: Fix docstring inaccuracies in Python-SDK
 Key: BEAM-1694
 URL: https://issues.apache.org/jira/browse/BEAM-1694
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Tibor Kiss
Assignee: Tibor Kiss
Priority: Trivial


Certain functions / classes has inaccurate documentation: 
 - missing/non-matching arguments between code & doc
 - arguments present in docstring but not in code
 - single quotation used for docstring

These issues should be fixed to not to mislead developers browsing the API-doc. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)