[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey

2017-06-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061796#comment-16061796
 ] 

ASF GitHub Bot commented on BEAM-2477:
--

Github user JingsongLi closed the pull request at:

https://github.com/apache/beam/pull/3398


> BeamAggregationRel should use Combine.perKey instead of GroupByKey
> --
>
> Key: BEAM-2477
> URL: https://issues.apache.org/jira/browse/BEAM-2477
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>  Labels: dsl_sql_merge
>
> Their semantics are the same, but the efficiency of implementation is quite 
> different, and at the runner level there is a lot of optimization for 
> `Combine.perKey`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4205

2017-06-23 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Dataflow #3438

2017-06-23 Thread Apache Jenkins Server
See 




[jira] [Comment Edited] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061713#comment-16061713
 ] 

Guillermo Rodríguez Cano edited comment on BEAM-2490 at 6/24/17 1:10 AM:
-

  Hello again,

 as I commented before, and after fixing the shell's expansion I think I am 
having a similar issue, in both Dataflow and Direct runners. I am not sure if 
it is the glob operator or the combination with the gzip compression.
I simplified my pipeline to emulate a simple grep of some JSON files this time:

{code:none}
with beam.Pipeline(options=pipeline_options) as p:
raw_events = p | 'Read input' >> ReadFromText(known_args.input)

events = raw_events | 'Generate events' >> beam.ParDo(ExtractEventsFn())

filtered_events = (events
   | 'Filter for a specific user' >> beam.Filter(lambda e: 
e['user'] == '123')
   | 'Filter for a specific video' >> beam.Filter(lambda e: 
e['video'] == '456')
  )

output = (filtered_events 
  | 'Format output events' >> beam.Map(lambda e: '%s @ %s (%s - %s 
- %s)' % (datetime.fromtimestamp(e['timestamp']/1000).isoformat(), e['type'], 
e['user'], e['video'], e['device']))
  | 'Write results' >> WriteToText(known_args.output)
 )
{code}

When I run the pipeline with the input files decompressed with either the 
Direct or Dataflow runners I obtain the expected result (as compared with me 
parsing the input files in the command line with the grep command) while when I 
run the pipeline with the files compressed (gzip) I obtain, with both runners, 
a rather minimal subset (as in <2%) of the expected result.
When running with Dataflow runner I was using the Google Cloud Storage, while 
with the Direct runner I was using my local hard-disk (I had to reduce the 
initial subset of files from 48 files to just 8 files where I know the result 
is generated from due to high swap memory used by the Python process in the 
laptop for the uncompressed scenario).

The similarities with [~oliviernguyenquoc] are that the compressed files I am 
using are around the same size (200 MB, decompressed are about 1.5-2 GB) and 
text files (JSON in my case).

Interestingly enough I also tried loading each compressed file as a PCollection 
directly in the source code and then merge them with a Flatten transform. I got 
similar unsuccessful results with the Direct runner (I did not try with the 
Dataflow runner). Similar because the output was slightly different than when 
using the glob operator in the directory where the files are.

It feels as if Apache Beam is sampling the files when they are gzip compressed 
regardless of the runner used.


was (Author: wileeam):
  Hello again,

 as I commented before, and after fixing the shell's expansion I think I am 
having a similar issue, in both Dataflow and Direct runners. I am not sure if 
it is the glob operator or the combination with the gzip compression.
I simplified my pipeline to emulate a simple grep of some JSON files this time:

{code:none}
with beam.Pipeline(options=pipeline_options) as p:
raw_events = p | 'Read input' >> ReadFromText(known_args.input)

events = raw_events | 'Generate events' >> beam.ParDo(ExtractEventsFn())

filtered_events = (events
   | 'Filter for a specific user' >> beam.Filter(lambda e: 
e['user'] == '123')
   | 'Filter for a specific video' >> beam.Filter(lambda e: 
e['video'] == '456')
  )

output = (filtered_events 
  | 'Format output events' >> beam.Map(lambda e: '%s @ %s (%s - %s 
- %s)' % (datetime.fromtimestamp(e['timestamp']/1000).isoformat(), e['type'], 
e['user'], e['video'], e['device']))
  | 'Write results' >> WriteToText(known_args.output)
 )
{code}

When I run the pipeline with the input files decompressed with either the 
Direct or Dataflow runners I obtain the expected result (as compared with me 
parsing the input files in the command line with the grep command) while when I 
run the pipeline with the files compressed (gzip) I obtain, with both runners, 
a rather minimal subset (as in <2%) of the expected result.
When running with Dataflow runner I was using the Google Cloud Storage, while 
with the Direct runner I was using my local hard-disk (I had to reduce the 
initial subset of files from 48 files to just 8 files where I know the result 
is generated from due to high swap memory used by the Python process in the 
laptop for the uncompressed scenario).

The similarities with [~oliviernguyenquoc] are that the compressed files I am 
using are around the same size (200 MB, decompressed are about 1.5-2 GB) and 
text files (JSON in my case).

Interestingly enough I also tried loading each file as a PCollection directly 
in the source code and then merge them with 

[jira] [Comment Edited] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061713#comment-16061713
 ] 

Guillermo Rodríguez Cano edited comment on BEAM-2490 at 6/24/17 1:09 AM:
-

  Hello again,

 as I commented before, and after fixing the shell's expansion I think I am 
having a similar issue, in both Dataflow and Direct runners. I am not sure if 
it is the glob operator or the combination with the gzip compression.
I simplified my pipeline to emulate a simple grep of some JSON files this time:

{code:none}
with beam.Pipeline(options=pipeline_options) as p:
raw_events = p | 'Read input' >> ReadFromText(known_args.input)

events = raw_events | 'Generate events' >> beam.ParDo(ExtractEventsFn())

filtered_events = (events
   | 'Filter for a specific user' >> beam.Filter(lambda e: 
e['user'] == '123')
   | 'Filter for a specific video' >> beam.Filter(lambda e: 
e['video'] == '456')
  )

output = (filtered_events 
  | 'Format output events' >> beam.Map(lambda e: '%s @ %s (%s - %s 
- %s)' % (datetime.fromtimestamp(e['timestamp']/1000).isoformat(), e['type'], 
e['user'], e['video'], e['device']))
  | 'Write results' >> WriteToText(known_args.output)
 )
{code}

When I run the pipeline with the input files decompressed with either the 
Direct or Dataflow runners I obtain the expected result (as compared with me 
parsing the input files in the command line with the grep command) while when I 
run the pipeline with the files compressed (gzip) I obtain, with both runners, 
a rather minimal subset (as in <2%) of the expected result.
When running with Dataflow runner I was using the Google Cloud Storage, while 
with the Direct runner I was using my local hard-disk (I had to reduce the 
initial subset of files from 48 files to just 8 files where I know the result 
is generated from due to high swap memory used by the Python process in the 
laptop for the uncompressed scenario).

The similarities with [~oliviernguyenquoc] are that the compressed files I am 
using are around the same size (200 MB, decompressed are about 1.5-2 GB) and 
text files (JSON in my case).

Interestingly enough I also tried loading each file as a PCollection directly 
in the source code and then merge them with a Flatten transform. I got similar 
unsuccessful results with the Direct runner (I did not try with the Dataflow 
runner). Similar because the output was slightly different than when using the 
glob operator in the directory where the files are.

It feels as if Apache Beam is sampling the files when they are gzip compressed


was (Author: wileeam):
  Hello again,

 as I commented before, and after fixing the shell's expansion I think I am 
having a similar issue, in both Dataflow and Direct runners. I am not sure if 
it is the glob operator or the combination with the gzip compression.
I simplified my pipeline to emulate a simple grep of some JSON files this time:

{code:none}
with beam.Pipeline(options=pipeline_options) as p:
raw_events = p | 'Read input' >> ReadFromText(known_args.input)

events = raw_events | 'Generate events' >> beam.ParDo(ExtractEventsFn())

filtered_events = (events
   | 'Filter for a specific user' >> beam.Filter(lambda 
e: e['user'] == '123')
   | 'Filter for a specific video' >> 
beam.Filter(lambda e: e['video'] == '456')
  )

output = (filtered_events 
  | 'Format output events' >> beam.Map(lambda e: '%s @ %s (%s - 
%s - %s)' % (datetime.fromtimestamp(e['timestamp']/1000).isoformat(), 
e['type'], e['user'], e['video'], e['device']))
  | 'Write results' >> WriteToText(known_args.output)
 )
{code}

When I run the pipeline with the input files decompressed with either the 
Direct or Dataflow runners I obtain the expected result (as compared with me 
parsing the input files in the command line with the grep command) while when I 
run the pipeline with the files compressed (gzip) I obtain, with both runners, 
a rather minimal subset (as in <2%) of the expected result.
When running with Dataflow runner I was using the Google Cloud Storage, while 
with the Direct runner I was using my local hard-disk (I had to reduce the 
initial subset of files from 48 files to just 8 files where I know the result 
is generated from due to high swap memory used by the Python process in the 
laptop for the uncompressed scenario).

The similarities with [~oliviernguyenquoc] are that the compressed files I am 
using are around the same size (200 MB, decompressed are about 1.5-2 GB) and 
text files (JSON in my case).

Interestingly enough I also tried loading each file as a PCollection directly 
in the source code and then merge them 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2466

2017-06-23 Thread Apache Jenkins Server
See 




[GitHub] beam-site pull request #260: Mention python setup.py sdist

2017-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/260


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/3] beam-site git commit: Mention python setup.py sdist and link to its docs

2017-06-23 Thread altay
Repository: beam-site
Updated Branches:
  refs/heads/asf-site 1c164e086 -> 7360cb748


Mention python setup.py sdist and link to its docs


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/d3fb6784
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/d3fb6784
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/d3fb6784

Branch: refs/heads/asf-site
Commit: d3fb67846b1963822c7c396dc29db995a79aa611
Parents: 1c164e0
Author: JP Martin 
Authored: Fri Jun 23 15:52:33 2017 -0700
Committer: Ahmet Altay 
Committed: Fri Jun 23 17:31:50 2017 -0700

--
 src/documentation/sdks/python-pipeline-dependencies.md | 6 ++
 1 file changed, 6 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/d3fb6784/src/documentation/sdks/python-pipeline-dependencies.md
--
diff --git a/src/documentation/sdks/python-pipeline-dependencies.md 
b/src/documentation/sdks/python-pipeline-dependencies.md
index 916a9b5..9a4ebe7 100644
--- a/src/documentation/sdks/python-pipeline-dependencies.md
+++ b/src/documentation/sdks/python-pipeline-dependencies.md
@@ -49,6 +49,12 @@ If your pipeline uses packages that are not available 
publicly (e.g. packages th
 
 --extra_package /path/to/package/package-name
 
+   where package-name is the package's tarball. If you have the `setup.py` for 
that
+   package then you can build the tarball with the following command:
+
+python setup.py sdist
+
+   See the [sdist 
documentation](https://docs.python.org/2/distutils/sourcedist.html) for more 
details on this command.
 
 ## Multiple File Dependencies
 



[3/3] beam-site git commit: This closes #260

2017-06-23 Thread altay
This closes #260


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/7360cb74
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/7360cb74
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/7360cb74

Branch: refs/heads/asf-site
Commit: 7360cb7486b36cc98df6c8eacf726f4a0194275d
Parents: 1c164e0 9454311
Author: Ahmet Altay 
Authored: Fri Jun 23 17:35:48 2017 -0700
Committer: Ahmet Altay 
Committed: Fri Jun 23 17:35:48 2017 -0700

--
 .../sdks/python-pipeline-dependencies/index.html| 9 +
 src/documentation/sdks/python-pipeline-dependencies.md  | 6 ++
 2 files changed, 15 insertions(+)
--




[1/2] beam git commit: Fix a typo in function args

2017-06-23 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master a90e40ae9 -> 16f87f49f


Fix a typo in function args


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e45f522d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e45f522d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e45f522d

Branch: refs/heads/master
Commit: e45f522d6e945899c20259ebf8faca105c2e552e
Parents: a90e40a
Author: Valentyn Tymofieiev 
Authored: Fri Jun 23 16:44:49 2017 -0700
Committer: Ahmet Altay 
Committed: Fri Jun 23 17:34:17 2017 -0700

--
 sdks/python/apache_beam/examples/streaming_wordcount.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/e45f522d/sdks/python/apache_beam/examples/streaming_wordcount.py
--
diff --git a/sdks/python/apache_beam/examples/streaming_wordcount.py 
b/sdks/python/apache_beam/examples/streaming_wordcount.py
index f2b179a..4c29f2b 100644
--- a/sdks/python/apache_beam/examples/streaming_wordcount.py
+++ b/sdks/python/apache_beam/examples/streaming_wordcount.py
@@ -33,7 +33,7 @@ import apache_beam.transforms.window as window
 
 def split_fn(lines):
   import re
-  return re.findall(r'[A-Za-z\']+', x)
+  return re.findall(r'[A-Za-z\']+', lines)
 
 
 def run(argv=None):



[GitHub] beam pull request #3435: Fix a typo in function args

2017-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3435


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3435

2017-06-23 Thread altay
This closes #3435


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/16f87f49
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/16f87f49
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/16f87f49

Branch: refs/heads/master
Commit: 16f87f49f20796e29d01ed363a9097ea5420583c
Parents: a90e40a e45f522
Author: Ahmet Altay 
Authored: Fri Jun 23 17:34:21 2017 -0700
Committer: Ahmet Altay 
Committed: Fri Jun 23 17:34:21 2017 -0700

--
 sdks/python/apache_beam/examples/streaming_wordcount.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2464

2017-06-23 Thread Apache Jenkins Server
See 




Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3437

2017-06-23 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3435: Fix a typo in function args

2017-06-23 Thread tvalentyn
GitHub user tvalentyn opened a pull request:

https://github.com/apache/beam/pull/3435

Fix a typo in function args

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tvalentyn/beam fix_typo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3435.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3435


commit 803af24e3929fa60a2ac034a269dd427d20d99ad
Author: Valentyn Tymofieiev 
Date:   2017-06-23T23:44:49Z

Fix a typo in function args




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Avoid pickling the entire pipeline per-transform.

2017-06-23 Thread robertwb
Repository: beam
Updated Branches:
  refs/heads/master 9acce7150 -> a90e40ae9


Avoid pickling the entire pipeline per-transform.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/903da41a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/903da41a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/903da41a

Branch: refs/heads/master
Commit: 903da41ac5395e76c44ef8ae1c8a695569e23abb
Parents: 9acce71
Author: Robert Bradshaw 
Authored: Fri Jun 23 15:01:42 2017 -0700
Committer: Robert Bradshaw 
Committed: Fri Jun 23 16:39:51 2017 -0700

--
 sdks/python/apache_beam/pipeline.py  |  7 +++
 sdks/python/apache_beam/pipeline_test.py | 18 ++
 2 files changed, 25 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/903da41a/sdks/python/apache_beam/pipeline.py
--
diff --git a/sdks/python/apache_beam/pipeline.py 
b/sdks/python/apache_beam/pipeline.py
index d84a2b7..724c87d 100644
--- a/sdks/python/apache_beam/pipeline.py
+++ b/sdks/python/apache_beam/pipeline.py
@@ -466,6 +466,13 @@ class Pipeline(object):
 self.transforms_stack.pop()
 return pvalueish_result
 
+  def __reduce__(self):
+# Some transforms contain a reference to their enclosing pipeline,
+# which in turn reference all other transforms (resulting in quadratic
+# time/space to pickle each transform individually).  As we don't
+# require pickled pipelines to be executable, break the chain here.
+return str, ('Pickled pipeline stub.',)
+
   def _verify_runner_api_compatible(self):
 class Visitor(PipelineVisitor):  # pylint: disable=used-before-assignment
   ok = True  # Really a nonlocal.

http://git-wip-us.apache.org/repos/asf/beam/blob/903da41a/sdks/python/apache_beam/pipeline_test.py
--
diff --git a/sdks/python/apache_beam/pipeline_test.py 
b/sdks/python/apache_beam/pipeline_test.py
index f9b894f..aad0143 100644
--- a/sdks/python/apache_beam/pipeline_test.py
+++ b/sdks/python/apache_beam/pipeline_test.py
@@ -480,6 +480,24 @@ class RunnerApiTest(unittest.TestCase):
 p2 = Pipeline.from_runner_api(proto, p.runner, p._options)
 p2.run()
 
+  def test_pickling(self):
+class MyPTransform(beam.PTransform):
+  pickle_count = [0]
+
+  def expand(self, p):
+self.p = p
+return p | beam.Create([None])
+
+  def __reduce__(self):
+self.pickle_count[0] += 1
+return str, ()
+
+p = beam.Pipeline()
+for k in range(20):
+  p | 'Iter%s' % k >> MyPTransform()  # pylint: 
disable=expression-not-assigned
+p.to_runner_api()
+self.assertEqual(MyPTransform.pickle_count[0], 20)
+
 
 if __name__ == '__main__':
   logging.getLogger().setLevel(logging.DEBUG)



[2/2] beam git commit: Closes #3433

2017-06-23 Thread robertwb
Closes #3433


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a90e40ae
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a90e40ae
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a90e40ae

Branch: refs/heads/master
Commit: a90e40ae9f748c1c8392c1198469f3229a06ed70
Parents: 9acce71 903da41
Author: Robert Bradshaw 
Authored: Fri Jun 23 16:39:52 2017 -0700
Committer: Robert Bradshaw 
Committed: Fri Jun 23 16:39:52 2017 -0700

--
 sdks/python/apache_beam/pipeline.py  |  7 +++
 sdks/python/apache_beam/pipeline_test.py | 18 ++
 2 files changed, 25 insertions(+)
--




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4202

2017-06-23 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3434: Use SDK harness container for FnAPI jobs when worke...

2017-06-23 Thread tvalentyn
GitHub user tvalentyn opened a pull request:

https://github.com/apache/beam/pull/3434

Use SDK harness container for FnAPI jobs when worker_harness_containe…

…r_image is not specified. By default the image tag corresponds to the 
version of the release, same as for legacy images.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tvalentyn/beam default_container_name

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3434.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3434


commit 925ec45d2a59fa3f45e9d600fae594cc9d8c3536
Author: Valentyn Tymofieiev 
Date:   2017-06-23T23:15:28Z

Use SDK harness container for FnAPI jobs when 
worker_harness_container_image is not specified. By default the image tag 
corresponds to the version of the release, same as for legacy images.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2510) Link to SDK pages from the quickstarts

2017-06-23 Thread Melissa Pashniak (JIRA)
Melissa Pashniak created BEAM-2510:
--

 Summary: Link to SDK pages from the quickstarts
 Key: BEAM-2510
 URL: https://issues.apache.org/jira/browse/BEAM-2510
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: Melissa Pashniak
Assignee: Melissa Pashniak
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam-site pull request #260: Mention python setup.py sdist

2017-06-23 Thread jean-philippe-martin
GitHub user jean-philippe-martin opened a pull request:

https://github.com/apache/beam-site/pull/260

Mention python setup.py sdist

I was stuck at that point, so thought I'd suggest a documentation update to 
help others who may run into the same issue.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jean-philippe-martin/beam-site 
jp_expand_on_deps

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #260


commit 5f3115a185ab16678b8764440abdf2c3f499ac6b
Author: JP Martin 
Date:   2017-06-23T22:30:45Z

Mention python setup.py sdist




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4201

2017-06-23 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3431: Fix python fn API data plane remote grpc port acces...

2017-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3431


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Fix python fn API data plane remote grpc port access

2017-06-23 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master af69e979a -> 9acce7150


Fix python fn API data plane remote grpc port access


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/32095487
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/32095487
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/32095487

Branch: refs/heads/master
Commit: 32095487e56b63b5c1aa690bb6e098375cb108d5
Parents: af69e97
Author: Vikas Kedigehalli 
Authored: Fri Jun 23 11:50:12 2017 -0700
Committer: Ahmet Altay 
Committed: Fri Jun 23 15:13:27 2017 -0700

--
 sdks/python/apache_beam/runners/worker/data_plane.py | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/32095487/sdks/python/apache_beam/runners/worker/data_plane.py
--
diff --git a/sdks/python/apache_beam/runners/worker/data_plane.py 
b/sdks/python/apache_beam/runners/worker/data_plane.py
index bc981a8..26f65ee 100644
--- a/sdks/python/apache_beam/runners/worker/data_plane.py
+++ b/sdks/python/apache_beam/runners/worker/data_plane.py
@@ -246,8 +246,8 @@ class DataChannelFactory(object):
   __metaclass__ = abc.ABCMeta
 
   @abc.abstractmethod
-  def create_data_channel(self, function_spec):
-"""Returns a ``DataChannel`` from the given function_spec."""
+  def create_data_channel(self, remote_grpc_port):
+"""Returns a ``DataChannel`` from the given RemoteGrpcPort."""
 raise NotImplementedError(type(self))
 
   @abc.abstractmethod
@@ -265,9 +265,7 @@ class GrpcClientDataChannelFactory(DataChannelFactory):
   def __init__(self):
 self._data_channel_cache = {}
 
-  def create_data_channel(self, function_spec):
-remote_grpc_port = beam_fn_api_pb2.RemoteGrpcPort()
-function_spec.data.Unpack(remote_grpc_port)
+  def create_data_channel(self, remote_grpc_port):
 url = remote_grpc_port.api_service_descriptor.url
 if url not in self._data_channel_cache:
   logging.info('Creating channel for %s', url)
@@ -289,7 +287,7 @@ class InMemoryDataChannelFactory(DataChannelFactory):
   def __init__(self, in_memory_data_channel):
 self._in_memory_data_channel = in_memory_data_channel
 
-  def create_data_channel(self, unused_function_spec):
+  def create_data_channel(self, unused_remote_grpc_port):
 return self._in_memory_data_channel
 
   def close(self):



[2/2] beam git commit: This closes #3431

2017-06-23 Thread altay
This closes #3431


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9acce715
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9acce715
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9acce715

Branch: refs/heads/master
Commit: 9acce7150ee8fa6c9e50049155ad7b85b646f98e
Parents: af69e97 3209548
Author: Ahmet Altay 
Authored: Fri Jun 23 15:13:30 2017 -0700
Committer: Ahmet Altay 
Committed: Fri Jun 23 15:13:30 2017 -0700

--
 sdks/python/apache_beam/runners/worker/data_plane.py | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)
--




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2462

2017-06-23 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-06-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061529#comment-16061529
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3432

[BEAM-1347] Create a DoFnRunner specific for the Fn API

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---
This allows me to work on the Fn State API without needing to use 
StateInternals which brings in a bunch of structure which is orthogonal to how 
stuff works with the Fn State API (specifically regarding caching and state 
keys).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3432


commit 0825eb4a871616239dd4984b791f34731279049e
Author: Luke Cwik 
Date:   2017-06-23T21:31:58Z

[BEAM-1347] Rename DoFnRunnerFactory to FnApiDoFnRunner.

commit b7ebfca05bf4d8ccced64f9552f7b03d01689dc3
Author: Luke Cwik 
Date:   2017-06-23T21:34:36Z

[BEAM-1347] Add DoFnRunner specific to Fn Api.




> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3432: [BEAM-1347] Create a DoFnRunner specific for the Fn...

2017-06-23 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3432

[BEAM-1347] Create a DoFnRunner specific for the Fn API

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---
This allows me to work on the Fn State API without needing to use 
StateInternals which brings in a bunch of structure which is orthogonal to how 
stuff works with the Fn State API (specifically regarding caching and state 
keys).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3432


commit 0825eb4a871616239dd4984b791f34731279049e
Author: Luke Cwik 
Date:   2017-06-23T21:31:58Z

[BEAM-1347] Rename DoFnRunnerFactory to FnApiDoFnRunner.

commit b7ebfca05bf4d8ccced64f9552f7b03d01689dc3
Author: Luke Cwik 
Date:   2017-06-23T21:34:36Z

[BEAM-1347] Add DoFnRunner specific to Fn Api.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3426

2017-06-23 Thread altay
This closes #3426


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/af69e979
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/af69e979
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/af69e979

Branch: refs/heads/master
Commit: af69e979ab3037274bdd78904bdeb5016563bcad
Parents: 6cef5c7 fd8f15f
Author: Ahmet Altay 
Authored: Fri Jun 23 14:03:17 2017 -0700
Committer: Ahmet Altay 
Committed: Fri Jun 23 14:03:17 2017 -0700

--
 .../jenkins/common_job_properties.groovy|  4 +-
 .../job_beam_PerformanceTests_Python.groovy | 58 
 2 files changed, 61 insertions(+), 1 deletion(-)
--




[GitHub] beam pull request #3426: [BEAM-2745] Add Jenkins Suite for Python Performanc...

2017-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3426


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-2745] Add Jenkins Suite for Python Performance Test

2017-06-23 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 6cef5c7b6 -> af69e979a


[BEAM-2745] Add Jenkins Suite for Python Performance Test


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/fd8f15f1
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/fd8f15f1
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/fd8f15f1

Branch: refs/heads/master
Commit: fd8f15f1ac761425dc791a455b042a8846081f48
Parents: 6cef5c7
Author: Mark Liu 
Authored: Thu Jun 22 14:04:00 2017 -0700
Committer: Ahmet Altay 
Committed: Fri Jun 23 14:03:14 2017 -0700

--
 .../jenkins/common_job_properties.groovy|  4 +-
 .../job_beam_PerformanceTests_Python.groovy | 58 
 2 files changed, 61 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/fd8f15f1/.test-infra/jenkins/common_job_properties.groovy
--
diff --git a/.test-infra/jenkins/common_job_properties.groovy 
b/.test-infra/jenkins/common_job_properties.groovy
index 6d4d68b..0e047ea 100644
--- a/.test-infra/jenkins/common_job_properties.groovy
+++ b/.test-infra/jenkins/common_job_properties.groovy
@@ -264,8 +264,10 @@ class common_job_properties {
 shell('rm -rf PerfKitBenchmarker')
 // Clone appropriate perfkit branch
 shell('git clone 
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git')
-// Install job requirements.
+// Install Perfkit benchmark requirements.
 shell('pip install --user -r PerfKitBenchmarker/requirements.txt')
+// Install job requirements for Python SDK.
+shell('pip install --user -e sdks/python/[gcp,test]')
 // Launch performance test.
 shell("python PerfKitBenchmarker/pkb.py $pkbArgs")
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/fd8f15f1/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy
--
diff --git a/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy 
b/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy
new file mode 100644
index 000..6a71bda
--- /dev/null
+++ b/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import common_job_properties
+
+// This job runs the Beam Python performance tests on PerfKit Benchmarker.
+job('beam_PerformanceTests_Python'){
+  // Set default Beam job properties.
+  common_job_properties.setTopLevelMainJobProperties(delegate)
+
+  // Run job in postcommit every 6 hours, don't trigger every push.
+  common_job_properties.setPostCommit(
+  delegate,
+  '0 */6 * * *',
+  false,
+  'commits@beam.apache.org')
+
+  // Allows triggering this build against pull requests.
+  common_job_properties.enablePhraseTriggeringFromPullRequest(
+  delegate,
+  'Python SDK Performance Test',
+  'Run Python Performance Test')
+
+  def pipelineArgs = [
+  project: 'apache-beam-testing',
+  staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
+  temp_location: 'gs://temp-storage-for-end-to-end-tests/temp-it',
+  output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
+  ]
+  def pipelineArgList = []
+  pipelineArgs.each({
+key, value -> pipelineArgList.add("--$key=$value")
+  })
+  def pipelineArgsJoined = pipelineArgList.join(',')
+
+  def argMap = [
+  beam_sdk : 'python',
+  benchmarks: 'beam_integration_benchmark',
+  beam_it_args: pipelineArgsJoined
+  ]
+
+  common_job_properties.buildPerformanceTest(delegate, argMap)
+}



[jira] [Commented] (BEAM-2465) Python Stream Runner

2017-06-23 Thread Willian Fuks (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061425#comment-16061425
 ] 

Willian Fuks commented on BEAM-2465:


Thanks [~altay] :)! Will keep a track on those! 

> Python Stream Runner
> 
>
> Key: BEAM-2465
> URL: https://issues.apache.org/jira/browse/BEAM-2465
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-py
>Reporter: Willian Fuks
>Assignee: Ahmet Altay
>  Labels: newbie
> Fix For: Not applicable
>
>
> I'm sorry in advance if this is not the right place for this, but I've been 
> looking everywhere I could and couldn't find the answer yet so I'll try this 
> channel.
> Is there some time estimation of when python sdk will have the features of 
> the stream runner? such as windows, watermarks, triggers and so on?
> I'm asking this because I'd like to start using it at my work and depending 
> on when the expected release date is i'll start implementing everything in 
> batch for now.
> Again, sorry if this is not the right place. I had to try :)
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4200

2017-06-23 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2509) Fn API Runner hangs in grpc controller mode

2017-06-23 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061379#comment-16061379
 ] 

Vikas Kedigehalli commented on BEAM-2509:
-

cc: [~altay]

> Fn API Runner hangs in grpc controller mode
> ---
>
> Key: BEAM-2509
> URL: https://issues.apache.org/jira/browse/BEAM-2509
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model-fn-api, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Luke Cwik
>Priority: Minor
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L312
>  tests only run in direct mode, but we should run in grpc mode as well. 
> Currently the grpc mode is broken and needs fixing. Once we enable it, these 
> tests can catch issues like https://github.com/apache/beam/pull/3431



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2509) Fn API Runner hangs in grpc controller mode

2017-06-23 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-2509:
---

 Summary: Fn API Runner hangs in grpc controller mode
 Key: BEAM-2509
 URL: https://issues.apache.org/jira/browse/BEAM-2509
 Project: Beam
  Issue Type: Bug
  Components: beam-model-fn-api, sdk-py
Reporter: Vikas Kedigehalli
Assignee: Luke Cwik
Priority: Minor


https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L312
 tests only run in direct mode, but we should run in grpc mode as well. 
Currently the grpc mode is broken and needs fixing. Once we enable it, these 
tests can catch issues like https://github.com/apache/beam/pull/3431



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3431: Fix python fn API data plane remote grpc port acces...

2017-06-23 Thread vikkyrk
GitHub user vikkyrk opened a pull request:

https://github.com/apache/beam/pull/3431

Fix python fn API data plane remote grpc port access

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vikkyrk/incubator-beam py_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3431.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3431


commit 19251ebd9ddd9e4ef162bf91c5056dfe8a595e66
Author: Vikas Kedigehalli 
Date:   2017-06-23T18:50:12Z

Fix python fn API data plane remote grpc port access




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2461

2017-06-23 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-23 Thread Chamikara Jayalath (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath updated BEAM-2490:
-
Fix Version/s: (was: 2.1.0)
   Not applicable

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
> Fix For: Not applicable
>
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-23 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061345#comment-16061345
 ] 

Chamikara Jayalath commented on BEAM-2490:
--

Sounds good. For now I'll remove this from the release 2.1.0 blockers list.

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
> Fix For: 2.1.0
>
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink #3228

2017-06-23 Thread Apache Jenkins Server
See 




[GitHub] beam-site pull request #259: Includes Splittable DoFn in the capability matr...

2017-06-23 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam-site/pull/259

Includes Splittable DoFn in the capability matrix

R: @kennknowles 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/beam-site cap-sdf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/259.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #259


commit adf7f2b61337aac6e094681f7e2c1f2be32ca1a2
Author: Eugene Kirpichov 
Date:   2017-06-23T18:05:10Z

Includes Splittable DoFn in the capability matrix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2471) Add Amazon EMR DynamoDB example using HadoopInputFormatIO

2017-06-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061207#comment-16061207
 ] 

ASF GitHub Bot commented on BEAM-2471:
--

GitHub user seshadri-cr opened a pull request:

https://github.com/apache/beam-site/pull/258

[BEAM-2471]EMR DynamoDB example using HadoopIputFormatIO

This effort it to document an example to read from Amazon EMR DynamoDB 
using HadoopInputFormatIO.
Please let me know if any additional details are required.
JIRA - https://issues.apache.org/jira/browse/BEAM-2471

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/seshadri-cr/beam-site hifio_dynamodb_example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/258.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #258


commit f2ef5e4efe5ba210c0e6bb9966228d6f29a38342
Author: Seshadri 
Date:   2017-06-22T22:55:20Z

Merge pull request #1 from apache/asf-site

Rebasing with master

commit f6d2c5f863d38f4a4486ce4b2d8b4ff4bdccc927
Author: Seshadri Chakkravarthy 
Date:   2017-06-23T16:37:39Z

EMR DynamoDB example using HadoopInputFormatIO




> Add Amazon EMR DynamoDB example using HadoopInputFormatIO
> -
>
> Key: BEAM-2471
> URL: https://issues.apache.org/jira/browse/BEAM-2471
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Affects Versions: Not applicable
>Reporter: Seshadri Raghunathan
>Assignee: Seshadri Raghunathan
>Priority: Minor
> Fix For: Not applicable
>
>
> To document an example to read from Amazon EMR DynamoDB using 
> HadoopInputFormatIO.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1187) GCP Transport not performing timed backoff after connection failure

2017-06-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061198#comment-16061198
 ] 

ASF GitHub Bot commented on BEAM-1187:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3430

[BEAM-1187] Improve logging to contain the number of retries done due to 
IOException and unsuccessful response codes.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam beam1187

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3430.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3430


commit f20a47c450f5a6b05cbb410506b233fee71dbb0c
Author: Luke Cwik 
Date:   2017-06-23T16:32:49Z

[BEAM-1187] Improve logging to contain the number of retries done due to 
IOException and unsuccessful response codes.




> GCP Transport not performing timed backoff after connection failure
> ---
>
> Key: BEAM-1187
> URL: https://issues.apache.org/jira/browse/BEAM-1187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core, sdk-java-gcp
>Reporter: Luke Cwik
>Priority: Minor
>
> The http request retries are failing and seemingly being immediately retried 
> if there is a connection exception. Note that below all the times are the 
> same, and also that we are logging too much. This seems to be related to the 
> interaction by the chaining http request initializer combining the Credential 
> initializer followed by the RetryHttpRequestInitializer. Also, note that we 
> never log "Request failed with IOException, will NOT retry" which implies 
> that the retry logic never made it to the RetryHttpRequestInitializer.
> Action items are:
> 1) Ensure that the RetryHttpRequestInitializer is used
> 2) Ensure that calls do backoff
> 3) Reduce the logging to one terminal statement saying that we retried X 
> times and final failure was YYY.
> Dump of console output:
> Dec 20, 2016 9:12:20 AM 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner fromOptions
> INFO: PipelineOptions.filesToStage was not specified. Defaulting to files 
> from the classpath: will stage 1 files. Enable logging at DEBUG level to see 
> which files will be staged.
> Dec 20, 2016 9:12:21 AM 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner run
> INFO: Executing pipeline on the Dataflow Service, which will have billing 
> implications related to Google Compute Engine usage and other Google Cloud 
> Services.
> Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
> stageClasspathElements
> INFO: Uploading 1 files from PipelineOptions.filesToStage to staging location 
> to prepare for execution.
> Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
> stageClasspathElements
> INFO: Uploading PipelineOptions.filesToStage complete: 1 files newly 
> uploaded, 0 files cached
> Dec 20, 2016 9:12:22 AM com.google.api.client.http.HttpRequest execute
> WARNING: exception thrown while executing request
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:589)
>   at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>   at sun.net.www.http.HttpClient.(HttpClient.java:211)
>   at sun.net.www.http.HttpClient.New(HttpClient.java:308)
>   at sun.net.www.http.HttpClient.New(HttpClient.java:326)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
>   at 
> 

[GitHub] beam pull request #3430: [BEAM-1187] Improve logging to contain the number o...

2017-06-23 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3430

[BEAM-1187] Improve logging to contain the number of retries done due to 
IOException and unsuccessful response codes.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam beam1187

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3430.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3430


commit f20a47c450f5a6b05cbb410506b233fee71dbb0c
Author: Luke Cwik 
Date:   2017-06-23T16:32:49Z

[BEAM-1187] Improve logging to contain the number of retries done due to 
IOException and unsuccessful response codes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-2506) Consider bundling multiple ValidatesRunner tests into one pipeline

2017-06-23 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2506:
--

Assignee: (was: Davor Bonaci)

> Consider bundling multiple ValidatesRunner tests into one pipeline
> --
>
> Key: BEAM-2506
> URL: https://issues.apache.org/jira/browse/BEAM-2506
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Eugene Kirpichov
>
> Currently ValidatesRunner test suites run 1 pipeline per unit test. That's a 
> lot of small pipelines, and consumes a lot of resources especially in case of 
> a pretty heavyweight runner like Dataflow, so tests take a long time and 
> can't be run in parallel due to quota issues, etc.
> [~jasonkuster] says he and [~davor] discussed that we could execute multiple 
> unit tests in a single TestPipeline.
> This JIRA is to track that idea.
> To further develop it: in case of Java, we could create a custom JUnit Runner 
> http://junit.org/junit4/javadoc/4.12/org/junit/runner/Runner.html that would 
> apply all the transforms and PAsserts in unit tests to a single instance of 
> TestPipeline (per class, rather than per method), and run the whole thing at 
> the end. PAssert captures the source location of its application, so we could 
> still report which particular test failed.
> This obviously has fewer isolation between unit test methods, cause they 
> effectively run in parallel instead of in sequence, so things like per-method 
> setup and teardown will no longer be applicable. There'll probably be other 
> issues.
> Anyway, this seems doable and high-impact.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2353) FileNamePolicy context parameters allow backwards compatibility where we really don't want any

2017-06-23 Thread Reuven Lax (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060978#comment-16060978
 ] 

Reuven Lax commented on BEAM-2353:
--

The dynamic FileBasedSink PR is also a breaking change to FilenamePolicy, as it 
removes basDirectory and extension from the parameter list. I don't know 
whether that PR will be done being reviewed by next week - but if not, we could 
pull the interface changes out and push them as a separate PR.

> FileNamePolicy context parameters allow backwards compatibility where we 
> really don't want any
> --
>
> Key: BEAM-2353
> URL: https://issues.apache.org/jira/browse/BEAM-2353
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Reuven Lax
> Fix For: 2.1.0
>
>
> Currently, in {{FileBasedSink}} the {{FileNamePolicy}} object accepts 
> parameters of type {{Context}} and {{WindowedContext}} respectively.
> These contexts are a coding technique to allow easy backwards compatibility 
> when adding new parameters. However, if a new parameter is added to the file 
> name policy it is likely data loss for the user to not incorporate it, so in 
> fact that is never a safe backwards compatible change.
> These are brand-new APIs and marked experimental. This is important enough I 
> think we should make the breaking change.
> We should inline all the parameters of the context, so that we _cannot_ add 
> parameters and maintain compatibility. Instead, if we have new ones we want 
> to add, it will have to be a new method or some such.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1282) DoFnTester should allow output() calls in start/finishBundle

2017-06-23 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-1282:
-

Assignee: Rune Fevang

> DoFnTester should allow output() calls in start/finishBundle
> 
>
> Key: BEAM-1282
> URL: https://issues.apache.org/jira/browse/BEAM-1282
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Daniel Mills
>Assignee: Rune Fevang
>Priority: Minor
>  Labels: starter
> Fix For: 2.1.0
>
>
> In a DoFn, users can call output() or outputWithTimestamp() during 
> start/finishBundle.  This will attempt to deduce a window for the output 
> element based on the current WindowFn and any timestamp provided. However, 
> DoFnTester always throws an exception if these methods are called from 
> start/finishBundle (because it does not have a WindowFn, so that will have to 
> be enhanced)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4199

2017-06-23 Thread Apache Jenkins Server
See 


Changes:

[jbonofre] [BEAM-2489] Use dynamic ES port in HIFIOWithElasticTest

--
[...truncated 1.21 MB...]
2017-06-23T13:31:56.916 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hive/hive-service/2.1.0/hive-service-2.1.0.jar
2017-06-23T13:31:57.006 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-hdfs/2.6.0/hadoop-hdfs-2.6.0.jar
 (7640 KB at 2465.1 KB/sec)
2017-06-23T13:31:57.006 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hive/hive-llap-server/2.1.0/hive-llap-server-2.1.0.jar
2017-06-23T13:31:57.084 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hive/hive-service-rpc/2.1.0/hive-service-rpc-2.1.0.jar
 (1503 KB at 472.9 KB/sec)
2017-06-23T13:31:57.084 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/slider/slider-core/0.90.2-incubating/slider-core-0.90.2-incubating.jar
2017-06-23T13:31:57.087 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hive/hive-service/2.1.0/hive-service-2.1.0.jar
 (472 KB at 148.4 KB/sec)
2017-06-23T13:31:57.087 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/beust/jcommander/1.30/jcommander-1.30.jar
2017-06-23T13:31:57.110 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-yarn-registry/2.7.1/hadoop-yarn-registry-2.7.1.jar
2017-06-23T13:31:57.118 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hive/hive-llap-server/2.1.0/hive-llap-server-2.1.0.jar
 (545 KB at 169.6 KB/sec)
2017-06-23T13:31:57.118 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-hadoop2-compat/1.1.1/hbase-hadoop2-compat-1.1.1.jar
2017-06-23T13:31:57.131 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/beust/jcommander/1.30/jcommander-1.30.jar
 (59 KB at 18.3 KB/sec)
2017-06-23T13:31:57.131 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-server/1.1.1/hbase-server-1.1.1.jar
2017-06-23T13:31:57.167 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-hadoop2-compat/1.1.1/hbase-hadoop2-compat-1.1.1.jar
 (80 KB at 24.3 KB/sec)
2017-06-23T13:31:57.167 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-procedure/1.1.1/hbase-procedure-1.1.1.jar
2017-06-23T13:31:57.194 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-yarn-registry/2.7.1/hadoop-yarn-registry-2.7.1.jar
 (96 KB at 29.1 KB/sec)
2017-06-23T13:31:57.194 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-common/1.1.1/hbase-common-1.1.1-tests.jar
2017-06-23T13:31:57.224 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-procedure/1.1.1/hbase-procedure-1.1.1.jar
 (100 KB at 30.0 KB/sec)
2017-06-23T13:31:57.225 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-prefix-tree/1.1.1/hbase-prefix-tree-1.1.1.jar
2017-06-23T13:31:57.255 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-common/1.1.1/hbase-common-1.1.1-tests.jar
 (206 KB at 61.3 KB/sec)
2017-06-23T13:31:57.255 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-hadoop-compat/1.1.1/hbase-hadoop-compat-1.1.1.jar
2017-06-23T13:31:57.274 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-prefix-tree/1.1.1/hbase-prefix-tree-1.1.1.jar
 (100 KB at 29.6 KB/sec)
2017-06-23T13:31:57.274 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/javax/servlet/jsp-api/2.0/jsp-api-2.0.jar
2017-06-23T13:31:57.290 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hbase/hbase-hadoop-compat/1.1.1/hbase-hadoop-compat-1.1.1.jar
 (36 KB at 10.4 KB/sec)
2017-06-23T13:31:57.290 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/jamon/jamon-runtime/2.3.1/jamon-runtime-2.3.1.jar
2017-06-23T13:31:57.299 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-hadoop-bundle/1.8.1/parquet-hadoop-bundle-1.8.1.jar
 (2835 KB at 835.6 KB/sec)
2017-06-23T13:31:57.299 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/thrift/libthrift/0.9.3/libthrift-0.9.3.jar
2017-06-23T13:31:57.311 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/javax/servlet/jsp-api/2.0/jsp-api-2.0.jar 
(50 KB at 14.5 KB/sec)
2017-06-23T13:31:57.321 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/jamon/jamon-runtime/2.3.1/jamon-runtime-2.3.1.jar
 (21 KB at 5.9 KB/sec)
2017-06-23T13:31:57.385 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/thrift/libthrift/0.9.3/libthrift-0.9.3.jar
 (229 KB at 65.8 KB/sec)
2017-06-23T13:31:57.447 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/slider/slider-core/0.90.2-incubating/slider-core-0.90.2-incubating.jar
 (1744 KB 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2460

2017-06-23 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4198

2017-06-23 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-2489) Use dynamic ES port in HIFIOWithElasticTest in module IO :: Hadoop :: jdk1.8-tests

2017-06-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-2489.

   Resolution: Fixed
Fix Version/s: 2.1.0

> Use dynamic ES port in HIFIOWithElasticTest in module IO :: Hadoop :: 
> jdk1.8-tests
> --
>
> Key: BEAM-2489
> URL: https://issues.apache.org/jira/browse/BEAM-2489
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Minor
> Fix For: 2.1.0
>
>
> HIFIOWithElasticTest unit test uses Elasticsearch standard port 9200. But, if 
> there is already an instance running on the machine, then the UT and thus the 
> build fails. It would be better to use a free port to avoid that



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink #3227

2017-06-23 Thread Apache Jenkins Server
See 


Changes:

[jbonofre] [BEAM-2489] Use dynamic ES port in HIFIOWithElasticTest

--
[...truncated 218.57 KB...]
2017-06-23T13:05:04.631 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-checkstyle-plugin/2.17/maven-checkstyle-plugin-2.17.pom
2017-06-23T13:05:04.665 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-checkstyle-plugin/2.17/maven-checkstyle-plugin-2.17.pom
 (14 KB at 383.4 KB/sec)
2017-06-23T13:05:04.669 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-checkstyle-plugin/2.17/maven-checkstyle-plugin-2.17.jar
2017-06-23T13:05:04.711 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-checkstyle-plugin/2.17/maven-checkstyle-plugin-2.17.jar
 (107 KB at 2524.5 KB/sec)
2017-06-23T13:05:04.720 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-surefire-plugin/2.20/maven-surefire-plugin-2.20.pom
2017-06-23T13:05:04.749 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-surefire-plugin/2.20/maven-surefire-plugin-2.20.pom
 (7 KB at 218.1 KB/sec)
2017-06-23T13:05:04.751 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire/2.20/surefire-2.20.pom
2017-06-23T13:05:04.780 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire/2.20/surefire-2.20.pom
 (21 KB at 700.5 KB/sec)
2017-06-23T13:05:04.785 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-surefire-plugin/2.20/maven-surefire-plugin-2.20.jar
2017-06-23T13:05:04.821 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-surefire-plugin/2.20/maven-surefire-plugin-2.20.jar
 (52 KB at 1443.8 KB/sec)
2017-06-23T13:05:04.828 [INFO] 
2017-06-23T13:05:04.828 [INFO] --- maven-clean-plugin:3.0.0:clean 
(default-clean) @ beam-sdks-java-build-tools ---
2017-06-23T13:05:04.831 [INFO] Deleting 

 (includes = [**/*.pyc, **/*.egg-info/, **/sdks/python/LICENSE, 
**/sdks/python/NOTICE, **/sdks/python/README.md], excludes = [])
2017-06-23T13:05:04.882 [INFO] 
2017-06-23T13:05:04.882 [INFO] --- maven-enforcer-plugin:1.4.1:enforce 
(enforce) @ beam-sdks-java-build-tools ---
2017-06-23T13:05:04.934 [INFO] 
2017-06-23T13:05:04.935 [INFO] --- maven-enforcer-plugin:1.4.1:enforce 
(enforce-banned-dependencies) @ beam-sdks-java-build-tools ---
2017-06-23T13:05:04.989 [INFO] 
2017-06-23T13:05:04.989 [INFO] --- maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) @ beam-sdks-java-build-tools ---
2017-06-23T13:05:05.060 [INFO] 
2017-06-23T13:05:05.061 [INFO] --- maven-resources-plugin:3.0.2:resources 
(default-resources) @ beam-sdks-java-build-tools ---
2017-06-23T13:05:05.063 [INFO] Using 'UTF-8' encoding to copy filtered 
resources.
2017-06-23T13:05:05.064 [INFO] Copying 4 resources
2017-06-23T13:05:05.066 [INFO] Copying 3 resources
2017-06-23T13:05:05.119 [INFO] 
2017-06-23T13:05:05.119 [INFO] --- maven-compiler-plugin:3.6.1:compile 
(default-compile) @ beam-sdks-java-build-tools ---
2017-06-23T13:05:05.124 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-incremental/1.1/maven-shared-incremental-1.1.pom
2017-06-23T13:05:05.152 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-incremental/1.1/maven-shared-incremental-1.1.pom
 (5 KB at 165.4 KB/sec)
2017-06-23T13:05:05.157 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.1/maven-shared-utils-0.1.pom
2017-06-23T13:05:05.185 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.1/maven-shared-utils-0.1.pom
 (4 KB at 141.1 KB/sec)
2017-06-23T13:05:05.189 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/ow2/asm/asm/6.0_ALPHA/asm-6.0_ALPHA.pom
2017-06-23T13:05:05.219 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/ow2/asm/asm/6.0_ALPHA/asm-6.0_ALPHA.pom
 (2 KB at 63.1 KB/sec)
2017-06-23T13:05:05.220 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/ow2/asm/asm-parent/6.0_ALPHA/asm-parent-6.0_ALPHA.pom
2017-06-23T13:05:05.249 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/ow2/asm/asm-parent/6.0_ALPHA/asm-parent-6.0_ALPHA.pom
 (6 KB at 185.2 KB/sec)
2017-06-23T13:05:05.252 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/thoughtworks/qdox/qdox/2.0-M5/qdox-2.0-M5.pom
2017-06-23T13:05:05.288 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/thoughtworks/qdox/qdox/2.0-M5/qdox-2.0-M5.pom
 (16 KB at 422.9 KB/sec)
2017-06-23T13:05:05.293 [INFO] 

[jira] [Commented] (BEAM-2489) Use dynamic ES port in HIFIOWithElasticTest in module IO :: Hadoop :: jdk1.8-tests

2017-06-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060855#comment-16060855
 ] 

ASF GitHub Bot commented on BEAM-2489:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3411


> Use dynamic ES port in HIFIOWithElasticTest in module IO :: Hadoop :: 
> jdk1.8-tests
> --
>
> Key: BEAM-2489
> URL: https://issues.apache.org/jira/browse/BEAM-2489
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Minor
>
> HIFIOWithElasticTest unit test uses Elasticsearch standard port 9200. But, if 
> there is already an instance running on the machine, then the UT and thus the 
> build fails. It would be better to use a free port to avoid that



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3411: [BEAM-2489] Use dynamic ES port in HIFIOWithElastic...

2017-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3411


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: [BEAM-2489] This closes #3411

2017-06-23 Thread jbonofre
[BEAM-2489] This closes #3411


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/6cef5c7b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/6cef5c7b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/6cef5c7b

Branch: refs/heads/master
Commit: 6cef5c7b625c6b5c2078084ba611b1c2adccc998
Parents: 336b7f1 f291713
Author: Jean-Baptiste Onofré 
Authored: Fri Jun 23 15:01:42 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Fri Jun 23 15:01:42 2017 +0200

--
 .../sdk/io/hadoop/inputformat/HIFIOWithElasticTest.java  | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)
--




[1/2] beam git commit: [BEAM-2489] Use dynamic ES port in HIFIOWithElasticTest

2017-06-23 Thread jbonofre
Repository: beam
Updated Branches:
  refs/heads/master 336b7f1cf -> 6cef5c7b6


[BEAM-2489] Use dynamic ES port in HIFIOWithElasticTest


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f291713b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f291713b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f291713b

Branch: refs/heads/master
Commit: f291713b28e3ba0246a8c0a710c71506cd0a0f91
Parents: 336b7f1
Author: Etienne Chauchot 
Authored: Wed Jun 21 10:39:39 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Fri Jun 23 14:25:41 2017 +0200

--
 .../sdk/io/hadoop/inputformat/HIFIOWithElasticTest.java  | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/f291713b/sdks/java/io/hadoop/jdk1.8-tests/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithElasticTest.java
--
diff --git 
a/sdks/java/io/hadoop/jdk1.8-tests/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithElasticTest.java
 
b/sdks/java/io/hadoop/jdk1.8-tests/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithElasticTest.java
index 8745521..3f866a4 100644
--- 
a/sdks/java/io/hadoop/jdk1.8-tests/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithElasticTest.java
+++ 
b/sdks/java/io/hadoop/jdk1.8-tests/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HIFIOWithElasticTest.java
@@ -20,6 +20,7 @@ package org.apache.beam.sdk.io.hadoop.inputformat;
 import java.io.File;
 import java.io.IOException;
 import java.io.Serializable;
+import java.net.ServerSocket;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.HashMap;
@@ -76,7 +77,7 @@ public class HIFIOWithElasticTest implements Serializable {
   private static final long serialVersionUID = 1L;
   private static final Logger LOG = 
LoggerFactory.getLogger(HIFIOWithElasticTest.class);
   private static final String ELASTIC_IN_MEM_HOSTNAME = "127.0.0.1";
-  private static final String ELASTIC_IN_MEM_PORT = "9200";
+  private static String elasticInMemPort = "9200";
   private static final String ELASTIC_INTERNAL_VERSION = "5.x";
   private static final String TRUE = "true";
   private static final String ELASTIC_INDEX_NAME = "beamdb";
@@ -94,6 +95,10 @@ public class HIFIOWithElasticTest implements Serializable {
   @BeforeClass
   public static void startServer()
   throws NodeValidationException, InterruptedException, IOException {
+ServerSocket serverSocket = new ServerSocket(0);
+int port = serverSocket.getLocalPort();
+serverSocket.close();
+elasticInMemPort = String.valueOf(port);
 ElasticEmbeddedServer.startElasticEmbeddedServer();
   }
 
@@ -173,7 +178,7 @@ public class HIFIOWithElasticTest implements Serializable {
   public Configuration getConfiguration() {
 Configuration conf = new Configuration();
 conf.set(ConfigurationOptions.ES_NODES, ELASTIC_IN_MEM_HOSTNAME);
-conf.set(ConfigurationOptions.ES_PORT, String.format("%s", 
ELASTIC_IN_MEM_PORT));
+conf.set(ConfigurationOptions.ES_PORT, String.format("%s", 
elasticInMemPort));
 conf.set(ConfigurationOptions.ES_RESOURCE, ELASTIC_RESOURCE);
 conf.set("es.internal.es.version", ELASTIC_INTERNAL_VERSION);
 conf.set(ConfigurationOptions.ES_NODES_DISCOVERY, TRUE);
@@ -209,7 +214,7 @@ public class HIFIOWithElasticTest implements Serializable {
   Settings settings = Settings.builder()
   .put("node.data", TRUE)
   .put("network.host", ELASTIC_IN_MEM_HOSTNAME)
-  .put("http.port", ELASTIC_IN_MEM_PORT)
+  .put("http.port", elasticInMemPort)
   .put("path.data", elasticTempFolder.getRoot().getPath())
   .put("path.home", elasticTempFolder.getRoot().getPath())
   .put("transport.type", "local")



Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2459

2017-06-23 Thread Apache Jenkins Server
See 




[jira] [Assigned] (BEAM-2389) GcpCoreApiSurfaceTest isn't testing right module

2017-06-23 Thread Michael Luckey (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Luckey reassigned BEAM-2389:


Assignee: Michael Luckey

> GcpCoreApiSurfaceTest isn't testing right module
> 
>
> Key: BEAM-2389
> URL: https://issues.apache.org/jira/browse/BEAM-2389
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Daniel Halperin
>Assignee: Michael Luckey
> Fix For: 2.1.0
>
>
> It looks like a clone of {{SdkApiSurfaceTest}} that was not updated, outside 
> of being renamed, now that it's in a new module. Even the java package of the 
> test is wrong.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-968) Update capability matrix to include gearpump-runner

2017-06-23 Thread Manu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang resolved BEAM-968.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Update capability matrix to include gearpump-runner
> ---
>
> Key: BEAM-968
> URL: https://issues.apache.org/jira/browse/BEAM-968
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump, website
>Reporter: Manu Zhang
>Assignee: Huafeng Wang
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-969) Add a gearpump runner web page under "learn/runners"

2017-06-23 Thread Manu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang resolved BEAM-969.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Add a gearpump runner web page under "learn/runners"
> 
>
> Key: BEAM-969
> URL: https://issues.apache.org/jira/browse/BEAM-969
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump, website
>Reporter: Manu Zhang
>Assignee: Huafeng Wang
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2456) Introduce generic metric poll thread interceptor & generic sink

2017-06-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-2456:
---
Summary: Introduce generic metric poll thread interceptor & generic sink  
(was: Introduce generic metric poll thread interceptor)

> Introduce generic metric poll thread interceptor & generic sink
> ---
>
> Key: BEAM-2456
> URL: https://issues.apache.org/jira/browse/BEAM-2456
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> The Spark runner provides a convenient feature which is the metric sink.
> By configuration, it allows us to configure a metric sink using 
> {{metrics.properties}} configuration containing:
> {code}
> driver.sink.graphite.class=org.apache.beam.runners.spark.metrics.sink.GraphiteSink
> driver.sink.graphite.host=localhost
> driver.sink.graphite.port=2003
> driver.sink.graphite.prefix=spark
> driver.sink.graphite.period=1
> driver.sink.graphite.unit=SECONDS 
> {code}
> This approach is very convenient to send the metric to any sink. I think we 
> can apply this logic in generic way working with any runner.
> The idea is to use a metric poll thread in the pipeline (that we can enable 
> via {{PipelineOptions}}) and send to a sink.
> I started a PoC about that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4197

2017-06-23 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-23 Thread Olivier NGUYEN QUOC (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060478#comment-16060478
 ] 

Olivier NGUYEN QUOC commented on BEAM-2490:
---

I will be able to give some example files next week (not before, sorry).

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
> Fix For: 2.1.0
>
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2508) Fix javaDoc of Stateful DoFn

2017-06-23 Thread Jingsong Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated BEAM-2508:
---
Description: StateSpec > 
StateSpec  (was: StateSpec > 
StateSpec Fix javaDoc of Stateful DoFn
> 
>
> Key: BEAM-2508
> URL: https://issues.apache.org/jira/browse/BEAM-2508
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Jingsong Lee
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>
> StateSpec > StateSpec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2508) Fix javaDoc of Stateful DoFn

2017-06-23 Thread Jingsong Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated BEAM-2508:
---
Fix Version/s: 2.1.0

> Fix javaDoc of Stateful DoFn
> 
>
> Key: BEAM-2508
> URL: https://issues.apache.org/jira/browse/BEAM-2508
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Jingsong Lee
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>
> StateSpec > StateSpec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2508) Fix javaDoc of Stateful DoFn

2017-06-23 Thread Jingsong Lee (JIRA)
Jingsong Lee created BEAM-2508:
--

 Summary: Fix javaDoc of Stateful DoFn
 Key: BEAM-2508
 URL: https://issues.apache.org/jira/browse/BEAM-2508
 Project: Beam
  Issue Type: Bug
  Components: beam-model
Reporter: Jingsong Lee
Assignee: Kenneth Knowles


StateSpec > StateSpec

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2458

2017-06-23 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4196

2017-06-23 Thread Apache Jenkins Server
See 


Changes:

[chamikara] [BEAM-2497] Fix the reading of concat gzip files

--
[...truncated 3.44 MB...]
2017-06-23T06:23:38.005 [INFO] Replacing original artifact with shaded artifact.
2017-06-23T06:23:38.005 [INFO] Replacing 

 with 

2017-06-23T06:23:38.005 [INFO] Replacing original test artifact with shaded 
test artifact.
2017-06-23T06:23:38.005 [INFO] Replacing 

 with 

2017-06-23T06:23:38.005 [INFO] Dependency-reduced POM written at: 

2017-06-23T06:23:38.115 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/net/java/dev/jets3t/jets3t/0.6.1/jets3t-0.6.1.jar
2017-06-23T06:23:38.115 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/net/jpountz/lz4/lz4/1.2.0/lz4-1.2.0.jar
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
2017-06-23T06:23:38.175 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/net/java/dev/jets3t/jets3t/0.6.1/jets3t-0.6.1.jar
 (315 KB at 5237.7 KB/sec)
[JENKINS] Archiving disabled
2017-06-23T06:23:38.824 [INFO]  
   
2017-06-23T06:23:38.824 [INFO] 

2017-06-23T06:23:38.824 [INFO] Skipping Apache Beam :: Parent
2017-06-23T06:23:38.824 [INFO] This project has been banned from the build due 
to previous failures.
2017-06-23T06:23:38.824 [INFO] 

[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
2017-06-23T06:24:00.451 [INFO] 

2017-06-23T06:24:00.451 [INFO] Reactor Summary:
2017-06-23T06:24:00.451 [INFO] 
2017-06-23T06:24:00.451 [INFO] Apache Beam :: Parent 
.. SUCCESS [ 22.927 s]
2017-06-23T06:24:00.451 [INFO] Apache Beam :: SDKs :: Java :: Build Tools 
. SUCCESS [ 10.485 s]
2017-06-23T06:24:00.451 [INFO] Apache Beam :: SDKs 
 SUCCESS [  4.368 s]
2017-06-23T06:24:00.451 [INFO] Apache Beam :: SDKs :: Common 
.. SUCCESS [  1.618 s]
2017-06-23T06:24:00.451 [INFO] Apache Beam :: SDKs :: Common :: Runner API 
 SUCCESS [ 19.566 s]
2017-06-23T06:24:00.451 [INFO] Apache Beam :: SDKs :: Common :: Fn API 
 SUCCESS [ 18.717 s]

[jira] [Commented] (BEAM-2441) DSL SQL maven module location and name

2017-06-23 Thread Xu Mingmin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060442#comment-16060442
 ] 

Xu Mingmin commented on BEAM-2441:
--

As we'll have both DSL interface and CLI client, not sure what's the proper 
position. Maybe raise it to more audience for any comments.

> DSL SQL maven module location and name
> --
>
> Key: BEAM-2441
> URL: https://issues.apache.org/jira/browse/BEAM-2441
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Luke Cwik
>  Labels: dsl_sql_merge
>
> The current maven module location is *dsl/sql*, unfortunately this occludes 
> the fact that this is for the Java SDK and also prevents alternative language 
> implementations.
> Some alternative locations could be:
> {code}
> sdks/java/extensions/sql
> sdks/java/dsls/sql
> dsls/sql/java
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)