Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Dataflow #2345

2017-02-19 Thread Apache Jenkins Server
See 


--
[...truncated 3.91 MB...]
[INFO] 2017-02-20T07:29:03.359Z: (a531003d9a6ec760): Stopping worker pool...
[INFO] 2017-02-20T07:29:10.070Z: S33: (669dc62e39b4398b): Executing operation 
PAssert$268/GroupGlobally/Create.Values/Read(CreateSource)+PAssert$268/GroupGlobally/WindowIntoDummy+PAssert$268/GroupGlobally/RemoveDummyTriggering+PAssert$268/GroupGlobally/NeverTrigger+PAssert$268/GroupGlobally/GroupDummyAndContents/Reify+PAssert$268/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.074Z: S34: (4f98a9e6a98b6e06): Executing operation 
PAssert$268/GroupGlobally/GatherAllOutputs/GroupByKey/Read+PAssert$268/GroupGlobally/GatherAllOutputs/GroupByKey/GroupByWindow+PAssert$268/GroupGlobally/GatherAllOutputs/Values/Values/Map+PAssert$268/GroupGlobally/RewindowActuals+PAssert$268/GroupGlobally/KeyForDummy/AddKeys/Map+PAssert$268/GroupGlobally/RemoveActualsTriggering+PAssert$268/GroupGlobally/NeverTrigger+PAssert$268/GroupGlobally/GroupDummyAndContents/Reify+PAssert$268/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.077Z: S15: (3968f1a08932e385): Executing operation 
PAssert$271/GroupGlobally/Create.Values/Read(CreateSource)+PAssert$271/GroupGlobally/WindowIntoDummy+PAssert$271/GroupGlobally/RemoveDummyTriggering+PAssert$271/GroupGlobally/NeverTrigger+PAssert$271/GroupGlobally/GroupDummyAndContents/Reify+PAssert$271/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.081Z: S16: (36c6506797658c7e): Executing operation 
PAssert$271/GroupGlobally/GatherAllOutputs/GroupByKey/Read+PAssert$271/GroupGlobally/GatherAllOutputs/GroupByKey/GroupByWindow+PAssert$271/GroupGlobally/GatherAllOutputs/Values/Values/Map+PAssert$271/GroupGlobally/RewindowActuals+PAssert$271/GroupGlobally/KeyForDummy/AddKeys/Map+PAssert$271/GroupGlobally/RemoveActualsTriggering+PAssert$271/GroupGlobally/NeverTrigger+PAssert$271/GroupGlobally/GroupDummyAndContents/Reify+PAssert$271/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.084Z: S21: (e2943d945190f228): Executing operation 
PAssert$270/GroupGlobally/Create.Values/Read(CreateSource)+PAssert$270/GroupGlobally/WindowIntoDummy+PAssert$270/GroupGlobally/RemoveDummyTriggering+PAssert$270/GroupGlobally/NeverTrigger+PAssert$270/GroupGlobally/GroupDummyAndContents/Reify+PAssert$270/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.087Z: S22: (bdfe3806bed2b37d): Executing operation 
PAssert$270/GroupGlobally/GatherAllOutputs/GroupByKey/Read+PAssert$270/GroupGlobally/GatherAllOutputs/GroupByKey/GroupByWindow+PAssert$270/GroupGlobally/GatherAllOutputs/Values/Values/Map+PAssert$270/GroupGlobally/RewindowActuals+PAssert$270/GroupGlobally/KeyForDummy/AddKeys/Map+PAssert$270/GroupGlobally/RemoveActualsTriggering+PAssert$270/GroupGlobally/NeverTrigger+PAssert$270/GroupGlobally/GroupDummyAndContents/Reify+PAssert$270/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.090Z: S27: (5d7e2b5ee8e11578): Executing operation 
PAssert$269/GroupGlobally/Create.Values/Read(CreateSource)+PAssert$269/GroupGlobally/WindowIntoDummy+PAssert$269/GroupGlobally/RemoveDummyTriggering+PAssert$269/GroupGlobally/NeverTrigger+PAssert$269/GroupGlobally/GroupDummyAndContents/Reify+PAssert$269/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.094Z: S28: (2c3df99f524c7da4): Executing operation 
PAssert$269/GroupGlobally/GatherAllOutputs/GroupByKey/Read+PAssert$269/GroupGlobally/GatherAllOutputs/GroupByKey/GroupByWindow+PAssert$269/GroupGlobally/GatherAllOutputs/Values/Values/Map+PAssert$269/GroupGlobally/RewindowActuals+PAssert$269/GroupGlobally/KeyForDummy/AddKeys/Map+PAssert$269/GroupGlobally/RemoveActualsTriggering+PAssert$269/GroupGlobally/NeverTrigger+PAssert$269/GroupGlobally/GroupDummyAndContents/Reify+PAssert$269/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.097Z: S09: (309a711d59ac5e5): Executing operation 
PAssert$272/GroupGlobally/Create.Values/Read(CreateSource)+PAssert$272/GroupGlobally/WindowIntoDummy+PAssert$272/GroupGlobally/RemoveDummyTriggering+PAssert$272/GroupGlobally/NeverTrigger+PAssert$272/GroupGlobally/GroupDummyAndContents/Reify+PAssert$272/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:10.101Z: S10: (cc3337c5af4794f): Executing operation 
PAssert$272/GroupGlobally/GatherAllOutputs/GroupByKey/Read+PAssert$272/GroupGlobally/GatherAllOutputs/GroupByKey/GroupByWindow+PAssert$272/GroupGlobally/GatherAllOutputs/Values/Values/Map+PAssert$272/GroupGlobally/RewindowActuals+PAssert$272/GroupGlobally/KeyForDummy/AddKeys/Map+PAssert$272/GroupGlobally/RemoveActualsTriggering+PAssert$272/GroupGlobally/NeverTrigger+PAssert$272/GroupGlobally/GroupDummyAndContents/Reify+PAssert$272/GroupGlobally/GroupDummyAndContents/Write
[INFO] 2017-02-20T07:29:09.331Z: (8dab8bb77c89b9e2): Workers have started 
successfully.

[jira] [Commented] (BEAM-1399) Code coverage numbers are not accurate

2017-02-19 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873950#comment-15873950
 ] 

Davor Bonaci commented on BEAM-1399:


Thanks [~staslev]; great analysis.

FYI [~jasonkuster].

> Code coverage numbers are not accurate
> --
>
> Key: BEAM-1399
> URL: https://issues.apache.org/jira/browse/BEAM-1399
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-java-core, testing
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> We've started adding Java Code Coverage numbers to PRs using the jacoco 
> plugin. However, we are getting very low coverage reported for things like 
> the Java SDK core.
> My belief is that this is happening because we test the bulk of the SDK not 
> in the SDK module , but in fact in the DirectRunner and other similar modules.
> JaCoCo has a {{report:aggregate}} target that might do the trick, but with a 
> few minutes of playing with it I wasn't able to make it work satisfactorily. 
> Basic work in https://github.com/apache/beam/pull/1800
> This is a good "random improvement" issue for anyone to pick up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1320) Add sphinx or pydocs documentation for python-sdk

2017-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873939#comment-15873939
 ] 

ASF GitHub Bot commented on BEAM-1320:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2024


> Add sphinx or pydocs documentation for python-sdk
> -
>
> Key: BEAM-1320
> URL: https://issues.apache.org/jira/browse/BEAM-1320
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/2] beam git commit: This closes #2024

2017-02-19 Thread davor
This closes #2024


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/92190ba5
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/92190ba5
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/92190ba5

Branch: refs/heads/master
Commit: 92190ba5de64e1a72f3352976697e51e5d5624a9
Parents: d03e398 87f2052
Author: Davor Bonaci 
Authored: Sun Feb 19 17:40:55 2017 -0800
Committer: Davor Bonaci 
Committed: Sun Feb 19 17:40:55 2017 -0800

--
 sdks/python/generate_pydoc.sh | 66 ++
 sdks/python/pom.xml   |  1 -
 sdks/python/setup.py  |  5 +++
 sdks/python/tox.ini   |  2 ++
 4 files changed, 73 insertions(+), 1 deletion(-)
--




[GitHub] beam pull request #2024: [BEAM-1320] Add script to generate documentation fo...

2017-02-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2024


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-1320] Add script to generate documentation for the python sdk

2017-02-19 Thread davor
Repository: beam
Updated Branches:
  refs/heads/master d03e3980c -> 92190ba5d


[BEAM-1320] Add script to generate documentation for the python sdk


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/87f20520
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/87f20520
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/87f20520

Branch: refs/heads/master
Commit: 87f205207d28d54b3a4e02465c493cb8d626d56b
Parents: d03e398
Author: Sourabh Bajaj 
Authored: Thu Feb 16 16:01:36 2017 -0800
Committer: Davor Bonaci 
Committed: Sun Feb 19 17:40:33 2017 -0800

--
 sdks/python/generate_pydoc.sh | 66 ++
 sdks/python/pom.xml   |  1 -
 sdks/python/setup.py  |  5 +++
 sdks/python/tox.ini   |  2 ++
 4 files changed, 73 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/87f20520/sdks/python/generate_pydoc.sh
--
diff --git a/sdks/python/generate_pydoc.sh b/sdks/python/generate_pydoc.sh
new file mode 100755
index 000..f96a649
--- /dev/null
+++ b/sdks/python/generate_pydoc.sh
@@ -0,0 +1,66 @@
+#!/bin/bash
+#
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+
+# This script will run sphinx to create documentation for python sdk
+#
+# Use "generate_pydocs.sh" to update documentation in the docs directory.
+#
+# The exit-code of the script indicates success or a failure.
+
+# Quit on any errors
+set -e
+
+# Create docs directory if it does not exist
+mkdir -p target/docs/source
+
+# Exclude autogenerated API message definition files that aren't part of SDK.
+excluded_internal_clients=(apache_beam/internal/clients/)
+python $(type -p sphinx-apidoc) -f -o target/docs/source apache_beam \
+"${excluded_internal_clients[@]}"
+
+# Remove Cython modules from doc template; they won't load
+sed -i -e '/.. automodule:: apache_beam.coders.stream/d' \
+target/docs/source/apache_beam.coders.rst
+
+# Create the configuration and index files
+cat > target/docs/source/conf.py <<'EOF'
+import os
+import sys
+
+sys.path.insert(0, os.path.abspath('../../..'))
+
+extensions = [
+'sphinx.ext.autodoc',
+'sphinx.ext.napoleon',
+'sphinx.ext.viewcode',
+]
+master_doc = 'index'
+html_theme = 'sphinxdoc'
+project = 'Apache Beam'
+EOF
+cat > target/docs/source/index.rst <<'EOF'
+.. include:: ./modules.rst
+EOF
+
+# Build the documentation using sphinx
+python $(type -p sphinx-build) -q target/docs/source target/docs/_build -c 
target/docs/source \
+-w "/tmp/sphinx-build.warnings.log"
+
+# Message is useful only when this script is run locally.  In a remote
+# test environment, this path will be removed when the test completes.
+echo "Browse to file://$PWD/target/docs/_build/index.html"

http://git-wip-us.apache.org/repos/asf/beam/blob/87f20520/sdks/python/pom.xml
--
diff --git a/sdks/python/pom.xml b/sdks/python/pom.xml
index 615ddc5..2b35f4d 100644
--- a/sdks/python/pom.xml
+++ b/sdks/python/pom.xml
@@ -165,5 +165,4 @@
   
 
   
-
 

http://git-wip-us.apache.org/repos/asf/beam/blob/87f20520/sdks/python/setup.py
--
diff --git a/sdks/python/setup.py b/sdks/python/setup.py
index 90a4e53..59c0994 100644
--- a/sdks/python/setup.py
+++ b/sdks/python/setup.py
@@ -103,6 +103,10 @@ REQUIRED_TEST_PACKAGES = [
 'pyhamcrest>=1.9,<2.0',
 ]
 
+EXTRA_REQUIRES = {
+  'docs': ['Sphinx>=1.5.2,<2.0'],
+}
+
 setuptools.setup(
 name=PACKAGE_NAME,
 version=PACKAGE_VERSION,
@@ -127,6 +131,7 @@ setuptools.setup(
 install_requires=REQUIRED_PACKAGES,
 test_suite='nose.collector',
 tests_require=REQUIRED_TEST_PACKAGES,
+extras_require=EXTRA_REQUIRES,
 zip_safe=False,
 # PyPI package information.
 classifiers=[

http://git-wip-us.apache.org/repos/asf/beam/blob/87f20520/sdks/python/tox.ini

[jira] [Commented] (BEAM-1218) De-Googlify Python SDK

2017-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873892#comment-15873892
 ] 

ASF GitHub Bot commented on BEAM-1218:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2047


> De-Googlify Python SDK
> --
>
> Key: BEAM-1218
> URL: https://issues.apache.org/jira/browse/BEAM-1218
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Mark Liu
>Assignee: Ahmet Altay
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2047: [BEAM-1218] Move GCP specific IO into separate modu...

2017-02-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2047


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/7] beam git commit: [BEAM-1218] Move GCP specific IO into separate module

2017-02-19 Thread altay
http://git-wip-us.apache.org/repos/asf/beam/blob/908c8532/sdks/python/apache_beam/io/google_cloud_platform/datastore/v1/datastoreio.py
--
diff --git 
a/sdks/python/apache_beam/io/google_cloud_platform/datastore/v1/datastoreio.py 
b/sdks/python/apache_beam/io/google_cloud_platform/datastore/v1/datastoreio.py
new file mode 100644
index 000..2eac4d5
--- /dev/null
+++ 
b/sdks/python/apache_beam/io/google_cloud_platform/datastore/v1/datastoreio.py
@@ -0,0 +1,391 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A connector for reading from and writing to Google Cloud Datastore"""
+
+import logging
+
+from google.cloud.proto.datastore.v1 import datastore_pb2
+from googledatastore import helper as datastore_helper
+
+from apache_beam.io.google_cloud_platform.datastore.v1 import helper
+from apache_beam.io.google_cloud_platform.datastore.v1 import query_splitter
+from apache_beam.transforms import Create
+from apache_beam.transforms import DoFn
+from apache_beam.transforms import FlatMap
+from apache_beam.transforms import GroupByKey
+from apache_beam.transforms import Map
+from apache_beam.transforms import PTransform
+from apache_beam.transforms import ParDo
+from apache_beam.transforms.util import Values
+
+__all__ = ['ReadFromDatastore', 'WriteToDatastore', 'DeleteFromDatastore']
+
+
+class ReadFromDatastore(PTransform):
+  """A ``PTransform`` for reading from Google Cloud Datastore.
+
+  To read a ``PCollection[Entity]`` from a Cloud Datastore ``Query``, use
+  ``ReadFromDatastore`` transform by providing a `project` id and a `query` to
+  read from. You can optionally provide a `namespace` and/or specify how many
+  splits you want for the query through `num_splits` option.
+
+  Note: Normally, a runner will read from Cloud Datastore in parallel across
+  many workers. However, when the `query` is configured with a `limit` or if 
the
+  query contains inequality filters like `GREATER_THAN, LESS_THAN` etc., then
+  all the returned results will be read by a single worker in order to ensure
+  correct data. Since data is read from a single worker, this could have
+  significant impact on the performance of the job.
+
+  The semantics for the query splitting is defined below:
+1. If `num_splits` is equal to 0, then the number of splits will be chosen
+dynamically at runtime based on the query data size.
+
+2. Any value of `num_splits` greater than
+`ReadFromDatastore._NUM_QUERY_SPLITS_MAX` will be capped at that value.
+
+3. If the `query` has a user limit set, or contains inequality filters, 
then
+`num_splits` will be ignored and no split will be performed.
+
+4. Under certain cases Cloud Datastore is unable to split query to the
+requested number of splits. In such cases we just use whatever the Cloud
+Datastore returns.
+
+  See https://developers.google.com/datastore/ for more details on Google Cloud
+  Datastore.
+  """
+
+  # An upper bound on the number of splits for a query.
+  _NUM_QUERY_SPLITS_MAX = 5
+  # A lower bound on the number of splits for a query. This is to ensure that
+  # we parellelize the query even when Datastore statistics are not available.
+  _NUM_QUERY_SPLITS_MIN = 12
+  # Default bundle size of 64MB.
+  _DEFAULT_BUNDLE_SIZE_BYTES = 64 * 1024 * 1024
+
+  def __init__(self, project, query, namespace=None, num_splits=0):
+"""Initialize the ReadFromDatastore transform.
+
+Args:
+  project: The Project ID
+  query: Cloud Datastore query to be read from.
+  namespace: An optional namespace.
+  num_splits: Number of splits for the query.
+"""
+logging.warning('datastoreio read transform is experimental.')
+super(ReadFromDatastore, self).__init__()
+
+if not project:
+  ValueError("Project cannot be empty")
+if not query:
+  ValueError("Query cannot be empty")
+if num_splits < 0:
+  ValueError("num_splits must be greater than or equal 0")
+
+self._project = project
+# using _namespace conflicts with DisplayData._namespace
+self._datastore_namespace = namespace
+self._query = query
+self._num_splits = num_splits
+
+  def expand(self, pcoll):
+# This is 

[1/7] beam git commit: [BEAM-1218] Move GCP specific IO into separate module

2017-02-19 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 2872f8666 -> d03e3980c


http://git-wip-us.apache.org/repos/asf/beam/blob/908c8532/sdks/python/apache_beam/io/google_cloud_platform/gcsio.py
--
diff --git a/sdks/python/apache_beam/io/google_cloud_platform/gcsio.py 
b/sdks/python/apache_beam/io/google_cloud_platform/gcsio.py
new file mode 100644
index 000..195fafc
--- /dev/null
+++ b/sdks/python/apache_beam/io/google_cloud_platform/gcsio.py
@@ -0,0 +1,871 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""Google Cloud Storage client.
+
+This library evolved from the Google App Engine GCS client available at
+https://github.com/GoogleCloudPlatform/appengine-gcs-client.
+"""
+
+import cStringIO
+import errno
+import fnmatch
+import logging
+import multiprocessing
+import os
+import Queue
+import re
+import threading
+import traceback
+
+import apitools.base.py.transfer as transfer
+from apitools.base.py.batch import BatchApiRequest
+from apitools.base.py.exceptions import HttpError
+
+from apache_beam.internal import auth
+from apache_beam.utils import retry
+
+# Issue a friendlier error message if the storage library is not available.
+# TODO(silviuc): Remove this guard when storage is available everywhere.
+try:
+  # pylint: disable=wrong-import-order, wrong-import-position
+  from apache_beam.io.google_cloud_platform.internal.clients import storage
+except ImportError:
+  raise RuntimeError(
+  'Google Cloud Storage I/O not supported for this execution environment '
+  '(could not import storage API client).')
+
+# This is the size of each partial-file read operation from GCS.  This
+# parameter was chosen to give good throughput while keeping memory usage at
+# a reasonable level; the following table shows throughput reached when
+# reading files of a given size with a chosen buffer size and informed the
+# choice of the value, as of 11/2016:
+#
+# +---++-+-+-+
+# |   | 50 MB file | 100 MB file | 200 MB file | 400 MB file |
+# +---++-+-+-+
+# | 8 MB buffer   | 17.12 MB/s | 22.67 MB/s  | 23.81 MB/s  | 26.05 MB/s  |
+# | 16 MB buffer  | 24.21 MB/s | 42.70 MB/s  | 42.89 MB/s  | 46.92 MB/s  |
+# | 32 MB buffer  | 28.53 MB/s | 48.08 MB/s  | 54.30 MB/s  | 54.65 MB/s  |
+# | 400 MB buffer | 34.72 MB/s | 71.13 MB/s  | 79.13 MB/s  | 85.39 MB/s  |
+# +---++-+-+-+
+DEFAULT_READ_BUFFER_SIZE = 16 * 1024 * 1024
+
+# This is the number of seconds the library will wait for a partial-file read
+# operation from GCS to complete before retrying.
+DEFAULT_READ_SEGMENT_TIMEOUT_SECONDS = 60
+
+# This is the size of chunks used when writing to GCS.
+WRITE_CHUNK_SIZE = 8 * 1024 * 1024
+
+
+# Maximum number of operations permitted in GcsIO.copy_batch() and
+# GcsIO.delete_batch().
+MAX_BATCH_OPERATION_SIZE = 100
+
+
+def parse_gcs_path(gcs_path):
+  """Return the bucket and object names of the given gs:// path."""
+  match = re.match('^gs://([^/]+)/(.+)$', gcs_path)
+  if match is None:
+raise ValueError('GCS path must be in the form gs:///.')
+  return match.group(1), match.group(2)
+
+
+class GcsIOError(IOError, retry.PermanentException):
+  """GCS IO error that should not be retried."""
+  pass
+
+
+class GcsIO(object):
+  """Google Cloud Storage I/O client."""
+
+  def __new__(cls, storage_client=None):
+if storage_client:
+  return super(GcsIO, cls).__new__(cls, storage_client)
+else:
+  # Create a single storage client for each thread.  We would like to avoid
+  # creating more than one storage client for each thread, since each
+  # initialization requires the relatively expensive step of initializing
+  # credentaials.
+  local_state = threading.local()
+  if getattr(local_state, 'gcsio_instance', None) is None:
+credentials = auth.get_service_credentials()
+storage_client = storage.StorageV1(credentials=credentials)
+local_state.gcsio_instance = (
+super(GcsIO, cls).__new__(cls, storage_client))
+

[7/7] beam git commit: This closes #2047

2017-02-19 Thread altay
This closes #2047


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d03e3980
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d03e3980
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d03e3980

Branch: refs/heads/master
Commit: d03e3980cdb24f3e1ea7238112587c671349dbc2
Parents: 2872f86 908c853
Author: Ahmet Altay 
Authored: Sun Feb 19 16:21:27 2017 -0800
Committer: Ahmet Altay 
Committed: Sun Feb 19 16:21:27 2017 -0800

--
 .../apache_beam/examples/snippets/snippets.py   |4 +-
 sdks/python/apache_beam/io/__init__.py  |4 +-
 sdks/python/apache_beam/io/bigquery.py  | 1082 --
 sdks/python/apache_beam/io/bigquery_test.py |  812 -
 .../python/apache_beam/io/datastore/__init__.py |   16 -
 .../apache_beam/io/datastore/v1/__init__.py |   16 -
 .../apache_beam/io/datastore/v1/datastoreio.py  |  391 ---
 .../io/datastore/v1/datastoreio_test.py |  237 
 .../io/datastore/v1/fake_datastore.py   |   92 --
 .../apache_beam/io/datastore/v1/helper.py   |  267 -
 .../apache_beam/io/datastore/v1/helper_test.py  |  256 -
 .../io/datastore/v1/query_splitter.py   |  269 -
 .../io/datastore/v1/query_splitter_test.py  |  201 
 sdks/python/apache_beam/io/fileio.py|2 +-
 sdks/python/apache_beam/io/gcsio.py |  871 --
 sdks/python/apache_beam/io/gcsio_test.py|  786 -
 .../io/google_cloud_platform/bigquery.py| 1082 ++
 .../io/google_cloud_platform/bigquery_test.py   |  813 +
 .../google_cloud_platform/datastore/__init__.py |   16 +
 .../datastore/v1/__init__.py|   16 +
 .../datastore/v1/datastoreio.py |  391 +++
 .../datastore/v1/datastoreio_test.py|  237 
 .../datastore/v1/fake_datastore.py  |   92 ++
 .../datastore/v1/helper.py  |  267 +
 .../datastore/v1/helper_test.py |  256 +
 .../datastore/v1/query_splitter.py  |  269 +
 .../datastore/v1/query_splitter_test.py |  201 
 .../io/google_cloud_platform/gcsio.py   |  871 ++
 .../io/google_cloud_platform/gcsio_test.py  |  786 +
 .../io/google_cloud_platform/pubsub.py  |   91 ++
 .../io/google_cloud_platform/pubsub_test.py |   63 +
 sdks/python/apache_beam/io/pubsub.py|   91 --
 sdks/python/apache_beam/io/pubsub_test.py   |   62 -
 33 files changed, 5456 insertions(+), 5454 deletions(-)
--




[4/7] beam git commit: [BEAM-1218] Move GCP specific IO into separate module

2017-02-19 Thread altay
http://git-wip-us.apache.org/repos/asf/beam/blob/908c8532/sdks/python/apache_beam/io/gcsio.py
--
diff --git a/sdks/python/apache_beam/io/gcsio.py 
b/sdks/python/apache_beam/io/gcsio.py
deleted file mode 100644
index 195fafc..000
--- a/sdks/python/apache_beam/io/gcsio.py
+++ /dev/null
@@ -1,871 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-"""Google Cloud Storage client.
-
-This library evolved from the Google App Engine GCS client available at
-https://github.com/GoogleCloudPlatform/appengine-gcs-client.
-"""
-
-import cStringIO
-import errno
-import fnmatch
-import logging
-import multiprocessing
-import os
-import Queue
-import re
-import threading
-import traceback
-
-import apitools.base.py.transfer as transfer
-from apitools.base.py.batch import BatchApiRequest
-from apitools.base.py.exceptions import HttpError
-
-from apache_beam.internal import auth
-from apache_beam.utils import retry
-
-# Issue a friendlier error message if the storage library is not available.
-# TODO(silviuc): Remove this guard when storage is available everywhere.
-try:
-  # pylint: disable=wrong-import-order, wrong-import-position
-  from apache_beam.io.google_cloud_platform.internal.clients import storage
-except ImportError:
-  raise RuntimeError(
-  'Google Cloud Storage I/O not supported for this execution environment '
-  '(could not import storage API client).')
-
-# This is the size of each partial-file read operation from GCS.  This
-# parameter was chosen to give good throughput while keeping memory usage at
-# a reasonable level; the following table shows throughput reached when
-# reading files of a given size with a chosen buffer size and informed the
-# choice of the value, as of 11/2016:
-#
-# +---++-+-+-+
-# |   | 50 MB file | 100 MB file | 200 MB file | 400 MB file |
-# +---++-+-+-+
-# | 8 MB buffer   | 17.12 MB/s | 22.67 MB/s  | 23.81 MB/s  | 26.05 MB/s  |
-# | 16 MB buffer  | 24.21 MB/s | 42.70 MB/s  | 42.89 MB/s  | 46.92 MB/s  |
-# | 32 MB buffer  | 28.53 MB/s | 48.08 MB/s  | 54.30 MB/s  | 54.65 MB/s  |
-# | 400 MB buffer | 34.72 MB/s | 71.13 MB/s  | 79.13 MB/s  | 85.39 MB/s  |
-# +---++-+-+-+
-DEFAULT_READ_BUFFER_SIZE = 16 * 1024 * 1024
-
-# This is the number of seconds the library will wait for a partial-file read
-# operation from GCS to complete before retrying.
-DEFAULT_READ_SEGMENT_TIMEOUT_SECONDS = 60
-
-# This is the size of chunks used when writing to GCS.
-WRITE_CHUNK_SIZE = 8 * 1024 * 1024
-
-
-# Maximum number of operations permitted in GcsIO.copy_batch() and
-# GcsIO.delete_batch().
-MAX_BATCH_OPERATION_SIZE = 100
-
-
-def parse_gcs_path(gcs_path):
-  """Return the bucket and object names of the given gs:// path."""
-  match = re.match('^gs://([^/]+)/(.+)$', gcs_path)
-  if match is None:
-raise ValueError('GCS path must be in the form gs:///.')
-  return match.group(1), match.group(2)
-
-
-class GcsIOError(IOError, retry.PermanentException):
-  """GCS IO error that should not be retried."""
-  pass
-
-
-class GcsIO(object):
-  """Google Cloud Storage I/O client."""
-
-  def __new__(cls, storage_client=None):
-if storage_client:
-  return super(GcsIO, cls).__new__(cls, storage_client)
-else:
-  # Create a single storage client for each thread.  We would like to avoid
-  # creating more than one storage client for each thread, since each
-  # initialization requires the relatively expensive step of initializing
-  # credentaials.
-  local_state = threading.local()
-  if getattr(local_state, 'gcsio_instance', None) is None:
-credentials = auth.get_service_credentials()
-storage_client = storage.StorageV1(credentials=credentials)
-local_state.gcsio_instance = (
-super(GcsIO, cls).__new__(cls, storage_client))
-local_state.gcsio_instance.client = storage_client
-  return local_state.gcsio_instance
-
-  def __init__(self, storage_client=None):
-# We must do this check on storage_client because 

[3/7] beam git commit: [BEAM-1218] Move GCP specific IO into separate module

2017-02-19 Thread altay
http://git-wip-us.apache.org/repos/asf/beam/blob/908c8532/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py
--
diff --git a/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py 
b/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py
new file mode 100644
index 000..3beecea
--- /dev/null
+++ b/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py
@@ -0,0 +1,1082 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""BigQuery sources and sinks.
+
+This module implements reading from and writing to BigQuery tables. It relies
+on several classes exposed by the BigQuery API: TableSchema, TableFieldSchema,
+TableRow, and TableCell. The default mode is to return table rows read from a
+BigQuery source as dictionaries. Similarly a Write transform to a BigQuerySink
+accepts PCollections of dictionaries. This is done for more convenient
+programming.  If desired, the native TableRow objects can be used throughout to
+represent rows (use an instance of TableRowJsonCoder as a coder argument when
+creating the sources or sinks respectively).
+
+Also, for programming convenience, instances of TableReference and TableSchema
+have a string representation that can be used for the corresponding arguments:
+
+  - TableReference can be a PROJECT:DATASET.TABLE or DATASET.TABLE string.
+  - TableSchema can be a NAME:TYPE{,NAME:TYPE}* string
+(e.g. 'month:STRING,event_count:INTEGER').
+
+The syntax supported is described here:
+https://cloud.google.com/bigquery/bq-command-line-tool-quickstart
+
+BigQuery sources can be used as main inputs or side inputs. A main input
+(common case) is expected to be massive and will be split into manageable 
chunks
+and processed in parallel. Side inputs are expected to be small and will be 
read
+completely every time a ParDo DoFn gets executed. In the example below the
+lambda function implementing the DoFn for the Map transform will get on each
+call *one* row of the main table and *all* rows of the side table. The runner
+may use some caching techniques to share the side inputs between calls in order
+to avoid excessive reading:
+
+  main_table = pipeline | 'very_big' >> beam.io.Read(beam.io.BigQuerySource()
+  side_table = pipeline | 'not_big' >> beam.io.Read(beam.io.BigQuerySource()
+  results = (
+  main_table
+  | 'process data' >> beam.Map(
+  lambda element, side_input: ..., AsList(side_table)))
+
+There is no difference in how main and side inputs are read. What makes the
+side_table a 'side input' is the AsList wrapper used when passing the table
+as a parameter to the Map transform. AsList signals to the execution framework
+that its input should be made available whole.
+
+The main and side inputs are implemented differently. Reading a BigQuery table
+as main input entails exporting the table to a set of GCS files (currently in
+JSON format) and then processing those files. Reading the same table as a side
+input entails querying the table for all its rows. The coder argument on
+BigQuerySource controls the reading of the lines in the export files (i.e.,
+transform a JSON object into a PCollection element). The coder is not involved
+when the same table is read as a side input since there is no intermediate
+format involved. We get the table rows directly from the BigQuery service with
+a query.
+
+Users may provide a query to read from rather than reading all of a BigQuery
+table. If specified, the result obtained by executing the specified query will
+be used as the data of the input transform.
+
+  query_results = pipeline | beam.io.Read(beam.io.BigQuerySource(
+  query='SELECT year, mean_temp FROM samples.weather_stations'))
+
+When creating a BigQuery input transform, users should provide either a query
+or a table. Pipeline construction will fail with a validation error if neither
+or both are specified.
+
+*** Short introduction to BigQuery concepts ***
+Tables have rows (TableRow) and each row has cells (TableCell).
+A table has a schema (TableSchema), which in turn describes the schema of each
+cell (TableFieldSchema). The terms field and cell are used interchangeably.
+
+TableSchema: Describes the schema 

[6/7] beam git commit: [BEAM-1218] Move GCP specific IO into separate module

2017-02-19 Thread altay
[BEAM-1218] Move GCP specific IO into separate module


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/908c8532
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/908c8532
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/908c8532

Branch: refs/heads/master
Commit: 908c85327ce17a0ed64401d1e86cb86396284fdb
Parents: 2872f86
Author: Sourabh Bajaj 
Authored: Sat Feb 18 22:52:06 2017 -0800
Committer: Ahmet Altay 
Committed: Sun Feb 19 16:21:25 2017 -0800

--
 .../apache_beam/examples/snippets/snippets.py   |4 +-
 sdks/python/apache_beam/io/__init__.py  |4 +-
 sdks/python/apache_beam/io/bigquery.py  | 1082 --
 sdks/python/apache_beam/io/bigquery_test.py |  812 -
 .../python/apache_beam/io/datastore/__init__.py |   16 -
 .../apache_beam/io/datastore/v1/__init__.py |   16 -
 .../apache_beam/io/datastore/v1/datastoreio.py  |  391 ---
 .../io/datastore/v1/datastoreio_test.py |  237 
 .../io/datastore/v1/fake_datastore.py   |   92 --
 .../apache_beam/io/datastore/v1/helper.py   |  267 -
 .../apache_beam/io/datastore/v1/helper_test.py  |  256 -
 .../io/datastore/v1/query_splitter.py   |  269 -
 .../io/datastore/v1/query_splitter_test.py  |  201 
 sdks/python/apache_beam/io/fileio.py|2 +-
 sdks/python/apache_beam/io/gcsio.py |  871 --
 sdks/python/apache_beam/io/gcsio_test.py|  786 -
 .../io/google_cloud_platform/bigquery.py| 1082 ++
 .../io/google_cloud_platform/bigquery_test.py   |  813 +
 .../google_cloud_platform/datastore/__init__.py |   16 +
 .../datastore/v1/__init__.py|   16 +
 .../datastore/v1/datastoreio.py |  391 +++
 .../datastore/v1/datastoreio_test.py|  237 
 .../datastore/v1/fake_datastore.py  |   92 ++
 .../datastore/v1/helper.py  |  267 +
 .../datastore/v1/helper_test.py |  256 +
 .../datastore/v1/query_splitter.py  |  269 +
 .../datastore/v1/query_splitter_test.py |  201 
 .../io/google_cloud_platform/gcsio.py   |  871 ++
 .../io/google_cloud_platform/gcsio_test.py  |  786 +
 .../io/google_cloud_platform/pubsub.py  |   91 ++
 .../io/google_cloud_platform/pubsub_test.py |   63 +
 sdks/python/apache_beam/io/pubsub.py|   91 --
 sdks/python/apache_beam/io/pubsub_test.py   |   62 -
 33 files changed, 5456 insertions(+), 5454 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/908c8532/sdks/python/apache_beam/examples/snippets/snippets.py
--
diff --git a/sdks/python/apache_beam/examples/snippets/snippets.py 
b/sdks/python/apache_beam/examples/snippets/snippets.py
index 6f081df..e7f28b0 100644
--- a/sdks/python/apache_beam/examples/snippets/snippets.py
+++ b/sdks/python/apache_beam/examples/snippets/snippets.py
@@ -867,8 +867,8 @@ def model_datastoreio():
   import googledatastore
   import apache_beam as beam
   from apache_beam.utils.pipeline_options import PipelineOptions
-  from apache_beam.io.datastore.v1.datastoreio import ReadFromDatastore
-  from apache_beam.io.datastore.v1.datastoreio import WriteToDatastore
+  from apache_beam.io.google_cloud_platform.datastore.v1.datastoreio import 
ReadFromDatastore
+  from apache_beam.io.google_cloud_platform.datastore.v1.datastoreio import 
WriteToDatastore
 
   project = 'my_project'
   kind = 'my_kind'

http://git-wip-us.apache.org/repos/asf/beam/blob/908c8532/sdks/python/apache_beam/io/__init__.py
--
diff --git a/sdks/python/apache_beam/io/__init__.py 
b/sdks/python/apache_beam/io/__init__.py
index 13ce36f..972ed53 100644
--- a/sdks/python/apache_beam/io/__init__.py
+++ b/sdks/python/apache_beam/io/__init__.py
@@ -19,13 +19,13 @@
 
 # pylint: disable=wildcard-import
 from apache_beam.io.avroio import *
-from apache_beam.io.bigquery import *
 from apache_beam.io.fileio import *
 from apache_beam.io.iobase import Read
 from apache_beam.io.iobase import Sink
 from apache_beam.io.iobase import Write
 from apache_beam.io.iobase import Writer
-from apache_beam.io.pubsub import *
 from apache_beam.io.textio import *
 from apache_beam.io.tfrecordio import *
 from apache_beam.io.range_trackers import *
+from apache_beam.io.google_cloud_platform.bigquery import *
+from apache_beam.io.google_cloud_platform.pubsub import *

http://git-wip-us.apache.org/repos/asf/beam/blob/908c8532/sdks/python/apache_beam/io/bigquery.py

[jira] [Commented] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873833#comment-15873833
 ] 

Jean-Baptiste Onofré commented on BEAM-1513:


As said on the mailing list, maybe it would make more sense to use a specific 
property more than {{skipTests}}. WDYT ?

> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-DskipTests' is specified.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Description: 
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]

  was:
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.


> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-DskipTests' is specified.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873830#comment-15873830
 ] 

ASF GitHub Bot commented on BEAM-1513:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2048

[BEAM-1513] Skip slower verifications if '-DskipTests' specified

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam skip-slow-verifications

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2048.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2048


commit 86d42a6b0cc4d4322017bc5b004e74097728b9e9
Author: Aviem Zur 
Date:   2017-02-19T20:24:14Z

[BEAM-1513] Skip slower verifications if '-DskipTests' specified




> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-DskipTests' is specified.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2048: [BEAM-1513] Skip slower verifications if '-DskipTes...

2017-02-19 Thread aviemzur
GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2048

[BEAM-1513] Skip slower verifications if '-DskipTests' specified

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam skip-slow-verifications

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2048.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2048


commit 86d42a6b0cc4d4322017bc5b004e74097728b9e9
Author: Aviem Zur 
Date:   2017-02-19T20:24:14Z

[BEAM-1513] Skip slower verifications if '-DskipTests' specified




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Description: 
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.

  was:
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.


> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-DskipTests' is specified.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #2685

2017-02-19 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #2045: [BEAM-1511] Add wildcard import back again.

2017-02-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2045


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2045

2017-02-19 Thread altay
This closes #2045


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2872f866
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2872f866
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2872f866

Branch: refs/heads/master
Commit: 2872f866679455003541195dcccbfc9daf6b53b8
Parents: 4063019 314f338
Author: Ahmet Altay 
Authored: Sun Feb 19 11:16:37 2017 -0800
Committer: Ahmet Altay 
Committed: Sun Feb 19 11:16:37 2017 -0800

--
 .../io/google_cloud_platform/internal/clients/bigquery/__init__.py  | 1 +
 .../io/google_cloud_platform/internal/clients/storage/__init__.py   | 1 +
 .../google_cloud_dataflow/internal/clients/dataflow/__init__.py | 1 +
 3 files changed, 3 insertions(+)
--




[1/2] beam git commit: Add wildcard import back again.

2017-02-19 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 406301979 -> 2872f8666


Add wildcard import back again.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/314f338d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/314f338d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/314f338d

Branch: refs/heads/master
Commit: 314f338db94e1c298a8e4a313bc10ff7009663ab
Parents: 4063019
Author: Ahmet Altay 
Authored: Sun Feb 19 00:15:16 2017 -0800
Committer: Ahmet Altay 
Committed: Sun Feb 19 11:16:30 2017 -0800

--
 .../io/google_cloud_platform/internal/clients/bigquery/__init__.py  | 1 +
 .../io/google_cloud_platform/internal/clients/storage/__init__.py   | 1 +
 .../google_cloud_dataflow/internal/clients/dataflow/__init__.py | 1 +
 3 files changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/314f338d/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/__init__.py
--
diff --git 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/__init__.py
 
b/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/__init__.py
index e8c849e..673e4d2 100644
--- 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/__init__.py
+++ 
b/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/__init__.py
@@ -20,6 +20,7 @@
 
 import pkgutil
 
+from apitools.base.py import *
 from 
apache_beam.io.google_cloud_platform.internal.clients.bigquery.bigquery_v2_client
 import *
 from 
apache_beam.io.google_cloud_platform.internal.clients.bigquery.bigquery_v2_messages
 import *
 

http://git-wip-us.apache.org/repos/asf/beam/blob/314f338d/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/__init__.py
--
diff --git 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/__init__.py
 
b/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/__init__.py
index 44717c1..81eee3e 100644
--- 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/__init__.py
+++ 
b/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/__init__.py
@@ -20,6 +20,7 @@
 
 import pkgutil
 
+from apitools.base.py import *
 from 
apache_beam.io.google_cloud_platform.internal.clients.storage.storage_v1_client 
import *
 from 
apache_beam.io.google_cloud_platform.internal.clients.storage.storage_v1_messages
 import *
 

http://git-wip-us.apache.org/repos/asf/beam/blob/314f338d/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/__init__.py
--
diff --git 
a/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/__init__.py
 
b/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/__init__.py
index d4d621f..eedf141 100644
--- 
a/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/__init__.py
+++ 
b/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/__init__.py
@@ -20,6 +20,7 @@
 
 import pkgutil
 
+from apitools.base.py import *
 from 
apache_beam.runners.google_cloud_dataflow.internal.clients.dataflow.dataflow_v1b3_messages
 import *
 from 
apache_beam.runners.google_cloud_dataflow.internal.clients.dataflow.dataflow_v1b3_client
 import *
 



[jira] [Commented] (BEAM-1218) De-Googlify Python SDK

2017-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873797#comment-15873797
 ] 

ASF GitHub Bot commented on BEAM-1218:
--

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2047

[BEAM-1218] Move GCP specific IO into separate module

PR 1/? for moving the GCP IO stuff out. I'll refactor the IO channel 
factory next. 
R: @aaltay PTAL



Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam BEAM-1218-Move-GCP-specific-IO

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2047


commit d262a598331b26e233f6fd779b357a2dc710c6cd
Author: Sourabh Bajaj 
Date:   2017-02-19T06:52:06Z

[BEAM-1218] Move GCP specific IO into separate module




> De-Googlify Python SDK
> --
>
> Key: BEAM-1218
> URL: https://issues.apache.org/jira/browse/BEAM-1218
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Mark Liu
>Assignee: Ahmet Altay
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2047: [BEAM-1218] Move GCP specific IO into separate modu...

2017-02-19 Thread sb2nov
GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2047

[BEAM-1218] Move GCP specific IO into separate module

PR 1/? for moving the GCP IO stuff out. I'll refactor the IO channel 
factory next. 
R: @aaltay PTAL



Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam BEAM-1218-Move-GCP-specific-IO

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2047


commit d262a598331b26e233f6fd779b357a2dc710c6cd
Author: Sourabh Bajaj 
Date:   2017-02-19T06:52:06Z

[BEAM-1218] Move GCP specific IO into separate module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2684

2017-02-19 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1512) Optimize leaf transforms materialization

2017-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873773#comment-15873773
 ] 

ASF GitHub Bot commented on BEAM-1512:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2046

[BEAM-1512] Optimize leaf transforms materialization

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam optimize-leaf-materialization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2046.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2046


commit 8449803dce45497187a7105a22cc62251580d942
Author: Aviem Zur 
Date:   2017-02-19T17:52:22Z

[BEAM-1512] Optimize leaf transforms materialization




> Optimize leaf transforms materialization
> 
>
> Key: BEAM-1512
> URL: https://issues.apache.org/jira/browse/BEAM-1512
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Optimize leaf materialization in {{EvaluationContext}} Use register for 
> DStream leaves and an empty {{foreachPartition}} for other leaves instead of 
> the current {{count()}} which adds overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2046: [BEAM-1512] Optimize leaf transforms materializatio...

2017-02-19 Thread aviemzur
GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2046

[BEAM-1512] Optimize leaf transforms materialization

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam optimize-leaf-materialization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2046.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2046


commit 8449803dce45497187a7105a22cc62251580d942
Author: Aviem Zur 
Date:   2017-02-19T17:52:22Z

[BEAM-1512] Optimize leaf transforms materialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1513:
---

 Summary: Skip slower verifications if '-DskipTests' specified
 Key: BEAM-1513
 URL: https://issues.apache.org/jira/browse/BEAM-1513
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Aviem Zur
Assignee: Davor Bonaci


Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1513:
---

Assignee: Aviem Zur  (was: Davor Bonaci)

> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Priority: Minor  (was: Major)

> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1512) Optimize leaf transforms materialization

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1512:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Optimize leaf transforms materialization
> 
>
> Key: BEAM-1512
> URL: https://issues.apache.org/jira/browse/BEAM-1512
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Optimize leaf materialization in {{EvaluationContext}} Use register for 
> DStream leaves and an empty {{foreachPartition}} for other leaves instead of 
> the current {{count()}} which adds overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1512) Optimize leaf transforms materialization

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1512:

Summary: Optimize leaf transforms materialization  (was: Optimize leaf 
transformation materialization)

> Optimize leaf transforms materialization
> 
>
> Key: BEAM-1512
> URL: https://issues.apache.org/jira/browse/BEAM-1512
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Amit Sela
>
> Optimize leaf materialization in {{EvaluationContext}} Use register for 
> DStream leaves and an empty {{foreachPartition}} for other leaves instead of 
> the current {{count()}} which adds overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Python_Verify #1309

2017-02-19 Thread Apache Jenkins Server
See 


--
[...truncated 274.15 KB...]
  File 
"
 line 91, in run
result = super(TestPipeline, self).run()
  File 
"
 line 163, in run
return self.runner.run(self)
  File 
"
 line 32, in run
self.result = super(TestDataflowRunner, self).run(pipeline)
  File 
"
 line 175, in run
self.dataflow_client.create_job(self.job), self)
  File 
"
 line 167, in wrapper
return fun(*args, **kwargs)
  File 
"
 line 411, in create_job
self.create_job_description(job)
  File 
"
 line 432, in create_job_description
job.options, file_copy=self._gcs_file_copy)
  File 
"
 line 274, in stage_job_resources
file_copy(setup_options.requirements_file, staged_path)
  File 
"
 line 167, in wrapper
return fun(*args, **kwargs)
  File 
"
 line 372, in _gcs_file_copy
self.stage_file(to_folder, to_name, f)
  File 
"
 line 389, in stage_file
upload = storage.Upload(stream, mime_type)
AttributeError: 'module' object has no attribute 'Upload'
 >> begin captured logging << 
root: DEBUG: PValue computed by side list (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by main input (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by ViewAsList(side 
list.None)/CreatePCollectionView (tag 1): refcount: 2 => 1
root: DEBUG: PValue computed by ViewAsList(side 
list.None)/CreatePCollectionView (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by FlatMap() 
(tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by assert_that/WindowInto(WindowIntoFn) (tag 1): 
refcount: 1 => 0
root: DEBUG: PValue computed by assert_that/ToVoidKey (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by assert_that/Group (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by assert_that/UnKey (tag 1): refcount: 1 => 0
root: INFO: Starting GCS upload to 
gs://temp-storage-for-end-to-end-tests/staging-validatesrunner-test/py-validatesrunner-1487494940.1487494946.448575/requirements.txt...
- >> end captured logging << -

==
ERROR: test_as_singleton_with_different_defaults_with_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest)
--
Traceback (most recent call last):
  File 
"
 line 265, in test_as_singleton_with_different_defaults_with_unique_labels
pipeline.run()
  File 
"
 line 91, in run
result = super(TestPipeline, self).run()
  File 
"
 line 163, in run
return self.runner.run(self)
  File 
"
 line 32, in run
self.result = super(TestDataflowRunner, self).run(pipeline)
  File 
"
 line 175, in run
self.dataflow_client.create_job(self.job), self)
  File 
"
 line 167, in wrapper
return fun(*args, **kwargs)
  File 

Jenkins build is back to normal : beam_PostCommit_Java_RunnableOnService_Spark #991

2017-02-19 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_RunnableOnService_Dataflow #2337

2017-02-19 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1511) Python post commits failing

2017-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873526#comment-15873526
 ] 

ASF GitHub Bot commented on BEAM-1511:
--

GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/2045

[BEAM-1511] Add wildcard import back again.

R: @jbonofre 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam jenk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2045


commit 3aa4fe35dd7351f7c9e10f81b364d1b7fcd30412
Author: Ahmet Altay 
Date:   2017-02-19T08:15:16Z

Add wildcard import back again.




> Python post commits failing
> ---
>
> Key: BEAM-1511
> URL: https://issues.apache.org/jira/browse/BEAM-1511
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>
> Post commit tests are failing with the following traceback:
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 177, in test_iterable_side_input
> pipeline.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/test_pipeline.py",
>  line 91, in run
> result = super(TestPipeline, self).run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/pipeline.py",
>  line 163, in run
> return self.runner.run(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/test/test_dataflow_runner.py",
>  line 32, in run
> self.result = super(TestDataflowRunner, self).run(pipeline)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/dataflow_runner.py",
>  line 175, in run
> self.dataflow_client.create_job(self.job), self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/utils/retry.py",
>  line 167, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py",
>  line 411, in create_job
> self.create_job_description(job)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py",
>  line 432, in create_job_description
> job.options, file_copy=self._gcs_file_copy)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/utils/dependency.py",
>  line 274, in stage_job_resources
> file_copy(setup_options.requirements_file, staged_path)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/utils/retry.py",
>  line 167, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py",
>  line 372, in _gcs_file_copy
> self.stage_file(to_folder, to_name, f)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py",
>  line 389, in stage_file
> upload = storage.Upload(stream, mime_type)
> AttributeError: 'module' object has no attribute 'Upload'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2045: [BEAM-1511] Add wildcard import back again.

2017-02-19 Thread aaltay
GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/2045

[BEAM-1511] Add wildcard import back again.

R: @jbonofre 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam jenk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2045


commit 3aa4fe35dd7351f7c9e10f81b364d1b7fcd30412
Author: Ahmet Altay 
Date:   2017-02-19T08:15:16Z

Add wildcard import back again.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: beam_PostCommit_Python_Verify #1308

2017-02-19 Thread Apache Jenkins Server
See 


Changes:

[jbonofre] More informative KafkaCheckpointMark toString

--
[...truncated 274.16 KB...]
  File 
"
 line 91, in run
result = super(TestPipeline, self).run()
  File 
"
 line 163, in run
return self.runner.run(self)
  File 
"
 line 32, in run
self.result = super(TestDataflowRunner, self).run(pipeline)
  File 
"
 line 175, in run
self.dataflow_client.create_job(self.job), self)
  File 
"
 line 167, in wrapper
return fun(*args, **kwargs)
  File 
"
 line 411, in create_job
self.create_job_description(job)
  File 
"
 line 432, in create_job_description
job.options, file_copy=self._gcs_file_copy)
  File 
"
 line 274, in stage_job_resources
file_copy(setup_options.requirements_file, staged_path)
  File 
"
 line 167, in wrapper
return fun(*args, **kwargs)
  File 
"
 line 372, in _gcs_file_copy
self.stage_file(to_folder, to_name, f)
  File 
"
 line 389, in stage_file
upload = storage.Upload(stream, mime_type)
AttributeError: 'module' object has no attribute 'Upload'
 >> begin captured logging << 
root: DEBUG: PValue computed by side list (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by main input (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by ViewAsList(side 
list.None)/CreatePCollectionView (tag 1): refcount: 2 => 1
root: DEBUG: PValue computed by ViewAsList(side 
list.None)/CreatePCollectionView (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by FlatMap() 
(tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by assert_that/WindowInto(WindowIntoFn) (tag 1): 
refcount: 1 => 0
root: DEBUG: PValue computed by assert_that/ToVoidKey (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by assert_that/Group (tag 1): refcount: 1 => 0
root: DEBUG: PValue computed by assert_that/UnKey (tag 1): refcount: 1 => 0
root: INFO: Starting GCS upload to 
gs://temp-storage-for-end-to-end-tests/staging-validatesrunner-test/py-validatesrunner-1487491859.1487491870.903585/requirements.txt...
- >> end captured logging << -

==
ERROR: test_as_singleton_with_different_defaults_with_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest)
--
Traceback (most recent call last):
  File 
"
 line 265, in test_as_singleton_with_different_defaults_with_unique_labels
pipeline.run()
  File 
"
 line 91, in run
result = super(TestPipeline, self).run()
  File 
"
 line 163, in run
return self.runner.run(self)
  File 
"
 line 32, in run
self.result = super(TestDataflowRunner, self).run(pipeline)
  File 
"
 line 175, in run
self.dataflow_client.create_job(self.job), self)
  File 

[1/2] beam git commit: More informative KafkaCheckpointMark toString

2017-02-19 Thread jbonofre
Repository: beam
Updated Branches:
  refs/heads/master 393bbc9d1 -> 406301979


More informative KafkaCheckpointMark toString


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/7bef50a0
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/7bef50a0
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/7bef50a0

Branch: refs/heads/master
Commit: 7bef50a0b6eb1cd1e0b71db4e439d31008095b6f
Parents: 393bbc9
Author: Aviem Zur 
Authored: Fri Feb 17 12:55:43 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Sun Feb 19 08:50:31 2017 +0100

--
 .../beam/sdk/io/kafka/KafkaCheckpointMark.java  | 16 
 1 file changed, 16 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/7bef50a0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCheckpointMark.java
--
diff --git 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCheckpointMark.java
 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCheckpointMark.java
index 763a98a..61a382d 100644
--- 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCheckpointMark.java
+++ 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCheckpointMark.java
@@ -17,6 +17,8 @@
  */
 package org.apache.beam.sdk.io.kafka;
 
+import com.google.common.base.Joiner;
+
 import java.io.IOException;
 import java.io.Serializable;
 import java.util.List;
@@ -52,6 +54,11 @@ public class KafkaCheckpointMark implements 
UnboundedSource.CheckpointMark {
 // is restarted (checkpoint is not available for job restarts).
   }
 
+  @Override
+  public String toString() {
+return "KafkaCheckpointMark{partitions=" + Joiner.on(",").join(partitions) 
+ '}';
+  }
+
   /**
* A tuple to hold topic, partition, and offset that comprise the checkpoint
* for a single partition.
@@ -80,6 +87,15 @@ public class KafkaCheckpointMark implements 
UnboundedSource.CheckpointMark {
 public long getNextOffset() {
   return nextOffset;
 }
+
+@Override
+public String toString() {
+  return "PartitionMark{"
+  + "topic='" + topic + '\''
+  + ", partition=" + partition
+  + ", nextOffset=" + nextOffset
+  + '}';
+}
   }
 }
 



[2/2] beam git commit: This closes #2034

2017-02-19 Thread jbonofre
This closes #2034


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/40630197
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/40630197
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/40630197

Branch: refs/heads/master
Commit: 4063019795ea4ae60a7e5b2aa8e077ac148b2163
Parents: 393bbc9 7bef50a
Author: Jean-Baptiste Onofré 
Authored: Sun Feb 19 09:07:59 2017 +0100
Committer: Jean-Baptiste Onofré 
Committed: Sun Feb 19 09:07:59 2017 +0100

--
 .../beam/sdk/io/kafka/KafkaCheckpointMark.java  | 16 
 1 file changed, 16 insertions(+)
--




[jira] [Created] (BEAM-1511) Python post commits failing

2017-02-19 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-1511:
-

 Summary: Python post commits failing
 Key: BEAM-1511
 URL: https://issues.apache.org/jira/browse/BEAM-1511
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay


Post commit tests are failing with the following traceback:

Traceback (most recent call last):
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/transforms/sideinputs_test.py",
 line 177, in test_iterable_side_input
pipeline.run()
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/test_pipeline.py",
 line 91, in run
result = super(TestPipeline, self).run()
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/pipeline.py",
 line 163, in run
return self.runner.run(self)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/test/test_dataflow_runner.py",
 line 32, in run
self.result = super(TestDataflowRunner, self).run(pipeline)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/dataflow_runner.py",
 line 175, in run
self.dataflow_client.create_job(self.job), self)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/utils/retry.py",
 line 167, in wrapper
return fun(*args, **kwargs)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py",
 line 411, in create_job
self.create_job_description(job)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py",
 line 432, in create_job_description
job.options, file_copy=self._gcs_file_copy)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/utils/dependency.py",
 line 274, in stage_job_resources
file_copy(setup_options.requirements_file, staged_path)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/utils/retry.py",
 line 167, in wrapper
return fun(*args, **kwargs)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py",
 line 372, in _gcs_file_copy
self.stage_file(to_folder, to_name, f)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py",
 line 389, in stage_file
upload = storage.Upload(stream, mime_type)
AttributeError: 'module' object has no attribute 'Upload'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)