[jira] [Commented] (BEAM-2018) refine expression of Calcite method/function

2017-04-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980278#comment-15980278
 ] 

ASF GitHub Bot commented on BEAM-2018:
--

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/2656

[BEAM-2018] refine expression of Calcite method/function

In this PR, `BeamSQLFnExecutor` is introduced to replace 
`BeamSQLSpELExecutor`, as the executor of `SqlOperator` expressions.

Several common operators are added here as reference, more are needed to 
cover all operators defined in `org.apache.calcite.sql.fun.SqlStdOperatorTable`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2018

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2656


commit 3ce5f7bae2b5a7e13c9dd2abf9d4111739540175
Author: mingmxu 
Date:   2017-04-23T05:20:25Z

redesign BeamSqlExpression to execute Calcite SQL expression.




> refine expression of Calcite method/function
> 
>
> Key: BEAM-2018
> URL: https://issues.apache.org/jira/browse/BEAM-2018
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> https://calcite.apache.org/docs/reference.html list the method/functions that 
> are supported in Calcite SQL statements. 
> In this task, it defines an interface on how to mapping SQL expressions(with 
> method/functions, and those without like direct-field reference) into a Java 
> expression that can be evaluated against row records. 
> --It's supposed to replace the reference implementation 
> {{BeamSQLSpELExecutor}} for better capability. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2656: [BEAM-2018] refine expression of Calcite method/fun...

2017-04-22 Thread XuMingmin
GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/2656

[BEAM-2018] refine expression of Calcite method/function

In this PR, `BeamSQLFnExecutor` is introduced to replace 
`BeamSQLSpELExecutor`, as the executor of `SqlOperator` expressions.

Several common operators are added here as reference, more are needed to 
cover all operators defined in `org.apache.calcite.sql.fun.SqlStdOperatorTable`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2018

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2656


commit 3ce5f7bae2b5a7e13c9dd2abf9d4111739540175
Author: mingmxu 
Date:   2017-04-23T05:20:25Z

redesign BeamSqlExpression to execute Calcite SQL expression.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (BEAM-2044) Downgrade HBaseIO to use the stable HBase client version (1.2.x)

2017-04-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-2044.

Resolution: Fixed

> Downgrade HBaseIO to use the stable HBase client version (1.2.x)
> 
>
> Key: BEAM-2044
> URL: https://issues.apache.org/jira/browse/BEAM-2044
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: First stable release
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: First stable release
>
>
> We must follow the stable version of the client dependency for HBaseIO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2044) Downgrade HBaseIO to use the stable HBase client version (1.2.x)

2017-04-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980270#comment-15980270
 ] 

ASF GitHub Bot commented on BEAM-2044:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2648


> Downgrade HBaseIO to use the stable HBase client version (1.2.x)
> 
>
> Key: BEAM-2044
> URL: https://issues.apache.org/jira/browse/BEAM-2044
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: First stable release
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: First stable release
>
>
> We must follow the stable version of the client dependency for HBaseIO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2648: [BEAM-2044] Downgrade HBaseIO to use the stable HBa...

2017-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2648


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-2044] Downgrade HBaseIO to use the stable HBase client version (1.2.x)

2017-04-22 Thread jbonofre
Repository: beam
Updated Branches:
  refs/heads/master be696fc20 -> 1dce98f07


[BEAM-2044] Downgrade HBaseIO to use the stable HBase client version (1.2.x)


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f5c0121d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f5c0121d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f5c0121d

Branch: refs/heads/master
Commit: f5c0121db73f560514d548c381d337d5b57468b8
Parents: be696fc
Author: Ismaël Mejía 
Authored: Sat Apr 22 14:38:20 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Sun Apr 23 07:00:26 2017 +0200

--
 sdks/java/io/hbase/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/f5c0121d/sdks/java/io/hbase/pom.xml
--
diff --git a/sdks/java/io/hbase/pom.xml b/sdks/java/io/hbase/pom.xml
index 3695bcb..d8cb95f 100644
--- a/sdks/java/io/hbase/pom.xml
+++ b/sdks/java/io/hbase/pom.xml
@@ -31,7 +31,7 @@
   Library to read and write from/to HBase
 
   
-1.3.1
+1.2.5
 2.5.1
   
 



[2/2] beam git commit: [BEAM-2044] This closes #2648

2017-04-22 Thread jbonofre
[BEAM-2044] This closes #2648


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1dce98f0
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1dce98f0
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1dce98f0

Branch: refs/heads/master
Commit: 1dce98f0745e9c1cc4d83a7532895f2f28ec86bf
Parents: be696fc f5c0121
Author: Jean-Baptiste Onofré 
Authored: Sun Apr 23 07:17:38 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Sun Apr 23 07:17:38 2017 +0200

--
 sdks/java/io/hbase/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




[jira] [Commented] (BEAM-1940) Add Capabilities info to "Built-in I/O Transforms" page

2017-04-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980269#comment-15980269
 ] 

Aviem Zur commented on BEAM-1940:
-

This may be a bit much for the table in 
https://beam.apache.org/documentation/io/built-in/
However, the IO names there are links to their respective package on Beam's 
repo on GitHub. We could add `README.MD` to these packages with the information 
you suggest.

> Add Capabilities info to "Built-in I/O Transforms" page
> ---
>
> Key: BEAM-1940
> URL: https://issues.apache.org/jira/browse/BEAM-1940
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> It'd be helpful for users to understand the various capabilities of the I/O 
> transforms. 
> Capabilities/fields I think it'd be useful to list per datastore+language: 
> * Read - Singleton/Parallel/Dynamic Work Rebalancing supported
> * Write - Singleton/Parallel/Dynamic Work Rebalancing supported
> * Testing - unit/Small IT/Multi-node IT tests run as part of CI(when they're 
> available :)
> It may also be useful to list high-impact Jira issues, but I'm not sure if 
> we'll be keeping that up to date or not so I hesitate to add it. If there is 
> a long-standing high-impact issue, there's nothing stopping us from just 
> adding a note in that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1940) Add Capabilities info to "Built-in I/O Transforms" page

2017-04-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980269#comment-15980269
 ] 

Aviem Zur edited comment on BEAM-1940 at 4/23/17 5:13 AM:
--

This may be a bit much for the table in 
https://beam.apache.org/documentation/io/built-in/
However, the IO names there are links to their respective package on Beam's 
repo on GitHub. We could add {{README.MD}} to these packages with the 
information you suggest.


was (Author: aviemzur):
This may be a bit much for the table in 
https://beam.apache.org/documentation/io/built-in/
However, the IO names there are links to their respective package on Beam's 
repo on GitHub. We could add `README.MD` to these packages with the information 
you suggest.

> Add Capabilities info to "Built-in I/O Transforms" page
> ---
>
> Key: BEAM-1940
> URL: https://issues.apache.org/jira/browse/BEAM-1940
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> It'd be helpful for users to understand the various capabilities of the I/O 
> transforms. 
> Capabilities/fields I think it'd be useful to list per datastore+language: 
> * Read - Singleton/Parallel/Dynamic Work Rebalancing supported
> * Write - Singleton/Parallel/Dynamic Work Rebalancing supported
> * Testing - unit/Small IT/Multi-node IT tests run as part of CI(when they're 
> available :)
> It may also be useful to list high-impact Jira issues, but I'm not sure if 
> we'll be keeping that up to date or not so I hesitate to add it. If there is 
> a long-standing high-impact issue, there's nothing stopping us from just 
> adding a note in that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-2027) get error sometimes while running the same code using beam0.6

2017-04-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980259#comment-15980259
 ] 

Aviem Zur edited comment on BEAM-2027 at 4/23/17 4:26 AM:
--

No problem, you're welcome!


was (Author: aviemzur):
No problem!

> get error sometimes while running the same code using beam0.6
> -
>
> Key: BEAM-2027
> URL: https://issues.apache.org/jira/browse/BEAM-2027
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-spark
> Environment: spark-1.6.2-bin-hadoop2.6, hadoop-2.6.0, source:hdfs 
> sink:hdfs
>Reporter: liyuntian
>Assignee: Aviem Zur
> Fix For: Not applicable
>
>
> run a yarn job using beam0.6.0, I get file from hdfs and write record to 
> hdfs, I use spark-1.6.2-bin-hadoop2.6,hadoop-2.6.0. I get error sometime 
> below, 
> 17/04/20 21:10:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, etl-develop-003): java.io.InvalidClassException: 
> org.apache.beam.runners.spark.coders.CoderHelpers$3; local class 
> incompatible: stream classdesc serialVersionUID = 1334222146820528045, local 
> class serialVersionUID = 5119956493581628999
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> 

[jira] [Resolved] (BEAM-2027) get error sometimes while running the same code using beam0.6

2017-04-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-2027.
-
   Resolution: Not A Bug
Fix Version/s: Not applicable

No problem!

> get error sometimes while running the same code using beam0.6
> -
>
> Key: BEAM-2027
> URL: https://issues.apache.org/jira/browse/BEAM-2027
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-spark
> Environment: spark-1.6.2-bin-hadoop2.6, hadoop-2.6.0, source:hdfs 
> sink:hdfs
>Reporter: liyuntian
>Assignee: Aviem Zur
> Fix For: Not applicable
>
>
> run a yarn job using beam0.6.0, I get file from hdfs and write record to 
> hdfs, I use spark-1.6.2-bin-hadoop2.6,hadoop-2.6.0. I get error sometime 
> below, 
> 17/04/20 21:10:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, etl-develop-003): java.io.InvalidClassException: 
> org.apache.beam.runners.spark.coders.CoderHelpers$3; local class 
> incompatible: stream classdesc serialVersionUID = 1334222146820528045, local 
> class serialVersionUID = 5119956493581628999
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>  

[jira] [Commented] (BEAM-2027) get error sometimes while running the same code using beam0.6

2017-04-22 Thread liyuntian (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980245#comment-15980245
 ] 

liyuntian commented on BEAM-2027:
-

my application's jar is a fat jar, I submit spark job on yarn-cluster from an 
agent ,but I put beam0.5 jars in the SPARK_HOME by mistake before,now I use 
beam 0.6 and I forget to delete beam0.5 jars,Perhaps the driver first get 
beam0.5 jars from SPARK_HOME and then get jars from fat jars.when driver find 
beam0,5 in SPARK_HOME, it serialize the job, and it deserialize the job in 
executors useing fat jars which I use beam0.6.  now I delete beam0.5 jars in my 
SPARK_HOME ,it run successfully,thank you.
it is not a  streaming application, I will submit spark job using beam later , 
thank you very much.

> get error sometimes while running the same code using beam0.6
> -
>
> Key: BEAM-2027
> URL: https://issues.apache.org/jira/browse/BEAM-2027
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-spark
> Environment: spark-1.6.2-bin-hadoop2.6, hadoop-2.6.0, source:hdfs 
> sink:hdfs
>Reporter: liyuntian
>Assignee: Aviem Zur
>
> run a yarn job using beam0.6.0, I get file from hdfs and write record to 
> hdfs, I use spark-1.6.2-bin-hadoop2.6,hadoop-2.6.0. I get error sometime 
> below, 
> 17/04/20 21:10:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, etl-develop-003): java.io.InvalidClassException: 
> org.apache.beam.runners.spark.coders.CoderHelpers$3; local class 
> incompatible: stream classdesc serialVersionUID = 1334222146820528045, local 
> class serialVersionUID = 5119956493581628999
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at 

[GitHub] beam pull request #2655: Rename openWindows => windowsThatAreOpen

2017-04-22 Thread wtanaka
GitHub user wtanaka opened a pull request:

https://github.com/apache/beam/pull/2655

Rename openWindows => windowsThatAreOpen

to make its name have parallel structure with windowsThatShouldFire

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wtanaka/beam rename

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2655


commit 59a580735384fd72374f263588b41819e9a415e1
Author: wtanaka.com 
Date:   2017-04-23T01:48:32Z

Rename openWindows => windowsThatAreOpen

to make its name have parallel structure with windowsThatShouldFire




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-2010) expose programming interface

2017-04-22 Thread Xu Mingmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin reassigned BEAM-2010:


Assignee: (was: Xu Mingmin)

> expose programming interface
> 
>
> Key: BEAM-2010
> URL: https://issues.apache.org/jira/browse/BEAM-2010
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Xu Mingmin
>
> Expose programming interfaces to use BeamSQL. With these APIs, developers can 
> inject SQL in their code, to help create a pipeline from SQL statements. A 
> proposed use case looks like:
> {code}
> BeamSQLEnvironment sqlEnv = BeamSQLEnvironment.create(options);
> sqlEnv.addAsSource(BeamSqlTable table);
> //...
> sqlEnv.addAsSink(BeamSqlTable table);
> //...
> Pipeline pipeline = sqlEnv.explainPipeline(String sqlStatement);
> pipeline.run().waitUntilFinish();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2008) aggregation functions support

2017-04-22 Thread Xu Mingmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin reassigned BEAM-2008:


Assignee: (was: Xu Mingmin)

> aggregation functions support
> -
>
> Key: BEAM-2008
> URL: https://issues.apache.org/jira/browse/BEAM-2008
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Xu Mingmin
>
> Support common-used aggregation functions in SQL, including:
> COUNT
> SUM
> MAX
> MIN



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2055) Merge AfterWatermarkEarlyAndLate and FromEndOfWindow

2017-04-22 Thread Wesley Tanaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wesley Tanaka reassigned BEAM-2055:
---

Assignee: (was: Davor Bonaci)

> Merge AfterWatermarkEarlyAndLate and FromEndOfWindow
> 
>
> Key: BEAM-2055
> URL: https://issues.apache.org/jira/browse/BEAM-2055
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 0.6.0
>Reporter: Wesley Tanaka
>Priority: Minor
>
> https://lists.apache.org/thread.html/6613a4d9fcf4318cc2567aa9ac67e79f3cc375877df815c75c178c42@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2055) Merge AfterWatermarkEarlyAndLate and FromEndOfWindow

2017-04-22 Thread Wesley Tanaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wesley Tanaka reassigned BEAM-2055:
---

Assignee: Davor Bonaci  (was: Wesley Tanaka)

> Merge AfterWatermarkEarlyAndLate and FromEndOfWindow
> 
>
> Key: BEAM-2055
> URL: https://issues.apache.org/jira/browse/BEAM-2055
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 0.6.0
>Reporter: Wesley Tanaka
>Assignee: Davor Bonaci
>Priority: Minor
>
> https://lists.apache.org/thread.html/6613a4d9fcf4318cc2567aa9ac67e79f3cc375877df815c75c178c42@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2654: IntelliJ gives "ambiguous method call" error

2017-04-22 Thread wtanaka
GitHub user wtanaka opened a pull request:

https://github.com/apache/beam/pull/2654

IntelliJ gives "ambiguous method call" error

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wtanaka/beam intellij

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2654


commit 9745c409af6495eb360d8e481817aeb1c042f1c6
Author: wtanaka.com 
Date:   2017-04-22T22:47:38Z

IntelliJ gives "ambiguous method call" error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2653: Add comments to Trigger.java

2017-04-22 Thread wtanaka
GitHub user wtanaka opened a pull request:

https://github.com/apache/beam/pull/2653

Add comments to Trigger.java



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wtanaka/beam comments

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2653


commit 7193c0460fe741308fe16745a5090ac5dbf07c2a
Author: wtanaka.com 
Date:   2017-04-22T22:14:35Z

Add comments to Trigger.java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2652: Remove unused private fields

2017-04-22 Thread wtanaka
GitHub user wtanaka opened a pull request:

https://github.com/apache/beam/pull/2652

Remove unused private fields



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wtanaka/beam dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2652


commit d45df9f6265a443dfe85fdcb713cf27e751199f3
Author: wtanaka.com 
Date:   2017-04-22T21:44:17Z

Remove unused private fields




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Dataflow #2912

2017-04-22 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-2055) Merge AfterWatermarkEarlyAndLate and FromEndOfWindow

2017-04-22 Thread Wesley Tanaka (JIRA)
Wesley Tanaka created BEAM-2055:
---

 Summary: Merge AfterWatermarkEarlyAndLate and FromEndOfWindow
 Key: BEAM-2055
 URL: https://issues.apache.org/jira/browse/BEAM-2055
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Affects Versions: 0.6.0
Reporter: Wesley Tanaka
Assignee: Wesley Tanaka
Priority: Minor


https://lists.apache.org/thread.html/6613a4d9fcf4318cc2567aa9ac67e79f3cc375877df815c75c178c42@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Spark #1752

2017-04-22 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #2911

2017-04-22 Thread Apache Jenkins Server
See 




[jira] [Closed] (BEAM-1428) KinesisIO should comply with PTransform style guide

2017-04-22 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-1428.
--
   Resolution: Fixed
Fix Version/s: First stable release

> KinesisIO should comply with PTransform style guide
> ---
>
> Key: BEAM-1428
> URL: https://issues.apache.org/jira/browse/BEAM-1428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>  Labels: backward-incompatible, starter
> Fix For: First stable release
>
>
> Suggested changes:
> - KinesisIO.Read should be a PTransform itself
> - It should have builder methods .withBlah() for setting the parameters, 
> instead of the current somewhat strange combination of the from() factory 
> methods and the using() methods



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1428) KinesisIO should comply with PTransform style guide

2017-04-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980094#comment-15980094
 ] 

ASF GitHub Bot commented on BEAM-1428:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2620


> KinesisIO should comply with PTransform style guide
> ---
>
> Key: BEAM-1428
> URL: https://issues.apache.org/jira/browse/BEAM-1428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>  Labels: backward-incompatible, starter
>
> Suggested changes:
> - KinesisIO.Read should be a PTransform itself
> - It should have builder methods .withBlah() for setting the parameters, 
> instead of the current somewhat strange combination of the from() factory 
> methods and the using() methods



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2620: [BEAM-1428] KinesisIO should comply with PTransform...

2017-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2620


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2620

2017-04-22 Thread jkff
This closes #2620


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/be696fc2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/be696fc2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/be696fc2

Branch: refs/heads/master
Commit: be696fc20a1c2f87a9125f46b31e72901cc06d28
Parents: 4a8e5d5 f420e62
Author: Eugene Kirpichov 
Authored: Sat Apr 22 12:14:12 2017 -0700
Committer: Eugene Kirpichov 
Committed: Sat Apr 22 12:14:12 2017 -0700

--
 sdks/java/io/kinesis/pom.xml|   6 +
 .../apache/beam/sdk/io/kinesis/KinesisIO.java   | 148 +--
 .../sdk/io/kinesis/KinesisMockReadTest.java |  14 +-
 .../beam/sdk/io/kinesis/KinesisReaderIT.java|  10 +-
 4 files changed, 119 insertions(+), 59 deletions(-)
--




[1/2] beam git commit: [BEAM-1428] KinesisIO should comply with PTransform style guide

2017-04-22 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master 4a8e5d5f9 -> be696fc20


[BEAM-1428] KinesisIO should comply with PTransform style guide


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f420e62a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f420e62a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f420e62a

Branch: refs/heads/master
Commit: f420e62a12706a667883cead315c9fe4cc482220
Parents: 4a8e5d5
Author: Eugene Kirpichov 
Authored: Thu Apr 20 16:48:23 2017 -0700
Committer: Eugene Kirpichov 
Committed: Sat Apr 22 12:13:57 2017 -0700

--
 sdks/java/io/kinesis/pom.xml|   6 +
 .../apache/beam/sdk/io/kinesis/KinesisIO.java   | 148 +--
 .../sdk/io/kinesis/KinesisMockReadTest.java |  14 +-
 .../beam/sdk/io/kinesis/KinesisReaderIT.java|  10 +-
 4 files changed, 119 insertions(+), 59 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/f420e62a/sdks/java/io/kinesis/pom.xml
--
diff --git a/sdks/java/io/kinesis/pom.xml b/sdks/java/io/kinesis/pom.xml
index 7ed2f95..25600ff 100644
--- a/sdks/java/io/kinesis/pom.xml
+++ b/sdks/java/io/kinesis/pom.xml
@@ -120,6 +120,12 @@
 
 
 
+  com.google.auto.value
+  auto-value
+  provided
+
+
+
   junit
   junit
   test

http://git-wip-us.apache.org/repos/asf/beam/blob/f420e62a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
--
diff --git 
a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 
b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
index 45a7b2d..c97316d 100644
--- 
a/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
+++ 
b/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
@@ -18,6 +18,7 @@
 package org.apache.beam.sdk.io.kinesis;
 
 
+import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
 import com.amazonaws.auth.AWSCredentialsProvider;
@@ -27,26 +28,27 @@ import com.amazonaws.regions.Regions;
 import com.amazonaws.services.kinesis.AmazonKinesis;
 import com.amazonaws.services.kinesis.AmazonKinesisClient;
 import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream;
+import com.google.auto.value.AutoValue;
+import javax.annotation.Nullable;
 import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.io.BoundedReadFromUnboundedSource;
 import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.joda.time.Duration;
 import org.joda.time.Instant;
 
 /**
  * {@link PTransform}s for reading from
  * https://aws.amazon.com/kinesis/;>Kinesis streams.
  *
- * Usage
- *
- * Main class you're going to operate is called {@link KinesisIO}.
- * It follows the usage conventions laid out by other *IO classes like
- * BigQueryIO or PubsubIOLet's see how you can set up a simple Pipeline, which 
reads from Kinesis:
+ * Example usage:
  *
  * {@code
- * p.
- *   apply(KinesisIO.Read.
- * from("streamName", InitialPositionInStream.LATEST).
- * using("AWS_KEY", _"AWS_SECRET", STREAM_REGION).
- * apply( ... ) // other transformations
+ * p.apply(KinesisIO.read()
+ * .from("streamName", InitialPositionInStream.LATEST)
+ * .withClientProvider("AWS_KEY", _"AWS_SECRET", STREAM_REGION)
+ *  .apply( ... ) // other transformations
  * }
  *
  * As you can see you need to provide 3 things:
@@ -81,80 +83,132 @@ import org.joda.time.Instant;
  * Usage is pretty straightforward:
  *
  * {@code
- * p.
- *   apply(KinesisIO.Read.
- *from("streamName", InitialPositionInStream.LATEST).
- *using(MyCustomKinesisClientProvider()).
- *apply( ... ) // other transformations
+ * p.apply(KinesisIO.read()
+ *.from("streamName", InitialPositionInStream.LATEST)
+ *.withClientProvider(new MyCustomKinesisClientProvider())
+ *  .apply( ... ) // other transformations
  * }
  *
  * There’s also possibility to start reading using arbitrary point in 
time -
  * in this case you need to provide {@link Instant} object:
  *
  * {@code
- * p.
- *   apply(KinesisIO.Read.
- * from("streamName", instant).
- * using(MyCustomKinesisClientProvider()).
- * apply( ... ) // other transformations
+ * p.apply(KinesisIO.read()
+ * .from("streamName", instant)
+ * .withClientProvider(new MyCustomKinesisClientProvider())
+ *  .apply( ... ) // other transformations
  * }
  *
  */
 @Experimental
 public final 

[jira] [Commented] (BEAM-2054) Upgrade dataflow.version to v1b3-rev196-1.22.0

2017-04-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980069#comment-15980069
 ] 

ASF GitHub Bot commented on BEAM-2054:
--

GitHub user drieber opened a pull request:

https://github.com/apache/beam/pull/2651

[BEAM-2054] Update pom.xml: upgrade dataflow.version to v1b3-rev196-1.22.0

Upgrade dataflow.version to v1b3-rev196-1.22.0

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/drieber/incubator-beam patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2651


commit 3a5f8de54f1cb0555eaeffb6e34dd4392368be1a
Author: drieber 
Date:   2017-04-22T18:20:13Z

Update pom.xml

Upgrade dataflow.version to v1b3-rev196-1.22.0




> Upgrade dataflow.version to v1b3-rev196-1.22.0
> --
>
> Key: BEAM-2054
> URL: https://issues.apache.org/jira/browse/BEAM-2054
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system
>Reporter: David Rieber
>Assignee: Davor Bonaci
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2651: [BEAM-2054] Update pom.xml: upgrade dataflow.versio...

2017-04-22 Thread drieber
GitHub user drieber opened a pull request:

https://github.com/apache/beam/pull/2651

[BEAM-2054] Update pom.xml: upgrade dataflow.version to v1b3-rev196-1.22.0

Upgrade dataflow.version to v1b3-rev196-1.22.0

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/drieber/incubator-beam patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2651


commit 3a5f8de54f1cb0555eaeffb6e34dd4392368be1a
Author: drieber 
Date:   2017-04-22T18:20:13Z

Update pom.xml

Upgrade dataflow.version to v1b3-rev196-1.22.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2054) Upgrade dataflow.version to v1b3-rev196-1.22.0

2017-04-22 Thread David Rieber (JIRA)
David Rieber created BEAM-2054:
--

 Summary: Upgrade dataflow.version to v1b3-rev196-1.22.0
 Key: BEAM-2054
 URL: https://issues.apache.org/jira/browse/BEAM-2054
 Project: Beam
  Issue Type: New Feature
  Components: build-system
Reporter: David Rieber
Assignee: Davor Bonaci






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2650: Remove unnecessary semicolons

2017-04-22 Thread wtanaka
GitHub user wtanaka opened a pull request:

https://github.com/apache/beam/pull/2650

Remove unnecessary semicolons

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wtanaka/beam semicolons

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2650.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2650


commit a1e99b16bdca8a1832037ac6b49a20541c3e0f3c
Author: wtanaka.com 
Date:   2017-04-22T18:18:58Z

Remove unnecessary semicolons




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-1672) Accumulable MetricsContainers.

2017-04-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1672:

Description: Make {{MetricsContainer}} accumulable. This can reduce 
duplication between runners and make implementing metrics easier for runners 
which have accumulators..  (was: Make {{MetricsContainer}} accumulable. This 
can reduce duplication between runners and make implementing metrics easier.)

> Accumulable MetricsContainers.
> --
>
> Key: BEAM-1672
> URL: https://issues.apache.org/jira/browse/BEAM-1672
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Make {{MetricsContainer}} accumulable. This can reduce duplication between 
> runners and make implementing metrics easier for runners which have 
> accumulators..



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1672) Accumulable MetricsContainers.

2017-04-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980004#comment-15980004
 ] 

ASF GitHub Bot commented on BEAM-1672:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2649

[BEAM-1672] Accumulable MetricsContainers.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam accumulable-metricscontainer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2649.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2649


commit 9f81af4b1a71af97672956e75298cff115c63fad
Author: Aviem Zur 
Date:   2017-04-22T14:45:35Z

[BEAM-1672] Make MetricsContainers accumulable.

commit 1a208e2dd7128a25cf62ee75fae4078b2af665f9
Author: Aviem Zur 
Date:   2017-04-22T16:00:11Z

[BEAM-1672] AccumulatedMetricsResults

commit f3f9b588eb473705506dc4bbc2020e3628e2e843
Author: Aviem Zur 
Date:   2017-04-22T16:00:51Z

[BEAM-1672] Use Accumulable MetricsContainers in Spark runner.

commit f41466b23ba475e141fde6d3db8c5c2fc14470c9
Author: Aviem Zur 
Date:   2017-04-22T16:01:38Z

[BEAM-1672] Use Accumulable MetricsContainers in Flink runner.




> Accumulable MetricsContainers.
> --
>
> Key: BEAM-1672
> URL: https://issues.apache.org/jira/browse/BEAM-1672
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Make {{MetricsContainer}} accumulable. This can reduce duplication between 
> runners and make implementing metrics easier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2649: [BEAM-1672] Accumulable MetricsContainers.

2017-04-22 Thread aviemzur
GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2649

[BEAM-1672] Accumulable MetricsContainers.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam accumulable-metricscontainer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2649.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2649


commit 9f81af4b1a71af97672956e75298cff115c63fad
Author: Aviem Zur 
Date:   2017-04-22T14:45:35Z

[BEAM-1672] Make MetricsContainers accumulable.

commit 1a208e2dd7128a25cf62ee75fae4078b2af665f9
Author: Aviem Zur 
Date:   2017-04-22T16:00:11Z

[BEAM-1672] AccumulatedMetricsResults

commit f3f9b588eb473705506dc4bbc2020e3628e2e843
Author: Aviem Zur 
Date:   2017-04-22T16:00:51Z

[BEAM-1672] Use Accumulable MetricsContainers in Spark runner.

commit f41466b23ba475e141fde6d3db8c5c2fc14470c9
Author: Aviem Zur 
Date:   2017-04-22T16:01:38Z

[BEAM-1672] Use Accumulable MetricsContainers in Flink runner.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-1672) Accumulable MetricsContainers.

2017-04-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1672:

Description: Make {{MetricsContainer}} accumulable. This can reduce 
duplication between runners and make implementing metrics easier.  (was: 
Refactor metrics implementations to reduce duplication.

For example: Making {{MetricsContainer}} accumulable could reduce LOC in 
several runners and make implementing metrics easier.)

> Accumulable MetricsContainers.
> --
>
> Key: BEAM-1672
> URL: https://issues.apache.org/jira/browse/BEAM-1672
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Make {{MetricsContainer}} accumulable. This can reduce duplication between 
> runners and make implementing metrics easier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #2910

2017-04-22 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-1672) Reduce duplication in runners metrics implementations

2017-04-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1672:

Description: 
Refactor metrics implementations to reduce duplication.

For example: Making {{MetricsContainer}} accumulable could reduce LOC in 
several runners.

  was:
Extract interface {{MetricData}} with common functionality in 
{{DistributionData}} and {{GaugeData}}. Create a similar {{CounterData}} class.
Try to refactor code which uses these methods to use the common interface 
(without compromising performance).


> Reduce duplication in runners metrics implementations
> -
>
> Key: BEAM-1672
> URL: https://issues.apache.org/jira/browse/BEAM-1672
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Refactor metrics implementations to reduce duplication.
> For example: Making {{MetricsContainer}} accumulable could reduce LOC in 
> several runners.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1672) Reduce duplication in runners metrics implementations

2017-04-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1672:

Description: 
Refactor metrics implementations to reduce duplication.

For example: Making {{MetricsContainer}} accumulable could reduce LOC in 
several runners and make implementing metrics easier.

  was:
Refactor metrics implementations to reduce duplication.

For example: Making {{MetricsContainer}} accumulable could reduce LOC in 
several runners.


> Reduce duplication in runners metrics implementations
> -
>
> Key: BEAM-1672
> URL: https://issues.apache.org/jira/browse/BEAM-1672
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Refactor metrics implementations to reduce duplication.
> For example: Making {{MetricsContainer}} accumulable could reduce LOC in 
> several runners and make implementing metrics easier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1672) Reduce duplication in runners metrics implementations

2017-04-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1672:

Summary: Reduce duplication in runners metrics implementations  (was: 
Extract interface MetricData)

> Reduce duplication in runners metrics implementations
> -
>
> Key: BEAM-1672
> URL: https://issues.apache.org/jira/browse/BEAM-1672
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Extract interface {{MetricData}} with common functionality in 
> {{DistributionData}} and {{GaugeData}}. Create a similar {{CounterData}} 
> class.
> Try to refactor code which uses these methods to use the common interface 
> (without compromising performance).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2044) Downgrade HBaseIO to use the stable HBase client version (1.2.x)

2017-04-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979907#comment-15979907
 ] 

ASF GitHub Bot commented on BEAM-2044:
--

GitHub user iemejia opened a pull request:

https://github.com/apache/beam/pull/2648

[BEAM-2044] Downgrade HBaseIO to use the stable HBase client version …

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/iemejia/beam BEAM-2044-downgrade-hbase-client

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2648


commit ed77a0ef33308cc1f2c8de61cf651b47b20bb2cc
Author: Ismaël Mejía 
Date:   2017-04-22T12:38:20Z

[BEAM-2044] Downgrade HBaseIO to use the stable HBase client version (1.2.x)




> Downgrade HBaseIO to use the stable HBase client version (1.2.x)
> 
>
> Key: BEAM-2044
> URL: https://issues.apache.org/jira/browse/BEAM-2044
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: First stable release
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: First stable release
>
>
> We must follow the stable version of the client dependency for HBaseIO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2044) Downgrade HBaseIO to use the stable HBase client version (1.2.x)

2017-04-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2044:
---
Description: We must follow the stable version of the client dependency for 
HBaseIO.  (was: An interesting fix on Scans on exhausted regions was added so 
this is worth the upgrade.)

> Downgrade HBaseIO to use the stable HBase client version (1.2.x)
> 
>
> Key: BEAM-2044
> URL: https://issues.apache.org/jira/browse/BEAM-2044
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: First stable release
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: First stable release
>
>
> We must follow the stable version of the client dependency for HBaseIO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2044) Downgrade HBaseIO to use the stable HBase client version (1.2.x)

2017-04-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2044:
---
Summary: Downgrade HBaseIO to use the stable HBase client version (1.2.x)  
(was: Upgrade HBaseIO to use HBase client version 1.3.1)

> Downgrade HBaseIO to use the stable HBase client version (1.2.x)
> 
>
> Key: BEAM-2044
> URL: https://issues.apache.org/jira/browse/BEAM-2044
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: First stable release
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: First stable release
>
>
> An interesting fix on Scans on exhausted regions was added so this is worth 
> the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-2027) get error sometimes while running the same code using beam0.6

2017-04-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979797#comment-15979797
 ] 

Aviem Zur edited comment on BEAM-2027 at 4/22/17 7:22 AM:
--

{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In believe in your application it could be the case that the version of Beam in 
the driver's classpath is different than the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.


was (Author: aviemzur):
{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In believe in your case the version of Beam in the driver's classpath is 
different than the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.

> get error sometimes while running the same code using beam0.6
> -
>
> Key: BEAM-2027
> URL: https://issues.apache.org/jira/browse/BEAM-2027
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-spark
> Environment: spark-1.6.2-bin-hadoop2.6, hadoop-2.6.0, source:hdfs 
> sink:hdfs
>Reporter: liyuntian
>Assignee: Aviem Zur
>
> run a yarn job using beam0.6.0, I get file from hdfs and write record to 
> hdfs, I use spark-1.6.2-bin-hadoop2.6,hadoop-2.6.0. I get error sometime 
> below, 
> 17/04/20 21:10:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, etl-develop-003): java.io.InvalidClassException: 
> org.apache.beam.runners.spark.coders.CoderHelpers$3; local class 
> incompatible: stream classdesc serialVersionUID = 1334222146820528045, local 
> class serialVersionUID = 5119956493581628999
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> 

[jira] [Comment Edited] (BEAM-2027) get error sometimes while running the same code using beam0.6

2017-04-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979797#comment-15979797
 ] 

Aviem Zur edited comment on BEAM-2027 at 4/22/17 7:21 AM:
--

{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In believe in your case the version of Beam in the driver's classpath is 
different than the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.


was (Author: aviemzur):
{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In your case - the version of Beam in the driver's classpath is different than 
the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.

> get error sometimes while running the same code using beam0.6
> -
>
> Key: BEAM-2027
> URL: https://issues.apache.org/jira/browse/BEAM-2027
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-spark
> Environment: spark-1.6.2-bin-hadoop2.6, hadoop-2.6.0, source:hdfs 
> sink:hdfs
>Reporter: liyuntian
>Assignee: Aviem Zur
>
> run a yarn job using beam0.6.0, I get file from hdfs and write record to 
> hdfs, I use spark-1.6.2-bin-hadoop2.6,hadoop-2.6.0. I get error sometime 
> below, 
> 17/04/20 21:10:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, etl-develop-003): java.io.InvalidClassException: 
> org.apache.beam.runners.spark.coders.CoderHelpers$3; local class 
> incompatible: stream classdesc serialVersionUID = 1334222146820528045, local 
> class serialVersionUID = 5119956493581628999
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at 

[2/3] beam git commit: [BEAM-1899] Add JStormRunnerRegistrar and empty implementations of PipelineRunner, RunnerResult, PipelineOptions.

2017-04-22 Thread pei
[BEAM-1899] Add JStormRunnerRegistrar and empty implementations of 
PipelineRunner, RunnerResult, PipelineOptions.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/15ebaf0f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/15ebaf0f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/15ebaf0f

Branch: refs/heads/jstorm-runner
Commit: 15ebaf0f77c6194f4f676644a2ff79fb24a1
Parents: 9d4de1b
Author: Pei He 
Authored: Thu Apr 6 14:55:25 2017 +0800
Committer: Pei He 
Committed: Sat Apr 22 15:05:20 2017 +0800

--
 .../runners/jstorm/JStormPipelineOptions.java   | 26 
 .../beam/runners/jstorm/JStormRunner.java   | 33 ++
 .../runners/jstorm/JStormRunnerRegistrar.java   | 55 +
 .../beam/runners/jstorm/JStormRunnerResult.java | 63 
 .../beam/runners/jstorm/package-info.java   | 22 +++
 .../jstorm/JStormRunnerRegistrarTest.java   | 49 +++
 6 files changed, 248 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/15ebaf0f/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormPipelineOptions.java
--
diff --git 
a/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormPipelineOptions.java
 
b/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormPipelineOptions.java
new file mode 100644
index 000..cc0aed5
--- /dev/null
+++ 
b/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormPipelineOptions.java
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.jstorm;
+
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * {@link PipelineOptions} that configures the JStorm pipeline.
+ */
+public interface JStormPipelineOptions extends PipelineOptions {
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/15ebaf0f/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunner.java
--
diff --git 
a/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunner.java 
b/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunner.java
new file mode 100644
index 000..78df4ca
--- /dev/null
+++ 
b/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunner.java
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.jstorm;
+
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.runners.PipelineRunner;
+
+/**
+ * A {@link PipelineRunner} that translates the {@link Pipeline} to a JStorm 
DAG and executes it
+ * either locally or on a JStorm cluster.
+ */
+public class JStormRunner extends PipelineRunner {
+
+@Override
+public JStormRunnerResult run(Pipeline pipeline) {
+throw new UnsupportedOperationException("This method is not yet 
implemented.");
+}
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/15ebaf0f/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunnerRegistrar.java
--
diff --git 

[1/3] beam git commit: [BEAM-1899] Start jstorm runner moduel in feature branch.

2017-04-22 Thread pei
Repository: beam
Updated Branches:
  refs/heads/jstorm-runner a8edbb81f -> f6a89b0fc


[BEAM-1899] Start jstorm runner moduel in feature branch.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9d4de1b2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9d4de1b2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9d4de1b2

Branch: refs/heads/jstorm-runner
Commit: 9d4de1b2f5bf4816cb834e08aae8c127d2dca4cf
Parents: a8edbb8
Author: Pei He 
Authored: Thu Apr 6 14:53:04 2017 +0800
Committer: Pei He 
Committed: Sat Apr 22 14:23:53 2017 +0800

--
 runners/jstorm/pom.xml | 194 
 runners/pom.xml|   1 +
 2 files changed, 195 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/9d4de1b2/runners/jstorm/pom.xml
--
diff --git a/runners/jstorm/pom.xml b/runners/jstorm/pom.xml
new file mode 100644
index 000..31a9b22
--- /dev/null
+++ b/runners/jstorm/pom.xml
@@ -0,0 +1,194 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  4.0.0
+
+  
+org.apache.beam
+beam-runners-parent
+0.7.0-SNAPSHOT
+../pom.xml
+  
+  
+  beam-runners-jstorm
+
+  Apache Beam :: Runners :: JStorm
+
+  jar
+
+  
+2.2.1
+  
+  
+  
+
+  
+  local-validates-runner-tests
+  false
+  
+
+  
+org.apache.maven.plugins
+maven-surefire-plugin
+
+  
+validates-runner-tests
+integration-test
+
+  test
+
+
+  
+org.apache.beam.sdk.testing.ValidatesRunner
+  
+  none
+  true
+  
+
org.apache.beam:beam-sdks-java-core
+  
+  
+
+  [
+"--runner=org.apache.beam.runners.jstorm.JStormRunner"
+  ]
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+
+  com.alibaba.jstorm
+  jstorm-core
+  ${jstorm.core.version}
+  provided
+
+
+
+
+  org.apache.beam
+  beam-sdks-java-core
+  
+
+
+  org.slf4j
+  slf4j-jdk14
+
+  
+
+
+
+  org.apache.beam
+  beam-runners-core-java
+  
+
+
+  org.slf4j
+  slf4j-jdk14
+
+  
+
+
+
+  org.apache.beam
+  beam-runners-core-construction-java
+  
+
+
+  org.slf4j
+  slf4j-jdk14
+
+  
+
+
+
+
+com.google.auto.service
+auto-service
+true
+
+
+com.google.auto.value
+auto-value
+
+
+
+
+  org.apache.beam
+  beam-sdks-java-core
+  tests
+  test
+  
+
+
+  org.slf4j
+  slf4j-jdk14
+
+  
+
+
+
+
+  com.fasterxml.jackson.dataformat
+  jackson-dataformat-yaml
+  test
+
+
+
+
+junit
+junit
+test
+
+
+  org.hamcrest
+  hamcrest-all
+  test
+
+
+  org.mockito
+  mockito-all
+  test
+
+  
+
+  
+  
+
+  org.apache.maven.plugins
+  maven-dependency-plugin
+  
+
+  analyze-only
+  
+
+false
+  
+
+  
+
+  
+  
+

http://git-wip-us.apache.org/repos/asf/beam/blob/9d4de1b2/runners/pom.xml
--
diff --git a/runners/pom.xml b/runners/pom.xml
index 150e987..e06c39e 100644
--- a/runners/pom.xml
+++ b/runners/pom.xml
@@ -40,6 +40,7 @@
 google-cloud-dataflow-java
 spark
 apex
+jstorm
   
 
   



[3/3] beam git commit: This closes #2457

2017-04-22 Thread pei
This closes #2457


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f6a89b0f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f6a89b0f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f6a89b0f

Branch: refs/heads/jstorm-runner
Commit: f6a89b0fc2428d2f85e087525a6ddb5361eed4cb
Parents: a8edbb8 15ebaf0
Author: Pei He 
Authored: Sat Apr 22 15:12:45 2017 +0800
Committer: Pei He 
Committed: Sat Apr 22 15:12:45 2017 +0800

--
 runners/jstorm/pom.xml  | 194 +++
 .../runners/jstorm/JStormPipelineOptions.java   |  26 +++
 .../beam/runners/jstorm/JStormRunner.java   |  33 
 .../runners/jstorm/JStormRunnerRegistrar.java   |  55 ++
 .../beam/runners/jstorm/JStormRunnerResult.java |  63 ++
 .../beam/runners/jstorm/package-info.java   |  22 +++
 .../jstorm/JStormRunnerRegistrarTest.java   |  49 +
 runners/pom.xml |   1 +
 8 files changed, 443 insertions(+)
--




[jira] [Commented] (BEAM-2027) get error sometimes while running the same code using beam0.6

2017-04-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979797#comment-15979797
 ] 

Aviem Zur commented on BEAM-2027:
-

{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In your case - the version of Beam in the driver's classpath is different than 
the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.

> get error sometimes while running the same code using beam0.6
> -
>
> Key: BEAM-2027
> URL: https://issues.apache.org/jira/browse/BEAM-2027
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-spark
> Environment: spark-1.6.2-bin-hadoop2.6, hadoop-2.6.0, source:hdfs 
> sink:hdfs
>Reporter: liyuntian
>Assignee: Aviem Zur
>
> run a yarn job using beam0.6.0, I get file from hdfs and write record to 
> hdfs, I use spark-1.6.2-bin-hadoop2.6,hadoop-2.6.0. I get error sometime 
> below, 
> 17/04/20 21:10:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, etl-develop-003): java.io.InvalidClassException: 
> org.apache.beam.runners.spark.coders.CoderHelpers$3; local class 
> incompatible: stream classdesc serialVersionUID = 1334222146820528045, local 
> class serialVersionUID = 5119956493581628999
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at