[GitHub] beam pull request #2140: [BEAM-1590] DataflowRunner: experimental support fo...

2017-03-01 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/beam/pull/2140

[BEAM-1590] DataflowRunner: experimental support for issuing FnAPI based 
jobs

R: @kennknowles 
CC: @herohde @lukecwik 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/beam fnapi-submission

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2140


commit 43e958beb47744a77d5296f956c78fd78bc24c2e
Author: Dan Halperin 
Date:   2017-03-02T07:06:11Z

DataflowRunner: experimental support for issuing FnAPI based jobs

Also cleanup some code around checking for existence of experiments.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-1411) Unable to downgrade to lower guava version after upgrade to beam-0.5

2017-03-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1411:
--
Fix Version/s: First stable release

> Unable to downgrade to lower guava version after upgrade to beam-0.5
> 
>
> Key: BEAM-1411
> URL: https://issues.apache.org/jira/browse/BEAM-1411
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 0.5.0
>Reporter: Michael Luckey
>Assignee: Davor Bonaci
> Fix For: First stable release
>
>
> While testing upgrading to 0.5 version of beam we ran into NoSuchMethodErrors
> {noformat}
> java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>  at 
> org.apache.beam.sdk.io.hdfs.HDFSFileSource$SerializableSplit.(HDFSFileSource.java:473)
>  at 
> org.apache.beam.sdk.io.hdfs.AvroHDFSFileSource$1.apply(AvroHDFSFileSource.java:81)
>  at 
> org.apache.beam.sdk.io.hdfs.AvroHDFSFileSource$1.apply(AvroHDFSFileSource.java:78)
>  at 
> com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:451)
>  at java.util.AbstractList$Itr.next(AbstractList.java:358)
>  at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:202)
>  at 
> org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:65)
>  at 
> org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:168)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:329)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:71)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:178)
>  at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:258)
> {noformat}
> This seems to be caused by the [HDFS 
> IO|https://github.com/apache/beam/tree/master/sdks/java/io/hdfs] components 
> guava dependency not being shaded - in contrast to the core components - and 
> revealed by the [recent update to 
> guava-20.0|https://github.com/apache/beam/commit/0b4b2becb45b9f637ba31f599ebe8be0331bd633]
>  and the therefore incorporated api changes made by [overloading 
> methods|https://github.com/google/guava/commit/892e323fca32945cdfb25395ca6e346dd0fffa5b#diff-fe7358934fa6eba23c2791eb40cec030]
> Did not check, whether more components are affected to.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1550) HBaseIO tests timeout precommit

2017-03-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin reassigned BEAM-1550:
-

Assignee: Ismaël Mejía  (was: Kenneth Knowles)

> HBaseIO tests timeout precommit
> ---
>
> Key: BEAM-1550
> URL: https://issues.apache.org/jira/browse/BEAM-1550
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Kenneth Knowles
>Assignee: Ismaël Mejía
>Priority: Blocker
>
> Since build 7819 of the Jenkins precommit, the HBaseIOTest 
> (beam-sdks-java-io-hbase) has been timing out:
> https://builds.apache.org/view/Beam/job/beam_PreCommit_Java_MavenInstall/7819/console



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1467) Use well-known coder types for known window coders

2017-03-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin resolved BEAM-1467.
---
   Resolution: Fixed
Fix Version/s: 0.6.0

> Use well-known coder types for known window coders
> --
>
> Key: BEAM-1467
> URL: https://issues.apache.org/jira/browse/BEAM-1467
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api, beam-model-runner-api
>Reporter: Daniel Halperin
>Assignee: Vikas Kedigehalli
>Priority: Minor
> Fix For: 0.6.0
>
>
> Known window types include:
> * GlobalWindow
> * IntervalWindow
> * WindowedValueCoder
> Standardizing the name and encodings of these windows will enable many more 
> pipelines to work across the Fn API with low overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1565) Update Spark runner PostCommit Jenkins job.

2017-03-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1565:
---

Assignee: Aviem Zur

> Update Spark runner PostCommit Jenkins job.
> ---
>
> Key: BEAM-1565
> URL: https://issues.apache.org/jira/browse/BEAM-1565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>Priority: Minor
>  Labels: beginner, low-hanging-fruit
>
> Should only activate {{runnable-on-service}} profile (and 
> {{streaming-runnable-on-service}} once enabled) and remove 
> {{-Dspark.port.maxRetries=64}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Dataflow #2445

2017-03-01 Thread Apache Jenkins Server
See 


--
[...truncated 3.87 MB...]
[INFO] 2017-03-02T07:35:47.270Z: (715e168a75870ca2): Fusing consumer 
PAssert$280/RunChecks into PAssert$280/GetPane/Map
[INFO] 2017-03-02T07:35:47.276Z: (715e168a75870338): Unzipping flatten s18 for 
input s13.13
[INFO] 2017-03-02T07:35:47.279Z: (715e168a7587056a): Fusing unzipped copy of 
PAssert$280/GroupGlobally/GroupDummyAndContents/Reify, through flatten , into 
producer PAssert$280/GroupGlobally/KeyForDummy/AddKeys/Map
[INFO] 2017-03-02T07:35:47.281Z: (715e168a7587079c): Fusing consumer 
PAssert$280/GroupGlobally/GroupDummyAndContents/Reify into 
PAssert$280/GroupGlobally/WindowIntoDummy/Window.Assign
[INFO] 2017-03-02T07:35:47.289Z: (715e168a75870064): Unzipping flatten s18-u40 
for input s20-reify-value19-c38
[INFO] 2017-03-02T07:35:47.291Z: (715e168a75870296): Fusing unzipped copy of 
PAssert$280/GroupGlobally/GroupDummyAndContents/Write, through flatten , into 
producer PAssert$280/GroupGlobally/GroupDummyAndContents/Reify
[INFO] 2017-03-02T07:35:47.293Z: (715e168a758704c8): Fusing consumer 
PAssert$280/GroupGlobally/GroupDummyAndContents/Write into 
PAssert$280/GroupGlobally/GroupDummyAndContents/Reify
[INFO] 2017-03-02T07:35:47.295Z: (715e168a758706fa): Fusing consumer 
PAssert$280/GroupGlobally/GroupDummyAndContents/GroupByWindow into 
PAssert$280/GroupGlobally/GroupDummyAndContents/Read
[INFO] 2017-03-02T07:35:47.297Z: (715e168a7587092c): Fusing consumer 
PAssert$280/GroupGlobally/GatherAllOutputs/Values/Values/Map into 
PAssert$280/GroupGlobally/GatherAllOutputs/GroupByKey/GroupByWindow
[INFO] 2017-03-02T07:35:47.299Z: (715e168a75870b5e): Fusing consumer 
Distinct/Combine.perKey(Anonymous)/Combine.GroupedValues/Extract into 
Distinct/Combine.perKey(Anonymous)/Combine.GroupedValues
[INFO] 2017-03-02T07:35:47.302Z: (715e168a75870d90): Fusing consumer 
Distinct/Combine.perKey(Anonymous)/Combine.GroupedValues into 
Distinct/Combine.perKey(Anonymous)/GroupByKey/Read
[INFO] 2017-03-02T07:35:47.304Z: (715e168a75870fc2): Fusing consumer 
PAssert$280/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map into 
PAssert$280/GroupGlobally/GatherAllOutputs/ParDo(ReifyTimestampsAndWindows)
[INFO] 2017-03-02T07:35:47.306Z: (715e168a758701f4): Fusing consumer 
PAssert$280/GroupGlobally/GatherAllOutputs/ParDo(ReifyTimestampsAndWindows) 
into PAssert$280/GroupGlobally/Window.Into()/Window.Assign
[INFO] 2017-03-02T07:35:47.308Z: (715e168a75870426): Fusing consumer 
PAssert$280/GroupGlobally/KeyForDummy/AddKeys/Map into 
PAssert$280/GroupGlobally/RewindowActuals/Window.Assign
[INFO] 2017-03-02T07:35:47.310Z: (715e168a75870658): Fusing consumer 
Distinct/Keys/Keys/Map into 
Distinct/Combine.perKey(Anonymous)/Combine.GroupedValues/Extract
[INFO] 2017-03-02T07:35:47.312Z: (715e168a7587088a): Fusing consumer 
PAssert$280/GroupGlobally/RewindowActuals/Window.Assign into 
PAssert$280/GroupGlobally/GatherAllOutputs/Values/Values/Map
[INFO] 2017-03-02T07:35:47.314Z: (715e168a75870abc): Fusing consumer 
PAssert$280/GroupGlobally/GatherAllOutputs/GroupByKey/Write into 
PAssert$280/GroupGlobally/GatherAllOutputs/GroupByKey/Reify
[INFO] 2017-03-02T07:35:47.316Z: (715e168a75870cee): Fusing consumer 
Distinct/Combine.perKey(Anonymous)/GroupByKey/Reify into 
Distinct/Combine.perKey(Anonymous)/GroupByKey+Distinct/Combine.perKey(Anonymous)/Combine.GroupedValues/Partial
[INFO] 2017-03-02T07:35:47.318Z: (715e168a75870f20): Fusing consumer 
Distinct/CreateIndex/Map into Create.Values/Read(CreateSource)
[INFO] 2017-03-02T07:35:47.320Z: (715e168a75870152): Fusing consumer 
Distinct/Combine.perKey(Anonymous)/GroupByKey/Write into 
Distinct/Combine.perKey(Anonymous)/GroupByKey/Reify
[INFO] 2017-03-02T07:35:47.322Z: (715e168a75870384): Fusing consumer 
PAssert$280/GroupGlobally/GatherAllOutputs/GroupByKey/GroupByWindow into 
PAssert$280/GroupGlobally/GatherAllOutputs/GroupByKey/Read
[INFO] 2017-03-02T07:35:47.325Z: (715e168a758705b6): Fusing consumer 
PAssert$280/GroupGlobally/Window.Into()/Window.Assign into 
Distinct/Keys/Keys/Map
[INFO] 2017-03-02T07:35:47.327Z: (715e168a758707e8): Fusing consumer 
PAssert$280/GroupGlobally/GatherAllOutputs/GroupByKey/Reify into 
PAssert$280/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign
[INFO] 2017-03-02T07:35:47.332Z: (715e168a75870a1a): Fusing consumer 
PAssert$280/GroupGlobally/GatherAllOutputs/Window.Into()/Window.Assign into 
PAssert$280/GroupGlobally/GatherAllOutputs/WithKeys/AddKeys/Map
[INFO] 2017-03-02T07:35:47.334Z: (715e168a75870c4c): Fusing consumer 
Distinct/Combine.perKey(Anonymous)/GroupByKey+Distinct/Combine.perKey(Anonymous)/Combine.GroupedValues/Partial
 into Distinct/CreateIndex/Map
[INFO] 2017-03-02T07:35:47.471Z: S01: (2e98163713366cd9): Executing operation 
Distinct/Combine.perKey(Anonymous)/GroupByKey/Create
[INFO] 2017-03-02T07:35:47.677Z: (af143fd80179dc71): Starting 1 workers...
[INFO] 201

[GitHub] beam pull request #2139: Update Dataflow container version for release 0.6.0

2017-03-01 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/beam/pull/2139

Update Dataflow container version for release 0.6.0



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/beam release-0.6.0-patch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2139


commit bb28c9eca412514d8dd08f2ac4309f1087bd2ee9
Author: Dan Halperin 
Date:   2017-03-02T05:37:11Z

Update Dataflow container version for release 0.6.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam-site pull request #168: Add windowing section to programming guide

2017-03-01 Thread melap
GitHub user melap opened a pull request:

https://github.com/apache/beam-site/pull/168

Add windowing section to programming guide

R: @kennknowles 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/melap/beam-site windowing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #168


commit 87df39874aea738cc323fedc89fa575267dc5b6a
Author: melissa 
Date:   2017-03-02T03:54:17Z

Add windowing section to programming guide




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2017-03-01 Thread Jingsong Lee (JIRA)
Jingsong Lee created BEAM-1589:
--

 Summary: Add OnWindowExpiration method to Stateful DoFn
 Key: BEAM-1589
 URL: https://issues.apache.org/jira/browse/BEAM-1589
 Project: Beam
  Issue Type: New Feature
  Components: runner-core, sdk-java-core
Reporter: Jingsong Lee


See BEAM-1517
This allows the user to do some work before the state's garbage collection.
It seems kind of annoying, but on the other hand forgetting to set a final 
timer to flush state is probably data loss most of the time.
FlinkRunner does this work very simply, but other runners, such as 
DirectRunner, need to traverse all the states to do this, and maybe it's a 
little hard.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1476) Support MapState in Flink runner

2017-03-01 Thread Jingsong Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned BEAM-1476:
--

Assignee: Jingsong Lee

> Support MapState in Flink runner
> 
>
> Key: BEAM-1476
> URL: https://issues.apache.org/jira/browse/BEAM-1476
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Jingsong Lee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1483) Support SetState in Flink runner

2017-03-01 Thread Jingsong Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned BEAM-1483:
--

Assignee: Jingsong Lee

> Support SetState in Flink runner
> 
>
> Key: BEAM-1483
> URL: https://issues.apache.org/jira/browse/BEAM-1483
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Jingsong Lee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1588) Reuse StateNamespace.stringKey in Flink States

2017-03-01 Thread Jingsong Lee (JIRA)
Jingsong Lee created BEAM-1588:
--

 Summary: Reuse StateNamespace.stringKey in Flink States
 Key: BEAM-1588
 URL: https://issues.apache.org/jira/browse/BEAM-1588
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Jingsong Lee
Assignee: Jingsong Lee


See BEAM-1587
StateNamespace.stringKey did two things: the base64 encoding of window , and 
then String.format. These two things consumption is not small. We can cache it 
in State and reuse.
Further more, we can cache the state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1587) Use StringBuilder to stringKey of StateNamespace instead of String.format

2017-03-01 Thread Jingsong Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned BEAM-1587:
--

Assignee: Jingsong Lee  (was: Kenneth Knowles)

> Use StringBuilder to stringKey of StateNamespace instead of String.format
> -
>
> Key: BEAM-1587
> URL: https://issues.apache.org/jira/browse/BEAM-1587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>
> In Flink Runner, each State visit will call the namespace stringKey once. 
> Since stringKey uses String.format to deal with, the impact on performance is 
> relatively large.
> Some extreme cases, stringKey performance consumption of up to 2%.
> Here is a test on StringBuilder and String.format:
> {code}
>   public static void main(String[] args) throws Exception {
> String[] strs = new String[1000_000];
> for (int i = 0; i < strs.length; i++) {
>   strs[i] = getRandomString(10);
> }
> {
>   long start = System.nanoTime();
>   for (int i = 0; i < strs.length; i++) {
> strs[i] = testFormat(strs[i]);
>   }
>   System.out.println("testStringFormat: " + ((System.nanoTime() - 
> start)/1000_000) + "ms");
> }
> {
>   long start = System.nanoTime();
>   for (int i = 0; i < strs.length; i++) {
> strs[i] = testStringBuild(strs[i]);
>   }
>   System.out.println("testStringBuilder: " + ((System.nanoTime() - 
> start)/1000_000) + "ms");
> }
>   }
> {code}
> testStringFormat: 2312ms
> testStringBuilder: 266ms



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1587) Use StringBuilder to stringKey of StateNamespace instead of String.format

2017-03-01 Thread Jingsong Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated BEAM-1587:
---
Description: 
In Flink Runner, each State visit will call the namespace stringKey once. Since 
stringKey uses String.format to deal with, the impact on performance is 
relatively large.
Some extreme cases, stringKey performance consumption of up to 2%.
Here is a test on StringBuilder and String.format:
{code}
  public static void main(String[] args) throws Exception {
String[] strs = new String[1000_000];
for (int i = 0; i < strs.length; i++) {
  strs[i] = getRandomString(10);
}
{
  long start = System.nanoTime();
  for (int i = 0; i < strs.length; i++) {
strs[i] = testFormat(strs[i]);
  }
  System.out.println("testStringFormat: " + ((System.nanoTime() - 
start)/1000_000) + "ms");
}
{
  long start = System.nanoTime();
  for (int i = 0; i < strs.length; i++) {
strs[i] = testStringBuild(strs[i]);
  }
  System.out.println("testStringBuilder: " + ((System.nanoTime() - 
start)/1000_000) + "ms");
}
  }
{code}
testStringFormat: 2312ms
testStringBuilder: 266ms

> Use StringBuilder to stringKey of StateNamespace instead of String.format
> -
>
> Key: BEAM-1587
> URL: https://issues.apache.org/jira/browse/BEAM-1587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Jingsong Lee
>Assignee: Kenneth Knowles
>
> In Flink Runner, each State visit will call the namespace stringKey once. 
> Since stringKey uses String.format to deal with, the impact on performance is 
> relatively large.
> Some extreme cases, stringKey performance consumption of up to 2%.
> Here is a test on StringBuilder and String.format:
> {code}
>   public static void main(String[] args) throws Exception {
> String[] strs = new String[1000_000];
> for (int i = 0; i < strs.length; i++) {
>   strs[i] = getRandomString(10);
> }
> {
>   long start = System.nanoTime();
>   for (int i = 0; i < strs.length; i++) {
> strs[i] = testFormat(strs[i]);
>   }
>   System.out.println("testStringFormat: " + ((System.nanoTime() - 
> start)/1000_000) + "ms");
> }
> {
>   long start = System.nanoTime();
>   for (int i = 0; i < strs.length; i++) {
> strs[i] = testStringBuild(strs[i]);
>   }
>   System.out.println("testStringBuilder: " + ((System.nanoTime() - 
> start)/1000_000) + "ms");
> }
>   }
> {code}
> testStringFormat: 2312ms
> testStringBuilder: 266ms



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1587) Use StringBuilder to stringKey of StateNamespace instead of String.format

2017-03-01 Thread Jingsong Lee (JIRA)
Jingsong Lee created BEAM-1587:
--

 Summary: Use StringBuilder to stringKey of StateNamespace instead 
of String.format
 Key: BEAM-1587
 URL: https://issues.apache.org/jira/browse/BEAM-1587
 Project: Beam
  Issue Type: Improvement
  Components: runner-core
Reporter: Jingsong Lee
Assignee: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #1105

2017-03-01 Thread Apache Jenkins Server
See 




Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1104

2017-03-01 Thread Apache Jenkins Server
See 




beam git commit: [maven-release-plugin] prepare branch release-0.6.0

2017-03-01 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 599a8ed65 -> 1a770ef2f


[maven-release-plugin] prepare branch release-0.6.0


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1a770ef2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1a770ef2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1a770ef2

Branch: refs/heads/master
Commit: 1a770ef2f21b96fbdc5ff06ea8642351f136328f
Parents: 599a8ed
Author: Ahmet Altay 
Authored: Wed Mar 1 18:21:40 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Mar 1 18:21:40 2017 -0800

--
 pom.xml|  2 +-
 runners/core-construction-java/pom.xml |  4 +---
 sdks/common/fn-api/pom.xml |  4 +---
 sdks/common/runner-api/pom.xml |  4 +---
 sdks/java/extensions/jackson/pom.xml   |  4 +---
 sdks/java/harness/pom.xml  |  4 +---
 sdks/java/javadoc/pom.xml  | 10 +-
 7 files changed, 11 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/1a770ef2/pom.xml
--
diff --git a/pom.xml b/pom.xml
index a37f1af..eded684 100644
--- a/pom.xml
+++ b/pom.xml
@@ -48,7 +48,7 @@
 
scm:git:https://git-wip-us.apache.org/repos/asf/beam.git
 
scm:git:https://git-wip-us.apache.org/repos/asf/beam.git
 https://git-wip-us.apache.org/repos/asf?p=beam.git;a=summary
-HEAD
+release-0.6.0
   
 
   

http://git-wip-us.apache.org/repos/asf/beam/blob/1a770ef2/runners/core-construction-java/pom.xml
--
diff --git a/runners/core-construction-java/pom.xml 
b/runners/core-construction-java/pom.xml
index 868f743..b602f5d 100644
--- a/runners/core-construction-java/pom.xml
+++ b/runners/core-construction-java/pom.xml
@@ -17,9 +17,7 @@
   ~ limitations under the License.
   -->
 
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
 
   4.0.0
 

http://git-wip-us.apache.org/repos/asf/beam/blob/1a770ef2/sdks/common/fn-api/pom.xml
--
diff --git a/sdks/common/fn-api/pom.xml b/sdks/common/fn-api/pom.xml
index 1f6193f..5a41d9e 100644
--- a/sdks/common/fn-api/pom.xml
+++ b/sdks/common/fn-api/pom.xml
@@ -15,9 +15,7 @@
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
   4.0.0
 
   jar

http://git-wip-us.apache.org/repos/asf/beam/blob/1a770ef2/sdks/common/runner-api/pom.xml
--
diff --git a/sdks/common/runner-api/pom.xml b/sdks/common/runner-api/pom.xml
index 8eaeb8e..9c6de1e 100644
--- a/sdks/common/runner-api/pom.xml
+++ b/sdks/common/runner-api/pom.xml
@@ -15,9 +15,7 @@
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
   4.0.0
 
   jar

http://git-wip-us.apache.org/repos/asf/beam/blob/1a770ef2/sdks/java/extensions/jackson/pom.xml
--
diff --git a/sdks/java/extensions/jackson/pom.xml 
b/sdks/java/extensions/jackson/pom.xml
index be5c953..1dfbd72 100644
--- a/sdks/java/extensions/jackson/pom.xml
+++ b/sdks/java/extensions/jackson/pom.xml
@@ -15,9 +15,7 @@
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+h

beam git commit: [maven-release-plugin] prepare branch release-0.6.0

2017-03-01 Thread altay
Repository: beam
Updated Branches:
  refs/heads/release-0.6.0 [created] c4e002bfa


[maven-release-plugin] prepare branch release-0.6.0


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c4e002bf
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c4e002bf
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c4e002bf

Branch: refs/heads/release-0.6.0
Commit: c4e002bfa00b1fef87c7ba2efeaf05ee82692b92
Parents: ca678d8
Author: Ahmet Altay 
Authored: Wed Mar 1 18:06:34 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Mar 1 18:06:34 2017 -0800

--
 pom.xml|  2 +-
 runners/core-construction-java/pom.xml |  4 +---
 sdks/common/fn-api/pom.xml |  4 +---
 sdks/common/runner-api/pom.xml |  4 +---
 sdks/java/extensions/jackson/pom.xml   |  4 +---
 sdks/java/harness/pom.xml  |  4 +---
 sdks/java/javadoc/pom.xml  | 10 +-
 7 files changed, 11 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c4e002bf/pom.xml
--
diff --git a/pom.xml b/pom.xml
index a37f1af..eded684 100644
--- a/pom.xml
+++ b/pom.xml
@@ -48,7 +48,7 @@
 
scm:git:https://git-wip-us.apache.org/repos/asf/beam.git
 
scm:git:https://git-wip-us.apache.org/repos/asf/beam.git
 https://git-wip-us.apache.org/repos/asf?p=beam.git;a=summary
-HEAD
+release-0.6.0
   
 
   

http://git-wip-us.apache.org/repos/asf/beam/blob/c4e002bf/runners/core-construction-java/pom.xml
--
diff --git a/runners/core-construction-java/pom.xml 
b/runners/core-construction-java/pom.xml
index 868f743..b602f5d 100644
--- a/runners/core-construction-java/pom.xml
+++ b/runners/core-construction-java/pom.xml
@@ -17,9 +17,7 @@
   ~ limitations under the License.
   -->
 
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
 
   4.0.0
 

http://git-wip-us.apache.org/repos/asf/beam/blob/c4e002bf/sdks/common/fn-api/pom.xml
--
diff --git a/sdks/common/fn-api/pom.xml b/sdks/common/fn-api/pom.xml
index 1f6193f..5a41d9e 100644
--- a/sdks/common/fn-api/pom.xml
+++ b/sdks/common/fn-api/pom.xml
@@ -15,9 +15,7 @@
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
   4.0.0
 
   jar

http://git-wip-us.apache.org/repos/asf/beam/blob/c4e002bf/sdks/common/runner-api/pom.xml
--
diff --git a/sdks/common/runner-api/pom.xml b/sdks/common/runner-api/pom.xml
index 8eaeb8e..9c6de1e 100644
--- a/sdks/common/runner-api/pom.xml
+++ b/sdks/common/runner-api/pom.xml
@@ -15,9 +15,7 @@
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
   4.0.0
 
   jar

http://git-wip-us.apache.org/repos/asf/beam/blob/c4e002bf/sdks/java/extensions/jackson/pom.xml
--
diff --git a/sdks/java/extensions/jackson/pom.xml 
b/sdks/java/extensions/jackson/pom.xml
index be5c953..1dfbd72 100644
--- a/sdks/java/extensions/jackson/pom.xml
+++ b/sdks/java/extensions/jackson/pom.xml
@@ -15,9 +15,7 @@
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.

[jira] [Resolved] (BEAM-1548) Window should not reassign windows if no WindowFn is specified

2017-03-01 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-1548.
---
   Resolution: Fixed
Fix Version/s: 0.6.0

> Window should not reassign windows if no WindowFn is specified
> --
>
> Key: BEAM-1548
> URL: https://issues.apache.org/jira/browse/BEAM-1548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1548) Window should not reassign windows if no WindowFn is specified

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891453#comment-15891453
 ] 

ASF GitHub Bot commented on BEAM-1548:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2101


> Window should not reassign windows if no WindowFn is specified
> --
>
> Key: BEAM-1548
> URL: https://issues.apache.org/jira/browse/BEAM-1548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2101: [BEAM-1548] Do not Reassign Windows when WindowFn i...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2101


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: Do not Reassign Windows when WindowFn is null

2017-03-01 Thread tgroh
Do not Reassign Windows when WindowFn is null

Adjusting the Windowing Strategy should not change any elements of the
data. This is also potentially type-unsafe, as the upstream WindowFn may
only take elements of a type which is not the input element of the
downstream PTransform.

Introduce Window.Assign, which replaces Window.Bound as the primitive to
"assign elements to windows based on the WindowFn". This converts
Window.Bound into a composite in all cases.

Use a Flatten to improve performance on many runners, without needing an
opaque DoFn.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/eaf9b9b3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/eaf9b9b3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/eaf9b9b3

Branch: refs/heads/master
Commit: eaf9b9b36dec1cc421335b27f225663ce42d0cca
Parents: ca678d8
Author: Thomas Groh 
Authored: Fri Feb 24 11:29:42 2017 -0800
Committer: Thomas Groh 
Committed: Wed Mar 1 17:51:19 2017 -0800

--
 .../translation/ApexPipelineTranslator.java |   2 +-
 .../translation/WindowAssignTranslator.java |  78 +++
 .../apex/translation/WindowBoundTranslator.java |  78 ---
 .../direct/TransformEvaluatorRegistry.java  |   2 +-
 .../runners/direct/WindowEvaluatorFactory.java  |  11 +-
 .../direct/WindowEvaluatorFactoryTest.java  |  46 +--
 .../flink/FlinkBatchTransformTranslators.java   |   8 +-
 .../FlinkStreamingTransformTranslators.java |   8 +-
 .../functions/FlinkAssignWindows.java   |   2 +-
 .../dataflow/DataflowPipelineTranslator.java|   9 +-
 .../spark/translation/TransformTranslator.java  |   8 +-
 .../spark/translation/TranslationUtils.java |   4 +-
 .../streaming/StreamingTransformTranslator.java |   8 +-
 .../beam/sdk/transforms/windowing/Window.java   |  43 +-
 .../sdk/transforms/windowing/WindowTest.java| 136 +++
 15 files changed, 290 insertions(+), 153 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/eaf9b9b3/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
index e9d6571..951a286 100644
--- 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
@@ -71,7 +71,7 @@ public class ApexPipelineTranslator implements 
Pipeline.PipelineVisitor {
 new CreateApexPCollectionViewTranslator());
 registerTransformTranslator(CreatePCollectionView.class,
 new CreatePCollectionViewTranslator());
-registerTransformTranslator(Window.Bound.class, new 
WindowBoundTranslator());
+registerTransformTranslator(Window.Assign.class, new 
WindowAssignTranslator());
   }
 
   public ApexPipelineTranslator(ApexPipelineOptions options) {

http://git-wip-us.apache.org/repos/asf/beam/blob/eaf9b9b3/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/WindowAssignTranslator.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/WindowAssignTranslator.java
 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/WindowAssignTranslator.java
new file mode 100644
index 000..b3aef8d
--- /dev/null
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/WindowAssignTranslator.java
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.apex.translation;
+
+import java.util.Collections;
+import org.apache.beam.runners.apex.ApexPipelineOptions;
+import org.apache.beam.runners.apex.translation.operators.ApexParDoOperator;
+import org.apache.beam.runne

[1/2] beam git commit: This closes #2101

2017-03-01 Thread tgroh
Repository: beam
Updated Branches:
  refs/heads/master ca678d887 -> 599a8ed65


This closes #2101


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/599a8ed6
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/599a8ed6
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/599a8ed6

Branch: refs/heads/master
Commit: 599a8ed65befb2c42ba0488b556bfb42ce77e215
Parents: ca678d8 eaf9b9b
Author: Thomas Groh 
Authored: Wed Mar 1 17:51:19 2017 -0800
Committer: Thomas Groh 
Committed: Wed Mar 1 17:51:19 2017 -0800

--
 .../translation/ApexPipelineTranslator.java |   2 +-
 .../translation/WindowAssignTranslator.java |  78 +++
 .../apex/translation/WindowBoundTranslator.java |  78 ---
 .../direct/TransformEvaluatorRegistry.java  |   2 +-
 .../runners/direct/WindowEvaluatorFactory.java  |  11 +-
 .../direct/WindowEvaluatorFactoryTest.java  |  46 +--
 .../flink/FlinkBatchTransformTranslators.java   |   8 +-
 .../FlinkStreamingTransformTranslators.java |   8 +-
 .../functions/FlinkAssignWindows.java   |   2 +-
 .../dataflow/DataflowPipelineTranslator.java|   9 +-
 .../spark/translation/TransformTranslator.java  |   8 +-
 .../spark/translation/TranslationUtils.java |   4 +-
 .../streaming/StreamingTransformTranslator.java |   8 +-
 .../beam/sdk/transforms/windowing/Window.java   |  43 +-
 .../sdk/transforms/windowing/WindowTest.java| 136 +++
 15 files changed, 290 insertions(+), 153 deletions(-)
--




[jira] [Commented] (BEAM-1586) DataflowRunner mixes up cached and uploaded staged file counters

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891426#comment-15891426
 ] 

ASF GitHub Bot commented on BEAM-1586:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2138


> DataflowRunner mixes up cached and uploaded staged file counters
> 
>
> Key: BEAM-1586
> URL: https://issues.apache.org/jira/browse/BEAM-1586
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 0.5.0
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> we print the "cached" counter as # uploaded, and vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2138: [BEAM-1586] DataflowRunner: fix message about uploa...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2138


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2138

2017-03-01 Thread tgroh
This closes #2138


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ca678d88
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ca678d88
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ca678d88

Branch: refs/heads/master
Commit: ca678d887813fbe60f7d2794e892ee09962d1017
Parents: 7d22dbc 05a006a
Author: Thomas Groh 
Authored: Wed Mar 1 17:33:57 2017 -0800
Committer: Thomas Groh 
Committed: Wed Mar 1 17:33:57 2017 -0800

--
 .../java/org/apache/beam/runners/dataflow/util/PackageUtil.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




[1/2] beam git commit: Fix DataflowRunner message about uploaded vs cached files

2017-03-01 Thread tgroh
Repository: beam
Updated Branches:
  refs/heads/master 7d22dbcab -> ca678d887


Fix DataflowRunner message about uploaded vs cached files


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/05a006a1
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/05a006a1
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/05a006a1

Branch: refs/heads/master
Commit: 05a006a14de2bc42d31321f34a49706ff3814962
Parents: 7d22dbc
Author: Dan Halperin 
Authored: Wed Mar 1 17:17:10 2017 -0800
Committer: Thomas Groh 
Committed: Wed Mar 1 17:33:46 2017 -0800

--
 .../java/org/apache/beam/runners/dataflow/util/PackageUtil.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/05a006a1/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
index 685d48c..482ddd9 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
@@ -330,7 +330,7 @@ class PackageUtil {
 
 LOG.info(
 "Staging files complete: {} files cached, {} files newly uploaded",
-numUploaded.get(), numCached.get());
+numCached.get(), numUploaded.get());
 
 return packages;
   }



[GitHub] beam pull request #2137: Update javadoc ant to include runners/ and exclude ...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2137


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2137

2017-03-01 Thread altay
This closes #2137


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/7d22dbca
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/7d22dbca
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/7d22dbca

Branch: refs/heads/master
Commit: 7d22dbcabacd456eccedb0e68b5a019571c76108
Parents: bc1a301 71a3fd7
Author: Ahmet Altay 
Authored: Wed Mar 1 17:08:40 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Mar 1 17:08:40 2017 -0800

--
 sdks/java/javadoc/ant.xml | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)
--




[1/2] beam git commit: Update javadoc ant to include runners/ and exclude modules with a wildcard

2017-03-01 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master bc1a301b0 -> 7d22dbcab


Update javadoc ant to include runners/ and exclude modules with a wildcard


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/71a3fd76
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/71a3fd76
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/71a3fd76

Branch: refs/heads/master
Commit: 71a3fd769ff4622df07633b5e7b27b5c309de29e
Parents: bc1a301
Author: Ahmet Altay 
Authored: Wed Mar 1 16:04:49 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Mar 1 17:08:30 2017 -0800

--
 sdks/java/javadoc/ant.xml | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/71a3fd76/sdks/java/javadoc/ant.xml
--
diff --git a/sdks/java/javadoc/ant.xml b/sdks/java/javadoc/ant.xml
index 48b8913..80dbdc2 100644
--- a/sdks/java/javadoc/ant.xml
+++ b/sdks/java/javadoc/ant.xml
@@ -35,6 +35,16 @@
results in one Java source tree. -->
   
 
+
+
+  
+
+  
+  
+  
+
+
 
 http://junit.sourceforge.net/javadoc/";
 offline="true" packageListLoc="junit-docs"/>
   
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
-  
+  
+  
+  
+  
+  
+  
+  
+  
+  
 
 
 



[jira] [Updated] (BEAM-1586) DataflowRunner mixes up cached and uploaded staged file counters

2017-03-01 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1586:
--
Affects Version/s: 0.5.0

> DataflowRunner mixes up cached and uploaded staged file counters
> 
>
> Key: BEAM-1586
> URL: https://issues.apache.org/jira/browse/BEAM-1586
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 0.5.0
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> we print the "cached" counter as # uploaded, and vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1586) DataflowRunner mixes up cached and uploaded staged file counters

2017-03-01 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1586:
-

 Summary: DataflowRunner mixes up cached and uploaded staged file 
counters
 Key: BEAM-1586
 URL: https://issues.apache.org/jira/browse/BEAM-1586
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Daniel Halperin
Assignee: Daniel Halperin
Priority: Minor


we print the "cached" counter as # uploaded, and vice versa.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2138: DataflowRunner: fix message about uploaded vs cache...

2017-03-01 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/beam/pull/2138

DataflowRunner: fix message about uploaded vs cached files

Relevant branch name.

R: @tgroh OR @kennknowles or @bjchambers or anyone else.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/beam dan-is-stupid

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2138


commit 7f9c40553823e3e63c146ff737f1de9849b1661b
Author: Dan Halperin 
Date:   2017-03-02T01:17:10Z

DataflowRunner: fix message about uploaded vs cached files




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-1585) Ability to add new file systems to beamFS in the python sdk

2017-03-01 Thread Sourabh Bajaj (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Bajaj updated BEAM-1585:

Description: 
BEAM-1441 implements the new BeamFileSystem in the python SDK but currently 
lacks the ability to add user implemented file systems.

This needs to be executed in the worker so should be packaged correctly with 
the pipeline code. 

> Ability to add new file systems to beamFS in the python sdk
> ---
>
> Key: BEAM-1585
> URL: https://issues.apache.org/jira/browse/BEAM-1585
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
>
> BEAM-1441 implements the new BeamFileSystem in the python SDK but currently 
> lacks the ability to add user implemented file systems.
> This needs to be executed in the worker so should be packaged correctly with 
> the pipeline code. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1585) Ability to add new file systems to beamFS in the python sdk

2017-03-01 Thread Sourabh Bajaj (JIRA)
Sourabh Bajaj created BEAM-1585:
---

 Summary: Ability to add new file systems to beamFS in the python 
sdk
 Key: BEAM-1585
 URL: https://issues.apache.org/jira/browse/BEAM-1585
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py
Reporter: Sourabh Bajaj
Assignee: Sourabh Bajaj






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #2792

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-1584) Support clean-up step in integration test

2017-03-01 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-1584:
---
Description: 
Idea comes from: 
https://github.com/apache/beam/pull/2064/files/628fafed098ac5550356a201c6ccdcdcc2e9604e

Integration tests in all sdks should be able to do clean-up at the end of each 
run.

> Support clean-up step in integration test
> -
>
> Key: BEAM-1584
> URL: https://issues.apache.org/jira/browse/BEAM-1584
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Idea comes from: 
> https://github.com/apache/beam/pull/2064/files/628fafed098ac5550356a201c6ccdcdcc2e9604e
> Integration tests in all sdks should be able to do clean-up at the end of 
> each run.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1584) Support clean-up step in integration test

2017-03-01 Thread Mark Liu (JIRA)
Mark Liu created BEAM-1584:
--

 Summary: Support clean-up step in integration test
 Key: BEAM-1584
 URL: https://issues.apache.org/jira/browse/BEAM-1584
 Project: Beam
  Issue Type: Task
  Components: testing
Reporter: Mark Liu
Assignee: Mark Liu






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1583) Separate GCP test required packages from general GCP dependencies

2017-03-01 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-1583:
---
Issue Type: Task  (was: Bug)

> Separate GCP test required packages from general GCP dependencies
> -
>
> Key: BEAM-1583
> URL: https://issues.apache.org/jira/browse/BEAM-1583
> Project: Beam
>  Issue Type: Task
>  Components: project-management, sdk-py
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> This issue comes from discussion under:
> https://github.com/apache/beam/pull/2064#discussion_r103755653
> If more GCP dependencies introduced for test only purpose, thinking to move 
> them to a separate group.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1583) Separate GCP test required packages from general GCP dependencies

2017-03-01 Thread Mark Liu (JIRA)
Mark Liu created BEAM-1583:
--

 Summary: Separate GCP test required packages from general GCP 
dependencies
 Key: BEAM-1583
 URL: https://issues.apache.org/jira/browse/BEAM-1583
 Project: Beam
  Issue Type: Bug
  Components: project-management, sdk-py
Reporter: Mark Liu
Assignee: Mark Liu


This issue comes from discussion under:
https://github.com/apache/beam/pull/2064#discussion_r103755653

If more GCP dependencies introduced for test only purpose, thinking to move 
them to a separate group.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2137: Update javadoc ant to include runners/ and exclude ...

2017-03-01 Thread aaltay
GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/2137

Update javadoc ant to include runners/ and exclude modules with a wildcard

R: @davorbonaci 

(cc: @sb2nov)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam pom

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2137.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2137


commit a6850bc7991973e6a40bddb37f7aa7d5907b6de2
Author: Ahmet Altay 
Date:   2017-03-02T00:04:49Z

Update javadoc ant to include runners/ and exclude modules with a wildcard




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2791

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-1577) Python BigqueryTornadoes Example Failed

2017-03-01 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu resolved BEAM-1577.

   Resolution: Fixed
Fix Version/s: 0.6.0

> Python BigqueryTornadoes Example Failed
> ---
>
> Key: BEAM-1577
> URL: https://issues.apache.org/jira/browse/BEAM-1577
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 0.6.0
>Reporter: Mark Liu
>Assignee: Mark Liu
> Fix For: 0.6.0
>
>
> Running bigquery_tornadoes on DataflowRunner failed with following commands:
> python -m apache_beam.examples.cookbook.bigquery_tornadoes \
> --project google.com:clouddfe \
> --runner DataflowRunner \
> --staging_location gs://.../tmp/python/staging \
> --temp_location gs://.../tmp/python/temp \
> --output BigQueryTornadoesIT.monthly_tornadoes_0001 \
> --sdk_location dist/apache-beam-0.6.0.dev0.tar.gz
> JobName: beamapp-markliu-0301012437-463604
> Job Link: 
> https://pantheon.corp.google.com/dataflow/job/2017-02-28_17_24_39-6087587560567828169?project=google.com:clouddfe&organizationId=433637338589
> Full stack trace:
> {code}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 544, in do_work
> work_executor.execute()
>   File "dataflow_worker/executor.py", line 971, in 
> dataflow_worker.executor.MapTaskExecutor.execute 
> (dataflow_worker/executor.c:30533)
> with op.scoped_metrics_container:
>   File "dataflow_worker/executor.py", line 972, in 
> dataflow_worker.executor.MapTaskExecutor.execute 
> (dataflow_worker/executor.c:30481)
> op.start()
>   File "dataflow_worker/executor.py", line 207, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8758)
> def start(self):
>   File "dataflow_worker/executor.py", line 208, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8663)
> with self.scoped_start_state:
>   File "dataflow_worker/executor.py", line 213, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8579)
> with self.spec.source.reader() as reader:
>   File "dataflow_worker/executor.py", line 223, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8524)
> self.output(windowed_value)
>   File "dataflow_worker/executor.py", line 151, in 
> dataflow_worker.executor.Operation.output (dataflow_worker/executor.c:6317)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/executor.py", line 84, in 
> dataflow_worker.executor.ConsumerSet.receive (dataflow_worker/executor.c:4021)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/executor.py", line 544, in 
> dataflow_worker.executor.DoOperation.process 
> (dataflow_worker/executor.c:18474)
> with self.scoped_process_state:
>   File "dataflow_worker/executor.py", line 545, in 
> dataflow_worker.executor.DoOperation.process 
> (dataflow_worker/executor.c:18428)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 195, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:5142)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 267, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7201)
> self.reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 279, in 
> apache_beam.runners.common.DoFnRunner.reraise_augmented 
> (apache_beam/runners/common.c:7590)
> raise type(exn), args, sys.exc_info()[2]
>   File "apache_beam/runners/common.py", line 263, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7090)
> self._dofn_simple_invoker(element)
>   File "apache_beam/runners/common.py", line 198, in 
> apache_beam.runners.common.DoFnRunner._dofn_simple_invoker 
> (apache_beam/runners/common.c:5288)
> self._process_outputs(element, self.dofn_process(element.value))
>   File "apache_beam/runners/common.py", line 326, in 
> apache_beam.runners.common.DoFnRunner._process_outputs 
> (apache_beam/runners/common.c:8563)
> self.main_receivers.receive(windowed_value)
>   File "dataflow_worker/executor.py", line 82, in 
> dataflow_worker.executor.ConsumerSet.receive (dataflow_worker/executor.c:3987)
> self.update_counters_start(windowed_value)
>   File "dataflow_worker/executor.py", line 88, in 
> dataflow_worker.executor.ConsumerSet.update_counters_start 
> (dataflow_worker/executor.c:4207)
> self.opcounter.update_from(windowed_value)
>   File "dataflow_worker/opcounters.py", line 57, in 
> dataflow_worker.opcounters.OperationCounters.update_from 
> (dataflow_worker/opcounters.c:2396)
> self.do_sample(windowed_value)
>   File "datafl

[jira] [Closed] (BEAM-1577) Python BigqueryTornadoes Example Failed

2017-03-01 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu closed BEAM-1577.
--

> Python BigqueryTornadoes Example Failed
> ---
>
> Key: BEAM-1577
> URL: https://issues.apache.org/jira/browse/BEAM-1577
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 0.6.0
>Reporter: Mark Liu
>Assignee: Mark Liu
> Fix For: 0.6.0
>
>
> Running bigquery_tornadoes on DataflowRunner failed with following commands:
> python -m apache_beam.examples.cookbook.bigquery_tornadoes \
> --project google.com:clouddfe \
> --runner DataflowRunner \
> --staging_location gs://.../tmp/python/staging \
> --temp_location gs://.../tmp/python/temp \
> --output BigQueryTornadoesIT.monthly_tornadoes_0001 \
> --sdk_location dist/apache-beam-0.6.0.dev0.tar.gz
> JobName: beamapp-markliu-0301012437-463604
> Job Link: 
> https://pantheon.corp.google.com/dataflow/job/2017-02-28_17_24_39-6087587560567828169?project=google.com:clouddfe&organizationId=433637338589
> Full stack trace:
> {code}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 544, in do_work
> work_executor.execute()
>   File "dataflow_worker/executor.py", line 971, in 
> dataflow_worker.executor.MapTaskExecutor.execute 
> (dataflow_worker/executor.c:30533)
> with op.scoped_metrics_container:
>   File "dataflow_worker/executor.py", line 972, in 
> dataflow_worker.executor.MapTaskExecutor.execute 
> (dataflow_worker/executor.c:30481)
> op.start()
>   File "dataflow_worker/executor.py", line 207, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8758)
> def start(self):
>   File "dataflow_worker/executor.py", line 208, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8663)
> with self.scoped_start_state:
>   File "dataflow_worker/executor.py", line 213, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8579)
> with self.spec.source.reader() as reader:
>   File "dataflow_worker/executor.py", line 223, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8524)
> self.output(windowed_value)
>   File "dataflow_worker/executor.py", line 151, in 
> dataflow_worker.executor.Operation.output (dataflow_worker/executor.c:6317)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/executor.py", line 84, in 
> dataflow_worker.executor.ConsumerSet.receive (dataflow_worker/executor.c:4021)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/executor.py", line 544, in 
> dataflow_worker.executor.DoOperation.process 
> (dataflow_worker/executor.c:18474)
> with self.scoped_process_state:
>   File "dataflow_worker/executor.py", line 545, in 
> dataflow_worker.executor.DoOperation.process 
> (dataflow_worker/executor.c:18428)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 195, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:5142)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 267, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7201)
> self.reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 279, in 
> apache_beam.runners.common.DoFnRunner.reraise_augmented 
> (apache_beam/runners/common.c:7590)
> raise type(exn), args, sys.exc_info()[2]
>   File "apache_beam/runners/common.py", line 263, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7090)
> self._dofn_simple_invoker(element)
>   File "apache_beam/runners/common.py", line 198, in 
> apache_beam.runners.common.DoFnRunner._dofn_simple_invoker 
> (apache_beam/runners/common.c:5288)
> self._process_outputs(element, self.dofn_process(element.value))
>   File "apache_beam/runners/common.py", line 326, in 
> apache_beam.runners.common.DoFnRunner._process_outputs 
> (apache_beam/runners/common.c:8563)
> self.main_receivers.receive(windowed_value)
>   File "dataflow_worker/executor.py", line 82, in 
> dataflow_worker.executor.ConsumerSet.receive (dataflow_worker/executor.c:3987)
> self.update_counters_start(windowed_value)
>   File "dataflow_worker/executor.py", line 88, in 
> dataflow_worker.executor.ConsumerSet.update_counters_start 
> (dataflow_worker/executor.c:4207)
> self.opcounter.update_from(windowed_value)
>   File "dataflow_worker/opcounters.py", line 57, in 
> dataflow_worker.opcounters.OperationCounters.update_from 
> (dataflow_worker/opcounters.c:2396)
> self.do_sample(windowed_value)
>   File "dataflow_worker/opcounters.py", line 75, in 
> dataflow_work

[jira] [Commented] (BEAM-1577) Python BigqueryTornadoes Example Failed

2017-03-01 Thread Mark Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891276#comment-15891276
 ] 

Mark Liu commented on BEAM-1577:


PR2128 is merged and new job passed:
https://pantheon.corp.google.com/dataflow/job/2017-03-01_15_15_31-6567849312534451671?project=google.com:clouddfe&organizationId=433637338589

Will mark as fixed and close it.

> Python BigqueryTornadoes Example Failed
> ---
>
> Key: BEAM-1577
> URL: https://issues.apache.org/jira/browse/BEAM-1577
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 0.6.0
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Running bigquery_tornadoes on DataflowRunner failed with following commands:
> python -m apache_beam.examples.cookbook.bigquery_tornadoes \
> --project google.com:clouddfe \
> --runner DataflowRunner \
> --staging_location gs://.../tmp/python/staging \
> --temp_location gs://.../tmp/python/temp \
> --output BigQueryTornadoesIT.monthly_tornadoes_0001 \
> --sdk_location dist/apache-beam-0.6.0.dev0.tar.gz
> JobName: beamapp-markliu-0301012437-463604
> Job Link: 
> https://pantheon.corp.google.com/dataflow/job/2017-02-28_17_24_39-6087587560567828169?project=google.com:clouddfe&organizationId=433637338589
> Full stack trace:
> {code}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 544, in do_work
> work_executor.execute()
>   File "dataflow_worker/executor.py", line 971, in 
> dataflow_worker.executor.MapTaskExecutor.execute 
> (dataflow_worker/executor.c:30533)
> with op.scoped_metrics_container:
>   File "dataflow_worker/executor.py", line 972, in 
> dataflow_worker.executor.MapTaskExecutor.execute 
> (dataflow_worker/executor.c:30481)
> op.start()
>   File "dataflow_worker/executor.py", line 207, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8758)
> def start(self):
>   File "dataflow_worker/executor.py", line 208, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8663)
> with self.scoped_start_state:
>   File "dataflow_worker/executor.py", line 213, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8579)
> with self.spec.source.reader() as reader:
>   File "dataflow_worker/executor.py", line 223, in 
> dataflow_worker.executor.ReadOperation.start (dataflow_worker/executor.c:8524)
> self.output(windowed_value)
>   File "dataflow_worker/executor.py", line 151, in 
> dataflow_worker.executor.Operation.output (dataflow_worker/executor.c:6317)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/executor.py", line 84, in 
> dataflow_worker.executor.ConsumerSet.receive (dataflow_worker/executor.c:4021)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/executor.py", line 544, in 
> dataflow_worker.executor.DoOperation.process 
> (dataflow_worker/executor.c:18474)
> with self.scoped_process_state:
>   File "dataflow_worker/executor.py", line 545, in 
> dataflow_worker.executor.DoOperation.process 
> (dataflow_worker/executor.c:18428)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 195, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:5142)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 267, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7201)
> self.reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 279, in 
> apache_beam.runners.common.DoFnRunner.reraise_augmented 
> (apache_beam/runners/common.c:7590)
> raise type(exn), args, sys.exc_info()[2]
>   File "apache_beam/runners/common.py", line 263, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7090)
> self._dofn_simple_invoker(element)
>   File "apache_beam/runners/common.py", line 198, in 
> apache_beam.runners.common.DoFnRunner._dofn_simple_invoker 
> (apache_beam/runners/common.c:5288)
> self._process_outputs(element, self.dofn_process(element.value))
>   File "apache_beam/runners/common.py", line 326, in 
> apache_beam.runners.common.DoFnRunner._process_outputs 
> (apache_beam/runners/common.c:8563)
> self.main_receivers.receive(windowed_value)
>   File "dataflow_worker/executor.py", line 82, in 
> dataflow_worker.executor.ConsumerSet.receive (dataflow_worker/executor.c:3987)
> self.update_counters_start(windowed_value)
>   File "dataflow_worker/executor.py", line 88, in 
> dataflow_worker.executor.ConsumerSet.update_counters_start 
> (dataflow_worker/executor.c:4207)
> self.opcounter.update_from(windowed_value)
>   File "dataflow

[GitHub] beam pull request #2122: [BEAM-1572] Adding per-stage level matching of metr...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2122


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1572) Add per-stage matching of scope in metrics for the DirectRunner

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891273#comment-15891273
 ] 

ASF GitHub Bot commented on BEAM-1572:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2122


> Add per-stage matching of scope in metrics for the DirectRunner
> ---
>
> Key: BEAM-1572
> URL: https://issues.apache.org/jira/browse/BEAM-1572
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>
> e.g. Metrics with scope "Top/Outer/Inner" should be matched by queries with 
> "Top/Outer" scope.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/2] beam git commit: Closes #2122

2017-03-01 Thread bchambers
Repository: beam
Updated Branches:
  refs/heads/master afd710674 -> bc1a301b0


Closes #2122


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/bc1a301b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/bc1a301b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/bc1a301b

Branch: refs/heads/master
Commit: bc1a301b0a38f354658639c39289903c4e260564
Parents: afd7106 b34e50c
Author: bchambers 
Authored: Wed Mar 1 14:59:22 2017 -0800
Committer: bchambers 
Committed: Wed Mar 1 14:59:22 2017 -0800

--
 .../beam/runners/direct/DirectMetrics.java  | 31 -
 .../beam/runners/direct/DirectMetricsTest.java  | 70 
 sdks/python/apache_beam/metrics/metric.py   | 53 ---
 sdks/python/apache_beam/metrics/metric_test.py  | 43 
 4 files changed, 187 insertions(+), 10 deletions(-)
--




[2/2] beam git commit: Adding per-stage matching to metrics filters

2017-03-01 Thread bchambers
Adding per-stage matching to metrics filters


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b34e50c6
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b34e50c6
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b34e50c6

Branch: refs/heads/master
Commit: b34e50c649adc8861670c42228cb96688abf4038
Parents: afd7106
Author: Pablo 
Authored: Mon Feb 27 17:28:26 2017 -0800
Committer: bchambers 
Committed: Wed Mar 1 14:59:22 2017 -0800

--
 .../beam/runners/direct/DirectMetrics.java  | 31 -
 .../beam/runners/direct/DirectMetricsTest.java  | 70 
 sdks/python/apache_beam/metrics/metric.py   | 53 ---
 sdks/python/apache_beam/metrics/metric_test.py  | 43 
 4 files changed, 187 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b34e50c6/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectMetrics.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectMetrics.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectMetrics.java
index 145326f..fa8f9c3 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectMetrics.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectMetrics.java
@@ -275,13 +275,40 @@ class DirectMetrics extends MetricResults {
 && matchesScope(key.stepName(), filter.steps());
   }
 
-  private boolean matchesScope(String actualScope, Set scopes) {
+  /**
+  * {@code subPathMatches(haystack, needle)} returns true if {@code needle}
+  * represents a path within {@code haystack}. For example, "foo/bar" is in 
"a/foo/bar/b",
+  * but not "a/fool/bar/b" or "a/foo/bart/b".
+  */
+  public boolean subPathMatches(String haystack, String needle) {
+int location = haystack.indexOf(needle);
+int end = location + needle.length();
+if (location == -1) {
+  return false;  // needle not found
+} else if (location != 0 && haystack.charAt(location - 1) != '/') {
+  return false; // the first entry in needle wasn't exactly matched
+} else if (end != haystack.length() && haystack.charAt(end) != '/') {
+  return false; // the last entry in needle wasn't exactly matched
+} else {
+  return true;
+}
+  }
+
+  /**
+   * {@code matchesScope(actualScope, scopes)} returns true if the scope of a 
metric is matched
+   * by any of the filters in {@code scopes}. A metric scope is a path of type 
"A/B/D". A
+   * path is matched by a filter if the filter is equal to the path (e.g. 
"A/B/D", or
+   * if it represents a subpath within it (e.g. "A/B" or "B/D", but not 
"A/D"). */
+  public boolean matchesScope(String actualScope, Set scopes) {
 if (scopes.isEmpty() || scopes.contains(actualScope)) {
   return true;
 }
 
+// If there is no perfect match, a stage name-level match is tried.
+// This is done by a substring search over the levels of the scope.
+// e.g. a scope "A/B/C/D" is matched by "A/B", but not by "A/C".
 for (String scope : scopes) {
-  if (actualScope.startsWith(scope)) {
+  if (subPathMatches(actualScope, scope)) {
 return true;
   }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/b34e50c6/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java
--
diff --git 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java
 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java
index 3ad2bdc..77229bf 100644
--- 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java
+++ 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/DirectMetricsTest.java
@@ -23,9 +23,13 @@ import static 
org.apache.beam.sdk.metrics.MetricMatchers.committedMetricsResult;
 import static org.apache.beam.sdk.metrics.MetricNameFilter.inNamespace;
 import static org.hamcrest.Matchers.contains;
 import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertTrue;
 
 import com.google.common.collect.ImmutableList;
+import java.util.HashSet;
+import java.util.Set;
 import org.apache.beam.runners.direct.DirectRunner.CommittedBundle;
 import org.apache.beam.sdk.metrics.DistributionData;
 import org.apache.beam.sdk.metrics.DistributionResult;
@@ -125,6 +129,72 @@ public class DirectMetricsTest {
 committedMetricsResult("ns1", "name1", "step2", 0L)));
   }
 
+  private boolean matchesSubPath(String actualSc

[jira] [Commented] (BEAM-1582) ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.

2017-03-01 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891188#comment-15891188
 ] 

Amit Sela commented on BEAM-1582:
-

Looks like the flake happens when the entire input is re-read.
We inject 4 elements to Kafka before the first run, and 2 more before the 
second. When all is well, printing the number of elements read by 
SparkUnboundedSource's {{readUnboundedStream}} JavaDStream says 4 (sometimes 1 
followed by 3) in the first run, and 2 in the second, but in failures, it reads 
6 in the second.
This would happen if the checkpoint of the readers are not persisted for some 
reason causing the KafkaIO to use the default "earliest" and so read everything.
This happens even though checkpoint interval is batch interval. I will check if 
there's a way to guarantee/block on checkpointing.

> ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.
> --
>
> Key: BEAM-1582
> URL: https://issues.apache.org/jira/browse/BEAM-1582
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> See: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/
> After some digging in it appears that a second firing occurs (though only one 
> is expected) but it doesn't come from a stale state (state is empty before it 
> fires).
> Might be a retry happening for some reason, which is OK in terms of 
> fault-tolerance guarantees (at-least-once), but not so much in terms of flaky 
> tests. 
> I'm looking into this hoping to fix this ASAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2088: Upgrade dill to 0.2.6 and pin it

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2088


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2088

2017-03-01 Thread altay
This closes #2088


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/afd71067
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/afd71067
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/afd71067

Branch: refs/heads/master
Commit: afd710674add3c23e4a149c090be4317934eafaf
Parents: b2428ea 018b6ff
Author: Ahmet Altay 
Authored: Wed Mar 1 14:20:28 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Mar 1 14:20:28 2017 -0800

--
 sdks/python/setup.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




[1/2] beam git commit: Upgrade dill to 0.2.6 and pin it

2017-03-01 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master b2428ea60 -> afd710674


Upgrade dill to 0.2.6 and pin it

Upgrade dill to the latest version and pin it. There were potential
compatibility issues between 0.2.5 and 0.2.6, and keeping this as a
range requirement is risky going forward.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/018b6ffb
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/018b6ffb
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/018b6ffb

Branch: refs/heads/master
Commit: 018b6ffbf85fca3fb3cd4f37bef4931dad3925a0
Parents: b2428ea
Author: Ahmet Altay 
Authored: Thu Feb 23 14:00:54 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Mar 1 14:20:23 2017 -0800

--
 sdks/python/setup.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/018b6ffb/sdks/python/setup.py
--
diff --git a/sdks/python/setup.py b/sdks/python/setup.py
index 6856e00..022d69d 100644
--- a/sdks/python/setup.py
+++ b/sdks/python/setup.py
@@ -86,7 +86,7 @@ else:
 REQUIRED_PACKAGES = [
 'avro>=1.7.7,<2.0.0',
 'crcmod>=1.7,<2.0',
-'dill>=0.2.5,<0.3',
+'dill==0.2.6',
 'httplib2>=0.8,<0.10',
 'mock>=1.0.1,<3.0.0',
 'oauth2client>=2.0.1,<4.0.0',



[jira] [Commented] (BEAM-1025) User guide - "How to create Beam IO Transforms"

2017-03-01 Thread Stephen Sisk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891174#comment-15891174
 ] 

Stephen Sisk commented on BEAM-1025:


thanks JB! 

More thinking as I've been looking into what we already have:  Pipeline IO 
probably has enough content associated with it that we're going to want it to 
have it's own page (or set of pages.)

I'm going to move in that direction, and try to reconcile the python 
documentation as we go.

cc [~melap] and [~chamikara] who I believe are also interested in this.

> User guide - "How to create Beam IO Transforms"
> ---
>
> Key: BEAM-1025
> URL: https://issues.apache.org/jira/browse/BEAM-1025
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> Beam has javadocs for how to create a read or write transform, but no 
> friendly user guide on how to get started using BoundedSource/BoundedReader.
> This should cover:
> * background on beam's source/sink API design 
> * design patterns
> * evaluating different data sources (eg, what are the properties of a pub sub 
> system that affect how you should write your UnboundedSource? What is the 
> best design for reading from a NoSql style source?)
> * testing - how to write unit, integration (and once we have them, 
> performance tests)
> * public API recommendations
> This is related, but not strictly overlapping with: 
> https://issues.apache.org/jira/browse/BEAM-193
> - the Dataflow SDK documentation for "Custom Sources and Sinks"  contains 
> some info about writing Sources/Sinks, but it is somewhat out of date, and 
> doesn't reflect the things we've learned recently.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1567) hashStream should be closed in PackageUtil#createPackageAttributes()

2017-03-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891145#comment-15891145
 ] 

Ismaël Mejía commented on BEAM-1567:


I just took a look and gave my comments, we will have to wait for Dan's final 
word. Dan is quite efficient for this but I have seen that he is reviewing a 
ton of stuff, so sorry if we make you wait.
For the next time the convention for reviewers at Beam is to mention them at 
the PR, something like R: @iemejia, to call for the appropriate person as a 
reviewer.


> hashStream should be closed in PackageUtil#createPackageAttributes()
> 
>
> Key: BEAM-1567
> URL: https://issues.apache.org/jira/browse/BEAM-1567
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ted Yu
>Priority: Minor
>  Labels: newbie, starter
>
> Here is related code:
> {code}
>   OutputStream hashStream = Funnels.asOutputStream(hasher);
> {code}
> hashStream should be closed upon return from the method



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #1099

2017-03-01 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #2789

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-849) Redesign PipelineResult API

2017-03-01 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891137#comment-15891137
 ] 

Eugene Kirpichov commented on BEAM-849:
---

On your first paragraph: yes, I'm not saying that waitUntilFinish should 
trigger the termination - I'm just saying that such a pipeline must terminate 
regardless of whether it is a "streaming runner" or not (and this is currently 
not the case, e.g., for the Dataflow runner); so if somebody calls 
waitUntilFinish, this call should wait for that termination and complete as 
well. I guess, though, at this point we're discussing "when should pipelines 
terminate" rather than "what should the API be for detecting that".

In the example I gave, the "end" is not necessarily known ahead of execution. 
E.g. imagine a use case where we continually tail the file and stream data from 
it until the file is marked with a read-only attribute. It might get marked 
soon, tomorrow, or never at all - then we should keep running and processing 
new records as they arrive; but when it's marked read-only, the pipeline should 
terminate.

Unbounded collections are part of the SDK, but unbounded pipelines are not. I 
guess one could introduce terminology that an unbounded pipeline is a pipeline 
that has at least one unbounded collection?... but again, it seems like the 
only use case for that would be when a runner that only supports bounded 
collections validates whether a pipeline being submitted satisfies this.

Ideally I would like to stop using terminology such as "batch" and "streaming" 
altogether, except when referring to a particular runner ("batch Dataflow 
runner", "streaming Spark runner").

> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1582) ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.

2017-03-01 Thread Amit Sela (JIRA)
Amit Sela created BEAM-1582:
---

 Summary: ResumeFromCheckpointStreamingTest flakes with what 
appears as a second firing.
 Key: BEAM-1582
 URL: https://issues.apache.org/jira/browse/BEAM-1582
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela


See: 
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/

After some digging in it appears that a second firing occurs (though only one 
is expected) but it doesn't come from a stale state (state is empty before it 
fires).
Might be a retry happening for some reason, which is OK in terms of 
fault-tolerance guarantees (at-least-once), but not so much in terms of flaky 
tests. 

I'm looking into this hoping to fix this ASAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1098

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-1173) TestSparkRunner should use a PipelineVisitor to determine expected assertions.

2017-03-01 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-1173.
---
   Resolution: Fixed
Fix Version/s: 0.6.0

> TestSparkRunner should use a PipelineVisitor to determine expected assertions.
> --
>
> Key: BEAM-1173
> URL: https://issues.apache.org/jira/browse/BEAM-1173
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Thomas Groh
>Priority: Minor
> Fix For: 0.6.0
>
>
> Instead of checking each transform on execution, the {{TestSparkRunner}} 
> should use a {{PipelineVisitor}} to determine the number of assertions in the 
> pipeline in order to validate the same number of {{PAssert.SUCCESS_COUNTER}}. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-646) Get runners out of the apply()

2017-03-01 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-646.
--
   Resolution: Fixed
Fix Version/s: 0.6.0

> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>  Labels: backwards-incompatible
> Fix For: 0.6.0
>
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-646) Get runners out of the apply()

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891039#comment-15891039
 ] 

ASF GitHub Bot commented on BEAM-646:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2119


> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>  Labels: backwards-incompatible
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-111) Use SDK implementation of WritableCoder

2017-03-01 Thread Amit Sela (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela resolved BEAM-111.

   Resolution: Fixed
Fix Version/s: 0.6.0

> Use SDK implementation of WritableCoder
> ---
>
> Key: BEAM-111
> URL: https://issues.apache.org/jira/browse/BEAM-111
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Ismaël Mejía
>  Labels: easy, starter
> Fix For: 0.6.0
>
>
> The Spark runner currently uses it's own implementation of WritableCoder, 
> should use the one in {{io-hdfs}}.
> Remove {{org.apache.beam.runners.spark.coders.WritableCoder}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/2] beam git commit: This closes #2119

2017-03-01 Thread tgroh
Repository: beam
Updated Branches:
  refs/heads/master a81c45781 -> b2428ea60


This closes #2119


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b2428ea6
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b2428ea6
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b2428ea6

Branch: refs/heads/master
Commit: b2428ea60fdc4363bb2734abe1b247f36d0cfe44
Parents: a81c457 3408f60
Author: Thomas Groh 
Authored: Wed Mar 1 13:05:49 2017 -0800
Committer: Thomas Groh 
Committed: Wed Mar 1 13:05:49 2017 -0800

--
 .../beam/runners/apex/TestApexRunner.java   | 10 ---
 .../beam/runners/flink/TestFlinkRunner.java |  9 ---
 .../dataflow/testing/TestDataflowRunner.java| 17 ++---
 .../testing/TestDataflowRunnerTest.java |  3 +-
 runners/spark/pom.xml   |  4 ++
 .../beam/runners/spark/TestSparkRunner.java | 76 
 .../beam/runners/spark/ForceStreamingTest.java  |  2 +
 .../main/java/org/apache/beam/sdk/Pipeline.java |  4 +-
 .../apache/beam/sdk/runners/PipelineRunner.java | 14 
 .../org/apache/beam/sdk/testing/PAssert.java| 64 +
 .../apache/beam/sdk/testing/PAssertTest.java| 30 
 11 files changed, 154 insertions(+), 79 deletions(-)
--




[GitHub] beam pull request #2119: [BEAM-646][BEAM-1173] Remove PipelineRunner#apply

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2119


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: Remove PipelineRunner#apply

2017-03-01 Thread tgroh
Remove PipelineRunner#apply

All existing Pipeline Runners that use the Java SDK modify Pipeline
graphs with the Pipeline Surgery APIs. Apply is now superflous.

Add an AssertionCountingVisitor to enable TestRunners to track the
number of assertions in the Pipeline.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3408f604
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3408f604
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3408f604

Branch: refs/heads/master
Commit: 3408f6049ba3692f9edbbeead75626125954d4b6
Parents: a81c457
Author: Thomas Groh 
Authored: Thu Feb 23 17:32:01 2017 -0800
Committer: Thomas Groh 
Committed: Wed Mar 1 13:05:49 2017 -0800

--
 .../beam/runners/apex/TestApexRunner.java   | 10 ---
 .../beam/runners/flink/TestFlinkRunner.java |  9 ---
 .../dataflow/testing/TestDataflowRunner.java| 17 ++---
 .../testing/TestDataflowRunnerTest.java |  3 +-
 runners/spark/pom.xml   |  4 ++
 .../beam/runners/spark/TestSparkRunner.java | 76 
 .../beam/runners/spark/ForceStreamingTest.java  |  2 +
 .../main/java/org/apache/beam/sdk/Pipeline.java |  4 +-
 .../apache/beam/sdk/runners/PipelineRunner.java | 14 
 .../org/apache/beam/sdk/testing/PAssert.java| 64 +
 .../apache/beam/sdk/testing/PAssertTest.java| 30 
 11 files changed, 154 insertions(+), 79 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/3408f604/runners/apex/src/main/java/org/apache/beam/runners/apex/TestApexRunner.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/TestApexRunner.java 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/TestApexRunner.java
index e447e37..a64ac54 100644
--- 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/TestApexRunner.java
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/TestApexRunner.java
@@ -18,14 +18,10 @@
 package org.apache.beam.runners.apex;
 
 import java.io.IOException;
-
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.options.PipelineOptionsValidator;
 import org.apache.beam.sdk.runners.PipelineRunner;
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.values.PInput;
-import org.apache.beam.sdk.values.POutput;
 import org.joda.time.Duration;
 
 /**
@@ -49,12 +45,6 @@ public class TestApexRunner extends 
PipelineRunner {
   }
 
   @Override
-  public 
-  OutputT apply(PTransform transform, InputT input) {
-return delegate.apply(transform, input);
-  }
-
-  @Override
   public ApexRunnerResult run(Pipeline pipeline) {
 ApexRunnerResult result = delegate.run(pipeline);
 try {

http://git-wip-us.apache.org/repos/asf/beam/blob/3408f604/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/TestFlinkRunner.java
--
diff --git 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/TestFlinkRunner.java
 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/TestFlinkRunner.java
index 30a94a1..ef56b55 100644
--- 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/TestFlinkRunner.java
+++ 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/TestFlinkRunner.java
@@ -24,10 +24,7 @@ import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.options.PipelineOptionsFactory;
 import org.apache.beam.sdk.options.PipelineOptionsValidator;
 import org.apache.beam.sdk.runners.PipelineRunner;
-import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.util.UserCodeException;
-import org.apache.beam.sdk.values.PInput;
-import org.apache.beam.sdk.values.POutput;
 
 /**
  * Test Flink runner.
@@ -56,12 +53,6 @@ public class TestFlinkRunner extends 
PipelineRunner {
   }
 
   @Override
-  public 
-  OutputT apply(PTransform transform, InputT input) {
-return delegate.apply(transform, input);
-  }
-
-  @Override
   public PipelineResult run(Pipeline pipeline) {
 try {
   return delegate.run(pipeline);

http://git-wip-us.apache.org/repos/asf/beam/blob/3408f604/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowRunner.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowRunner.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowRunner.java
index 0564448..5315671 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/or

[jira] [Commented] (BEAM-111) Use SDK implementation of WritableCoder

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891037#comment-15891037
 ] 

ASF GitHub Bot commented on BEAM-111:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2118


> Use SDK implementation of WritableCoder
> ---
>
> Key: BEAM-111
> URL: https://issues.apache.org/jira/browse/BEAM-111
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Ismaël Mejía
>  Labels: easy, starter
>
> The Spark runner currently uses it's own implementation of WritableCoder, 
> should use the one in {{io-hdfs}}.
> Remove {{org.apache.beam.runners.spark.coders.WritableCoder}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/2] beam git commit: This closes #2118

2017-03-01 Thread amitsela
This closes #2118


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a81c4578
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a81c4578
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a81c4578

Branch: refs/heads/master
Commit: a81c45781491185569f8ff75f5256d6de84a7150
Parents: b49ec3f 44624c3
Author: Sela 
Authored: Wed Mar 1 22:47:40 2017 +0200
Committer: Sela 
Committed: Wed Mar 1 22:47:40 2017 +0200

--
 runners/spark/pom.xml   |   6 +
 .../runners/spark/coders/NullWritableCoder.java |  76 
 .../runners/spark/coders/WritableCoder.java | 122 ---
 .../runners/spark/coders/WritableCoderTest.java |  45 ---
 .../io/hadoop/HadoopFileFormatPipelineTest.java |   2 +-
 sdks/java/io/hadoop-common/pom.xml  |  10 ++
 .../beam/sdk/io/hadoop/WritableCoder.java   | 116 ++
 .../beam/sdk/io/hadoop/WritableCoderTest.java   |  45 +++
 sdks/java/io/hdfs/pom.xml   |   5 -
 .../apache/beam/sdk/io/hdfs/HDFSFileSource.java |   1 +
 .../apache/beam/sdk/io/hdfs/WritableCoder.java  | 116 --
 .../beam/sdk/io/hdfs/WritableCoderTest.java |  45 ---
 12 files changed, 179 insertions(+), 410 deletions(-)
--




[GitHub] beam pull request #2118: [BEAM-111] Move WritableCoder to hadoop-common

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2118


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-111] Move WritableCoder to hadoop-common

2017-03-01 Thread amitsela
Repository: beam
Updated Branches:
  refs/heads/master b49ec3fa2 -> a81c45781


[BEAM-111] Move WritableCoder to hadoop-common


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/44624c38
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/44624c38
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/44624c38

Branch: refs/heads/master
Commit: 44624c382ac5ff062191d41df1dcd008839352e0
Parents: b49ec3f
Author: Ismaël Mejía 
Authored: Sat Feb 25 05:20:58 2017 +0100
Committer: Sela 
Committed: Wed Mar 1 22:47:16 2017 +0200

--
 runners/spark/pom.xml   |   6 +
 .../runners/spark/coders/NullWritableCoder.java |  76 
 .../runners/spark/coders/WritableCoder.java | 122 ---
 .../runners/spark/coders/WritableCoderTest.java |  45 ---
 .../io/hadoop/HadoopFileFormatPipelineTest.java |   2 +-
 sdks/java/io/hadoop-common/pom.xml  |  10 ++
 .../beam/sdk/io/hadoop/WritableCoder.java   | 116 ++
 .../beam/sdk/io/hadoop/WritableCoderTest.java   |  45 +++
 sdks/java/io/hdfs/pom.xml   |   5 -
 .../apache/beam/sdk/io/hdfs/HDFSFileSource.java |   1 +
 .../apache/beam/sdk/io/hdfs/WritableCoder.java  | 116 --
 .../beam/sdk/io/hdfs/WritableCoderTest.java |  45 ---
 12 files changed, 179 insertions(+), 410 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/44624c38/runners/spark/pom.xml
--
diff --git a/runners/spark/pom.xml b/runners/spark/pom.xml
index 409fc27..8c35178 100644
--- a/runners/spark/pom.xml
+++ b/runners/spark/pom.xml
@@ -306,6 +306,12 @@
 
 
 
+  org.apache.beam
+  beam-sdks-java-io-hadoop-common
+  test
+
+
+
   org.mockito
   mockito-all
   test

http://git-wip-us.apache.org/repos/asf/beam/blob/44624c38/runners/spark/src/main/java/org/apache/beam/runners/spark/coders/NullWritableCoder.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/coders/NullWritableCoder.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/coders/NullWritableCoder.java
deleted file mode 100644
index ebbab1a..000
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/coders/NullWritableCoder.java
+++ /dev/null
@@ -1,76 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.beam.runners.spark.coders;
-
-import com.fasterxml.jackson.annotation.JsonCreator;
-import java.io.InputStream;
-import java.io.OutputStream;
-import org.apache.beam.sdk.coders.Coder;
-import org.apache.hadoop.io.NullWritable;
-
-/**
- * Simple writable coder for Null.
- */
-public final class NullWritableCoder extends WritableCoder {
-  private static final long serialVersionUID = 1L;
-
-  @JsonCreator
-  public static NullWritableCoder of() {
-return INSTANCE;
-  }
-
-  private static final NullWritableCoder INSTANCE = new NullWritableCoder();
-
-  private NullWritableCoder() {
-super(NullWritable.class);
-  }
-
-  @Override
-  public void encode(NullWritable value, OutputStream outStream, Context 
context) {
-// nothing to write
-  }
-
-  @Override
-  public NullWritable decode(InputStream inStream, Context context) {
-return NullWritable.get();
-  }
-
-  @Override
-  public boolean consistentWithEquals() {
-return true;
-  }
-
-  /**
-   * Returns true since registerByteSizeObserver() runs in constant time.
-   */
-  @Override
-  public boolean isRegisterByteSizeObserverCheap(NullWritable value, Context 
context) {
-return true;
-  }
-
-  @Override
-  protected long getEncodedElementByteSize(NullWritable value, Context 
context) {
-return 0;
-  }
-
-  @Override
-  public void verifyDeterministic() throws Coder.NonDeterministicException {
-// NullWritableCoder is deterministic
-  }
-}

http://git-wip-us.apache.org/repos/asf/beam/blob/44624c38/runners/spark/src/main/java/org

[jira] [Resolved] (BEAM-1297) Update maven build plugins and improve maven performance

2017-03-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-1297.

   Resolution: Fixed
Fix Version/s: Not applicable

> Update maven build plugins and improve maven performance
> 
>
> Key: BEAM-1297
> URL: https://issues.apache.org/jira/browse/BEAM-1297
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> I was investigating ways to speed up the build / test execution. And I found 
> that a common source of delay comes from outdated maven plugins.
> Running this command
> mvn versions:display-plugin-updates
> I found that we have some outdated maven plugins, and additionally some minor 
> configuration issues so I fix them.
> In my local tests I got an improvement of 50s (over an average 11m run) for 
> the "mvn clean verify -Prelease" execution so I think this might be worth 
> checking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1297) Update maven build plugins and improve maven performance

2017-03-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía closed BEAM-1297.
--

> Update maven build plugins and improve maven performance
> 
>
> Key: BEAM-1297
> URL: https://issues.apache.org/jira/browse/BEAM-1297
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> I was investigating ways to speed up the build / test execution. And I found 
> that a common source of delay comes from outdated maven plugins.
> Running this command
> mvn versions:display-plugin-updates
> I found that we have some outdated maven plugins, and additionally some minor 
> configuration issues so I fix them.
> In my local tests I got an improvement of 50s (over an average 11m run) for 
> the "mvn clean verify -Prelease" execution so I think this might be worth 
> checking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-849) Redesign PipelineResult API

2017-03-01 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890953#comment-15890953
 ] 

Amit Sela edited comment on BEAM-849 at 3/1/17 8:14 PM:


A continually growing file is something else, which I agree on, but in that 
case {{waitUntilFinish()}} would terminate after the file is done, right ? by 
moving the WM to end-of-time ? so if that happens implicitly, why call 
explicitly ?

All I'm saying is that in batch, beginning and end are known ahead of 
execution, so blocking until termination is natural. In streaming however, the 
end is unknown, so it's a bit awkward - some pipelines will behave like batch, 
like SDF-log-tail, and some won't like reading from Pubsub/Kafka.
I will agree that for the sake of a unified model it makes sense, but still a 
bit un-natural, so that's why I think this ticket is for - to try and reason 
about this and make "feel" more natural, no ? 

As for "unbounded pipelines" not being a part of the model, it's a bit 
confusing because it's all over the SDK.



was (Author: amitsela):
A continually growing file is something else, which I agree on, but in that 
case {{waitUntilFinish()}} would terminate after the file is done, right ? by 
moving the WM to end-of-time ? so if that happens implicitly, why call 
explicitly ?

All I'm saying is that in batch, beginning and end are known ahead of 
execution, so blocking until termination is natural. In streaming however, the 
end is unknown, so it's a bit awkward - some pipelines will behave the same, 
like SDF-log-tail, and some won't like reading from Pubsub/Kafka.
I will agree that for the sake of a unified model it makes sense, but still a 
bit un-natural, so that's why I think this ticket is for - to try and reason 
about this and make "feel" more natural, no ? 

As for "unbounded pipelines" not being a part of the model, it's a bit 
confusing because it's all over the SDK.


> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2788

2017-03-01 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #1097

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-849) Redesign PipelineResult API

2017-03-01 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890953#comment-15890953
 ] 

Amit Sela commented on BEAM-849:


A continually growing file is something else, which I agree on, but in that 
case {{waitUntilFinish()}} would terminate after the file is done, right ? by 
moving the WM to end-of-time ? so if that happens implicitly, why call 
explicitly ?

All I'm saying is that in batch, beginning and end are known ahead of 
execution, so blocking until termination is natural. In streaming however, the 
end is unknown, so it's a bit awkward - some pipelines will behave the same, 
like SDF-log-tail, and some won't like reading from Pubsub/Kafka.
I will agree that for the sake of a unified model it makes sense, but still a 
bit un-natural, so that's why I think this ticket is for - to try and reason 
about this and make "feel" more natural, no ? 

As for "unbounded pipelines" not being a part of the model, it's a bit 
confusing because it's all over the SDK.


> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-849) Redesign PipelineResult API

2017-03-01 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890915#comment-15890915
 ] 

Eugene Kirpichov commented on BEAM-849:
---

Sorry, I don't understand the question. This use case - representing reading 
from a file, including a continually growing file, as a file-reading splittable 
ParDo applied to a PCollection of filenames - is one of the main motivating use 
cases of splittable DoFn https://s.apache.org/splittable-do-fn ; all I'm saying 
here is that, even if the file eventually stops growing forever and we would 
like the pipeline to terminate in that case, this makes sense regardless of 
whether the runner is a "batch" or a "streaming" runner. And a user may want to 
choose to run this pipeline using their favorite runner in "streaming" mode for 
various reasons; one being that a "streaming" runner might process the data 
from the file immediately as it appears, whereas a "batch" runner might think 
it's ok to wait for the whole file to appear and then process it. This is of 
course runner-specific; my main point is that there is no such thing as 
"unbounded pipelines" in the Beam model, and semantics of waitForFinish() 
should be defined in terms of the Beam model and treated equally by all runners 
regardless of whether the runner has a distinction between "batch/streaming" 
modes.

> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-849) Redesign PipelineResult API

2017-03-01 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890885#comment-15890885
 ] 

Amit Sela edited comment on BEAM-849 at 3/1/17 7:35 PM:


I disagree with stating that "Create.of(filename) + ParDo(tail file) + 
ParDo(process records)" in streaming leverages "low-latency", that's very 
runner specific, and generally inaccurate - how is sending a "filename" to 
worker/s + data locality slower than streaming the file to processors ? 


was (Author: amitsela):
I disagree with stating that "Create.of(filename) + ParDo(tail file) + 
ParDo(process records)" in streaming leverages "low-latency", that's very 
runner specific, and generally inaccurate - how is sending a "filename" to 
worker/s + data locality faster than streaming the file to processors ? 

> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1562) Use a "signal" to stop streaming tests as they finish.

2017-03-01 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890889#comment-15890889
 ] 

Amit Sela commented on BEAM-1562:
-

I am using Watermarks ;-) but I persisted things to give all the information.

> Use a "signal" to stop streaming tests as they finish.
> --
>
> Key: BEAM-1562
> URL: https://issues.apache.org/jira/browse/BEAM-1562
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> Streaming tests use a timeout that has to take a large enough buffer to avoid 
> slow runtimes stopping before test completes.
> We can introduce a "poison pill" based on an {{EndOfStream}} element that 
> would be counted in a Metrics/Aggregator to know all data was processed.
> Another option would be to follow the slowest WM and stop once it hits 
> end-of-time.
> This requires the Spark runner to stop blocking the execution of a streaming 
> pipeline until it's complete - which is something relevant to 
> {{PipelineResult}} (re)design that is being discussed in BEAM-849   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-849) Redesign PipelineResult API

2017-03-01 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890885#comment-15890885
 ] 

Amit Sela commented on BEAM-849:


I disagree with stating that "Create.of(filename) + ParDo(tail file) + 
ParDo(process records)" in streaming leverages "low-latency", that's very 
runner specific, and generally inaccurate - how is sending a "filename" to 
worker/s + data locality faster than streaming the file to processors ? 

> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1562) Use a "signal" to stop streaming tests as they finish.

2017-03-01 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890861#comment-15890861
 ] 

Eugene Kirpichov commented on BEAM-1562:


See my comment on BEAM-849: as an alternative, we can use "watermarks 
progressed to infinity" as a termination signal.

> Use a "signal" to stop streaming tests as they finish.
> --
>
> Key: BEAM-1562
> URL: https://issues.apache.org/jira/browse/BEAM-1562
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> Streaming tests use a timeout that has to take a large enough buffer to avoid 
> slow runtimes stopping before test completes.
> We can introduce a "poison pill" based on an {{EndOfStream}} element that 
> would be counted in a Metrics/Aggregator to know all data was processed.
> Another option would be to follow the slowest WM and stop once it hits 
> end-of-time.
> This requires the Spark runner to stop blocking the execution of a streaming 
> pipeline until it's complete - which is something relevant to 
> {{PipelineResult}} (re)design that is being discussed in BEAM-849   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-849) Redesign PipelineResult API

2017-03-01 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890855#comment-15890855
 ] 

Eugene Kirpichov commented on BEAM-849:
---

I disagree that "unbounded pipelines" can't finish successfully.

- Dataflow runner supports draining of pipelines, which leads to successful 
termination.
- It is possible to run a pipeline like Create.of(1, 2, 3) + ParDo(do nothing) 
using a streaming runner, and it should terminate rather than hang.
- One might ask "why run such a pipeline with a streaming runner", but it makes 
a lot more sense if the ParDo is splittable. E.g. Create.of(filename) + 
ParDo(tail file) + ParDo(process records) could use the low-latency 
capabilities of a streaming runner, but successfully terminate when the file is 
somehow "finalized". As a more mundane example - tests in 
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java
 should pass in streaming runners as well as batch runners.
- "Unbounded pipeline" is in general not a Beam concept - we should have a 
batch/streaming-agnostic meaning of "finished" in "waitUntilFinished". I 
propose the one that Dataflow runner uses for deciding when drain is completed: 
"all watermarks have progressed to infinity".

> Redesign PipelineResult API
> ---
>
> Key: BEAM-849
> URL: https://issues.apache.org/jira/browse/BEAM-849
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>
> Current state: 
> Jira https://issues.apache.org/jira/browse/BEAM-443 addresses 
> waitUntilFinish() and cancel(). 
> However, there are additional work around PipelineResult: 
> need clearly defined contract and verification across all runners 
> need to revisit how to handle metrics/aggregators 
> need to be able to get logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1096

2017-03-01 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Flink #1801

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1297) Update maven build plugins and improve maven performance

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890798#comment-15890798
 ] 

ASF GitHub Bot commented on BEAM-1297:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1985


> Update maven build plugins and improve maven performance
> 
>
> Key: BEAM-1297
> URL: https://issues.apache.org/jira/browse/BEAM-1297
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> I was investigating ways to speed up the build / test execution. And I found 
> that a common source of delay comes from outdated maven plugins.
> Running this command
> mvn versions:display-plugin-updates
> I found that we have some outdated maven plugins, and additionally some minor 
> configuration issues so I fix them.
> In my local tests I got an improvement of 50s (over an average 11m run) for 
> the "mvn clean verify -Prelease" execution so I think this might be worth 
> checking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #1985: [BEAM-1297] Update maven build plugins and various ...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1985


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-1297] Update maven shade plugin, fix typo and remove unneeded version

2017-03-01 Thread davor
Repository: beam
Updated Branches:
  refs/heads/master 3b3d6b81a -> b49ec3fa2


[BEAM-1297] Update maven shade plugin, fix typo and remove unneeded version

[BEAM-1297] Enable tiered compilation to make the JVM startup times faster

This makes the build faster because every time maven forks it does not
necessarily reuse the JVM instance.

[BEAM-1297] Enable parallel builds in travis (1C = 1 per core)


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/4256801a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/4256801a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/4256801a

Branch: refs/heads/master
Commit: 4256801a6c4a0e7c15232c5fae22f7eef5fdf914
Parents: 3b3d6b8
Author: Ismaël Mejía 
Authored: Sun Feb 12 06:52:45 2017 +0100
Committer: Davor Bonaci 
Committed: Wed Mar 1 10:47:57 2017 -0800

--
 .jenkins/common_job_properties.groovy | 4 
 .travis.yml   | 7 ---
 pom.xml   | 5 ++---
 sdks/java/javadoc/pom.xml | 1 -
 4 files changed, 10 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/4256801a/.jenkins/common_job_properties.groovy
--
diff --git a/.jenkins/common_job_properties.groovy 
b/.jenkins/common_job_properties.groovy
index 19cf471..bcac19e 100644
--- a/.jenkins/common_job_properties.groovy
+++ b/.jenkins/common_job_properties.groovy
@@ -166,6 +166,10 @@ class common_job_properties {
 context.mavenInstallation('Maven 3.3.3')
 context.mavenOpts('-Dorg.slf4j.simpleLogger.showDateTime=true')
 
context.mavenOpts('-Dorg.slf4j.simpleLogger.dateTimeFormat=-MM-dd\\\'T\\\'HH:mm:ss.SSS')
+// The -XX:+TieredCompilation -XX:TieredStopAtLevel=1 JVM options enable
+// tiered compilation to make the JVM startup times faster during the 
tests.
+context.mavenOpts('-XX:+TieredCompilation')
+context.mavenOpts('-XX:TieredStopAtLevel=1')
 context.rootPOM('pom.xml')
 // Use a repository local to the workspace for better isolation of jobs.
 context.localRepository(LocalRepositoryLocation.LOCAL_TO_WORKSPACE)

http://git-wip-us.apache.org/repos/asf/beam/blob/4256801a/.travis.yml
--
diff --git a/.travis.yml b/.travis.yml
index a392f7d..c896431 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -58,9 +58,10 @@ matrix:
 - os: linux
   env: TEST_PYTHON="1"
 
-
 before_install:
-  - echo 'MAVEN_OPTS="$MAVEN_OPTS -Xmx1024m -XX:MaxPermSize=512m 
-XX:+BytecodeVerificationLocal"' >> ~/.mavenrc
+  # The -XX:+TieredCompilation -XX:TieredStopAtLevel=1 JVM options enable
+  # tiered compilation to make the JVM startup times faster during the tests.
+  - echo 'MAVEN_OPTS="$MAVEN_OPTS -Xmx1024m -XX:MaxPermSize=512m 
-XX:+TieredCompilation -XX:TieredStopAtLevel=1 -XX:+BytecodeVerificationLocal"' 
>> ~/.mavenrc
   - echo $'MAVEN_OPTS="$MAVEN_OPTS -Dorg.slf4j.simpleLogger.showDateTime=true 
-Dorg.slf4j.simpleLogger.dateTimeFormat=-MM-dd\'T\'HH:mm:ss.SSS"' >> 
~/.mavenrc
   - cat ~/.mavenrc
   - if [ "$TRAVIS_OS_NAME" == "osx" ]; then export 
JAVA_HOME=$(/usr/libexec/java_home); fi
@@ -80,7 +81,7 @@ install:
 
 script:
   - if [ "$TEST_PYTHON" ]; then travis_retry $TOX_HOME/tox -e $TOX_ENV -c 
sdks/python/tox.ini; fi
-  - if [ ! "$TEST_PYTHON" ]; then travis_retry mvn --batch-mode 
--update-snapshots --no-snapshot-updates $MAVEN_OVERRIDE install && 
travis_retry bash -ex .travis/test_wordcount.sh; fi
+  - if [ ! "$TEST_PYTHON" ]; then travis_retry mvn --batch-mode 
--update-snapshots --no-snapshot-updates --threads 1C $MAVEN_OVERRIDE install 
&& travis_retry bash -ex .travis/test_wordcount.sh; fi
 
 cache:
   directories:

http://git-wip-us.apache.org/repos/asf/beam/blob/4256801a/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 65f6723..a37f1af 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1310,7 +1310,7 @@
 
   org.apache.maven.plugins
   maven-shade-plugin
-  2.4.3
+  3.0.0
 
 
 
@@ -1448,7 +1448,7 @@
   [1.7,)
 
 
-  
+  
   [3.2,)
 
   
@@ -1483,7 +1483,6 @@
   
 
   
-
   
 
 3.2

http://git-wip-us.apache.org/repos/asf/beam/blob/4256801a/sdks/java/javadoc/pom.xml
--
diff --git a/sdks/java/javadoc/pom.xml b/sdks/java/javadoc/pom.xml
index 145dcf0..243dae5 100644
--- a/sdks/java/javadoc/pom.xml
+++ b/sdks/java/javadoc/pom.xml
@@ -232,7 +232,6 @@
   
 org.apache.maven.plugins
 maven-enforcer-plugin
- 

[2/2] beam git commit: This closes #1985

2017-03-01 Thread davor
This closes #1985


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b49ec3fa
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b49ec3fa
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b49ec3fa

Branch: refs/heads/master
Commit: b49ec3fa28c8da3bc431931f2ed4645a711e75eb
Parents: 3b3d6b8 4256801
Author: Davor Bonaci 
Authored: Wed Mar 1 10:54:29 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Mar 1 10:54:29 2017 -0800

--
 .jenkins/common_job_properties.groovy | 4 
 .travis.yml   | 7 ---
 pom.xml   | 5 ++---
 sdks/java/javadoc/pom.xml | 1 -
 4 files changed, 10 insertions(+), 7 deletions(-)
--




Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Flink #1800

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Closed] (BEAM-1545) Python sdk example run failed

2017-03-01 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay closed BEAM-1545.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Python sdk example run failed
> -
>
> Key: BEAM-1545
> URL: https://issues.apache.org/jira/browse/BEAM-1545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Haoxiang
>Assignee: Sourabh Bajaj
> Fix For: Not applicable
>
>
> When I run the python sdk example with 
> https://beam.apache.org/get-started/quickstart-py/ show, run the command:
> python -m apache_beam.examples.wordcount --input 
> gs://dataflow-samples/shakespeare/kinglear.txt --output output.txt
> it was failed by the logs:
> INFO:root:Missing pipeline option (runner). Executing pipeline using the 
> default runner: DirectRunner.
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
>  line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File 
> "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
>  line 72, in _run_code
> exec code in run_globals
>   File 
> "/Users/haoxiang/InterestingGitProject/beam/sdks/python/apache_beam/examples/wordcount.py",
>  line 107, in 
> run()
>   File 
> "/Users/haoxiang/InterestingGitProject/beam/sdks/python/apache_beam/examples/wordcount.py",
>  line 83, in run
> lines = p | 'read' >> ReadFromText(known_args.input)
>   File "apache_beam/io/textio.py", line 378, in __init__
> skip_header_lines=skip_header_lines)
>   File "apache_beam/io/textio.py", line 87, in __init__
> validate=validate)
>   File "apache_beam/io/filebasedsource.py", line 97, in __init__
> self._validate()
>   File "apache_beam/io/filebasedsource.py", line 171, in _validate
> if len(fileio.ChannelFactory.glob(self._pattern, limit=1)) <= 0:
>   File "apache_beam/io/fileio.py", line 281, in glob
> return gcsio.GcsIO().glob(path, limit)
> AttributeError: 'NoneType' object has no attribute 'GcsIO'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1570) broken link on website (ostatic.com)

2017-03-01 Thread Sourabh Bajaj (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890755#comment-15890755
 ] 

Sourabh Bajaj commented on BEAM-1570:
-

This is fixed.

> broken link on website (ostatic.com)
> 
>
> Key: BEAM-1570
> URL: https://issues.apache.org/jira/browse/BEAM-1570
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Stephen Sisk
>Assignee: Sourabh Bajaj
>
> I see the following error when running the website test suite locally and on 
> jenkins: 
>   *  External link 
> http://ostatic.com/blog/apache-beam-unifies-batch-and-streaming-data-processing
>  failed: got a time out (response code 0)
> It looks like ostatic's website is down. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1545) Python sdk example run failed

2017-03-01 Thread Sourabh Bajaj (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890756#comment-15890756
 ] 

Sourabh Bajaj commented on BEAM-1545:
-

This is fixed so we should close this.

> Python sdk example run failed
> -
>
> Key: BEAM-1545
> URL: https://issues.apache.org/jira/browse/BEAM-1545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Haoxiang
>Assignee: Sourabh Bajaj
>
> When I run the python sdk example with 
> https://beam.apache.org/get-started/quickstart-py/ show, run the command:
> python -m apache_beam.examples.wordcount --input 
> gs://dataflow-samples/shakespeare/kinglear.txt --output output.txt
> it was failed by the logs:
> INFO:root:Missing pipeline option (runner). Executing pipeline using the 
> default runner: DirectRunner.
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
>  line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File 
> "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
>  line 72, in _run_code
> exec code in run_globals
>   File 
> "/Users/haoxiang/InterestingGitProject/beam/sdks/python/apache_beam/examples/wordcount.py",
>  line 107, in 
> run()
>   File 
> "/Users/haoxiang/InterestingGitProject/beam/sdks/python/apache_beam/examples/wordcount.py",
>  line 83, in run
> lines = p | 'read' >> ReadFromText(known_args.input)
>   File "apache_beam/io/textio.py", line 378, in __init__
> skip_header_lines=skip_header_lines)
>   File "apache_beam/io/textio.py", line 87, in __init__
> validate=validate)
>   File "apache_beam/io/filebasedsource.py", line 97, in __init__
> self._validate()
>   File "apache_beam/io/filebasedsource.py", line 171, in _validate
> if len(fileio.ChannelFactory.glob(self._pattern, limit=1)) <= 0:
>   File "apache_beam/io/fileio.py", line 281, in glob
> return gcsio.GcsIO().glob(path, limit)
> AttributeError: 'NoneType' object has no attribute 'GcsIO'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1095

2017-03-01 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #2786

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-325) Add Slack details to website

2017-03-01 Thread Nathan Florea (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890674#comment-15890674
 ] 

Nathan Florea commented on BEAM-325:


Could I get added to the Slack as well?  There is still a dearth of information 
available about Beam on the website, so I'm hoping I can use it to answer some 
of my newbie questions.  I'm signing up for the mailing list, too.  Thanks.

> Add Slack details to website
> 
>
> Key: BEAM-325
> URL: https://issues.apache.org/jira/browse/BEAM-325
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: James Malone
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> Need to add details on the public Slack channel to the Beam website.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2785

2017-03-01 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-351) Add DisplayData to KafkaIO

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890653#comment-15890653
 ] 

ASF GitHub Bot commented on BEAM-351:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2111


> Add DisplayData to KafkaIO
> --
>
> Key: BEAM-351
> URL: https://issues.apache.org/jira/browse/BEAM-351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Ben Chambers
>Assignee: Aviem Zur
>Priority: Minor
>  Labels: starter
>
> Any interesting parameters of the sources/sinks should be exposed as display 
> data. See any of the sources/sinks that already export this (BigQuery, 
> PubSub, etc.) for examples. Also look at the DisplayData builder and 
> HasDisplayData interface for how to wire these up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2111: [BEAM-351] Add DisplayData to KafkaIO

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2111


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >