[jira] [Resolved] (BEAM-1178) Make naming of logger objects consistent

2016-12-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-1178.

Resolution: Fixed

> Make naming of logger objects consistent
> 
>
> Key: BEAM-1178
> URL: https://issues.apache.org/jira/browse/BEAM-1178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-extensions
>Affects Versions: 0.5.0-incubating
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: Not applicable
>
>
> Logger objects are used in different instances in Beam, around 90% of the 
> current classes that use loggers use the convention name 'LOG', however there 
> are instances that use 'logger' and others that uses 'LOGGER', this issue is 
> to make the logger naming consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (BEAM-1178) Make naming of logger objects consistent

2016-12-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reopened BEAM-1178:


> Make naming of logger objects consistent
> 
>
> Key: BEAM-1178
> URL: https://issues.apache.org/jira/browse/BEAM-1178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-extensions
>Affects Versions: 0.5.0-incubating
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: Not applicable
>
>
> Logger objects are used in different instances in Beam, around 90% of the 
> current classes that use loggers use the convention name 'LOG', however there 
> are instances that use 'logger' and others that uses 'LOGGER', this issue is 
> to make the logger naming consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1165) Unexpected file created when checking dependencies on clean repo

2016-12-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-1165.

Resolution: Fixed

> Unexpected file created when checking dependencies on clean repo
> 
>
> Key: BEAM-1165
> URL: https://issues.apache.org/jira/browse/BEAM-1165
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.5.0-incubating
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 0.5.0-incubating
>
>
> I just found a weird behavior when I was checking for the latest release,
> nothing breaking, but when I start with a clean repo clone and I do:
> mvn dependency:tree
> It creates a new file runners/flink/examples/wordcounts.txt with the
> dependencies.
> This error happens because maven-dependency-plugin asumes the property output
> used by the flink tests as the export file for the command.
> Ref.
> https://maven.apache.org/plugins/maven-dependency-plugin/tree-mojo.html#output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (BEAM-1165) Unexpected file created when checking dependencies on clean repo

2016-12-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reopened BEAM-1165:


> Unexpected file created when checking dependencies on clean repo
> 
>
> Key: BEAM-1165
> URL: https://issues.apache.org/jira/browse/BEAM-1165
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.5.0-incubating
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 0.5.0-incubating
>
>
> I just found a weird behavior when I was checking for the latest release,
> nothing breaking, but when I start with a clean repo clone and I do:
> mvn dependency:tree
> It creates a new file runners/flink/examples/wordcounts.txt with the
> dependencies.
> This error happens because maven-dependency-plugin asumes the property output
> used by the flink tests as the export file for the command.
> Ref.
> https://maven.apache.org/plugins/maven-dependency-plugin/tree-mojo.html#output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #1668: [BEAM-27, BEAM-362] Remove deprecated InM...

2016-12-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1668


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767452#comment-15767452
 ] 

ASF GitHub Bot commented on BEAM-27:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1668


> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: This closes #1668: Remove deprecated InMemoryTimerInternals from SDK

2016-12-21 Thread kenn
This closes #1668: Remove deprecated InMemoryTimerInternals from SDK


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/0d0a5e28
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/0d0a5e28
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/0d0a5e28

Branch: refs/heads/master
Commit: 0d0a5e2872aeba7a1069927408b3a9607709cf11
Parents: aadcf3a 9f1d3d1
Author: Kenneth Knowles 
Authored: Wed Dec 21 08:16:00 2016 -0800
Committer: Kenneth Knowles 
Committed: Wed Dec 21 08:16:00 2016 -0800

--
 .../sdk/util/state/InMemoryTimerInternals.java  | 275 ---
 1 file changed, 275 deletions(-)
--




[1/2] incubator-beam git commit: Remove deprecated InMemoryTimerInternals from SDK

2016-12-21 Thread kenn
Repository: incubator-beam
Updated Branches:
  refs/heads/master aadcf3a12 -> 0d0a5e287


Remove deprecated InMemoryTimerInternals from SDK


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/9f1d3d15
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/9f1d3d15
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/9f1d3d15

Branch: refs/heads/master
Commit: 9f1d3d155303bd3d1069541be704d5f3e74926eb
Parents: 6a05d7f
Author: Kenneth Knowles 
Authored: Tue Dec 20 14:07:00 2016 -0800
Committer: Kenneth Knowles 
Committed: Tue Dec 20 15:16:32 2016 -0800

--
 .../sdk/util/state/InMemoryTimerInternals.java  | 275 ---
 1 file changed, 275 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/9f1d3d15/sdks/java/core/src/main/java/org/apache/beam/sdk/util/state/InMemoryTimerInternals.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/state/InMemoryTimerInternals.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/state/InMemoryTimerInternals.java
deleted file mode 100644
index a910d64..000
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/state/InMemoryTimerInternals.java
+++ /dev/null
@@ -1,275 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.util.state;
-
-import static com.google.common.base.Preconditions.checkNotNull;
-import static com.google.common.base.Preconditions.checkState;
-
-import com.google.common.base.MoreObjects;
-import java.util.HashSet;
-import java.util.PriorityQueue;
-import java.util.Set;
-import javax.annotation.Nullable;
-import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
-import org.apache.beam.sdk.util.TimeDomain;
-import org.apache.beam.sdk.util.TimerInternals;
-import org.apache.beam.sdk.util.WindowTracing;
-import org.joda.time.Instant;
-
-/**
- * @deprecated use {@code org.apache.beam.runners.core.InMemoryTimerInternals}.
- */
-@Deprecated
-public class InMemoryTimerInternals implements TimerInternals {
-
-  /** At most one timer per timestamp is kept. */
-  private Set existingTimers = new HashSet<>();
-
-  /** Pending input watermark timers, in timestamp order. */
-  private PriorityQueue watermarkTimers = new PriorityQueue<>(11);
-
-  /** Pending processing time timers, in timestamp order. */
-  private PriorityQueue processingTimers = new PriorityQueue<>(11);
-
-  /** Pending synchronized processing time timers, in timestamp order. */
-  private PriorityQueue synchronizedProcessingTimers = new 
PriorityQueue<>(11);
-
-  /** Current input watermark. */
-  private Instant inputWatermarkTime = BoundedWindow.TIMESTAMP_MIN_VALUE;
-
-  /** Current output watermark. */
-  @Nullable private Instant outputWatermarkTime = null;
-
-  /** Current processing time. */
-  private Instant processingTime = BoundedWindow.TIMESTAMP_MIN_VALUE;
-
-  /** Current synchronized processing time. */
-  private Instant synchronizedProcessingTime = 
BoundedWindow.TIMESTAMP_MIN_VALUE;
-
-  @Override
-  @Nullable
-  public Instant currentOutputWatermarkTime() {
-return outputWatermarkTime;
-  }
-
-  /**
-   * Returns when the next timer in the given time domain will fire, or {@code 
null}
-   * if there are no timers scheduled in that time domain.
-   */
-  @Nullable
-  public Instant getNextTimer(TimeDomain domain) {
-final TimerData data;
-switch (domain) {
-  case EVENT_TIME:
-data = watermarkTimers.peek();
-break;
-  case PROCESSING_TIME:
-data = processingTimers.peek();
-break;
-  case SYNCHRONIZED_PROCESSING_TIME:
-data = synchronizedProcessingTimers.peek();
-break;
-  default:
-throw new IllegalArgumentException("Unexpected time domain: " + 
domain);
-}
-return (data == null) ? null : data.getTimestamp();
-  }
-
-  private PriorityQueue queue(TimeDomain domain) {
-switch 

[jira] [Commented] (BEAM-1146) Decrease spark runner startup overhead

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767378#comment-15767378
 ] 

ASF GitHub Bot commented on BEAM-1146:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/incubator-beam/pull/1674

[BEAM-1146] Decrease spark runner startup overhead

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Replace finding all `Source` and `Coder` implementations for serialization 
registration with wrapper classes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/incubator-beam 
decrease-spark-runner-startup-overhead

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1674


commit 8501cdc88ee9c89f643120e34381ec9bc2562965
Author: Aviem Zur 
Date:   2016-12-21T15:49:34Z

[BEAM-1146] Decrease spark runner startup overhead

Replace finding all `Source` and `Coder` implementations for serialization 
registration with wrapper classes.




> Decrease spark runner startup overhead
> --
>
> Key: BEAM-1146
> URL: https://issues.apache.org/jira/browse/BEAM-1146
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> BEAM-921 introduced a lazy singleton instantiated once in each machine 
> (driver & executors) which utilizes reflection to find all subclasses of 
> Source and Coder
> While this is beneficial in it's own right, the change added about one minute 
> of overhead in spark runner startup time (which cause the first job/stage to 
> take up to a minute).
> The change is in class {{BeamSparkRunnerRegistrator}}
> The reason reflection (specifically reflections library) was used here is 
> because  there is no current way of knowing all the source and coder classes 
> at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #1674: [BEAM-1146] Decrease spark runner startup...

2016-12-21 Thread aviemzur
GitHub user aviemzur opened a pull request:

https://github.com/apache/incubator-beam/pull/1674

[BEAM-1146] Decrease spark runner startup overhead

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Replace finding all `Source` and `Coder` implementations for serialization 
registration with wrapper classes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/incubator-beam 
decrease-spark-runner-startup-overhead

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1674


commit 8501cdc88ee9c89f643120e34381ec9bc2562965
Author: Aviem Zur 
Date:   2016-12-21T15:49:34Z

[BEAM-1146] Decrease spark runner startup overhead

Replace finding all `Source` and `Coder` implementations for serialization 
registration with wrapper classes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-1145) Remove classifier from shaded spark runner artifact

2016-12-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1145:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Remove classifier from shaded spark runner artifact
> ---
>
> Key: BEAM-1145
> URL: https://issues.apache.org/jira/browse/BEAM-1145
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Shade plugin configured in spark runner's pom adds a classifier to spark 
> runner shaded jar
> {code:xml}
> true
> spark-app
> {code}
> This means, that in order for a user application that is dependent on 
> spark-runner to work in cluster mode, they have to add the classifier in 
> their dependency declaration, like so:
> {code:xml}
> 
> org.apache.beam
> beam-runners-spark
> 0.4.0-incubating-SNAPSHOT
> spark-app
> 
> {code}
> Otherwise, if they do not specify classifier, the jar they get is unshaded, 
> which in cluster mode, causes collisions between different guava versions.
> Example exception in cluster mode when adding the dependency without 
> classifier:
> {code}
> 16/12/12 06:58:56 WARN TaskSetManager: Lost task 4.0 in stage 8.0 (TID 153, 
> lvsriskng02.lvs.paypal.com): java.lang.NoSuchMethodError: 
> com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
>   at 
> org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:137)
>   at 
> org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:98)
>   at 
> org.apache.spark.streaming.StateSpec$$anonfun$1.apply(StateSpec.scala:180)
>   at 
> org.apache.spark.streaming.StateSpec$$anonfun$1.apply(StateSpec.scala:179)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:57)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:155)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:149)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> I would suggest that the classifier be removed from the shaded jar, to avoid 
> confusion among users, and have a better user experience.
> P.S. Looks like Dataflow runner does not add a classifier to its shaded jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-1144) Spark runner fails to deserialize MicrobatchSource in cluster mode

2016-12-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1144:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Spark runner fails to deserialize MicrobatchSource in cluster mode
> --
>
> Key: BEAM-1144
> URL: https://issues.apache.org/jira/browse/BEAM-1144
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> When running in cluster mode (yarn), spark runner fails on deserialization of 
> {{MicrobatchSource}}
> After changes made in BEAM-921 spark runner fails in cluster mode with the 
> following:
> {code}
> 16/12/12 04:27:01 ERROR ApplicationMaster: User class threw exception: 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:72)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:115)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:101)
>   at 
> com.paypal.risk.platform.aleph.example.MapOnlyExample.main(MapOnlyExample.java:38)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
> deserialization.
>   at 
> com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>   at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)
>   at 
> org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:169)
>   at 
> org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:201)
>   at 
> org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:198)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:155)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.runners.spark.io.MicrobatchSource
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>  

[jira] [Assigned] (BEAM-1146) Decrease spark runner startup overhead

2016-12-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1146:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Decrease spark runner startup overhead
> --
>
> Key: BEAM-1146
> URL: https://issues.apache.org/jira/browse/BEAM-1146
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> BEAM-921 introduced a lazy singleton instantiated once in each machine 
> (driver & executors) which utilizes reflection to find all subclasses of 
> Source and Coder
> While this is beneficial in it's own right, the change added about one minute 
> of overhead in spark runner startup time (which cause the first job/stage to 
> take up to a minute).
> The change is in class {{BeamSparkRunnerRegistrator}}
> The reason reflection (specifically reflections library) was used here is 
> because  there is no current way of knowing all the source and coder classes 
> at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : beam_PostCommit_Java_RunnableOnService_Spark #519

2016-12-21 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_RunnableOnService_Spark #519

2016-12-21 Thread Apache Jenkins Server
See 




<    1   2   3