Re: Java 8 vs Scala

2015-07-15 Thread Reinis Vicups
We have a complex application that runs productively for couple of 
months and heavily uses spark in scala.


Just to give you some insight on complexity - we do not have such a huge 
source data (only about 500'000 complex elements), but we have more than 
a billion transformations and intermediate data elements we do with our 
machine learning algorithms.
Our current spark/mesos cluster consists of 120 CPUs, 190 GB RAM and 
plenty of HDD space.


Now regarding your question:

- scala is just a beautiful language itself, it has nothing to do with 
spark;


- spark api fits very naturally into scala semantics because of the 
map/reduce transformations are written more or less identicaly for local 
collections and RDDs;


- as with any religious topic, there is controverse discussion on what 
language is better and most of the times (I have read quite a lot of 
blog/forum topics on this) argumentation is based on what religion one 
belongs to (e.g. Java vs Scala vs Python)


- we have checked supposed performance issues and limitations of scala 
described here: (http://www.infoq.com/news/2011/11/yammer-scala) by 
re-factoring to "best practices" described in the article and have 
observed both performance increase in some places and, at the same time, 
performance decrease in other places. Thus I would say there is no 
noticeable performance difference between scala vs java in our use case 
(of course there are and always will be applications where one or other 
language performs better);


hope I could help
reinis


On 15.07.2015 09:27, 诺铁 wrote:
I think different team got different answer for this question.  my 
team use scala, and happy with it.


On Wed, Jul 15, 2015 at 1:31 PM, Tristan Blakers 
mailto:tris...@blackfrog.org>> wrote:


We have had excellent results operating on RDDs using Java 8 with
Lambdas. It’s slightly more verbose than Scala, but I haven’t
found this an issue, and haven’t missed any functionality.

The new DataFrame API makes the Spark platform even more language
agnostic.

Tristan

On 15 July 2015 at 06:40, Vineel Yalamarthy
mailto:vineelyalamar...@gmail.com>>
wrote:

 Good   question. Like  you , many are in the same boat(coming
from Java background). Looking forward to response from the
community.

Regards
Vineel

On Tue, Jul 14, 2015 at 2:30 PM, spark user
mailto:spark_u...@yahoo.com.invalid>> wrote:

Hi All

To Start new project in Spark , which technology is good
.Java8 OR  Scala .

I am Java developer , Can i start with Java 8  or I Need
to learn Scala .

which one is better technology  for quick start any POC
project

Thanks

- su




-- 


Thanks and Regards,
Venkata Vineel, Student  ,School of Computing
Mobile : +1-385-2109-788

-/Innovation is the ability to convert //ideas into invoice*s*/







Re: Tasks randomly stall when running on mesos

2015-05-26 Thread Reinis Vicups

Hi,

I just configured my cluster to run with 1.4.0-rc2, alas the dependency 
jungle does not one let just download, config and start. Instead one 
will have to fiddle with sbt settings for the upcoming couple of nights:


2015-05-26 14:50:52,686 WARN  a.r.ReliableDeliverySupervisor - 
Association with remote system 
[akka.tcp://driverPropsFetcher@app03:44805] has failed, address is now 
gated for [5000] ms. Reason is: [org.apache.spark.rpc.akka.AkkaMessage].
2015-05-26 14:52:55,707 ERROR Remoting - 
org.apache.spark.rpc.akka.AkkaMessage

java.lang.ClassNotFoundException: org.apache.spark.rpc.akka.AkkaMessage
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626)
at 
akka.util.ClassLoaderObjectInputStream.resolveClass(ClassLoaderObjectInputStream.scala:19)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at 
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at 
akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)

at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
at 
akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)

at scala.util.Try$.apply(Try.scala:161)
at 
akka.serialization.Serialization.deserialize(Serialization.scala:98)
at 
akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
at 
akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
at 
akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)

at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
at 
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)

at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


kind regards
reinis

On 25.05.2015 23:09, Reinis Vicups wrote:

Great hints, you guys!

Yes spark-shell worked fine with mesos as master. I haven't tried to 
execute multiple rdd actions in a row though (I did couple of 
successful  counts on hbase tables i am working with in several 
experiments but nothing that would compare to the stuff my spark jobs 
are doing), but will check if shell stalls upon some decent rdd action.


Also thanks a bunch for the links to binaries. This will literally 
save me hours!


kind regards
reinis

On 25.05.2015 21:00, Dean Wampler wrote:

Here is a link for builds of 1.4 RC2:

http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/ 
<http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/>

For a mvn repo, I believe the RC2 artifacts are here:

https://repository.apache.org/content/repositories/orgapachespark-1104/

A few experiments you might try:

1. Does spark-shell work? It might start fine, but make sure you can 
create an RDD and use it, e.g., something like:


val rdd = sc.parallelize(Seq(1,2,3,4,5,6))
rdd foreach println

2. Try coarse grained mode, which has different logic for executor 
management.


You can set it in $SPARK_HOME/conf/spark-defaults.conf file:

spark.mesos.coarse   true

Or, from this page 
<http://spark.apache.org/docs/latest/running-on-mesos.html>, set the 
property in a SparkConf object used to construct the SparkContext:


conf.set("spark.mesos.coarse", "true")

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition 
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)

Typesafe <ht

Re: Tasks randomly stall when running on mesos

2015-05-25 Thread Reinis Vicups

Great hints, you guys!

Yes spark-shell worked fine with mesos as master. I haven't tried to 
execute multiple rdd actions in a row though (I did couple of 
successful  counts on hbase tables i am working with in several 
experiments but nothing that would compare to the stuff my spark jobs 
are doing), but will check if shell stalls upon some decent rdd action.


Also thanks a bunch for the links to binaries. This will literally save 
me hours!


kind regards
reinis

On 25.05.2015 21:00, Dean Wampler wrote:

Here is a link for builds of 1.4 RC2:

http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/ 
<http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/>


For a mvn repo, I believe the RC2 artifacts are here:

https://repository.apache.org/content/repositories/orgapachespark-1104/

A few experiments you might try:

1. Does spark-shell work? It might start fine, but make sure you can 
create an RDD and use it, e.g., something like:


val rdd = sc.parallelize(Seq(1,2,3,4,5,6))
rdd foreach println

2. Try coarse grained mode, which has different logic for executor 
management.


You can set it in $SPARK_HOME/conf/spark-defaults.conf file:

spark.mesos.coarse   true

Or, from this page 
<http://spark.apache.org/docs/latest/running-on-mesos.html>, set the 
property in a SparkConf object used to construct the SparkContext:


conf.set("spark.mesos.coarse", "true")

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition 
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)

Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Mon, May 25, 2015 at 12:06 PM, Reinis Vicups <mailto:sp...@orbit-x.de>> wrote:


Hello,

I assume I am running spark in a fine-grained mode since I haven't
changed the default here.

One question regarding 1.4.0-RC1 - is there a mvn snapshot
repository I could use for my project config? (I know that I have
to download source and make-distribution for executor as well)

thanks
reinis


On 25.05.2015 17:07, Iulian Dragoș wrote:


On Mon, May 25, 2015 at 2:43 PM, Reinis Vicups mailto:sp...@orbit-x.de>> wrote:

Hello,

I am using Spark 1.3.1-hadoop2.4 with Mesos 0.22.1 with
zookeeper and running on a cluster with 3 nodes on 64bit ubuntu.

My application is compiled with spark 1.3.1 (apparently with
mesos 0.21.0 dependency), hadoop 2.5.1-mapr-1503 and akka
2.3.10. Only with this combination I have succeeded to run
spark-jobs on mesos at all. Different versions are causing
class loader issues.

I am submitting spark jobs with spark-submit with
mesos://zk://.../mesos.


Are you using coarse grained or fine grained mode?

sandbox log of slave-node app01 (the one that stalls) shows
following:

10:01:25.815506 35409 fetcher.cpp:214] Fetching URI
'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz'
10:01:26.497764 35409 fetcher.cpp:99] Fetching URI
'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz'
using Hadoop Client
10:01:26.497869 35409 fetcher.cpp:109] Downloading resource
from 'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz'
to

'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05/spark-1.3.1-bin-hadoop2.4.tgz'
10:01:32.877717 35409 fetcher.cpp:78] Extracted resource

'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05/spark-1.3.1-bin-hadoop2.4.tgz'
into

'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05'
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
10:01:34 INFO MesosExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
10:01:34.459292 35730 exec.cpp:132] Version: 0.22.0
*10:01:34 ERROR MesosExecutorBackend: Received launchTask but
executor was null*
10:01:34.540870 35765 exec.cpp:206] Executor registered on
slave 20150511-150924-3410235146-5050-1903-S3
10:01:34 INFO MesosExecutorBackend: Registered with Mesos as
executor ID 20150511-150924-3410235146-5050-1903-S3 with 1 cpus


It looks like an inconsistent state on the Mesos scheduler. It
tries to launch a task on a given slave before the 

Re: Tasks randomly stall when running on mesos

2015-05-25 Thread Reinis Vicups

Hello,

I assume I am running spark in a fine-grained mode since I haven't 
changed the default here.


One question regarding 1.4.0-RC1 - is there a mvn snapshot repository I 
could use for my project config? (I know that I have to download source 
and make-distribution for executor as well)


thanks
reinis

On 25.05.2015 17:07, Iulian Dragoș wrote:


On Mon, May 25, 2015 at 2:43 PM, Reinis Vicups <mailto:sp...@orbit-x.de>> wrote:


Hello,

I am using Spark 1.3.1-hadoop2.4 with Mesos 0.22.1 with zookeeper
and running on a cluster with 3 nodes on 64bit ubuntu.

My application is compiled with spark 1.3.1 (apparently with mesos
0.21.0 dependency), hadoop 2.5.1-mapr-1503 and akka 2.3.10. Only
with this combination I have succeeded to run spark-jobs on mesos
at all. Different versions are causing class loader issues.

I am submitting spark jobs with spark-submit with
mesos://zk://.../mesos.


Are you using coarse grained or fine grained mode?

sandbox log of slave-node app01 (the one that stalls) shows following:

10:01:25.815506 35409 fetcher.cpp:214] Fetching URI
'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz'
10:01:26.497764 35409 fetcher.cpp:99] Fetching URI
'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz' using
Hadoop Client
10:01:26.497869 35409 fetcher.cpp:109] Downloading resource from
'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz' to

'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05/spark-1.3.1-bin-hadoop2.4.tgz'
10:01:32.877717 35409 fetcher.cpp:78] Extracted resource

'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05/spark-1.3.1-bin-hadoop2.4.tgz'
into

'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05'
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
10:01:34 INFO MesosExecutorBackend: Registered signal handlers for
[TERM, HUP, INT]
10:01:34.459292 35730 exec.cpp:132] Version: 0.22.0
*10:01:34 ERROR MesosExecutorBackend: Received launchTask but
executor was null*
10:01:34.540870 35765 exec.cpp:206] Executor registered on slave
20150511-150924-3410235146-5050-1903-S3
10:01:34 INFO MesosExecutorBackend: Registered with Mesos as
executor ID 20150511-150924-3410235146-5050-1903-S3 with 1 cpus


It looks like an inconsistent state on the Mesos scheduler. It tries 
to launch a task on a given slave before the executor has registered. 
This code was improved/refactored in 1.4, could you try 1.4.0-RC1?


iulian

10:01:34 INFO SecurityManager: Changing view acls to...
10:01:35 INFO Slf4jLogger: Slf4jLogger started
10:01:35 INFO Remoting: Starting remoting
10:01:35 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkExecutor@app01:xxx]
10:01:35 INFO Utils: Successfully started service 'sparkExecutor'
on port xxx.
10:01:35 INFO AkkaUtils: Connecting to MapOutputTracker:
akka.tcp://sparkDriver@dev-web01/user/MapOutputTracker
10:01:35 INFO AkkaUtils: Connecting to BlockManagerMaster:
akka.tcp://sparkDriver@dev-web01/user/BlockManagerMaster
10:01:36 INFO DiskBlockManager: Created local directory at

/tmp/spark-52a6585a-f9f2-4ab6-bebc-76be99b0c51c/blockmgr-e6d79818-fe30-4b5c-bcd6-8fbc5a201252
10:01:36 INFO MemoryStore: MemoryStore started with capacity 88.3 MB
10:01:36 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
10:01:36 INFO AkkaUtils: Connecting to OutputCommitCoordinator:
akka.tcp://sparkDriver@dev-web01/user/OutputCommitCoordinator
10:01:36 INFO Executor: Starting executor ID
20150511-150924-3410235146-5050-1903-S3 on host app01
10:01:36 INFO NettyBlockTransferService: Server created on XXX
10:01:36 INFO BlockManagerMaster: Trying to register BlockManager
10:01:36 INFO BlockManagerMaster: Registered BlockManager
10:01:36 INFO AkkaUtils: Connecting to HeartbeatReceiver:
akka.tcp://sparkDriver@dev-web01/user/HeartbeatReceiver

As soon as spark-driver is aborted, following log entries are
added to the sandbox log of slave-node app01:

10:17:29.559433 35772 exec.cpp:379] Executor asked to shutdown
10:17:29 WARN ReliableDeliverySupervisor: Association with remote
system [akka.tcp://sparkDriver@dev-web01] has failed, address is
now gated for [5000] ms. R

Tasks randomly stall when running on mesos

2015-05-25 Thread Reinis Vicups

Hello,

I am using Spark 1.3.1-hadoop2.4 with Mesos 0.22.1 with zookeeper and 
running on a cluster with 3 nodes on 64bit ubuntu.


My application is compiled with spark 1.3.1 (apparently with mesos 
0.21.0 dependency), hadoop 2.5.1-mapr-1503 and akka 2.3.10. Only with 
this combination I have succeeded to run spark-jobs on mesos at all. 
Different versions are causing class loader issues.


I am submitting spark jobs with spark-submit with mesos://zk://.../mesos.

About 50% of all jobs stall forever (or until I kill spark driver).
Error occurs randomly on different slave-nodes. It happens that 4 
spark-job in a row run completely without problems and then problem 
suddenly occurs.
I am always testing same set of 5 different jobs single and combined and 
the error occurs always in different job/node/stage/task combinations.


Whenever a slave-node stalls, this message appears in sandbox-log of the 
failing slave:
10:01:34 ERROR MesosExecutorBackend: Received launchTask but executor 
was null


Any hints on how to address this issue are greatly appreciated

kind regards
reinis


Job that stalls, shows following in spark-driver log (As one can see - 
the task 1.0 is never finished):


10:01:25,620 INFO  o.a.s.s.DAGScheduler - Submitting 4 missing tasks 
from Stage 0 (MapPartitionsRDD[1] at groupBy at 
ImportExtensionFieldsSparkJob.scala:57)
10:01:25,621 INFO  o.a.s.s.TaskSchedulerImpl - Adding task set 0.0 with 
4 tasks
10:01:25,656 INFO  o.a.s.s.TaskSetManager - Starting task 0.0 in stage 
0.0 (TID 0, app03, PROCESS_LOCAL, 1140 bytes)
10:01:25,660 INFO  o.a.s.s.TaskSetManager - Starting task 1.0 in stage 
0.0 (TID 1, app01, PROCESS_LOCAL, 1140 bytes)
10:01:25,661 INFO  o.a.s.s.TaskSetManager - Starting task 2.0 in stage 
0.0 (TID 2, app02, PROCESS_LOCAL, 1140 bytes)
10:01:25,662 INFO  o.a.s.s.TaskSetManager - Starting task 3.0 in stage 
0.0 (TID 3, app03, PROCESS_LOCAL, 1140 bytes)
10:01:36,842 INFO  o.a.s.s.BlockManagerMasterActor - Registering block 
manager app02 with 88.3 MB RAM, 
BlockManagerId(20150511-150924-3410235146-5050-1903-S1, app02, 59622)
10:01:36,862 INFO  o.a.s.s.BlockManagerMasterActor - Registering block 
manager app03 with 88.3 MB RAM, 
BlockManagerId(20150511-150924-3410235146-5050-1903-S2, app03, 39420)
10:01:36,917 INFO  o.a.s.s.BlockManagerMasterActor - Registering block 
manager app01 with 88.3 MB RAM, 
BlockManagerId(20150511-150924-3410235146-5050-1903-S3, app01, 45605)
10:01:38,701 INFO  o.a.s.s.BlockManagerInfo - Added broadcast_2_piece0 
in memory on app03 (size: 2.6 KB, free: 88.3 MB)
10:01:38,702 INFO  o.a.s.s.BlockManagerInfo - Added broadcast_2_piece0 
in memory on app02 (size: 2.6 KB, free: 88.3 MB)
10:01:41,400 INFO  o.a.s.s.TaskSetManager - Finished task 0.0 in stage 
0.0 (TID 0) in 15721 ms on app03 (1/4)
10:01:41,539 INFO  o.a.s.s.TaskSetManager - Finished task 2.0 in stage 
0.0 (TID 2) in 15870 ms on app02 (2/4)
10:01:41,697 INFO  o.a.s.s.TaskSetManager - Finished task 3.0 in stage 
0.0 (TID 3) in 16029 ms on app03 (3/4)


sandbox log of slave-node app01 (the one that stalls) shows following:

10:01:25.815506 35409 fetcher.cpp:214] Fetching URI 
'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz'
10:01:26.497764 35409 fetcher.cpp:99] Fetching URI 
'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz' using Hadoop Client
10:01:26.497869 35409 fetcher.cpp:109] Downloading resource from 
'hdfs://dev-hadoop01/apps/spark-1.3.1-bin-hadoop2.4.tgz' to 
'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05/spark-1.3.1-bin-hadoop2.4.tgz'
10:01:32.877717 35409 fetcher.cpp:78] Extracted resource 
'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05/spark-1.3.1-bin-hadoop2.4.tgz' 
into 
'/tmp/mesos/slaves/20150511-150924-3410235146-5050-1903-S3/frameworks/20150511-150924-3410235146-5050-1903-0249/executors/20150511-150924-3410235146-5050-1903-S3/runs/ec3a0f13-2f44-4952-bb23-4d48abaacc05'
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
10:01:34 INFO MesosExecutorBackend: Registered signal handlers for 
[TERM, HUP, INT]

10:01:34.459292 35730 exec.cpp:132] Version: 0.22.0
*10:01:34 ERROR MesosExecutorBackend: Received launchTask but executor 
was null*
10:01:34.540870 35765 exec.cpp:206] Executor registered on slave 
20150511-150924-3410235146-5050-1903-S3
10:01:34 INFO MesosExecutorBackend: Registered with Mesos as executor ID 
20150511-150924-3410235146-5050-1903-S3 with 1 cpus

10:01:34 INFO SecurityManager: Changing view acls to...
10:01:35 INFO Slf4jLogger: Slf4jLogger started
10:01:35 INFO Remoting: Starting remoting
10:01:35 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkExecutor@app01:xxx]
10:01:35 INFO Utils: 

Spark 1.1.0: weird spark-shell behavior

2014-12-01 Thread Reinis Vicups

Hello,

I have two weird effects when working with spark-shell:


1. This code executed in spark-shell causes an exception below. At the 
same time it works perfectly when submitted with spark-submit! :


import org.apache.hadoop.hbase.{HConstants, HBaseConfiguration}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.client.Result
import org.apache.mahout.math.VectorWritable
import com.google.common.io.ByteStreams
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.SparkContext.rddToPairRDDFunctions
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

val hConf = HBaseConfiguration.create()
hConf.set("hbase.defaults.for.version.skip", "true")
hConf.set("hbase.defaults.for.version", "0.98.6-cdh5.2.0")
hConf.set(HConstants.ZOOKEEPER_QUORUM, "myserv")
hConf.set(HConstants.ZOOKEEPER_CLIENT_PORT, "2181")
hConf.set(TableInputFormat.INPUT_TABLE, "MyNS:MyTable")
val rdd = sc.newAPIHadoopRDD(hConf, classOf[TableInputFormat], 
classOf[ImmutableBytesWritable], classOf[Result])

rdd.count()

--- Exception ---

14/12/01 10:45:24 ERROR ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]

 java.lang.ExceptionInInitializerError
at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
at 
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:113)

at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: hbase-default.xml file seems to 
be for and old version of HBase (null), this version is 0.98.6-cdh5.2.0
at 
org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73)
at 
org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105)
at 
org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:116)
at 
org.apache.hadoop.hbase.client.HConnectionManager.(HConnectionManager.java:222)

... 14 more

We have already checked most of the trivial stuff with class paths and 
existenceof tables and column groups, enabled HBase specific settings to 
avoid the version checking and so on. It appears that the supplied HBase 
configuration is completely ignored by context. We tried to solve this 
issue by instantiating own spark context and encountered the second 
weird effect:


2. when attempting to instantiate own SparkContext we get an exception 
below:


// imports block
...

|val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)

--- Exception ---

2014-12-01 10:42:24,966 WARN  o.e.j.u.c.AbstractLifeCycle - FAILED 
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Die Adresse 
wird bereits verwendet

java.net.BindException: Die Adresse wird bereits verwendet
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at 
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at 
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)

at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at 
org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:199)
at 
org.apache.spark.ui.JettyUtils$$anonfun$4.apply(JettyUtils.scala:209)
at 
org.apache.spark.ui.JettyUtils$$anonfun$4.apply(JettyUtils.scala:209)
at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1449)

at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

Re: HBase 0.96+ with Spark 1.0+

2014-09-18 Thread Reinis Vicups
I am humbly bumping this since even after another week of trying I 
haven't had luck to fix this yet.


On 14.09.2014 19:21, Reinis Vicups wrote:
I did actually try Seans suggestion just before I posted for the first 
time in this thread. I got an error when doing this and thought that I 
am not understanding what Sean was suggesting.


Now I re-attempted your suggestions with spark 1.0.0-cdh5.1.0, hbase 
0.98.1-cdh5.1.0 and hadoop 2.3.0-cdh5.1.0 I am currently using.


I used following:

  val mortbayEnforce = "org.mortbay.jetty" % "servlet-api" % 
"3.0.20100224"
  val mortbayExclusion = ExclusionRule(organization = 
"org.mortbay.jetty", name = "servlet-api-2.5")


and applied this to hadoop and hbase dependencies e.g. like this:

val hbase = Seq(HBase.server, HBase.common, HBase.compat, 
HBase.compat2, HBase.protocol, 
mortbayEnforce).map(_.excludeAll(HBase.exclusions: _*))


private object HBase {
val server = "org.apache.hbase"  % "hbase-server" % Version.HBase
...
val exclusions = Seq(ExclusionRule("org.apache.ant"), 
mortbayExclusion)

}

I still get the error I got the last time I tried this experiment:

14/09/14 18:28:09 ERROR metrics.MetricsSystem: Sink class 
org.apache.spark.metrics.sink.MetricsServlet cannot be instantialized

java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:136)
at 
org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:130)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)

at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at 
org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:130)
at 
org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:84)
at 
org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:167)

at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)
at org.apache.spark.SparkContext.(SparkContext.scala:202)
at 
d.s.f.s.t.SimpleTicketTextSimilaritySparkJobSpec$$anonfun$1.apply$mcV$sp(SimpleTicketTextSimilaritySparkJobSpec.scala:29)
at 
d.s.f.s.t.SimpleTicketTextSimilaritySparkJobSpec$$anonfun$1.apply(SimpleTicketTextSimilaritySparkJobSpec.scala:21)
at 
d.s.f.s.t.SimpleTicketTextSimilaritySparkJobSpec$$anonfun$1.apply(SimpleTicketTextSimilaritySparkJobSpec.scala:21)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)

at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683)
at 
org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644)
at 
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
at 
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)

at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656)
at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1683)
at 
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
at 
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)

at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)

at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714)
at org.scalatest.FlatSpec.runTests(FlatSpec.scala:1683)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at 
org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$s

Re: HBase 0.96+ with Spark 1.0+

2014-09-14 Thread Reinis Vicups
I did actually try Seans suggestion just before I posted for the first 
time in this thread. I got an error when doing this and thought that I 
am not understanding what Sean was suggesting.


Now I re-attempted your suggestions with spark 1.0.0-cdh5.1.0, hbase 
0.98.1-cdh5.1.0 and hadoop 2.3.0-cdh5.1.0 I am currently using.


I used following:

  val mortbayEnforce = "org.mortbay.jetty" % "servlet-api" % "3.0.20100224"
  val mortbayExclusion = ExclusionRule(organization = 
"org.mortbay.jetty", name = "servlet-api-2.5")


and applied this to hadoop and hbase dependencies e.g. like this:

val hbase = Seq(HBase.server, HBase.common, HBase.compat, HBase.compat2, 
HBase.protocol, mortbayEnforce).map(_.excludeAll(HBase.exclusions: _*))


private object HBase {
val server = "org.apache.hbase"  % "hbase-server" % Version.HBase
...
val exclusions = Seq(ExclusionRule("org.apache.ant"), mortbayExclusion)
}

I still get the error I got the last time I tried this experiment:

14/09/14 18:28:09 ERROR metrics.MetricsSystem: Sink class 
org.apache.spark.metrics.sink.MetricsServlet cannot be instantialized

java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:136)
at 
org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:130)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)

at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at 
org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:130)
at 
org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:84)
at 
org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:167)

at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)
at org.apache.spark.SparkContext.(SparkContext.scala:202)
at 
d.s.f.s.t.SimpleTicketTextSimilaritySparkJobSpec$$anonfun$1.apply$mcV$sp(SimpleTicketTextSimilaritySparkJobSpec.scala:29)
at 
d.s.f.s.t.SimpleTicketTextSimilaritySparkJobSpec$$anonfun$1.apply(SimpleTicketTextSimilaritySparkJobSpec.scala:21)
at 
d.s.f.s.t.SimpleTicketTextSimilaritySparkJobSpec$$anonfun$1.apply(SimpleTicketTextSimilaritySparkJobSpec.scala:21)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)

at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683)
at 
org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644)
at 
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
at 
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)

at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656)
at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1683)
at 
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
at 
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)

at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)

at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714)
at org.scalatest.FlatSpec.runTests(FlatSpec.scala:1683)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at 
org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$super$run(FlatSpec.scala:1683)
at 
org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
at 
org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)

at org.scalatest.SuperEngine.runImpl(Engine.scala