Re: Profiling in YourKit

2015-02-07 Thread Enno Shioji
1 You have 4 CPU core and 34 threads (system wide, you likely have many more, by the way). Think of it as having 4 espresso machine and 34 baristas. Does the fact that you have only 4 espresso machine mean you can only have 4 baristas? Of course not, there's plenty more work other than making

Re: Profiling in YourKit

2015-02-07 Thread Sean Owen
If you look at the threads, the other 30 are almost surely not Spark worker threads. They're the JVM finalizer, GC threads, Jetty listeners, etc. Nothing wrong with this. Your OS has hundreds of threads running now, most of which are idle, and up to 4 of which can be executing. In a one-machine

Re: Is the pre-built version of spark 1.2.0 with --hive option?

2015-02-07 Thread Sean Owen
https://github.com/apache/spark/blob/master/dev/create-release/create-release.sh#L217 Yes, except the 'without hive' version. On Sat, Feb 7, 2015 at 3:45 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, After various problems with the binaries built by myself, I want to try the pre-built

getting error when submit spark with master as yarn

2015-02-07 Thread sachin Singh
Hi, when I am trying to execute my program as spark-submit --master yarn --class com.mytestpack.analysis.SparkTest sparktest-1.jar I am getting error bellow error- java.lang.IllegalArgumentException: Required executor memory (1024+384 MB) is above the max threshold (1024 MB) of this cluster!

Re: getting error when submit spark with master as yarn

2015-02-07 Thread Sandy Ryza
Hi Sachin, In your YARN configuration, either yarn.nodemanager.resource.memory-mb is 1024 on your nodes or yarn.scheduler.maximum-allocation-mb is set to 1024. If you have more than 1024 MB on each node, you should bump these properties. Otherwise, you should request fewer resources by setting

Re: Spark impersonation

2015-02-07 Thread Sandy Ryza
https://issues.apache.org/jira/browse/SPARK-5493 currently tracks this. -Sandy On Mon, Feb 2, 2015 at 9:37 PM, Zhan Zhang zzh...@hortonworks.com wrote: I think you can configure hadoop/hive to do impersonation. There is no difference between secure or insecure hadoop cluster by using kinit.

Re: Can't access remote Hive table from spark

2015-02-07 Thread guxiaobo1982
Hi Zhan Zhang, With the pre-bulit version 1.2.0 of spark against the yarn cluster installed by ambari 1.7.0, I come with the following errors: [xiaobogu@lix1 spark]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-cluster --num-executors 3 --driver-memory 512m

Similar code in Java

2015-02-07 Thread Eduardo Costa Alfaia
Hi Guys, How could I doing in Java the code scala below? val KafkaDStreams = (1 to numStreams) map {_ = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicMap,storageLevel = StorageLevel.MEMORY_ONLY).map(_._2) } val unifiedStream =

Re: Can't access remote Hive table from spark

2015-02-07 Thread Ted Yu
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xiaobogu, access=WRITE, inode=/user:hdfs:hdfs:drwxr-xr-x Looks like permission issue. Can you give access to 'xiaobogu' ? Cheers On Sat, Feb 7, 2015 at 8:15 AM,

Re: Can't access remote Hive table from spark

2015-02-07 Thread Zhan Zhang
Yes. You need to create xiaobogu under /user and provide right permission to xiaobogu. Thanks. Zhan Zhang On Feb 7, 2015, at 8:15 AM, guxiaobo1982 guxiaobo1...@qq.commailto:guxiaobo1...@qq.com wrote: Hi Zhan Zhang, With the pre-bulit version 1.2.0 of spark against the yarn cluster installed

Re: Profiling in YourKit

2015-02-07 Thread Deep Pradhan
So, Can I increase the number of threads by manually coding in the Spark code? On Sat, Feb 7, 2015 at 6:52 PM, Sean Owen so...@cloudera.com wrote: If you look at the threads, the other 30 are almost surely not Spark worker threads. They're the JVM finalizer, GC threads, Jetty listeners, etc.

no space left at worker node

2015-02-07 Thread ey-chih chow
Hi, I submitted a spark job to an ec2 cluster, using spark-submit. At a worker node, there is an exception of 'no space left on device' as follows. == 15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

When uses SparkFiles.get(GeoIP.dat), got exception in thread main java.io.FileNotFoundException

2015-02-07 Thread Gmail
Hi there, Spark version: 1.2 /home/hadoop/spark/bin/spark-submit --class com.litb.bi.CSLog2ES --master yarn --executor-memory 1G --jars

ERROR EndpointWriter: AssociationError

2015-02-07 Thread Lan
Hello, I'm new to Spark, and tried to setup a Spark cluster of 1 master VM SparkV1 and 1 worker VM SparkV4 (the error is the same if I have 2 workers). They are connected without a problem now. But when I submit a job (as in https://spark.apache.org/docs/latest/quick-start.html) at the master:

Re: Spark impersonation

2015-02-07 Thread Chester Chen
Sorry for the many typos as I was typing from my cell phone. Hope you still can get the idea. On Sat, Feb 7, 2015 at 1:55 PM, Chester @work ches...@alpinenow.com wrote: I just implemented this in our application. The impersonation is done before the job is submitted. In spark yarn (we are

Custom streaming receiver slow on YARN

2015-02-07 Thread Jong Wook Kim
Hello people, I have an issue that my streaming receiver is laggy on YARN. Can anyone reply to my question on StackOverflow?: http://stackoverflow.com/questions/28370362/spark-streaming-receiver-particularly-slow-on-yarn Thanks Jong Wook -- View this message in context:

[GraphX] Excessive value recalculations during aggregateMessages cycles

2015-02-07 Thread Kyle Ellrott
I'm trying to setup a simple iterative message/update problem in GraphX (spark 1.2.0), but I'm running into issues with the caching and re-calculation of data. I'm trying to follow the example found in the Pregel implementation of materializing and cacheing messages and graphs and then