Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-24 Thread Arun Ahuja
, Arun Ahuja aahuj...@gmail.com wrote: Yes, I imagine it's the driver's classpath - I'm pulling those screenshots straight from the Spark UI environment page. Is there somewhere else to grab the executor class path? Also, the warning is only printing once, so it's also not clear whether

Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-21 Thread Arun Ahuja
it references the assembly you built locally and from which you're launching the driver. I think we're concerned with the executors and what they have on the classpath. I suspect there is still a problem somewhere in there. On Mon, Jul 20, 2015 at 4:59 PM, Arun Ahuja aahuj...@gmail.com wrote

Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-20 Thread Arun Ahuja
Ryza sandy.r...@cloudera.com wrote: Can you try setting the spark.yarn.jar property to make sure it points to the jar you're thinking of? -Sandy On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja aahuj...@gmail.com wrote: Yes, it's a YARN cluster and using spark-submit to run. I have SPARK_HOME

Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-17 Thread Arun Ahuja
this assembly you built for your job -- like, it's the actually the assembly the executors are using. On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com wrote: Is there more documentation on what is needed to setup BLAS/LAPACK native suport with Spark. I’ve built spark

Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-17 Thread Arun Ahuja
you are using this assembly across your cluster. On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com wrote: Hi Sean, Thanks for the reply! I did double-check that the jar is one I think I am running: [image: Inline image 2] jar tf /hpc/users/ahujaa01/src/spark/assembly

What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-07 Thread Arun Ahuja
Is there more documentation on what is needed to setup BLAS/LAPACK native suport with Spark. I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib classes are in the assembly jar. jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native 6625 Tue Jul 07

Re: unable to bring up cluster with ec2 script

2015-07-07 Thread Arun Ahuja
Sorry, I can't help with this issue, but if you are interested in a simple way to launch a Spark cluster on Amazon, Spark is now offered as an application in Amazon EMR. With this you can have a full cluster with a few clicks: https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/ -

Re: Spark on YARN memory utilization

2014-12-06 Thread Arun Ahuja
Hi Denny, This is due the spark.yarn.memoryOverhead parameter, depending on what version of Spark you are on the default of this may differ, but it should be the larger of 1024mb per executor or .07 * executorMemory. When you set executor memory, the yarn resource request is executorMemory +

Re: spark-submit on YARN is slow

2014-12-05 Thread Arun Ahuja
Hey Sandy, What are those sleeps for and do they still exist? We have seen about a 1min to 1:30 executor startup time, which is a large chunk for jobs that run in ~10min. Thanks, Arun On Fri, Dec 5, 2014 at 3:20 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Denny, Those sleeps were only

Re: Nightly releases

2014-11-21 Thread Arun Ahuja
, myself included. I think Patrick's recent work on the build scripts for 1.2.0 will make delivering nightly builds to a public maven repo easier. On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja aahuj...@gmail.com wrote: Of course we can run this as well to get the lastest, but the build is fairly

Re: Nightly releases

2014-11-21 Thread Arun Ahuja
Great - posted here https://issues.apache.org/jira/browse/SPARK-4542 On Fri, Nov 21, 2014 at 1:03 PM, Andrew Ash and...@andrewash.com wrote: Yes you should file a Jira and echo it out here so others can follow and comment on it. Thanks Arun! On Fri, Nov 21, 2014 at 12:02 PM, Arun Ahuja

Nightly releases

2014-11-18 Thread Arun Ahuja
Are nightly releases posted anywhere? There are quite a few vital bugfixes and performance improvements being commited to Spark and using the latest commits is useful (or even necessary for some jobs). Is there a place to post them, it doesn't seem like it would diffcult to run make-dist nightly

Re: Nightly releases

2014-11-18 Thread Arun Ahuja
Of course we can run this as well to get the lastest, but the build is fairly long and this seems like a resource many would need. On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja aahuj...@gmail.com wrote: Are nightly releases posted anywhere? There are quite a few vital bugfixes and performance

Re: Increase Executor Memory on YARN

2014-11-10 Thread Arun Ahuja
If you are using spark-submit with --master yarn you can also pass as a flag --executor-memory ​ On Mon, Nov 10, 2014 at 8:58 AM, Mudassar Sarwar mudassar.sar...@northbaysolutions.net wrote: Hi, How can we increase the executor memory of a running spark cluster on YARN? We want to increase

Re: Viewing web UI after fact

2014-11-07 Thread Arun Ahuja
We are running our applications through YARN and are only somtimes seeing them into the History Server. Most do not seem to have the APPLICATION_COMPLETE file. Specifically any job that ends because of yarn application -kill does not show up. For other ones what would be a reason for them not

Re: Larger heap leads to perf degradation due to GC

2014-10-06 Thread Arun Ahuja
We have used the strategy that you suggested, Andrew - using many workers per machine and keeping the heaps small ( 20gb). Using a large heap resulted in workers hanging or not responding (leading to timeouts). The same dataset/job for us will fail (most often due to akka disassociated or fetch

Re: IOException running streaming job

2014-09-29 Thread Arun Ahuja
We are also seeing this PARSING_ERROR(2) error due to Caused by: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2) at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:362) at

java.io.IOException Error in task deserialization

2014-09-26 Thread Arun Ahuja
Has anyone else seen this erorr in task deserialization? The task is processing a small amount of data and doesn't seem to have much data hanging to the closure? I've only seen this with Spark 1.1 Job aborted due to stage failure: Task 975 in stage 8.0 failed 4 times, most recent failure: Lost

Re: java.io.IOException Error in task deserialization

2014-09-26 Thread Arun Ahuja
, which is a bit interesting since the error message shows that the same stage has failed multiple times. Are you able to consistently re-produce the bug across multiple invocations at the same place? On Fri, Sep 26, 2014 at 6:11 AM, Arun Ahuja aahuj...@gmail.com wrote: Has anyone else seen

Specifying Spark Executor Java options using Spark Submit

2014-09-24 Thread Arun Ahuja
What is the proper way to specify java options for the Spark executors using spark-submit? We had done this previously using export SPARK_JAVA_OPTS='.. previously, for example to attach a debugger to each executor or add -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps On spark-submit I

TorrentBroadcast causes java.io.IOException: unexpected exception type

2014-09-23 Thread Arun Ahuja
Since upgrading to Spark 1.1 we have been seeing the following error in the logs: 14/09/23 02:14:42 ERROR executor.Executor: Exception in task 1087.0 in stage 0.0 (TID 607) java.io.IOException: unexpected exception type at

General question on persist

2014-09-23 Thread Arun Ahuja
I have a general question on when persisting will be beneficial and when it won't: I have a task that runs as follow keyedRecordPieces = records.flatMap( record = Seq(key, recordPieces)) partitoned = keyedRecordPieces.partitionBy(KeyPartitioner) partitoned.mapPartitions(doComputation).save()

Re: General question on persist

2014-09-23 Thread Arun Ahuja
to persist RDDs and one allows you to specify storage level. Thanks, Liquan On Tue, Sep 23, 2014 at 2:08 PM, Arun Ahuja aahuj...@gmail.com wrote: I have a general question on when persisting will be beneficial and when it won't: I have a task that runs as follow keyedRecordPieces

Input Field in Spark 1.1 Web UI

2014-09-08 Thread Arun Ahuja
Is there more information on what the Input column on the Spark UI means? How is this computed? I am processing a fairly small (but zipped) file and see the value as [image: Inline image 1] This does not seem correct? Thanks, Arun

Re: Failed jobs show up as succeeded in YARN?

2014-08-19 Thread Arun Ahuja
We see this all the time as well, I don't the believe there is much a relationship before the Spark job status and the what Yarn shows as the status. On Mon, Aug 11, 2014 at 3:17 PM, Shay Rojansky r...@roji.org wrote: Spark 1.0.2, Python, Cloudera 5.1 (Hadoop 2.3.0) It seems that Python jobs

spark-submit with Yarn

2014-08-19 Thread Arun Ahuja
Is there more documentation on using spark-submit with Yarn? Trying to launch a simple job does not seem to work. My run command is as follows: /opt/cloudera/parcels/CDH/bin/spark-submit \ --master yarn \ --deploy-mode client \ --executor-memory 10g \ --driver-memory 10g \

Re: spark-submit with Yarn

2014-08-19 Thread Arun Ahuja
, Marcelo Vanzin van...@cloudera.com wrote: On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja aahuj...@gmail.com wrote: /opt/cloudera/parcels/CDH/bin/spark-submit \ --master yarn \ --deploy-mode client \ This should be enough. But when I view the job 4040 page, SparkUI, there is a single

java.net.SocketTimeoutException: Read timed out and java.io.IOException: Filesystem closed on Spark 1.0

2014-06-20 Thread Arun Ahuja
Hi all, I'm running a job that seems to continually fail with the following exception: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at

Re: Yarn configuration file doesn't work when run with yarn-client mode

2014-05-20 Thread Arun Ahuja
I was actually able to get this to work. I was NOT setting the classpath properly originally. Simply running java -cp /etc/hadoop/conf/:yarn, hadoop jars com.domain.JobClass and setting yarn-client as the spark master worked for me. Originally I had not put the configuration on the classpath.

Re: advice on maintaining a production spark cluster?

2014-05-20 Thread Arun Ahuja
Hi Matei, Unfortunately, I don't have more detailed information, but we have seen the loss of workers in standalone mode as well. If a job is killed through CTRL-C we will often see in the Spark Master page the number of workers and cores decrease. They are still alive and well in the Cloudera

Re: Yarn configuration file doesn't work when run with yarn-client mode

2014-05-20 Thread Arun Ahuja
-1.0.0-rc7-docs/configuration.html http://spark.apache.org/docs/0.9.1/configuration.html 2014-05-20 11:30 GMT-07:00 Arun Ahuja aahuj...@gmail.com: I was actually able to get this to work. I was NOT setting the classpath properly originally. Simply running java -cp /etc/hadoop/conf/:yarn

Re: Yarn configuration file doesn't work when run with yarn-client mode

2014-05-19 Thread Arun Ahuja
I am encountering the same thing. Basic yarn apps work as does the SparkPi example, but my custom application gives this result. I am using compute-classpath to create the proper classpath for my application, same with SparkPi - was there a resolution to this issue? Thanks, Arun On Wed, Feb