Re: Starting with spark

2014-08-03 Thread Mahebub Sayyed
Hello, I have enabled Spark in the Quickstart VM and Running SparkPi in Standalone Mode reference: *http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_running_spark_apps.html

Re: Starting with spark

2014-08-03 Thread Sean Owen
If the question is likely about the Quickstart VM, it's better to ask in the VM forum: https://community.cloudera.com/t5/Apache-Hadoop-Concepts-and/bd-p/ApacheHadoopConcepts Please give more detail though; it's not clear what you mean is not working. On Sun, Aug 3, 2014 at 10:09 AM, Mahebub

error while running kafka-spark-example

2014-08-03 Thread Mahebub Sayyed
Hello, I am getting following error while running kafka-spark-example: Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/Function at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at

Re: error while running kafka-spark-example

2014-08-03 Thread Sameer Sayyed
I have jar file kafka-spark-example.jar. What should be the location of jar file while runing kafka-spark-example using *cloudera-quickstart-vm-5.0.0-0-vmware* On Sun, Aug 3, 2014 at 2:47 PM, Mahebub Sayyed mahebub...@gmail.com wrote: Hello, I am getting following error while running

Re: error while running kafka-spark-example

2014-08-03 Thread Sean Owen
You have marked Spark dependencies as 'provided', but are evidently not 'providing' them at runtime. You haven't said how you are running them. Running with spark-submit should set up the classpath correctly. On Sun, Aug 3, 2014 at 12:47 PM, Mahebub Sayyed mahebub...@gmail.com wrote: Hello, I

pyspark script fails on EMR with an ERROR in configuring object.

2014-08-03 Thread Rahul Bhojwani
Hi, I used to run spark scripts on local machine. Now i am porting my codes to EMR and i am facing lots of problem. The main one now is that the spark script which is running properly on my local machine is giving error when run on Amazon EMR Cluster. Here is the error: [image: Inline image 1]

Re: disable log4j for spark-shell

2014-08-03 Thread Sean Owen
That's just a template. Nothing consults that file by default. It's looking inside the Spark .jar. If you edit core/src/main/resources/org/apache/spark/log4j-defaults.properties and rebuild Spark, it will pick up those changes. I think you could also use the JVM argument

Re: pyspark script fails on EMR with an ERROR in configuring object.

2014-08-03 Thread Rahul Bhojwani
The logs provided in the image may not be enough for help. Here I have copied the whole logs: WARNING: Running python applications through ./bin/pyspark is deprecated as of Spark 1.0. Use ./bin/spark-submit python file 14/08/03 11:10:57 INFO SparkConf: Using Spark's default log4j profile:

Re: GraphX runs without Spark?

2014-08-03 Thread Deep Pradhan
We need to pass the URL only when we are using the interactive shell right? Now, I am not using the interactive shell, I am just doing ./bin/run-example.. when I am in the Spark directory. If not, Spark may be ignoring your single-node cluster and defaulting to local mode. What does this

Re: disable log4j for spark-shell

2014-08-03 Thread Patrick Wendell
If you want to customize the logging behavior - the simplest way is to copy conf/log4j.properties.tempate to conf/log4j.properties. Then you can go and modify the log level in there. The spark shells should pick this up. On Sun, Aug 3, 2014 at 6:16 AM, Sean Owen so...@cloudera.com wrote:

Cached RDD Block Size - Uneven Distribution

2014-08-03 Thread iramaraju
I am running spark 1.0.0, Tachyon 0.5 and Hadoop 1.0.4. I am selecting a subset of a large dataset and trying to run queries on the cached schema RDD. Strangely, in web UI, I see the following. 150 Partitions Block Name Storage Level Size in Memory ▴Size on Disk Executors

Re: How to share a NonSerializable variable among tasks in the same worker node?

2014-08-03 Thread Ron's Yahoo!
I think you’re going to have to make it serializable by registering it with the Kryo registrator. I think multiple workers are running as separate VMs so it might need to be able to serialize and deserialize broadcasted variables to the different executors. Thanks, Ron On Aug 3, 2014, at 6:38

Kafka and Spark application after polling twice.

2014-08-03 Thread salemi
Hi All, My application works when I use the spark-submit with master=local[*]. But if I deploy the application to a standalone cluster master=spark://master:7077 then the application polls twice twice from kafka topic and then it stops working. I don't get any error logs. I can see application

Re: Tasks fail when ran in cluster but they work fine when submited using local local

2014-08-03 Thread salemi
Let me answer the solution to this problem. I had to set the spark.httpBroadcast.uri to the FQDN of the driver. Ali -- View this message in context: