date:20140803

Re: Starting with spark

2014-08-03 Thread Mahebub Sayyed

Hello, I have enabled Spark in the Quickstart VM and Running SparkPi in Standalone Mode reference: *http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_running_spark_apps.html

Re: Starting with spark

2014-08-03 Thread Sean Owen

If the question is likely about the Quickstart VM, it's better to ask in the VM forum: https://community.cloudera.com/t5/Apache-Hadoop-Concepts-and/bd-p/ApacheHadoopConcepts Please give more detail though; it's not clear what you mean is not working. On Sun, Aug 3, 2014 at 10:09 AM, Mahebub

error while running kafka-spark-example

2014-08-03 Thread Mahebub Sayyed

Hello, I am getting following error while running kafka-spark-example: Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/Function at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at

Re: error while running kafka-spark-example

2014-08-03 Thread Sameer Sayyed

I have jar file kafka-spark-example.jar. What should be the location of jar file while runing kafka-spark-example using *cloudera-quickstart-vm-5.0.0-0-vmware* On Sun, Aug 3, 2014 at 2:47 PM, Mahebub Sayyed mahebub...@gmail.com wrote: Hello, I am getting following error while running

Re: error while running kafka-spark-example

2014-08-03 Thread Sean Owen

You have marked Spark dependencies as 'provided', but are evidently not 'providing' them at runtime. You haven't said how you are running them. Running with spark-submit should set up the classpath correctly. On Sun, Aug 3, 2014 at 12:47 PM, Mahebub Sayyed mahebub...@gmail.com wrote: Hello, I

pyspark script fails on EMR with an ERROR in configuring object.

2014-08-03 Thread Rahul Bhojwani

Hi, I used to run spark scripts on local machine. Now i am porting my codes to EMR and i am facing lots of problem. The main one now is that the spark script which is running properly on my local machine is giving error when run on Amazon EMR Cluster. Here is the error: [image: Inline image 1]

Re: disable log4j for spark-shell

2014-08-03 Thread Sean Owen

That's just a template. Nothing consults that file by default. It's looking inside the Spark .jar. If you edit core/src/main/resources/org/apache/spark/log4j-defaults.properties and rebuild Spark, it will pick up those changes. I think you could also use the JVM argument

Re: pyspark script fails on EMR with an ERROR in configuring object.

2014-08-03 Thread Rahul Bhojwani

The logs provided in the image may not be enough for help. Here I have copied the whole logs: WARNING: Running python applications through ./bin/pyspark is deprecated as of Spark 1.0. Use ./bin/spark-submit python file 14/08/03 11:10:57 INFO SparkConf: Using Spark's default log4j profile:

Re: GraphX runs without Spark?

2014-08-03 Thread Deep Pradhan

We need to pass the URL only when we are using the interactive shell right? Now, I am not using the interactive shell, I am just doing ./bin/run-example.. when I am in the Spark directory. If not, Spark may be ignoring your single-node cluster and defaulting to local mode. What does this

Re: disable log4j for spark-shell

2014-08-03 Thread Patrick Wendell

If you want to customize the logging behavior - the simplest way is to copy conf/log4j.properties.tempate to conf/log4j.properties. Then you can go and modify the log level in there. The spark shells should pick this up. On Sun, Aug 3, 2014 at 6:16 AM, Sean Owen so...@cloudera.com wrote:

Cached RDD Block Size - Uneven Distribution

2014-08-03 Thread iramaraju

I am running spark 1.0.0, Tachyon 0.5 and Hadoop 1.0.4. I am selecting a subset of a large dataset and trying to run queries on the cached schema RDD. Strangely, in web UI, I see the following. 150 Partitions Block Name Storage Level Size in Memory ▴Size on Disk Executors

Re: How to share a NonSerializable variable among tasks in the same worker node?

2014-08-03 Thread Ron's Yahoo!

I think you’re going to have to make it serializable by registering it with the Kryo registrator. I think multiple workers are running as separate VMs so it might need to be able to serialize and deserialize broadcasted variables to the different executors. Thanks, Ron On Aug 3, 2014, at 6:38

Kafka and Spark application after polling twice.

2014-08-03 Thread salemi

Hi All, My application works when I use the spark-submit with master=local[*]. But if I deploy the application to a standalone cluster master=spark://master:7077 then the application polls twice twice from kafka topic and then it stops working. I don't get any error logs. I can see application

Re: Tasks fail when ran in cluster but they work fine when submited using local local

2014-08-03 Thread salemi

Let me answer the solution to this problem. I had to set the spark.httpBroadcast.uri to the FQDN of the driver. Ali -- View this message in context:

Re: Starting with spark

Re: Starting with spark

error while running kafka-spark-example

Re: error while running kafka-spark-example

Re: error while running kafka-spark-example

pyspark script fails on EMR with an ERROR in configuring object.

Re: disable log4j for spark-shell

Re: pyspark script fails on EMR with an ERROR in configuring object.

Re: GraphX runs without Spark?

Re: disable log4j for spark-shell

Cached RDD Block Size - Uneven Distribution

Re: How to share a NonSerializable variable among tasks in the same worker node?

Kafka and Spark application after polling twice.

Re: Tasks fail when ran in cluster but they work fine when submited using local local

14 matches

Site Navigation

Mail list logo

Footer information