spark.logConf with log4j.rootCategory=WARN

2015-05-01 Thread roy
Hi, I have recently enable log4j.rootCategory=WARN, console in spark configuration. but after that spark.logConf=True has becomes ineffective. So just want to confirm if this is because log4j.rootCategory=WARN ? Thanks -- View this message in context:

ClassNotFoundException for Kryo serialization

2015-05-01 Thread Akshat Aranya
Hi, I'm getting a ClassNotFoundException at the executor when trying to register a class for Kryo serialization: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at

Spark SQL ThriftServer Impersonation Support

2015-05-01 Thread Night Wolf
Hi guys, Trying to use the SparkSQL Thriftserver with hive metastore. It seems that hive meta impersonation works fine (when running Hive tasks). However spinning up SparkSQL thrift server, impersonation doesn't seem to work... What settings do I need to enable impersonation? I've copied the

Error when saving as parquet to S3

2015-05-01 Thread Cosmin Cătălin Sanda
After repartitioning a DataFrame in Spark 1.3.0 I get a .parquet exception when saving toAmazon's S3. The data that I try to write is 10G. logsForDate .repartition(10) .saveAsParquetFile(destination) // -- Exception here The exception I receive is: java.io.IOException: The file being

Re: How to add a column to a spark RDD with many columns?

2015-05-01 Thread ayan guha
You have rdd or dataframe? Rdds are kind of tuples. You can add a new column to it by a map. rdd s are immutable, so you will get another rdd. On 1 May 2015 14:59, Carter gyz...@hotmail.com wrote: Hi all, I have a RDD with *MANY *columns (e.g., *hundreds*), how do I add one more column at the

Re: How to group multiple row data ?

2015-05-01 Thread Bipin Nag
OK, consider the case where there are multiple event triggers for a given customer/ vendor/product like 1,1,2,2,3 arranged in the order of *event* *occurrence* (time stamp). So output should be two groups (1,2) and (1,2,3). The doublet would be first occurrence of 1,2 and triplet later occurrences

Help with publishing to Kafka from Spark Streaming?

2015-05-01 Thread Pavan Sudheendra
Link to the question: http://stackoverflow.com/questions/29974017/spark-kafka-producer-not-serializable-exception Thanks for any pointers.

Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Ted Yu
bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow So the above class is in the jar which was in the classpath ? Can you tell us a bit more about Schema$MyRow ? On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm getting a

Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Akshat Aranya
Yes, this class is present in the jar that was loaded in the classpath of the executor Java process -- it wasn't even lazily added as a part of the task execution. Schema$MyRow is a protobuf-generated class. After doing some digging around, I think I might be hitting up against SPARK-5470, the

Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Akshat Aranya
I cherry-picked the fix for SPARK-5470 and the problem has gone away. On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote: Yes, this class is present in the jar that was loaded in the classpath of the executor Java process -- it wasn't even lazily added as a part of the task

Fwd: Event generator for SPARK-Streaming from csv

2015-05-01 Thread anshu shukla
I have the real DEBS-TAxi data in csv file , in order to operate over it how to simulate a Spout kind of thing as event generator using the timestamps in CSV file. -- Thanks Regards, Anshu Shukla

Re: Event generator for SPARK-Streaming from csv

2015-05-01 Thread Juan Rodríguez Hortalá
Hi, Maybe you could use streamingContext.fileStream like in the example from https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers, you can read from files on any file system compatible with the HDFS API (that is, HDFS, S3, NFS, etc.). You could split

Re: Enabling Event Log

2015-05-01 Thread James King
Oops! well spotted. Many thanks Shixiong. On Fri, May 1, 2015 at 1:25 AM, Shixiong Zhu zsxw...@gmail.com wrote: spark.history.fs.logDirectory is for the history server. For Spark applications, they should use spark.eventLog.dir. Since you commented out spark.eventLog.dir, it will be

Re: how to pass configuration properties from driver to executor?

2015-05-01 Thread Michael Ryabtsev
Hi, We've had a similar problem, but with log4j properties file. The only working way we've found, was externally deploying the properties file on the worker machine to the spark conf folder and configuring the executor jvm options with: sparkConf.set(spark.executor.extraJavaOptions,

Re: Spark on Mesos

2015-05-01 Thread Tim Chen
Hi Stephen, It looks like Mesos slave was most likely not able to launch some mesos helper processes (fetcher probably?). How did you install Mesos? Did you build from source yourself? Please install Mesos through a package or actually from source run make install and run from the installed

Spark worker error on standalone cluster

2015-05-01 Thread Michael Ryabtsev (Totango)
Hi everyone, I have a spark application that works fine on a standalone Spark cluster that runs on my laptop (master and one worker), but fails when I try to run in on a standalone Spark cluster deployed on EC2 (master and worker are on different machines). The application structure goes in the

Driver memory default setting stops background jobs

2015-05-01 Thread Andreas Marfurt
Hi all, I encountered strange behavior with the driver memory setting, and was wondering if some of you experienced it as well, or know what the problem is. I want to start a Spark job in the background with spark-submit. If I have the driver memory setting in my spark-defaults.conf:

Exiting driver main() method...

2015-05-01 Thread James Carman
In all the examples, it seems that the spark application doesn't really do anything special in order to exit. When I run my application, however, the spark-submit script just hangs there at the end. Is there something special I need to do to get that thing to exit normally?

NullPointerException with Avro + Spark.

2015-05-01 Thread ๏̯͡๏
I have this spark app that simply needs to do a simple regular join between two datasets. IT works fine with tiny data set (2.5G input of each dataset). When i run against 25G of each input and with .partitionBy(new org.apache.spark.HashPartitioner(200)) , I see NullPointerExveption this trace

Spark Streaming Kafka Avro NPE on deserialization of payload

2015-05-01 Thread Todd Nist
*Resending as I do not see that this made it to the mailing list, sorry if in fact it did an is just nor reflected online yet.* I’m very perplexed with the following. I have a set of AVRO generated objects that are sent to a SparkStreaming job via Kafka. The SparkStreaming job follows the