Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-07-19 Thread lihu
Hi, Everyone. I have a piece of following code. When I run it, it occurred the error just like below, it seem that the SparkContext is not serializable, but i do not try to use the SparkContext except the broadcast. [In fact, this code is in the MLLib, I just try to broadcast the

registerAsTable can't be compiled

2014-07-19 Thread junius
Hello, I write code to practice Spark Sql based on latest Spark version. But I get compilation error as following, seems the implicit conversion from RDD to SchemaRDD doesn't work. If anybody can help me to fix it. Thanks a lot. value registerAsTable is not a member of

Re: NullPointerException When Reading Avro Sequence Files

2014-07-19 Thread Sparky
I see Spark is using AvroRecordReaderBase, which is used to grab Avro Container Files, which is different from Sequence Files. If anyone is using Avro Sequence Files with success and has an example, please let me know. -- View this message in context:

Re: NullPointerException When Reading Avro Sequence Files

2014-07-19 Thread Sparky
To be more specific, I'm working with a system that stores data in org.apache.avro.hadoop.io.AvroSequenceFile format. An AvroSequenceFile is A wrapper around a Hadoop SequenceFile that also supports reading and writing Avro data. It seems that Spark does not support this out of the box. --

Re: NullPointerException When Reading Avro Sequence Files

2014-07-19 Thread Nick Pentreath
I got this working locally a little while ago when playing around with AvroKeyInputFile: https://gist.github.com/MLnick/5864741781b9340cb211 But not sure about AvroSequenceFile. Any chance you have an example datafile or records? On Sat, Jul 19, 2014 at 11:00 AM, Sparky gullo_tho...@bah.com

Re: NullPointerException When Reading Avro Sequence Files

2014-07-19 Thread Sparky
Thanks for the gist. I'm just now learning about Avro. I think when you use a DataFileWriter you are writing to an Avro Container (which is different than an Avro Sequence File). I have a system where data was written to an HDFS Sequence File using AvroSequenceFile.Writer (which is a wrapper

Uber jar with SBT

2014-07-19 Thread boci
Hi Guys, I try to create spark uber jar with sbt but I have a lot of problem... I want to use the following: - Spark streaming - Kafka - Elsaticsearch - HBase the current jar size is cca 60M and it's not working. - When I deploy with spark-submit: It's running and exit without any error - When I

Re: Uber jar with SBT

2014-07-19 Thread Sean Owen
Are you building / running with Java 6? I imagine your .jar files has more than 65536 files, and Java 6 has various issues with jars this large. If possible, use Java 7 everywhere. https://issues.apache.org/jira/browse/SPARK-1520 On Sat, Jul 19, 2014 at 2:30 PM, boci boci.b...@gmail.com wrote:

Re: Uber jar with SBT

2014-07-19 Thread boci
Hi! I using java7, I found the problem. I not run start and await termination on streaming context, now it's work BUT spark-submit never return (it's run in the foreground and receive the kafka streams)... what I miss? (I want to send the job to standalone cluster worker process) b0c1

Re: SparkSQL operator priority

2014-07-19 Thread Eric Friedman
Can position be null? Looks like there may be constraints with predicate push down in that case. https://github.com/apache/spark/pull/511/ On Jul 18, 2014, at 8:04 PM, Christos Kozanitis kozani...@berkeley.edu wrote: Hello What is the order with which SparkSQL deserializes parquet

Need help with coalesce

2014-07-19 Thread Madhura
Hi, I have a file called out with random numbers where each number in on one line in the file. I am loading the complete file into a RDD and I want to create partitions with the help of coalesce function. This is my code snippet. import scala.math.Ordered import org.apache.spark.rdd.CoalescedRDD

Re: Java null pointer exception while saving hadoop file

2014-07-19 Thread Madhura
Hi, You can try setting the heap space memory to a higher value. Are you using an Ubuntu machine? In bashrc set the following option. export _JAVA_OPTIONS=-Xmx2g This should set your heap size to a higher value. Regards, Madhura -- View this message in context:

Re: registerAsTable can't be compiled

2014-07-19 Thread Michael Armbrust
Can you provide the code? Is Record a case class? and is it defined as a top level object? Also have you done import sqlContext._? On Sat, Jul 19, 2014 at 3:39 AM, junius junius.z...@gmail.com wrote: Hello, I write code to practice Spark Sql based on latest Spark version. But I get

Real-time segmentation with SPARK

2014-07-19 Thread Mahesh Govind
HI Experts, Could you please help me in getting some insights about doing realtime segmentation ( Segmentation on demand ) Using spark . My use case is like this . 1) I am running a campaign 2) Customers are subscribing for the campaign 3) Campaign is for 2-3 hours 4) Estimated target

Re: Server IPC version 7 cannot communicate with client version 4 with Spark Streaming 1.0.0 in Java and CH4 quickstart in local mode

2014-07-19 Thread Juan Rodríguez Hortalá
Hi Sean, I was launching the Spark Streaming program from Eclipse, but now I'm running it with the spark-submit script from the Spark distribution for CDH4 at http://spark.apache.org/downloads.html, and it works just fine. Thanks a lot for your help, Greetings, Juan 2014-07-16 12:58

Caching issue with msg: RDD block could not be dropped from memory as it does not exist

2014-07-19 Thread rindra
Hi, I am working with a small dataset about 13Mbyte on the spark-shell. After doing a groupBy on the RDD, I wanted to cache RDD in memory but I keep getting these warnings: scala rdd.cache() res28: rdd.type = MappedRDD[63] at repartition at console:28 scala rdd.count() 14/07/19 12:45:18 WARN

Re: Java null pointer exception while saving hadoop file

2014-07-19 Thread durga
Thanks for the reply. I am trying to save a huge file in my case it is 60GB. I think l.toSeq is going to collect all the data into the driver , where I don't have that much space . Is there any possibility using something like multipleoutput format class etc for a large file. Thanks, Durga.

java.net.ConnectException: Connection timed out

2014-07-19 Thread Soren Macbeth
Hello, I get a lot of these exceptions on my mesos cluster when running spark jobs: 14/07/19 16:29:43 WARN spark.network.SendingConnection: Error finishing connection to prd-atl-mesos-slave-010/10.88.160.200:37586 java.net.ConnectException: Connection timed out at

Out of any idea

2014-07-19 Thread boci
Hi guys! I run out of ideas... I created a spark streaming job (kafka - spark - ES). If I start my app local machine (inside the editor, but connect to the real kafka and ES) the application work correctly. If I start it in my docker container (same kafka and ES, local mode (local[4]) like inside

Debugging spark

2014-07-19 Thread Ruchir Jha
I am a newbie and am looking for pointers to start debugging my spark app and did not find a straightforward tutorial. Any help is appreciated? Sent from my iPhone

Spark 1.0.1 SQL on 160 G parquet file (snappy compressed, made by cloudera impala), 23 core and 60G mem / node, yarn-client mode, always failed

2014-07-19 Thread chutium
160G parquet files (ca. 30 files, snappy compressed, made by cloudera impala) ca. 30 full table scan, took 3-5 columns out, then some normal scala operations like substring, groupby, filter, at the end, save as file in HDFS yarn-client mode, 23 core and 60G mem / node but, always failed !

Re: Large Task Size?

2014-07-19 Thread Kyle Ellrott
I'm still having trouble with this one. Watching it, I've noticed that the first time around, the task size is large, but not terrible (199KB). It's on the second iteration of the optimization that the task size goes crazy (120MB). Does anybody have any ideas why this might be happening? Is there

Re: Spark 1.0.1 SQL on 160 G parquet file (snappy compressed, made by cloudera impala), 23 core and 60G mem / node, yarn-client mode, always failed

2014-07-19 Thread Yin Huai
Can you attach your code? Thanks, Yin On Sat, Jul 19, 2014 at 4:10 PM, chutium teng@gmail.com wrote: 160G parquet files (ca. 30 files, snappy compressed, made by cloudera impala) ca. 30 full table scan, took 3-5 columns out, then some normal scala operations like substring, groupby,

spark1.0.1 hadoop2.2.0 issue

2014-07-19 Thread Hu, Leo
Hi all Have anyone encounter such problem below, and how to solve it ? any help would be appreciated. Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V at

Re: SparkSQL operator priority

2014-07-19 Thread Christos Kozanitis
Thanks Eric. That is the case as most of my fields are optional. So it seems that the problem comes from Parquet. On Sat, Jul 19, 2014 at 8:27 AM, Eric Friedman eric.d.fried...@gmail.com wrote: Can position be null? Looks like there may be constraints with predicate push down in that case.

Re: Out of any idea

2014-07-19 Thread Tathagata Das
Could you collect debug level logs and send us. Without logs its hard to speculate anything. :) TD On Sat, Jul 19, 2014 at 2:39 PM, boci boci.b...@gmail.com wrote: Hi guys! I run out of ideas... I created a spark streaming job (kafka - spark - ES). If I start my app local machine (inside

Re: Out of any idea

2014-07-19 Thread Krishna Sankar
Probably you have - if not, try a very simple app in the docker container and make sure it works. Sometimes resource contention/allocation can get in the way. This happened to me in the YARN container. Also try single worker thread. Cheers k/ On Sat, Jul 19, 2014 at 2:39 PM, boci

Re: spark1.0.1 hadoop2.2.0 issue

2014-07-19 Thread Debasish Das
I compiled spark 1.0.1 with 2.3.0cdh5.0.2 today... No issues with mvn compilation but my sbt build keeps failing on the sql module... I just saw that my scala is at 2.11.0 (with brew update)...not sure if that's why the sbt compilation is failing...retrying.. On Sat, Jul 19, 2014 at 6:16 PM,