Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
Right, well I don’t think the issue is with how you’re compiling the scala. I 
think it’s a conflict between different versions of several libs.
I had similar issues with my spark modules. You need to make sure you’re not 
loading a different version of the same lib that is clobbering another 
dependency. It’s very frustrating, but with patience you can weed them out. 
You’ll want to find the offending libs and put them into an  block 
under the associated dependency. I am still working with spark 1.5, scala 2.10, 
and for me the presence of scalap was the problem, and this resolved it:

 org.apache.spark
 spark-core_2.10
 1.5.1
 
  
   org.json4s
   json4s-core_2.10
  
 


 org.json4s
 json4s-core_2.10
 3.2.10
 
  
   org.scala-lang
   scalap
  
 


Unfortunately scalap is a dependency of json4s, which I want to keep. So what I 
do is exclude json4s from spark-core, then add it back in, but with its 
troublesome scalap dependency removed.


> On Mar 11, 2016, at 6:34 PM, Vasu Parameswaran  wrote:
> 
> Added these to the pom and still the same error :-(. I will look into sbt as 
> well.
> 
> 
> 
> On Fri, Mar 11, 2016 at 2:31 PM, Tristan Nixon  > wrote:
> You must be relying on IntelliJ to compile your scala, because you haven’t 
> set up any scala plugin to compile it from maven.
> You should have something like this in your plugins:
> 
> 
>  
>   net.alchim31.maven
>   scala-maven-plugin
>   
>
> scala-compile-first
> process-resources
> 
>  compile
> 
>
>
> scala-test-compile
> process-test-resources
> 
>  testCompile
> 
>
>   
>  
> 
> 
> PS - I use maven to compile all my scala and haven’t had a problem with it. I 
> know that sbt has some wonderful things, but I’m just set in my ways ;)
> 
>> On Mar 11, 2016, at 2:02 PM, Jacek Laskowski > > wrote:
>> 
>> Hi,
>> 
>> Doh! My eyes are bleeding to go through XMLs... 😁
>> 
>> Where did you specify Scala version? Dunno how it's in maven.
>> 
>> p.s. I *strongly* recommend sbt.
>> 
>> Jacek
>> 
>> 11.03.2016 8:04 PM "Vasu Parameswaran" > > napisał(a):
>> Thanks Jacek.  Pom is below (Currenlty set to 1.6.1 spark but I started out 
>> with 1.6.0 with the same problem).
>> 
>> 
>> 
>> http://maven.apache.org/POM/4.0.0 
>> "
>>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance 
>> "
>>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
>>  
>> http://maven.apache.org/xsd/maven-4.0.0.xsd 
>> ">
>> 
>> spark
>> com.test
>> 1.0-SNAPSHOT
>> 
>> 4.0.0
>> 
>> sparktest
>> 
>> 
>> UTF-8
>> 
>> 
>> 
>> 
>> junit
>> junit
>> 
>> 
>> 
>> commons-cli
>> commons-cli
>> 
>> 
>> com.google.code.gson
>> gson
>> 2.3.1
>> compile
>> 
>> 
>> org.apache.spark
>> spark-core_2.11
>> 1.6.1
>> 
>> 
>> 
>> 
>> 
>> 
>> org.apache.maven.plugins
>> maven-shade-plugin
>> 2.4.2
>> 
>> 
>> package
>> 
>> shade
>> 
>> 
>> 
>> 
>> 
>> ${project.artifactId}-${project.version}-with-dependencies
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Fri, Mar 11, 2016 at 10:46 AM, Jacek Laskowski > > wrote:
>> Hi,
>> 
>> Why do you use maven not sbt for Scala?
>> 
>> Can you show the entire pom.xml and the command to execute the app?
>> 
>> Jacek
>> 
>> 11.03.2016 7:33 PM "vasu20" mailto:vas...@gmail.com>> 
>> napisał(a):
>> Hi
>> 
>> Any help appreciated on this.  I am trying to write a Spark program using
>> IntelliJ.  I get a run time error as soon as new SparkConf() is called from
>> main.  Top few lines of the exception are pasted below.
>> 
>> These are the following versions:
>> 
>> Spark jar:  spark-assembly-1.6.0-hadoop2.6.0.jar
>> pom:  spark-core_2.11
>>  1.6.0
>> 
>> I have installed the Scala plugin in IntelliJ and added a dependency.
>> 
>> I have also added a library dependency in the project structure.
>> 
>> Thanks for any help!
>> 
>> Vasu
>> 
>> 
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String;
>> at org.apache.spark.util.Utils$.(Utils.scala:1682)
>> at org.apache.spark.util.Utils$.(Utils.scala)
>> at org.apache.spark.SparkConf.(SparkConf.scala:59)
>> 
>> 
>> 
>> 
>> 
>> 
>> --

Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Siva
Hi Everyone,

All of sudden we are encountering the below error from one of the spark
consumer. It used to work before without any issues.

When I restart the consumer with latest offsets, it is working fine for
sometime (it executed few batches) and it fails again, this issue is
intermittent.

Did any one come across this issue?

16/03/11 19:44:50 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0
(TID 3, ip-172-31-32-183.us-west-2.compute.internal):
java.lang.NoClassDefFoundError:
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream
at
kafka.message.ByteBufferMessageSet$.decompress(ByteBufferMessageSet.scala:65)
at
kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:179)
at
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:192)
at
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:146)
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:615)
at
org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:160)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
at
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.kafka.common.message.KafkaLZ4BlockOutputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 23 more

Container id: container_1456361466298_0236_01_02
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 50

16/03/11 19:44:55 INFO yarn.YarnAllocator: Completed container
container_1456361466298_0236_01_03 (state: COMPLETE, exit status:
50)
16/03/11 19:44:55 INFO yarn.YarnAllocator: Container marked as failed:
container_1456361466298_0236_01_03. Exit status: 50. Diagnostics:
Exception from container-launch.
Container id: container_1456361466298_0236_01_03
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.Th

Re: Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Ted Yu
KafkaLZ4BlockOutputStream is in kafka-clients jar :

$ jar tvf kafka-clients-0.8.2.0.jar | grep KafkaLZ4BlockOutputStream
  1609 Wed Jan 28 22:30:36 PST 2015
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream$BD.class
  2918 Wed Jan 28 22:30:36 PST 2015
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream$FLG.class
  4578 Wed Jan 28 22:30:36 PST 2015
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream.class

Can you check whether kafka-clients jar was in the classpath of the
container ?

Thanks

On Fri, Mar 11, 2016 at 5:00 PM, Siva  wrote:

> Hi Everyone,
>
> All of sudden we are encountering the below error from one of the spark
> consumer. It used to work before without any issues.
>
> When I restart the consumer with latest offsets, it is working fine for
> sometime (it executed few batches) and it fails again, this issue is
> intermittent.
>
> Did any one come across this issue?
>
> 16/03/11 19:44:50 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
> 1.0 (TID 3, ip-172-31-32-183.us-west-2.compute.internal):
> java.lang.NoClassDefFoundError:
> org/apache/kafka/common/message/KafkaLZ4BlockOutputStream
> at
> kafka.message.ByteBufferMessageSet$.decompress(ByteBufferMessageSet.scala:65)
> at
> kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:179)
> at
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:192)
> at
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:146)
> at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
> at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
> at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:615)
> at
> org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:160)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.kafka.common.message.KafkaLZ4BlockOutputStream
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 23 more
>
> Container id: container_1456361466298_0236_01_02
> Exit code: 50
> Stack trace: ExitCodeException exitCode=50:
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>
>
> Container exited with a non-zero exit code 50
>
> 16/03/11 19:44:55 INFO yarn.YarnAllocator: Completed container 
> container_1456361466298_0236_01_03 (state: COMPLETE, exit status: 50)
> 16/03/11 19:44:55 INFO yarn.YarnAllocator: Container marked as failed: 
> container_1456361466298_0236_01_03. Exit status: 50. Diagnostics: 
> Exception from container-launch.
> Container id: container_1456361466298_0236_01_03
> Exit code: 50
> Stack trace: ExitCodeException exitCode=50:
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>  

Re: Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Pankaj Wahane
Next thing you may want to check is if the jar has been provided to all the 
executors in your cluster. Most of the class not found errors got resolved for 
me after making required jars available in the SparkContext.

Thanks.

From: Ted Yu mailto:yuzhih...@gmail.com>>
Date: Saturday, 12 March 2016 at 7:17 AM
To: Siva mailto:sbhavan...@gmail.com>>
Cc: spark users mailto:user@spark.apache.org>>
Subject: Re: Spark Streaming: java.lang.NoClassDefFoundError: 
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

KafkaLZ4BlockOutputStream is in kafka-clients jar :

$ jar tvf kafka-clients-0.8.2.0.jar | grep KafkaLZ4BlockOutputStream
  1609 Wed Jan 28 22:30:36 PST 2015 
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream$BD.class
  2918 Wed Jan 28 22:30:36 PST 2015 
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream$FLG.class
  4578 Wed Jan 28 22:30:36 PST 2015 
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream.class

Can you check whether kafka-clients jar was in the classpath of the container ?

Thanks

On Fri, Mar 11, 2016 at 5:00 PM, Siva 
mailto:sbhavan...@gmail.com>> wrote:
Hi Everyone,

All of sudden we are encountering the below error from one of the spark 
consumer. It used to work before without any issues.

When I restart the consumer with latest offsets, it is working fine for 
sometime (it executed few batches) and it fails again, this issue is 
intermittent.

Did any one come across this issue?

16/03/11 19:44:50 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 
(TID 3, ip-172-31-32-183.us-west-2.compute.internal): 
java.lang.NoClassDefFoundError: 
org/apache/kafka/common/message/KafkaLZ4BlockOutputStream
at kafka.message.ByteBufferMessageSet$.decompress(ByteBufferMessageSet.scala:65)
at 
kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:179)
at 
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:192)
at 
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:146)
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:615)
at 
org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:160)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: 
org.apache.kafka.common.message.KafkaLZ4BlockOutputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 23 more


Container id: container_1456361466298_0236_01_02
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 50

16/03/11 19:44:55 INFO yarn.YarnAllocator: Completed container 
container_145636

Spark Serializer VS Hadoop Serializer

2016-03-11 Thread Fei Hu
Hi,

I am trying to migrate the program from Hadoop to Spark, but I met a problem 
about the serialization. In the Hadoop program, the key and value classes 
implement org.apache.hadoop.io.WritableComparable, which are for the 
serialization. Now in the spark program, I used newAPIHadoopRDD to read the 
data out from HDFS, and the key and value are the classes serialized by 
org.apache.hadoop.io.WritableComparable. When calling the reduceByKey function, 
it reports the error that “bject not serializable”. It seems that Spark does 
not support the the serialization provided by Hadoop, such as Text, Writable. 

Is there any convenient way to make the Hadoop serialization class work in 
Spark? Or I need refactor them in Kryo?

Thanks in advance,
Fei
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark session dies in about 2 days: HDFS_DELEGATION_TOKEN token can't be found

2016-03-11 Thread Ruslan Dautkhanov
Spark session dies out after ~40 hours when running against Hadoop Secure
cluster.

spark-submit has --principal and --keytab so kerberos ticket renewal works
fine according to logs.

Some happens with HDFS dfs connection?

These messages come up every 1 second:
  See complete stack: http://pastebin.com/QxcQvpqm

16/03/11 16:04:59 WARN hdfs.LeaseRenewer: Failed to renew lease for
> [DFSClient_NONMAPREDUCE_1534318438_13] for 2802 seconds.  Will retry
> shortly ...
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> token (HDFS_DELEGATION_TOKEN token 1349 for rdautkha) can't be found in
> cache


Then in 1 hour it stops trying:

16/03/11 16:18:17 WARN hdfs.DFSClient: Failed to renew lease for
> DFSClient_NONMAPREDUCE_1534318438_13 for 3600 seconds (>= hard-limit =3600
> seconds.) Closing all files being written ...
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> token (HDFS_DELEGATION_TOKEN token 1349 for rdautkha) can't be found in
> cache


It doesn't look it is Kerberos principal ticket renewal problem, because
that would expire much sooner (by default we have 12 hours), and from the
logs Spark kerberos ticket renewer works fine.

It's some sort of other hdfs delegation token renewal process that breaks?

RHEL 6.7
> Spark 1.5
> Hadoop 2.6


Found HDFS-5322, YARN-2648 that seem relevant, but I am not sure if it's
the same problem.
It seems Spark problem as I only seen this problem in Spark.
This is reproducible problem, just wait for ~40 hours and a Spark session
is no good.


Thanks,
Ruslan


Spark with Yarn Client

2016-03-11 Thread Divya Gehlot
Hi,
I am trying to understand behaviour /configuration of spark with yarn
client on hadoop cluster .
Can somebody help me or point me document /blog/books which has deeper
understanding of above two.
Thanks,
Divya


Re: Spark with Yarn Client

2016-03-11 Thread Alexander Pivovarov
Check doc - http://spark.apache.org/docs/latest/running-on-yarn.html

also you can start EMR-4.2.0 or 4.3.0 cluster with Spark app and see how
it's configured

On Fri, Mar 11, 2016 at 7:50 PM, Divya Gehlot 
wrote:

> Hi,
> I am trying to understand behaviour /configuration of spark with yarn
> client on hadoop cluster .
> Can somebody help me or point me document /blog/books which has deeper
> understanding of above two.
> Thanks,
> Divya
>


Re: Repeating Records w/ Spark + Avro?

2016-03-11 Thread Peyman Mohajerian
Here is the reason for the behavior:
'''Note:''' Because Hadoop's RecordReader class re-uses the same Writable
object for each record, directly caching the returned RDD or directly
passing it to an aggregation or shuffle operation will create many
references to the same object. If you plan to directly cache, sort, or
aggregate Hadoop writable objects, you should first copy them using a map
 function.

https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/SparkContext.html

So it is Hadoop related.

On Fri, Mar 11, 2016 at 3:19 PM, Chris Miller 
wrote:

> I have a bit of a strange situation:
>
> *
> import org.apache.avro.generic.{GenericData, GenericRecord}
> import org.apache.avro.mapred.{AvroInputFormat, AvroWrapper, AvroKey}
> import org.apache.avro.mapreduce.AvroKeyInputFormat
> import org.apache.hadoop.io.{NullWritable, WritableUtils}
>
> val path = "/path/to/data.avro"
>
> val rdd = sc.newAPIHadoopFile(path,
> classOf[AvroKeyInputFormat[GenericRecord]],
> classOf[AvroKey[GenericRecord]], classOf[NullWritable])
> rdd.take(10).foreach( x => println( x._1.datum() ))
> *
>
> In this situation, I get the right number of records returned, and if I
> look at the contents of rdd I see the individual records as tuple2's...
> however, if I println on each one as shown above, I get the same result
> every time.
>
> Apparently this has to do with something in Spark or Avro keeping a
> reference to the item its iterating over, so I need to clone the object
> before I use it. However, if I try to clone it (from the spark-shell
> console), I get:
>
> *
> rdd.take(10).foreach( x => {
>   val clonedDatum = x._1.datum().clone()
>   println(clonedDatum.datum())
> })
>
> :37: error: method clone in class Object cannot be accessed in
> org.apache.avro.generic.GenericRecord
>  Access to protected method clone not permitted because
>  prefix type org.apache.avro.generic.GenericRecord does not conform to
>  class $iwC where the access take place
> val clonedDatum = x._1.datum().clone()
> *
>
> So, how can I clone the datum?
>
> Seems I'm not the only one who ran into this problem:
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/102. I
> can't figure out how to fix it in my case without hacking away like the
> person in the linked PR did.
>
> Suggestions?
>
> --
> Chris Miller
>


spark-submit with cluster deploy mode fails with ClassNotFoundException (jars are not passed around properley?)

2016-03-11 Thread Hiroyuki Yamada
Hi,

I am trying to work with spark-submit with cluster deploy mode in single
node,
but I keep getting ClassNotFoundException as shown below.
(in this case, snakeyaml.jar is not found from the spark cluster)

===

16/03/12 14:19:12 INFO Remoting: Starting remoting
16/03/12 14:19:12 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://Driver@192.168.1.2:52993]
16/03/12 14:19:12 INFO util.Utils: Successfully started service
'Driver' on port 52993.
16/03/12 14:19:12 INFO worker.WorkerWatcher: Connecting to worker
akka.tcp://sparkWorker@192.168.1.2:52985/user/Worker
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at 
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/yaml/snakeyaml/Yaml
at 
com.analytics.config.YamlConfigLoader.loadConfig(YamlConfigLoader.java:30)
at 
com.analytics.api.DeclarativeAnalyticsFactory.create(DeclarativeAnalyticsFactory.java:21)
at com.analytics.program.QueryExecutor.main(QueryExecutor.java:12)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.Yaml
at java.lang.ClassLoader.findClass(ClassLoader.java:530)
at 
org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at 
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
at 
org.apache.spark.util.ChildFirstURLClassLoader.liftedTree1$1(MutableURLClassLoader.scala:75)
at 
org.apache.spark.util.ChildFirstURLClassLoader.loadClass(MutableURLClassLoader.scala:71)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
16/03/12 14:19:12 INFO util.Utils: Shutdown hook called



I can submit a job successfully with client mode, but I can't with cluster
mode,
so, it is a matter of not properly passing jars (snakeyaml) to the cluster.

The actual command I tried is:

$ spark-submit --master spark://192.168.1.2:6066 --deploy-mode cluster
--jars all-the-jars(with comma separated) --class
com.analytics.program.QueryExecutor analytics.jar
(of course, snakeyaml.jar is specified after --jars)

I tried spark.executor.extraClassPath and spark.driver.extraClassPath in
spark-defaults.conf to specifiy snakeyaml.jar,
but none of those worked.


I also found couple of similar issues posted in the mailing list or other
sites,
but, it is not responded back properly or it didn't work to me.

<
https://mail-archives.apache.org/mod_mbox/spark-user/201505.mbox/%3CCAGSyEuApEkfO_2-iiiuyS2eeg=w_jkf83vcceguns4douod...@mail.gmail.com%3E
>
<
http://stackoverflow.com/questions/34272426/how-to-give-dependent-jars-to-spark-submit-in-cluster-mode
>
<
https://support.datastax.com/hc/en-us/articles/207442243-Spark-submit-fails-with-class-not-found-when-deploying-in-cluster-mode
>


Could anyone give me a help ?

Best regards,
Hiro


Re: Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Siva
Thanks a lot Ted and Pankaj for your response. Changing the class path with
correct version of kafka jars resolved the issue.

Thanks,
Sivakumar Bhavanari.

On Fri, Mar 11, 2016 at 5:59 PM, Pankaj Wahane 
wrote:

> Next thing you may want to check is if the jar has been provided to all
> the executors in your cluster. Most of the class not found errors got
> resolved for me after making required jars available in the SparkContext.
>
> Thanks.
>
> From: Ted Yu 
> Date: Saturday, 12 March 2016 at 7:17 AM
> To: Siva 
> Cc: spark users 
> Subject: Re: Spark Streaming: java.lang.NoClassDefFoundError:
> org/apache/kafka/common/message/KafkaLZ4BlockOutputStream
>
> KafkaLZ4BlockOutputStream is in kafka-clients jar :
>
> $ jar tvf kafka-clients-0.8.2.0.jar | grep KafkaLZ4BlockOutputStream
>   1609 Wed Jan 28 22:30:36 PST 2015
> org/apache/kafka/common/message/KafkaLZ4BlockOutputStream$BD.class
>   2918 Wed Jan 28 22:30:36 PST 2015
> org/apache/kafka/common/message/KafkaLZ4BlockOutputStream$FLG.class
>   4578 Wed Jan 28 22:30:36 PST 2015
> org/apache/kafka/common/message/KafkaLZ4BlockOutputStream.class
>
> Can you check whether kafka-clients jar was in the classpath of the
> container ?
>
> Thanks
>
> On Fri, Mar 11, 2016 at 5:00 PM, Siva  wrote:
>
>> Hi Everyone,
>>
>> All of sudden we are encountering the below error from one of the spark
>> consumer. It used to work before without any issues.
>>
>> When I restart the consumer with latest offsets, it is working fine for
>> sometime (it executed few batches) and it fails again, this issue is
>> intermittent.
>>
>> Did any one come across this issue?
>>
>> 16/03/11 19:44:50 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
>> 1.0 (TID 3, ip-172-31-32-183.us-west-2.compute.internal):
>> java.lang.NoClassDefFoundError:
>> org/apache/kafka/common/message/KafkaLZ4BlockOutputStream
>> at
>> kafka.message.ByteBufferMessageSet$.decompress(ByteBufferMessageSet.scala:65)
>> at
>> kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:179)
>> at
>> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:192)
>> at
>> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:146)
>> at
>> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
>> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
>> at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
>> at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:615)
>> at
>> org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:160)
>> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> at
>> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
>> at
>> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.kafka.common.message.KafkaLZ4BlockOutputStream
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 23 more
>>
>> Container id: container_1456361466298_0236_01_02
>> Exit code: 50
>> Stack trace: ExitCodeException exitCode=50:
>>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>>  at org.apache.hadoop.util.Shell.run(Shell.java:455)
>>  at 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>>  at 
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>>  at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>>  at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>  at 
>> java.util.concurrent.Thr

NullPointerException

2016-03-11 Thread Saurabh Guru
I am seeing the following exception in my Spark Cluster every few days in 
production.

2016-03-12 05:30:00,541 - WARN  TaskSetManager - Lost task 0.0 in stage 12528.0 
(TID 18792, ip-1X-1XX-1-1XX.us 
-west-1.compute.internal
): java.lang.NullPointerException
   at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
   at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
   at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
   at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:89)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)


I have debugged in local machine but haven’t been able to pin point the cause 
of the error. Anyone knows why this might occur? Any suggestions? 


Thanks,
Saurabh





Correct way to use spark streaming with apache zeppelin

2016-03-11 Thread trung kien
Hi all,

I've just viewed some Zeppenlin's videos. The intergration between
Zeppenlin and Spark is really amazing and i want to use it for my
application.

In my app, i will have a Spark streaming app to do some basic realtime
aggregation ( intermediate data). Then i want to use Zeppenlin to do some
realtime analytics on the intermediate data.

My question is what's the most efficient storage engine to store realtime
intermediate data? Is parquet file somewhere is suitable?


Re: NullPointerException

2016-03-11 Thread Prabhu Joseph
Looking at ExternalSorter.scala line 192

189
while (records.hasNext) { addElementsRead() kv = records.next()
map.changeValue((getPartition(kv._1), kv._1), update)
maybeSpillCollection(usingMap = true) }

On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru 
wrote:

> I am seeing the following exception in my Spark Cluster every few days in
> production.
>
> 2016-03-12 05:30:00,541 - WARN  TaskSetManager - Lost task 0.0 in stage
> 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us 
> -west-1.compute.internal
> ): java.lang.NullPointerException
>at
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
>at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
>at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>at org.apache.spark.scheduler.Task.run(Task.scala:89)
>at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:745)
>
>
> I have debugged in local machine but haven’t been able to pin point the
> cause of the error. Anyone knows why this might occur? Any suggestions?
>
>
> Thanks,
> Saurabh
>
>
>
>


Re: Correct way to use spark streaming with apache zeppelin

2016-03-11 Thread Mich Talebzadeh
Hi,

I use Zeppelin as well and in the notebook mode you can do analytics much
like what you do in Spark-shell.

You can store your intermediate data in Parquet if you wish and then
analyse data the way you like.

What is your use case here? Zeppelin as I use it is a web UI to your
spark-shell, accessible from anywhere.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 12 March 2016 at 07:13, trung kien  wrote:

> Hi all,
>
> I've just viewed some Zeppenlin's videos. The intergration between
> Zeppenlin and Spark is really amazing and i want to use it for my
> application.
>
> In my app, i will have a Spark streaming app to do some basic realtime
> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
> realtime analytics on the intermediate data.
>
> My question is what's the most efficient storage engine to store realtime
> intermediate data? Is parquet file somewhere is suitable?
>


Re: NullPointerException

2016-03-11 Thread Prabhu Joseph
Looking at ExternalSorter.scala line 192, i suspect some input record has
Null key.

189  while (records.hasNext) {
190addElementsRead()
191kv = records.next()
192map.changeValue((getPartition(kv._1), kv._1), update)



On Sat, Mar 12, 2016 at 12:48 PM, Prabhu Joseph 
wrote:

> Looking at ExternalSorter.scala line 192
>
> 189
> while (records.hasNext) { addElementsRead() kv = records.next()
> map.changeValue((getPartition(kv._1), kv._1), update)
> maybeSpillCollection(usingMap = true) }
>
> On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru 
> wrote:
>
>> I am seeing the following exception in my Spark Cluster every few days in
>> production.
>>
>> 2016-03-12 05:30:00,541 - WARN  TaskSetManager - Lost task 0.0 in stage
>> 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us 
>> -west-1.compute.internal
>> ): java.lang.NullPointerException
>>at
>> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
>>at
>> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
>>at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>>at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>at java.lang.Thread.run(Thread.java:745)
>>
>>
>> I have debugged in local machine but haven’t been able to pin point the
>> cause of the error. Anyone knows why this might occur? Any suggestions?
>>
>>
>> Thanks,
>> Saurabh
>>
>>
>>
>>
>


Re: NullPointerException

2016-03-11 Thread Ted Yu
Which Spark release do you use ?

I wonder if the following may have fixed the problem:
SPARK-8029 Robust shuffle writer

JIRA is down, cannot check now.

On Fri, Mar 11, 2016 at 11:01 PM, Saurabh Guru 
wrote:

> I am seeing the following exception in my Spark Cluster every few days in
> production.
>
> 2016-03-12 05:30:00,541 - WARN  TaskSetManager - Lost task 0.0 in stage
> 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us 
> -west-1.compute.internal
> ): java.lang.NullPointerException
>at
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
>at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
>at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>at org.apache.spark.scheduler.Task.run(Task.scala:89)
>at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:745)
>
>
> I have debugged in local machine but haven’t been able to pin point the
> cause of the error. Anyone knows why this might occur? Any suggestions?
>
>
> Thanks,
> Saurabh
>
>
>
>


Re: NullPointerException

2016-03-11 Thread Saurabh Guru
I am using the following versions:


org.apache.spark
spark-streaming_2.10
1.6.0



org.apache.spark
spark-streaming-kafka_2.10
1.6.0



org.elasticsearch
elasticsearch-spark_2.10
2.2.0


Thanks,
Saurabh

:)



> On 12-Mar-2016, at 12:56 PM, Ted Yu  wrote:
> 
> Which Spark release do you use ?
> 
> I wonder if the following may have fixed the problem:
> SPARK-8029 Robust shuffle writer
> 
> JIRA is down, cannot check now.
> 
> On Fri, Mar 11, 2016 at 11:01 PM, Saurabh Guru  > wrote:
> I am seeing the following exception in my Spark Cluster every few days in 
> production.
> 
> 2016-03-12 05:30:00,541 - WARN  TaskSetManager - Lost task 0.0 in stage 
> 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us 
> -west-1.compute.internal
> ): java.lang.NullPointerException
>at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
>at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
>at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>at org.apache.spark.scheduler.Task.run(Task.scala:89)
>at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:745)
> 
> 
> I have debugged in local machine but haven’t been able to pin point the cause 
> of the error. Anyone knows why this might occur? Any suggestions? 
> 
> 
> Thanks,
> Saurabh
> 
> 
> 
> 



<    1   2