My Spark (1.3.0) job is failing with

com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
required: 1+details

com.esotericsoftware.kryo.KryoException: Buffer overflow. Available:
0, required: 1
        at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
        at com.esotericsoftware.kryo.io.Output.writeByte(Output.java:194)
        at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:599)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:566)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at 
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


-- 
Deepak






This is how am creating SparkContext (only once)

    val conf = new SparkConf()
        .setAppName(detail)
        .set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")

.set("spark.kryoserializer.buffer.mb",arguments.get("buffersize").get)

.registerKryoClasses(Array(classOf[com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum]))
    val sc = new SparkContext(conf)


Command:

    ./bin/spark-submit -v --master yarn-cluster --driver-class-path
/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs/hadoop-hdfs-2.4.1-EBAY-2.jar
--jars
/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs/hadoop-hdfs-2.4.1-EBAY-2.jar,/home/dvasthimal/spark1.3/spark_reporting_dep_only-1.0-SNAPSHOT.jar
 --num-executors 100 --driver-memory 12g --driver-java-options
"-XX:MaxPermSize=6G" --executor-memory 12g --executor-cores 1 --queue
hdmi-express --class com.ebay.ep.poc.spark.reporting.SparkApp
/home/dvasthimal/spark1.3/spark_reporting-1.0-SNAPSHOT.jar
startDate=2015-04-6 endDate=2015-04-7
input=/user/dvasthimal/epdatasets_small/exptsession subcommand=viewItem
output=/user/dvasthimal/epdatasets/viewItem* buffersize=200*
Spark assembly has been built with Hive, including Datanucleus jars on
classpath

buffer size is 200.

1. What is this buffer ?
2. What should be the value of this buffer ?
3. My Spark Job has many stages, does the above value need to be different
for each stage ?


Please clarify

Regards.
Deepak

Reply via email to