Re: PARSING_ERROR from kryo
Hi Andrew, No I could not figure out the root cause. This seems to be non-deterministic error... I didn't see same error after rerunning same program. But I noticed same error on a different program. First I thought that this may be related to SPARK-2878, but @Graham replied that this looks irrelevant. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/PARSING-ERROR-from-kryo-tp7944p8433.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing
I have both SPARK-2878 and SPARK-2893. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-2878-Kryo-serialisation-with-custom-Kryo-registrator-failing-tp7719p8046.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing
I am running the code with @rxin's patch in standalone mode. In my case I am registering org.apache.spark.graphx.GraphKryoRegistrator . Recently I started to see com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR . Has anyone seen this? Could it be related to this issue? Here it trace: -- vids (org.apache.spark.graphx.impl.VertexAttributeBlock) com.esotericsoftware.kryo.io.Input.fill(Input.java:142) com.esotericsoftware.kryo.io.Input.require(Input.java:169) com.esotericsoftware.kryo.io.Input.readLong_slow(Input.java:710) com.esotericsoftware.kryo.io.Input.readLong(Input.java:665) com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:127) com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:107) com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1054) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.graphx.impl.VertexPartitionBaseOps.innerJoinKeepLeft(VertexPartitionBaseOps.scala:192) org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:78) org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75) org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.graphx.EdgeRDD$$anonfun$mapEdgePartitions$1.apply(EdgeRDD.scala:87) org.apache.spark.graphx.EdgeRDD$$anonfun$mapEdgePartitions$1.apply(EdgeRDD.scala:85) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:202) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) -- -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-2878-Kryo-serialisation-with-custom-Kryo-registrator-failing-tp7719p7989.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Graphx seems to be broken while Creating a large graph(6B nodes in my case)
While creating a graph with 6B nodes and 12B edges, I noticed that *'numVertices' api returns incorrect result*; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Environment: Standalone mode running on EC2 . Using latest code from master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=*2157586441* ; noEdges=2747322705 Graph Returns: numVertices=*-2137380855* ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf (I have also filed this jira ticket: https://issues.apache.org/jira/browse/SPARK-3190) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Graphx-seems-to-be-broken-while-Creating-a-large-graph-6B-nodes-in-my-case-tp7966.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
PARSING_ERROR from kryo
Hi All, I am getting PARSING_ERROR while running my job on the code checked out up to commit# db56f2df1b8027171da1b8d2571d1f2ef1e103b6. I am running this job on EC2. Any idea if there is something wrong with my config? Here is my config: -- .set(spark.executor.extraJavaOptions, -XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps) .set(spark.storage.memoryFraction, 0.2) .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) .set(spark.kryo.registrator, org.apache.spark.graphx.GraphKryoRegistrator) .set(spark.akka.frameSize, 20) .set(spark.akka.timeout, 300) .set(spark.shuffle.memoryFraction, 0.5) .set(spark.core.connection.ack.wait.timeout, 1800) -- -- Job aborted due to stage failure: Task 947 in stage 11.0 failed 4 times, most recent failure: Lost task 947.3 in stage 11.0 (TID 12750, ip-10-167-149-118.ec2.internal): com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2) Serialization trace: vids (org.apache.spark.graphx.impl.VertexAttributeBlock) com.esotericsoftware.kryo.io.Input.fill(Input.java:142) com.esotericsoftware.kryo.io.Input.require(Input.java:169) com.esotericsoftware.kryo.io.Input.readLong_slow(Input.java:719) com.esotericsoftware.kryo.io.Input.readLong(Input.java:665) com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:127) com.esotericsoftware.kryo.serializers.DefaultArraySerializers$LongArraySerializer.read(DefaultArraySerializers.java:107) com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43) com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34) com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:119) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:129) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1038) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.graphx.impl.VertexPartitionBaseOps.innerJoinKeepLeft(VertexPartitionBaseOps.scala:192) org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:78) org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75) org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:57) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:147) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) -- -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/PARSING-ERROR-from-kryo-tp7944.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
How to set Java options -Xmn
Hi, I am trying to set -Xmn to control GC in spark.executor.extraJavaOptions (as recommended by tuning guide), but I am getting error that spark.executor.extraJavaOptions is not allowed to alter memory settings. It seems that extraJavaOptions takes just one number, not list of java options. How can I set -Xmn ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-set-Java-options-Xmn-tp7417.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Spark 1.1-snapshot: java.io.FileNotFoundException from ShuffleMapTask
Quite often I notice that shuffle file is missing thus FileNotFoundException is throws. Any idea why shuffle file will be missing ? Am I running low in memory? (I am using latest code from master branch on yarn-hadoop-2.2) -- java.io.FileNotFoundException: /var/storage/sda3/nm-local/usercache/npanj/appcache/application_1401394632504_0131/spark-local-20140603050956-6728/20/shuffle_0_2_97 (No such file or directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:116) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:177) at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:158) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-1-snapshot-java-io-FileNotFoundException-from-ShuffleMapTask-tp6915.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Spark 1.0: outerJoinVertices seems to return null for vertex attributes when input was partitioned and vertex attribute type is changed
Thanks Ankur. With your fix I see expected results. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-0-outerJoinVertices-seems-to-return-null-for-vertex-attributes-when-input-was-partitioned-and-tp6799p6806.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Spark 1.0: outerJoinVertices seems to return null for vertex attributes when input was partitioned and vertex attribute type is changed
I am seeing something strange with outerJoinVertices(and triangle count that relies on this api): Here is what I am doing: 1) Created a Graph with multiple partitions i.e created a graph with minEdgePartitions(in api GraphLoader.edgeListFile), where minEdgePartitions =1; and use partitionBy(PartitionStrategy.RandomVertexCut) on generated graph. Note: vertex attribute type is Int in this case 2) next I am building neighborhood ids by calling collectNeighborIds i.e. returned vertex attribute type is Array[VertexId] ; VertexRDD[Array[VertexId]] 3) finally join vertex ids from 2 to graph (generated in step 1) via outerJoinVertices 4) Create a subgraph on joined graph from 3 where I only keep the edges with ed.srcAttr != -1 ed.dstAttr != -1 i.e. filter out null attr vertices 5) Finally checked the number edges left in subgraph from step4 I ran this program in a loop where minEdgePartitions is changed in each iteration. When minEdgePartitions == 1 I see correct number of edges. When minEdgePartitions == 2 result is ~1/2 number of edges; when minEdgePartitions == 3 result is ~1/3 number of edges and so on It seems that outerJoinVertices is returning srcAttr(and dstAtt) = nulll for many attributes; and from numbers it seems that it might be returning null for vertices residing on other partitions ? Environment : I am using RC5; and 22 executers. BUT I get correct number of edges in each iteration when I repeated my experiment by keeping the vertex attribute type Int in step 2 (i.e. just kept the number of vertices instead of array of vertices), which is same as the type vertex attribute in graph before join. Is this a know bug fixed recently? or are we supposed to set some flags when updating the vertex attribute type? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-0-outerJoinVertices-seems-to-return-null-for-vertex-attributes-when-input-was-partitioned-and-tp6799.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Spark 1.0: outerJoinVertices seems to return null for vertex attributes when input was partitioned and vertex attribute type is changed
Correction: in step 4) predicate is ed.srcAttr != null ed.dstAttr != null (used -1, when when changed attr type to Int ) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-0-outerJoinVertices-seems-to-return-null-for-vertex-attributes-when-input-was-partitioned-and-tp6799p6800.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Graphx: GraphLoader.edgeListFile with edge weight
Hi, For my project I needed to load a graph with edge weight; for this I have updated GraphLoader.edgeListFile to consider third column in input file. I will like to submit my patch for review so that it can be merged with master branch. What is the process for submitting patches? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Graphx-GraphLoader-edgeListFile-with-edge-weight-tp6762.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.