[ https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258627#comment-17258627 ]
Rob Russo commented on SPARK-16087: ----------------------------------- I know this ticket is old now but spark 3 seems to have resurfaced the issues. I had a suite of tests that worked fine in spark 2.x but spent more than a month intermittently debugging why a number of my tests hung only on spark 3. As [~kevinconaway] said in his comment, it may be 1 refactor away from resurfacing and it seems that might be what happened. For anyone running into this issue, here is my resolution that i finally discovered from this ticket: Based on [~kevinconaway]'s comment saying that setting _spark.driver.host=localhost_ forces the problem, I found that setting _spark.driver.host=127.0.0.1_ completely fixes the problem. Hopefully this helps for anyone else who is running into this. Due to this issue popping up i'm going to reopen the ticket and mark spark 3 as an affected version. > Spark Hangs When Using Union With Persisted Hadoop RDD > ------------------------------------------------------ > > Key: SPARK-16087 > URL: https://issues.apache.org/jira/browse/SPARK-16087 > Project: Spark > Issue Type: Bug > Affects Versions: 1.4.1, 1.6.1, 2.0.1 > Reporter: Kevin Conaway > Priority: Critical > Labels: bulk-closed > Attachments: SPARK-16087.dump.log, SPARK-16087.log, Screen Shot > 2016-06-21 at 4.27.26 PM.png, Screen Shot 2016-06-21 at 4.27.35 PM.png, > part-00000, part-00001, spark-16087.tar.gz > > > Spark hangs when materializing a persisted RDD that was built from a Hadoop > sequence file and then union-ed with a similar RDD. > Below is a small file that exhibits the issue: > {code:java} > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.spark.SparkConf; > import org.apache.spark.api.java.JavaPairRDD; > import org.apache.spark.api.java.JavaSparkContext; > import org.apache.spark.api.java.function.PairFunction; > import org.apache.spark.serializer.KryoSerializer; > import org.apache.spark.storage.StorageLevel; > import scala.Tuple2; > public class SparkBug { > public static void main(String [] args) throws Exception { > JavaSparkContext sc = new JavaSparkContext( > new SparkConf() > .set("spark.serializer", KryoSerializer.class.getName()) > .set("spark.master", "local[*]") > .setAppName(SparkBug.class.getName()) > ); > JavaPairRDD<LongWritable, BytesWritable> rdd1 = sc.sequenceFile( > "hdfs://localhost:9000/part-00000", > LongWritable.class, > BytesWritable.class > ).mapToPair(new PairFunction<Tuple2<LongWritable, BytesWritable>, > LongWritable, BytesWritable>() { > @Override > public Tuple2<LongWritable, BytesWritable> > call(Tuple2<LongWritable, BytesWritable> tuple) throws Exception { > return new Tuple2<>( > new LongWritable(tuple._1.get()), > new BytesWritable(tuple._2.copyBytes()) > ); > } > }).persist( > StorageLevel.MEMORY_ONLY() > ); > System.out.println("Before union: " + rdd1.count()); > JavaPairRDD<LongWritable, BytesWritable> rdd2 = sc.sequenceFile( > "hdfs://localhost:9000/part-00001", > LongWritable.class, > BytesWritable.class > ); > JavaPairRDD<LongWritable, BytesWritable> joined = rdd1.union(rdd2); > System.out.println("After union: " + joined.count()); > } > } > {code} > You'll need to upload the attached part-00000 and part-00001 to a local hdfs > instance (I'm just using a dummy [Single Node > Cluster|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html] > locally). > Some things to note: > - It does not hang if rdd1 is not persisted > - It does not hang is rdd1 is not materialized (via calling rdd1.count()) > before the union-ed RDD is materialized > - It does not hang if the mapToPair() transformation is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org