LIFULONG created SPARK-24928: -------------------------------- Summary: spark sql cross join running time too long Key: SPARK-24928 URL: https://issues.apache.org/jira/browse/SPARK-24928 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 1.6.2 Reporter: LIFULONG
spark sql running time is too long while input left table and right table is small text format data, the sql is: select * from t1 cross join t2 the line of t1 is 499999, three column the line of t2 is 1, one column only running more than 30mins and then failed spark CartesianRDD also has the same problem, example test code is: val ones = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b") //1 line 1 column val twos = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b") //499999 line 3 column val cartesian = new CartesianRDD(sc, twos, ones) cartesian.count() running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use less than 10 seconds -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org