[
https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
LIFULONG updated SPARK-24928:
-
Description:
spark sql running time is too long while input left table and right table is
small hdfs text format data,
the sql is: select * from t1 cross join t2
the line of t1 is 49, three column
the line of t2 is 1, one column only
running more than 30mins and then failed
spark CartesianRDD also has the same problem, example test code is:
val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b") //1 line 1
column
val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b") //49
line 3 column
val cartesian = new CartesianRDD(sc, twos, ones)
cartesian.count()
running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use
less than 10 seconds
was:
spark sql running time is too long while input left table and right table is
small text format data,
the sql is: select * from t1 cross join t2
the line of t1 is 49, three column
the line of t2 is 1, one column only
running more than 30mins and then failed
spark CartesianRDD also has the same problem, example test code is:
val ones = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b")
//1 line 1 column
val twos =
sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b") //49
line 3 column
val cartesian = new CartesianRDD(sc, twos, ones)
cartesian.count()
running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use
less than 10 seconds
> spark sql cross join running time too long
> --
>
> Key: SPARK-24928
> URL: https://issues.apache.org/jira/browse/SPARK-24928
> Project: Spark
> Issue Type: Bug
> Components: Optimizer
>Affects Versions: 1.6.2
>Reporter: LIFULONG
>Priority: Minor
>
> spark sql running time is too long while input left table and right table is
> small hdfs text format data,
> the sql is: select * from t1 cross join t2
> the line of t1 is 49, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>
>
> spark CartesianRDD also has the same problem, example test code is:
> val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b") //1 line
> 1 column
> val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b") //49
> line 3 column
> val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use
> less than 10 seconds
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org