[jira] [Updated] (SPARK-24928) spark sql cross join running time too long

LIFULONG (JIRA) Thu, 26 Jul 2018 01:39:46 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


LIFULONG updated SPARK-24928:
-----------------------------
    Description: 
spark sql running time is too long while input left table and right table is 
small hdfs text format data,

the sql is:  select * from t1 cross join t2  

the line of t1 is 499999, three column

the line of t2 is 1, one column only

running more than 30mins and then failed

 

 

spark CartesianRDD also has the same problem, example test code is:

val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b")  //1 line 1 
column
 val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b")  //499999 
line 3 column
 val cartesian = new CartesianRDD(sc, twos, ones)

cartesian.count()

running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
less than 10 seconds

  was:
spark sql running time is too long while input left table and right table is 
small text format data,

the sql is:  select * from t1 cross join t2  

the line of t1 is 499999, three column

the line of t2 is 1, one column only

running more than 30mins and then failed

 

 

spark CartesianRDD also has the same problem, example test code is:

val ones = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b") 
 //1 line 1 column
 val twos = 
sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b")  //499999 
line 3 column
val cartesian = new CartesianRDD(sc, twos, ones)

cartesian.count()

running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
less than 10 seconds


> spark sql cross join running time too long
> ------------------------------------------
>
>                 Key: SPARK-24928
>                 URL: https://issues.apache.org/jira/browse/SPARK-24928
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 1.6.2
>            Reporter: LIFULONG
>            Priority: Minor
>
> spark sql running time is too long while input left table and right table is 
> small hdfs text format data,
> the sql is:  select * from t1 cross join t2  
> the line of t1 is 499999, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>  
>  
> spark CartesianRDD also has the same problem, example test code is:
> val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b")  //1 line 
> 1 column
>  val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b")  //499999 
> line 3 column
>  val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use 
> less than 10 seconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24928) spark sql cross join running time too long

Reply via email to