Antonio Piccolboni created SPARK-13169:
------------------------------------------

             Summary: CROSS JOIN slow or fails on tiny table
                 Key: SPARK-13169
                 URL: https://issues.apache.org/jira/browse/SPARK-13169
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.0
            Reporter: Antonio Piccolboni


I am running a cross join with a distinct select on both sides. Table is tiny 
(32 X 16). Running query through the thriftserver. Data is here 
(https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv). Query 
never terminates before 200s, mostly fails (TTransportException) while all 
cores are being used (single machine).
Query is

{code}
SELECT `gwtlyrpywf`.`gear`,`gwtlyrpywf`.`cyl`,`vs` FROM (
           SELECT DISTINCT * FROM (
             SELECT `gear` AS `gear`, `cyl` AS `cyl`FROM `mtcars`) 
              AS `zzz1`)
         AS `gwtlyrpywf`
       CROSS JOIN (
           SELECT DISTINCT * FROM (
             SELECT `vs` AS `vs` FROM `mtcars`)
           AS `zzz3`)
       AS `arytvfispy`
{code}

I know it can be simplified, but it comes from a generator and the generator 
counts on the optimizer to do the right thing. EXPLAIN shows the following

{code}

plan
1                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                     == Physical Plan ==
2                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
          Project [gear#21,cyl#22,vs#23]
3                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                     +- CartesianProduct
4                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                        :- ConvertToSafe
5                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                    :  +- 
TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22])
6                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                    :     +- TungstenExchange 
hashpartitioning(gear#21,cyl#22,200), None
7                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                              :        +- 
TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22])
8                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                             :           +- 
Project [gear#17 AS gear#21,cyl#9 AS cyl#22]
9     :              +- Scan 
CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true),
 StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), 
StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), 
StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), 
StructField(vs,DoubleType,true), StructField(am,DoubleType,true), 
StructField(gear,DoubleType,true), 
StructField(carb,DoubleType,true)),true,null)[gear#17,cyl#9] 
10                                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                        +- ConvertToSafe
11                                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                         +- 
TungstenAggregate(key=[vs#23], functions=[], output=[vs#23])
12                                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                   +- 
TungstenExchange hashpartitioning(vs#23,200), None
13                                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                         +- 
TungstenAggregate(key=[vs#23], functions=[], output=[vs#23])
14                                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
             +- Project [vs#15 AS vs#23]
15                            +- Scan 
CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true),
 StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), 
StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), 
StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), 
StructField(vs,DoubleType,true), StructField(am,DoubleType,true), 
StructField(gear,DoubleType,true), 
StructField(carb,DoubleType,true)),true,null)[vs#15]
{code}

Thanks




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to