[jira] [Created] (SPARK-16827) Query with Join produces excessive shuffle data

Sital Kedia (JIRA) Sun, 31 Jul 2016 17:25:23 -0700

Sital Kedia created SPARK-16827:
-----------------------------------

             Summary: Query with Join produces excessive shuffle data
                 Key: SPARK-16827
                 URL: https://issues.apache.org/jira/browse/SPARK-16827
             Project: Spark
          Issue Type: Bug
          Components: Shuffle, Spark Core
    Affects Versions: 2.0.0
            Reporter: Sital Kedia



One of our hive job which looks like this -

 SELECT  userid
     FROM  table1 a
     JOIN table2 b
      ON    a.ds = '2016-07-15'
      AND  b.ds = '2016-07-15'
      AND  a.source_id = b.id

After upgrade to Spark 2.0 the job is significantly slow.  Digging a little 
into it, we found out that one of the stages produces excessive amount of 
shuffle data.  Please note that this is a regression from Spark 1.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16827) Query with Join produces excessive shuffle data

Reply via email to