Can you reproduce it on master?

I can't reproduce it with the following code:

>>> t2 = sqlContext.range(50).selectExpr("concat('A', id) as id")
>>> t1 = sqlContext.range(10).selectExpr("concat('A', id) as id")
>>> t1.join(t2).where(t1.id == t2.id).explain()
ShuffledHashJoin [id#21], [id#19], BuildRight
 TungstenExchange hashpartitioning(id#21,200)
  TungstenProject [concat(A,cast(id#20L as string)) AS id#21]
   Scan PhysicalRDD[id#20L]
 TungstenExchange hashpartitioning(id#19,200)
  TungstenProject [concat(A,cast(id#18L as string)) AS id#19]
   Scan PhysicalRDD[id#18L]

>>> t1.join(t2).where(t1.id == t2.id).count()
10


On Mon, Oct 19, 2015 at 2:59 AM, gsvic <victora...@gmail.com> wrote:
> Hi Hao,
>
> Each table is created with the following python code snippet:
>
> data = [{'id': 'A%d'%i, 'value':ceil(random()*10)} for i in range(0,50)]
> with open('A.json', 'w+') as output:
>     json.dump(data, output)
>
> The tables A and B containing 10 and 50 tuples respectively.
>
> In spark shell I type
>
> sqlContext.setConf("spark.sql.planner.sortMergeJoin", "false") to disable
> sortMergeJoin and
> sqlContext.setConf("spark.sql.autoBroadcastJoinThreshold", "0") to disable
> BroadcastHashJoin, cause the tables are too small and this join will be
> selected.
>
> Finally I run the following query:
> t1.join(t2).where(t1("id").equalTo(t2("id"))).count
>
> and the result I get equals to zero, while ShuffledHashJoin and
> SortMergeJoin returns the right result (10).
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/ShuffledHashJoin-Possible-Issue-tp14672p14682.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to