Re: SPARK -SQL Understanding BroadcastNestedLoopJoin and number of partitions

2016-12-21 Thread David Hodeffi
Do you know who can I talk to about this code? I am rally curious to know why there is a join and why number of partition for join is the sum of both of them, I expected to see that number of partitions should be the same as the streamed table ,or worst case multiplied. Sent from my iPhone

SPARK -SQL Understanding BroadcastNestedLoopJoin and number of partitions

2016-12-21 Thread David Hodeffi
I have two dataframes which I am joining. small and big size dataframess. The optimizer suggest to use BroadcastNestedLoopJoin. number of partitions for the big Dataframe is 200 while small Dataframe has 5 partitions. The joined dataframe results with 205 partitions