[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-12-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 cc@wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-12-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 do we need to handle this scenario? do we have any PR for handling this issue? --- - To unsubscribe, e-mail:

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-08 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 I think this issue shall not be in improvement category, it shall be Critical Bug which is affecting the normal join query performances. Hope we address this issue. "Insert query flow

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 Thanks for the comment Sean , there are certain areas which i found inconsistencies, if i get some inputs from experts i think i can update the PR , if we are planning to tackle this

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22758 I don't know this code well enough to review. I think there is skepticism from people who know this code whether this is change is correct and beneficial. If there's doubt, I think it should be

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 cc @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-11-01 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan @HyukjinKwon @wangyum Any suggestions on this issue , because of this defect we are facing some performance issues in our customer environment. Requesting you all to please have a

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-28 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan @HyukjinKwon @srowen As result of my above observations a) I am having some doubt like if we are expecting the stats shall estimate the data size with files then why in the

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-22 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan Shall i update this PR based on the second approach, will that be fine?I tested with the second approach also and the usecases are working fine which is mentioned in this JIRA .

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 > Inorder to make this flow consistent either > a) we need to record HiveStats for insert command flow and always consider this stats while compting > OR > b) As mentioned above in

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 Inorder to make this flow consistent either a) we need to record HiveStats for insert command flow and always consider this stats while compting OR b) As mentioned above in

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 > I think the cost of get the stats from `HadoopFileSystem` may be quite high. Then we shall depend on HiveStats always to get the statistics, which is happening now also but

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread wangyum
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22758 I think the cost of get the stats from `HadoopFileSystem` may be quite high. --- - To unsubscribe, e-mail:

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan I can think as one solution, that In DetermineStats flow we can add one more condition to not update the stats for convertable relations, since we always get the stats from

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-18 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @cloud-fan Please find my understanding of the flow as mentioned below, its bit tricky :) Lets elaborate this flow might be we get more suggestions. Step 1 : insert command

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-17 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/22758 @srowen @cloud-fan @HyukjinKwon @felixcheung. @wangyum i think this PR shall also solves the problem mentioned in SPARK-25403. Please review and provide me any suggestions. Thanks all

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22758 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22758 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22758 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional