[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78390/ Test PASSed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #78390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78390/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #78390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78390/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @jiangxb1987 Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/16677 @viirya Could you please bring this up to date? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @watermen Are you sure `create table t1 using parquet select * from src limit 1000;` will invoke `CollectLimitExec.doExecute`? Seems that the writing is delegated to `FileFormatWriter`

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77079/ Test PASSed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #77079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77079/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #77079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77079/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77054/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #77054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77054/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #77054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77054/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76809/ Test PASSed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #76809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76809/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #76809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76809/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-10 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16677 @hvanhovell @cloud-fan We have seen value of this PR in our customer scenarios, and that's why we started a discussion in dev list before. And thank @viirya to discuss with us and implement it.

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-02 Thread watermen
Github user watermen commented on the issue: https://github.com/apache/spark/pull/16677 When we create a DataSource table like below ```sql create table t1 using parquet select * from src limit 1000; ``` It will call `CollectLimitExec.doExecute`, it also use

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75699/ Test PASSed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #75699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75699/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #75699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75699/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75694/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #75694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75694/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75688/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #75688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75688/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #75688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75688/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @hvanhovell Thanks for replying! Because this doesn't get feedback from committers for a long time, so I have the feeling. It is good to know you will take a look in the future. Take your time.

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16677 @viirya there is interest. I am just lacking the time, I will try to take a pass at some point in the next few weeks. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 Although this has good improvement on `Limit` operation, looks like there is no much interests from the committers... --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74086/ Test PASSed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #74086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74086/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74082/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #74086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74086/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #74082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74082/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73662/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #73662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73662/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #73662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73662/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73655/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #73655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73655/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #73655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73655/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #73518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73518/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73518/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #73518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73518/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 ping @cloud-fan @rxin @hvanhovell Can you provide some feedback for this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-15 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 ping @cloud-fan Any suggestions on this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @waterman But we need to know the number of output in each partition. `resultSize` is actually updated at driver with the size of data transmitted back to the driver. It is doable if we

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-08 Thread watermen
Github user watermen commented on the issue: https://github.com/apache/spark/pull/16677 @viirya We'd better don't modify the API and in `TaskMetrics` already has `resultSize`, we can add `resultNum` like it. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @watermen Thanks for the review. What is the advantage of adding it in `TaskMetrics` instead of `MapStatus`? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-08 Thread watermen
Github user watermen commented on the issue: https://github.com/apache/spark/pull/16677 @viirya LGTM, but can we add the statistics of the number of rows in `TaskMetrics` insteads of `MapStatus`? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @sujith71955 Thanks for the test! The test number looks promising! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-06 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/16677 @viirya i tested with the above mentioned approach with sample data, it has improved the performance almost into 3X Please find the test report Total No of Executers = 3 Total

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72373/ Test PASSed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #72373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72373/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #72373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72373/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @cloud-fan Compared with the sizes of each block, we only send back an Int for each partition. I think the overhead should be very low. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71977/ Test PASSed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71977/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71977/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71973/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71972/ Test FAILed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71973/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71972/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71955/ Test PASSed. ---

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71955/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71955/testReport)** for PR 16677 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 also cc @cloud-fan and @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled