[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-05 Thread ScrapCodes
Github user ScrapCodes commented on the issue: https://github.com/apache/spark/pull/22339 Thank you @srowen and @steveloughran. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-04 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22339 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96887/ Test PASSed. ---

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96887/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96885/ Test PASSed. ---

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96885/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96887/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96886/ Test FAILed. ---

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96886/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96886/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96885/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96843/ Test FAILed. ---

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96843/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96843/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22339 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-28 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22339 no, no cost penalties. Slightly lower namenode load too. If you had many, many spark streaming clients scanning directories, HDFS ops teams would eventually get upset. This will postpone the

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-28 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22339 Yeah I agree, I was saying I do think it will speed things up. If it's a non-trivial win it's worthwhile even if it isn't the last optimization here. Is there any downside to this? ---

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-28 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22339 Why the speedups? Comes from that glob filter calling getFileStatus() on every entry, which is is 1-3 HTTP requests and a few hundred millis per call, when instead that can be handled later.

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-28 Thread ScrapCodes
Github user ScrapCodes commented on the issue: https://github.com/apache/spark/pull/22339 Hi @srowen, would you like to take a look? Is there anything I can do, if this patch is missing something? I have tested it thoroughly against an object store. ---

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-26 Thread ScrapCodes
Github user ScrapCodes commented on the issue: https://github.com/apache/spark/pull/22339 For numbers, while testing with object store having 50 files/dirs, without this patch it took 130 REST requests for 2 batches to complete and with this patch it took 56 rest requests. So number

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96047/ Test PASSed. ---

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96047 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96047/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #96047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96047/testReport)** for PR 22339 at commit

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22339: [SPARK-17159][STREAM] Significant speed up for running s...

2018-09-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22339 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: