[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Reynold, I know very much about the time of reviewers, I put 1+h a day on the hadoop codebase reviewing stuff, generally trying to review the work of non-colleagues, so as to pull in the

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14731 Steve I think the main point is you should also respect the time of reviewers. The way most of your pull requests manifest have been suboptimal: they often start with a very early WIP (which is not

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Ok. what is the way? Do I write a formal proposal? Because right now there is no reliable way to get the full dependency graph of Spark + hadoop cloud JARs + direct cloud provider

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @srowen anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Is there anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74990/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #74990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74990/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #74990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74990/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The Hadoop FS Spec has now been updated to declare exactly what HDFS does w.r.t timestamps, and warn that what other filesystems and object stores do are implementation and installation

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-02 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/14731 @srowen Waiting for your final OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73434/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #73434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73434/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73433/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #73433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73433/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #73434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73434/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #73433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73433/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-02-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @uncleGen: reviewed this, tweaked the docs slightly but otherwise, there's nothing left to do that I can see --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71866/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #71866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71866/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @uncleGen I've updated it. Note that [HADOOP-13946](https://issues.apache.org/jira/browse/HADOOP-13946) tracks the changes in the Hadoop docs, which writes down what HDFS actually does, then

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #71866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71866/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 let me do a quick review & update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-21 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/14731 @steveloughran Are you still working on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Sean, I think I've managed to delete the lines where you were asking about globs > Am I right that the net change here is not an optimization but an expansion of the behavior to

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70819/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #70819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70819/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-01-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #70819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70819/testReport)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-10-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @srowen have you got any comments on the last patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66656/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #66656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66656/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #66656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66656/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65592/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65592/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65592/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65498/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65498/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65498/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The latest patch pulls out the shortcutting of the globStatus call if there's no wildcard chars in the path; closer to the original patch --- If your project is set up for it, you can reply

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64662/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64662/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64662/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64534/ Test FAILed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64534/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64534/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64488/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64488/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64488/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64486/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64486/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64486/ Test FAILed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64368/ Test FAILed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64368/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Having looked at the source code, `FileSystem.globStatus()` uses the glob patterns, which are not the same as the posix regexp ones.

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64368/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The logic has got complex enough it merits unit tests. Pulling into SparkHadoopUtils itself and writing some for the possible: simple, glob matches one , glob matches 1+, glob doesn't match,

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64296/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64296/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 1. updated the code to bypass the glob routine when there is no wildcard; this bypasses something fairly inefficient. 1. reporting FNFE on that base dir differently; skip the stack trace

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64296/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 I've now done the [s3a streaming test/example](https://github.com/steveloughran/spark/blob/features/SPARK-7481-cloud/cloud/src/main/scala/org/apache/spark/cloud/s3/examples/S3Streaming.scala)

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Actually, I've just noticed that DStream behaviour isn't in sync with the streaming programming guide, which says "files written in nested directories not supported)". That is: SPARK-14796

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 LGTM. I was trying to see if there was a way to create a good test here by triggering the takes-too-long codepath and having a counter, but there's no obvious way to do that

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14731 This is ready to go right @steveloughran ? LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64156/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64156/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64156/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64142/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64142/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14731 Ah right, you already have the modification time for free. Sounds good, remove the caching. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64142/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 to be precise: the caching of file modification times is superfluous. It's there to avoid the cost of executing `getFileStatus()` on previously scanned files. Once you use the FileStatus

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14731 Why is the caching superfluous -- because no file is evaluated more than once here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 # I'm going to scan through and tune them elsewhere; really I'm going by uses of the listFiles calls There's actually no significant use elsewhere that I can see; just a couple of

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14731 LGTM. Does this sort of change make sense elsewhere where `PathFilter` is used? I glanced at the others and it looked like a wash in other cases. --- If your project is set up for it, you can reply

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64140/ Test PASSed. ---

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64140/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64140/consoleFull)** for PR 14731 at commit