Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
Reynold, I know very much about the time of reviewers, I put 1+h a day on
the hadoop codebase reviewing stuff, generally trying to review the work of
non-colleagues, so as to pull in the
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14731
Steve I think the main point is you should also respect the time of
reviewers. The way most of your pull requests manifest have been suboptimal:
they often start with a very early WIP (which is not
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
Ok. what is the way? Do I write a formal proposal?
Because right now there is no reliable way to get the full dependency graph
of Spark + hadoop cloud JARs + direct cloud provider
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
@srowen anything else I need to do here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
Is there anything else I need to do here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74990/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #74990 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74990/testReport)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #74990 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74990/testReport)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
Any more comments?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
The Hadoop FS Spec has now been updated to declare exactly what HDFS does
w.r.t timestamps, and warn that what other filesystems and object stores do are
implementation and installation
Github user uncleGen commented on the issue:
https://github.com/apache/spark/pull/14731
@srowen Waiting for your final OK
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73434/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #73434 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73434/testReport)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73433/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #73433 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73433/testReport)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #73434 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73434/testReport)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #73433 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73433/testReport)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
@uncleGen: reviewed this, tweaked the docs slightly but otherwise, there's
nothing left to do that I can see
---
If your project is set up for it, you can reply to this email and have your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71866/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #71866 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71866/testReport)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
@uncleGen I've updated it. Note that
[HADOOP-13946](https://issues.apache.org/jira/browse/HADOOP-13946) tracks the
changes in the Hadoop docs, which writes down what HDFS actually does, then
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #71866 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71866/testReport)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
let me do a quick review & update
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user uncleGen commented on the issue:
https://github.com/apache/spark/pull/14731
@steveloughran Are you still working on this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
Sean, I think I've managed to delete the lines where you were asking about
globs
> Am I right that the net change here is not an optimization but an
expansion of the behavior to
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70819/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #70819 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70819/testReport)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #70819 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70819/testReport)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
@srowen have you got any comments on the last patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66656/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #66656 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66656/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #66656 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66656/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65592/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #65592 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65592/consoleFull)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #65592 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65592/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65498/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #65498 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65498/consoleFull)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #65498 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65498/consoleFull)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
The latest patch pulls out the shortcutting of the globStatus call if
there's no wildcard chars in the path; closer to the original patch
---
If your project is set up for it, you can reply
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64662/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64662 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64662/consoleFull)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64662 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64662/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64534/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64534 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64534/consoleFull)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64534 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64534/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64488/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64488 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64488/consoleFull)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64488 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64488/consoleFull)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64486 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64486/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64486 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64486/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64486/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64368/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64368 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64368/consoleFull)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
Having looked at the source code, `FileSystem.globStatus()` uses the glob
patterns, which are not the same as the posix regexp ones.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64368 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64368/consoleFull)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
The logic has got complex enough it merits unit tests. Pulling into
SparkHadoopUtils itself and writing some for the possible: simple, glob matches
one , glob matches 1+, glob doesn't match,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64296/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64296 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64296/consoleFull)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
1. updated the code to bypass the glob routine when there is no wildcard;
this bypasses something fairly inefficient.
1. reporting FNFE on that base dir differently; skip the stack trace
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64296 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64296/consoleFull)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
I've now done the [s3a streaming
test/example](https://github.com/steveloughran/spark/blob/features/SPARK-7481-cloud/cloud/src/main/scala/org/apache/spark/cloud/s3/examples/S3Streaming.scala)
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
Actually, I've just noticed that DStream behaviour isn't in sync with the
streaming programming guide, which says "files written in nested directories
not supported)". That is: SPARK-14796
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
LGTM. I was trying to see if there was a way to create a good test here by
triggering the takes-too-long codepath and having a counter, but there's no
obvious way to do that
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14731
This is ready to go right @steveloughran ? LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64156 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64156/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64156/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64156 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64156/consoleFull)**
for PR 14731 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64142/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64142 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64142/consoleFull)**
for PR 14731 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14731
Ah right, you already have the modification time for free. Sounds good,
remove the caching.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64142 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64142/consoleFull)**
for PR 14731 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
to be precise: the caching of file modification times is superfluous. It's
there to avoid the cost of executing `getFileStatus()` on previously scanned
files. Once you use the FileStatus
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14731
Why is the caching superfluous -- because no file is evaluated more than
once here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/14731
# I'm going to scan through and tune them elsewhere; really I'm going by
uses of the listFiles calls
There's actually no significant use elsewhere that I can see; just a couple
of
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14731
LGTM. Does this sort of change make sense elsewhere where `PathFilter` is
used? I glanced at the others and it looked like a wash in other cases.
---
If your project is set up for it, you can reply
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14731
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64140/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64140 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64140/consoleFull)**
for PR 14731 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14731
**[Test build #64140 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64140/consoleFull)**
for PR 14731 at commit
95 matches
Mail list logo