cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-729033449
@HeartSaVioR @maropu
Unfortunately, I don't have time to work further on this right now. If one
of you two would like to pick up remaining work in a subsequent PR, please
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-725771277
> The problem is feedback cycle, not whether @cchighman is busy or not. We
are requiring contributors for multiple months to keep on focus, whereas
reviewers don't promise any
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-725767515
@maropu
Thank you for your feedback. I will finish the merge pieces this evening.
If one of you two would like to pick up any remaining effort, please feel free
to do so
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-725766878
My ti
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-720961219
> Sigh. Now it has conflicts so I can't go with giving +1 and merging.
>
> @cchighman Could you please fix the conflicts? It would be totally OK if
you'd like to let me
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-708213623
> ping @cchighman
Thank you for checking in. Working on this further tonight.
This is an automated mes
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-704426096
> @cchighman Gentle ping in case notice wasn't sent for my review comments.
I just noticed this. Thank you. I will resolve this evening.
-
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-697913023
@gengliangwang @cloud-fan @zsxwing @maropu @HyukjinKwon @HeartSaVioR
@Dooyoung-Hwang
Gentle ping. I summarized key feedback above from the review on this PR to
help organ
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695165544
> BTW, welcome to the Apache Spark community and thank you for your first
contribution, @cchighman .
@Dooyoung-Hwang
Thank you very much! Hopefully after compiling
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695164039
> I roughly checked the code and I think you need to brush up more by
referring the other code and the style doc. Could you? The proposed idea itself
looks good and I think it
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695161492
> Shall we use `modificationTimeFilter`? the word `date` is not accurate
since it can be referred to date without any hour/minute/seconds
Semantics for the new options w
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695161242
> @cchighman
> Thanks for understanding. I still think even in SS case we want to provide
timestamp to effectively filter out files - the point is whether the timestamp
is
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695160233
> These timestamps should be inclusive or exclusive? e.g., `(modifiedAfter,
modifiedBefore)`, `[modifiedAfter, modifiedBefore]`... Looks like the current
code follows `(modifi
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695159898
> Should we specify the timezone here? What should the default timezone be?
> @cloud-fan @MaxGekk
@gengliangwang's comment here drew quick attention to the fact timezo
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695158666
> > I haven't looked into the detail yet, but Hadoop FS and a PathFilter
API, does Spark file source support the path filter option?
>
> Hadoop's PathFilter only gives t
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695158320
> > The need arises since we only have one filtering strategy where
pathGlobFilter is used in PartitioningAwareFileIndex and handling cleanly if we
now have more than one appr
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695158061
> `f` here is `FileStatus`. It looks inefficient to get the `FileStatus`
again in
https://github.com/apache/spark/pull/28841/files#diff-3c816e65d0bbdf16d46c7ea4e0b8cbaeR55
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695157152
> @cchighman I like the idea of such filtering.
> How about we follow the approach of the option `pathGlobFilter`:
https://github.com/apache/spark/pull/24518/files#diff-e458
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-695155611
@gengliangwang @cloud-fan @maropu @Dooyoung-Hwang @HeartSaVioR @HyukjinKwon
Following up on this PR for next steps. We've seen a strong amount of
constructive feedback dur
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-692617087
@HeartSaVioR @maropu @Dooyoung-Hwang @cloud-fan @gengliangwang
Following up as last feedback was 15 days ago, anything else remaining for
this one?
--
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-690007273
@maropu @HeartSaVioR @gengliangwang @cloud-fan @Dooyoung-Hwang
Gentle ping. Thank you for your time!
This
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-687671088
Are we looking good ?
This is an automated message from the Apache Git Service.
To respond to the message, ple
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-683336845
@maropu @HeartSaVioR @cloud-fan @gengliangwang @dongjoon-hyun
Gentle ping to confirm whether all looks good. Thanks!
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-681769930
@maropu
I took some time this evening to add the additional tests and refactoring
you requested.
@HeartSaVioR
Gentle ping for any further consideration as well.
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-679991891
@maropu All requested improvements have been completed. Thank you for the
opportunity to follow and learn from your analytical process.
-
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-678908616
I intend to update the PR based on comments, I'll try to swing around to it
this evening.
This is an automate
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673165268
@HeartSaVioR @maropu @gengliangwang @cloud-fan
There was some great feedback provided by maropu that's now completed with
all tests passing.
At this point, nothing
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-672566232
@maropu
Everything should be updated as requested. I appreciate your thorough
review and believe we arrived at a better result in the process.
-
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-672520195
@maropu
You will be very pleased to hear that I have discovered the maven task for
scalafmt!
This is an a
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-671990105
> Ah, one more; could you add fine-grained tests for the path filter
implementations? Probably, you can create a dedicated test suite like
`test/scala/org/apache/spark/sql/exe
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-668096119
> Anyway, thanks for the valable work, @cchighman !
I appreciate all your feedback and will address each comment.
--
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-667501680
@HeartSaVioR @gengliangwang @cloud-fan
Gentle ping to see if everything looks good. I feel like the final design
after incorporating feedback worked out really well here.
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-85340
There are quite a bit of commits here. Would it be preferred if I re-opened
this PR for a cleaner merge to master?
--
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-666133472
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664699010
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664140348
sparkr passed this time too.
This is an automated message from the Apache Git Service.
To respond to the mes
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664140210
Hmm...doubt this is related. Doesn't happen locally either.
```
==
FAIL: test_train_prediction
cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664115981
> Test failure looks related.
I'm a little baffled on the failures from SparkR for SparkSQL Arrow
optimization failing. Hoping they clear up with the sql tests resolvin
38 matches
Mail list logo