[
https://issues.apache.org/jira/browse/CRUNCH-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772440#comment-16772440
]
Andrew Olson commented on CRUNCH-678:
-------------------------------------
Here's a pull request: https://github.com/apache/crunch/pull/18
> Avoid unnecessary retrieval of last modified time
> -------------------------------------------------
>
> Key: CRUNCH-678
> URL: https://issues.apache.org/jira/browse/CRUNCH-678
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Reporter: Andrew Olson
> Assignee: Josh Wills
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> There is no assurance that the last modified time can be retrieved
> efficiently for all file systems. In particular, with object stores and large
> data sets it could be very slow. Since this information is actually not
> always needed, we should only retrieve it when necessary (i.e. when the write
> mode is checkpoint) for sources and targets.
> CRUNCH-658 expressed similar concerns for the getSize method. This would be a
> simpler and safer optimization to make.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)