[
https://issues.apache.org/jira/browse/BLUR-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478473#comment-13478473
]
Patrick Hunt commented on BLUR-5:
---------------------------------
Small snag on this.
The BlockDirectory/BlockCache is using getFileCacheName to uniquely identify
the particular file for a set of cached blocks. The cache name includes not
only the file name but also the "last modified" date of the file. For
CachedIndexInput this makes sense - the file shouldn't change, but if it did
invalidate the cached data.
On write this is a problem, as we don't know the last modified date and it's
changing on every write.
Given we can rely on HDFS being append only it seems that we don't have to
worry about the written parts of a file changing. Therefore we can use the file
name only as the cache name during write, and on close of the CachedIndexOutput
we can close the HDFS file, get the last modified date, and use that to update
the Cache filename to file id mapping to include the last modified, which will
then be used by the CachedInputIndex.
One concern is that if someone were to start reading the file before it were
closed that might be a problem, however I don't think that case is possible
here, but I'm not sure.
This sound like the right approach?
> Write through caching for the BlockCache
> ----------------------------------------
>
> Key: BLUR-5
> URL: https://issues.apache.org/jira/browse/BLUR-5
> Project: Apache Blur
> Issue Type: Improvement
> Reporter: Aaron McCurry
>
> This will allow for better NRT update performance because the writer will not
> have to read the NRT segments from HDFS.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira