[ 
https://issues.apache.org/jira/browse/BLUR-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478473#comment-13478473
 ] 

Patrick Hunt commented on BLUR-5:
---------------------------------

Small snag on this. 

The BlockDirectory/BlockCache is using getFileCacheName to uniquely identify 
the particular file for a set of cached blocks. The cache name includes not 
only the file name but also the "last modified" date of the file. For 
CachedIndexInput this makes sense - the file shouldn't change, but if it did 
invalidate the cached data.

On write this is a problem, as we don't know the last modified date and it's 
changing on every write.

Given we can rely on HDFS being append only it seems that we don't have to 
worry about the written parts of a file changing. Therefore we can use the file 
name only as the cache name during write, and on close of the CachedIndexOutput 
we can close the HDFS file, get the last modified date, and use that to update 
the Cache filename to file id mapping to include the last modified, which will 
then be used by the CachedInputIndex.

One concern is that if someone were to start reading the file before it were 
closed that might be a problem, however I don't think that case is possible 
here, but I'm not sure.

This sound like the right approach? 
                
> Write through caching for the BlockCache
> ----------------------------------------
>
>                 Key: BLUR-5
>                 URL: https://issues.apache.org/jira/browse/BLUR-5
>             Project: Apache Blur
>          Issue Type: Improvement
>            Reporter: Aaron McCurry
>
> This will allow for better NRT update performance because the writer will not 
> have to read the NRT segments from HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to