[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Michael McCandless (JIRA) Thu, 06 Sep 2012 11:47:11 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449931#comment-13449931
 ]


Michael McCandless commented on LUCENE-4123:
--------------------------------------------

bq. I am not sure if we really need that directory. With my changes in 
LUCENE-3659 we can handle that easily (also for files > 2 GiB). LUCENE-3659 
makes the buf size of RAMDir configureable (depending on IOContext while 
writing) and when you do new RAMDirectory(otherDir) - to cache the whole dir in 
RAM - it will use the maximum possible buffer size for the underlying file (2 
GiB) - as we dont write and need no smaller buf size.

Actually I think the two dirs have different use cases.

So I think we should do both: 1) fix RAMDir to do better buffering
(LUCENE-3659) and 2) add this new dir.

RAMDir is good for pure in-memory indices (for testing, or transient
usage, etc.) or for pulling in a read-only index from disk, while
CachingRAMDir (I think we should rename it to CachingDirWrapper) is
good if you want to write to the index but also want persistence,
since all writes go straight to the wrapped directory.

I don't think the limitations of this dir (max 2.1 GB file size) need
to block committing ... the javadocs call this out, and we can improve
it later.  It could be wrapping the byte[] in ByteBuffer and using
ByteBufferII doesn't lose any perf: that would be great. But we can
explore that after committing.

But definitely +1 to get LUCENE-3659 in...

                
> Add CachingRAMDirectory
> -----------------------
>
>                 Key: LUCENE-4123
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4123
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, 
> LUCENE-4123.patch
>
>
> The directory is very simple and useful if you have an index that you
> know fully fits into available RAM.  You could also use FileSwitchDir if
> you want to leave some files (eg stored fields or term vectors) on disk.
> It wraps any other Directory and delegates all writing (IndexOutput) to
> it, but for reading (IndexInput), it allocates a single byte[] and fully
> reads the file in and then serves requests off that single byte[].  It's
> more GC friendly than RAMDir since it only allocates a single array per
> file.
> It has a few nocommits still, but all tests pass if I wrap the delegate
> inside MockDirectoryWrapper using this.
> I tested with 1M Wikipedia english index (would like to test w/ 10M docs
> but I don't have enough RAM...); it seems to give a nice speedup:
> {noformat}
>                 Task    QPS base StdDev base  QPS cachedStdDev cached      
> Pct diff
>              Respell      197.00        7.27      203.19        8.17   -4% -  
>  11%
>             PKLookup      121.12        2.80      125.46        3.20   -1% -  
>   8%
>               Fuzzy2       66.62        2.62       69.91        2.85   -3% -  
>  13%
>               Fuzzy1      206.20        6.47      222.21        6.52    1% -  
>  14%
>        TermGroup100K      160.14        6.62      175.71        3.79    3% -  
>  16%
>               Phrase       34.85        0.40       38.75        0.61    8% -  
>  14%
>       TermBGroup100K      363.75       15.74      406.98       13.23    3% -  
>  20%
>             SpanNear       53.08        1.11       59.53        2.94    4% -  
>  20%
>     TermBGroup100K1P      222.53        9.78      252.86        5.96    6% -  
>  21%
>         SloppyPhrase       70.36        2.05       79.95        4.48    4% -  
>  23%
>             Wildcard      238.10        4.29      272.78        4.97   10% -  
>  18%
>            OrHighMed      123.49        4.85      149.32        4.66   12% -  
>  29%
>              Prefix3      288.46        8.10      350.40        5.38   16% -  
>  26%
>           OrHighHigh       76.46        3.27       93.13        2.96   13% -  
>  31%
>               IntNRQ       92.25        2.12      113.47        5.74   14% -  
>  32%
>                 Term      757.12       39.03      958.62       22.68   17% -  
>  36%
>          AndHighHigh      103.03        4.48      133.89        3.76   21% -  
>  39%
>           AndHighMed      376.36       16.58      493.99       10.00   23% -  
>  40%
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

Reply via email to