Re: A new Lucene Directory available

Earwin Burrfoot Sun, 15 Nov 2009 06:44:34 -0800

> About the RAMDirectory comparison, as you said yourself the bytes
> aren't read constantly but just at index reopen so I wouldn't be too
> worried about the "bunch of methods" as they're executed once per
> segment loading;
The bytes /are/ read constantly (readByte() method). I believe that is
the most innermost loop you can hope to find in Lucene.


> A RAMDirectory is AFAIK not recommended as you could hit memory limits and 
> because it's basically a synchronized HashMap;
On the other hand, just as I mentioned - the only access to said
synchronized HashMap is done when you
open InputStream on a file. That, unlike readByte(), happens rarely,
as InputStreams are cloned after creation as needed.
As for memory limits, your unbounded local cache hits them with same ease.

> Instances of ChunkCacheKey are not created for each single byte read
> but for each byte[] buffer, being the size of these buffers configurable.
No, they are! :-)
InfinispanIndexIO.java, rev. 1103:
120           public byte readByte() throws IOException {
.........
132              buffer = getChunkFromPosition(cache, fileKey,
filePosition, bufferSize);
.........
141           }
getChunkFromPosition() is called each time readByte() is invoked. It
creates 1-2 instances of ChunkCacheKey.

> This was decided after observations that it was
> improving performance to "chunk" segments in smaller pieces rather
> than have huge arrays of bytes, but if you like you can configure it
> to degenerate to approach the one key per segment ratio.
Locally, it's better not to chunk segments (unless you hit 2Gb
barrier). When shuffling them over network - I can't say.

> Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can 
> scale :-)
I'm just following two of your initial comparisons. And the only
characteristic that can be scaled with such
approach is queries/s. Index size - definetly not, updates/s - questionable.

> About JGroups I'm not technically prepared for a match, but I've heard
> of different stories of much bigger than 20 nodes business critical
> clusters working very well. Sure, it won't scale without a proper
> configuration at all levels: os, jgroups and infrastructure.
The volume of messages travelling around, length of GC delays VS
cluster size and messaging mode matter.
They used reliable synchronous multicasts, so - once one node starts
collecting, all others wait (or worse - send retries).
Another one starts collecting, then another, partially delivered
messages hold threads - caboom!
How is locking handled here? With central broker it probably can work.

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: A new Lucene Directory available

Reply via email to