> About the RAMDirectory comparison, as you said yourself the bytes > aren't read constantly but just at index reopen so I wouldn't be too > worried about the "bunch of methods" as they're executed once per > segment loading; The bytes /are/ read constantly (readByte() method). I believe that is the most innermost loop you can hope to find in Lucene.
> A RAMDirectory is AFAIK not recommended as you could hit memory limits and > because it's basically a synchronized HashMap; On the other hand, just as I mentioned - the only access to said synchronized HashMap is done when you open InputStream on a file. That, unlike readByte(), happens rarely, as InputStreams are cloned after creation as needed. As for memory limits, your unbounded local cache hits them with same ease. > Instances of ChunkCacheKey are not created for each single byte read > but for each byte[] buffer, being the size of these buffers configurable. No, they are! :-) InfinispanIndexIO.java, rev. 1103: 120 public byte readByte() throws IOException { ......... 132 buffer = getChunkFromPosition(cache, fileKey, filePosition, bufferSize); ......... 141 } getChunkFromPosition() is called each time readByte() is invoked. It creates 1-2 instances of ChunkCacheKey. > This was decided after observations that it was > improving performance to "chunk" segments in smaller pieces rather > than have huge arrays of bytes, but if you like you can configure it > to degenerate to approach the one key per segment ratio. Locally, it's better not to chunk segments (unless you hit 2Gb barrier). When shuffling them over network - I can't say. > Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can > scale :-) I'm just following two of your initial comparisons. And the only characteristic that can be scaled with such approach is queries/s. Index size - definetly not, updates/s - questionable. > About JGroups I'm not technically prepared for a match, but I've heard > of different stories of much bigger than 20 nodes business critical > clusters working very well. Sure, it won't scale without a proper > configuration at all levels: os, jgroups and infrastructure. The volume of messages travelling around, length of GC delays VS cluster size and messaging mode matter. They used reliable synchronous multicasts, so - once one node starts collecting, all others wait (or worse - send retries). Another one starts collecting, then another, partially delivered messages hold threads - caboom! How is locking handled here? With central broker it probably can work. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org