Re: A new Lucene Directory available

Manik Surtani Mon, 16 Nov 2009 03:00:49 -0800

@Sanne, thanks for announcing this, good stuff!

@Earwin, note that this is a tech preview and hardly production-ready code yet. 
 The more eyes that scan the code, try it out, report bugs and bottlenecks, the 
better.  So thanks for spotting ISPN-276, we look forward to more 
feedback/patches.  :)  Regarding your comments regarding locking, cluster-wide 
syncs, performance and tuning JGroups, I agree with Sanne that you should post 
your concerns on infinispan-...@lists.jboss.org and we can talk about it in 
greater depth there while keeping things relevant.


Cheers
Manik

--
Manik Surtani
ma...@jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org


On 15 Nov 2009, at 16:11, Sanne Grinovero wrote:

> Hi again Earwin,
> thanks you very much for spotting the byte reading issue, it's
> definitely not as I wanted it.
> https://jira.jboss.org/jira/browse/ISPN-276
> 
> I never tried to defend an improved updates/s ratio, just maybe
> compared to scheduled rsyncs :-)
> Our goal is to scale on queries/sec while usage semantics stays
> unchanged, so you can open an IndexWriter as it was local to make
> updates clusterwide. Very useful to cluster the many products already
> using Lucene which are currently implementing exotic index management
> workarounds or shared filesystems, as they weren't designed for it
> from the beginning as SolR did.
> I mentioned JIRA, you noticed how slow it can get on larger
> deployments? because there's no way to deploy it clustered currently
> (besides by using Terracotta), as it relies much on Lucene and index
> changes need to be applied in real time.
> 
> About locking and jgroups.. please switch over to
> infinispan-...@lists.jboss.org so you can get better answers and I
> don't have to spam the Lucene developers.
> 
> Regards,
> Sanne
> 
> 
> 
> On Sun, Nov 15, 2009 at 3:43 PM, Earwin Burrfoot <ear...@gmail.com> wrote:
>>> About the RAMDirectory comparison, as you said yourself the bytes
>>> aren't read constantly but just at index reopen so I wouldn't be too
>>> worried about the "bunch of methods" as they're executed once per
>>> segment loading;
>> The bytes /are/ read constantly (readByte() method). I believe that is
>> the most innermost loop you can hope to find in Lucene.
>> 
>>> A RAMDirectory is AFAIK not recommended as you could hit memory limits and 
>>> because it's basically a synchronized HashMap;
>> On the other hand, just as I mentioned - the only access to said
>> synchronized HashMap is done when you
>> open InputStream on a file. That, unlike readByte(), happens rarely,
>> as InputStreams are cloned after creation as needed.
>> As for memory limits, your unbounded local cache hits them with same ease.
>> 
>>> Instances of ChunkCacheKey are not created for each single byte read
>>> but for each byte[] buffer, being the size of these buffers configurable.
>> No, they are! :-)
>> InfinispanIndexIO.java, rev. 1103:
>> 120           public byte readByte() throws IOException {
>> .........
>> 132              buffer = getChunkFromPosition(cache, fileKey,
>> filePosition, bufferSize);
>> .........
>> 141           }
>> getChunkFromPosition() is called each time readByte() is invoked. It
>> creates 1-2 instances of ChunkCacheKey.
>> 
>>> This was decided after observations that it was
>>> improving performance to "chunk" segments in smaller pieces rather
>>> than have huge arrays of bytes, but if you like you can configure it
>>> to degenerate to approach the one key per segment ratio.
>> Locally, it's better not to chunk segments (unless you hit 2Gb
>> barrier). When shuffling them over network - I can't say.
>> 
>>> Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can 
>>> scale :-)
>> I'm just following two of your initial comparisons. And the only
>> characteristic that can be scaled with such
>> approach is queries/s. Index size - definetly not, updates/s - questionable.
>> 
>>> About JGroups I'm not technically prepared for a match, but I've heard
>>> of different stories of much bigger than 20 nodes business critical
>>> clusters working very well. Sure, it won't scale without a proper
>>> configuration at all levels: os, jgroups and infrastructure.
>> The volume of messages travelling around, length of GC delays VS
>> cluster size and messaging mode matter.
>> They used reliable synchronous multicasts, so - once one node starts
>> collecting, all others wait (or worse - send retries).
>> Another one starts collecting, then another, partially delivered
>> messages hold threads - caboom!
>> How is locking handled here? With central broker it probably can work.
>> 
>> --
>> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>> ICQ: 104465785
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> 
>> 
> 
> 
> 
> -- 
> Sanne Grinovero
> Sourcesense - making sense of Open  Source: http://www.sourcesense.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> 






---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: A new Lucene Directory available

Reply via email to