@Sanne, thanks for announcing this, good stuff! @Earwin, note that this is a tech preview and hardly production-ready code yet. The more eyes that scan the code, try it out, report bugs and bottlenecks, the better. So thanks for spotting ISPN-276, we look forward to more feedback/patches. :) Regarding your comments regarding locking, cluster-wide syncs, performance and tuning JGroups, I agree with Sanne that you should post your concerns on infinispan-...@lists.jboss.org and we can talk about it in greater depth there while keeping things relevant.
Cheers Manik -- Manik Surtani ma...@jboss.org Lead, Infinispan Lead, JBoss Cache http://www.infinispan.org http://www.jbosscache.org On 15 Nov 2009, at 16:11, Sanne Grinovero wrote: > Hi again Earwin, > thanks you very much for spotting the byte reading issue, it's > definitely not as I wanted it. > https://jira.jboss.org/jira/browse/ISPN-276 > > I never tried to defend an improved updates/s ratio, just maybe > compared to scheduled rsyncs :-) > Our goal is to scale on queries/sec while usage semantics stays > unchanged, so you can open an IndexWriter as it was local to make > updates clusterwide. Very useful to cluster the many products already > using Lucene which are currently implementing exotic index management > workarounds or shared filesystems, as they weren't designed for it > from the beginning as SolR did. > I mentioned JIRA, you noticed how slow it can get on larger > deployments? because there's no way to deploy it clustered currently > (besides by using Terracotta), as it relies much on Lucene and index > changes need to be applied in real time. > > About locking and jgroups.. please switch over to > infinispan-...@lists.jboss.org so you can get better answers and I > don't have to spam the Lucene developers. > > Regards, > Sanne > > > > On Sun, Nov 15, 2009 at 3:43 PM, Earwin Burrfoot <ear...@gmail.com> wrote: >>> About the RAMDirectory comparison, as you said yourself the bytes >>> aren't read constantly but just at index reopen so I wouldn't be too >>> worried about the "bunch of methods" as they're executed once per >>> segment loading; >> The bytes /are/ read constantly (readByte() method). I believe that is >> the most innermost loop you can hope to find in Lucene. >> >>> A RAMDirectory is AFAIK not recommended as you could hit memory limits and >>> because it's basically a synchronized HashMap; >> On the other hand, just as I mentioned - the only access to said >> synchronized HashMap is done when you >> open InputStream on a file. That, unlike readByte(), happens rarely, >> as InputStreams are cloned after creation as needed. >> As for memory limits, your unbounded local cache hits them with same ease. >> >>> Instances of ChunkCacheKey are not created for each single byte read >>> but for each byte[] buffer, being the size of these buffers configurable. >> No, they are! :-) >> InfinispanIndexIO.java, rev. 1103: >> 120 public byte readByte() throws IOException { >> ......... >> 132 buffer = getChunkFromPosition(cache, fileKey, >> filePosition, bufferSize); >> ......... >> 141 } >> getChunkFromPosition() is called each time readByte() is invoked. It >> creates 1-2 instances of ChunkCacheKey. >> >>> This was decided after observations that it was >>> improving performance to "chunk" segments in smaller pieces rather >>> than have huge arrays of bytes, but if you like you can configure it >>> to degenerate to approach the one key per segment ratio. >> Locally, it's better not to chunk segments (unless you hit 2Gb >> barrier). When shuffling them over network - I can't say. >> >>> Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can >>> scale :-) >> I'm just following two of your initial comparisons. And the only >> characteristic that can be scaled with such >> approach is queries/s. Index size - definetly not, updates/s - questionable. >> >>> About JGroups I'm not technically prepared for a match, but I've heard >>> of different stories of much bigger than 20 nodes business critical >>> clusters working very well. Sure, it won't scale without a proper >>> configuration at all levels: os, jgroups and infrastructure. >> The volume of messages travelling around, length of GC delays VS >> cluster size and messaging mode matter. >> They used reliable synchronous multicasts, so - once one node starts >> collecting, all others wait (or worse - send retries). >> Another one starts collecting, then another, partially delivered >> messages hold threads - caboom! >> How is locking handled here? With central broker it probably can work. >> >> -- >> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) >> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >> ICQ: 104465785 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > > > > -- > Sanne Grinovero > Sourcesense - making sense of Open Source: http://www.sourcesense.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org