[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Rutherglen updated LUCENE-2575: ------------------------------------- Attachment: LUCENE-2575.patch Here's a start at concurrency, the terms dictionary, and iterating over doc ids. * It needs concurrency unit tests * At an as yet undetermined interval, we need to conglomerate the existing terms into a sorted int[] rather than continue to use the ConcurrentSkipListMap, which consumes a far greater amount of RAM. The tradeoff and reason for using the CSLM is the level of concurrency gained by using it at the cost of greater memory consumption when compared with the sorted int[] of term ids. * An int[] based term enum needs to be implemented. In addition, a multi term enum, maybe there's one we can use, I'm not familiar enough with the new flex code base. * Copy on write is used to obtain a read-only version of the ByteBlockPool and IntBlockPool. In the case of the byte blocks, a boolean[] marks which elements need to be copied prior to writing by the DocumentsWriterPerThread on byte slice forwarding address rewrite. * A write lock on each DWPT guarantees that as reference copies are made, arrays being copied will not be altered in flight. There shouldn't be an issue even though to get a complete IndexReader[], we need to wait for each document to finish flushing, we're not blocking indexing, only the obtaining of the IRs. I can't see this being an issue for most use cases. * Similarly, a reference is copied of the ParallelPostingsArray (rather than a full copy) for use by the RAM Buffer based IndexReader. It is OK for the PPA to be changed during future doc adds, as the only the elements greater than the IRs max term id will be altered, ie, we're not going to run into JMM thread issues because the writing and read-only array reference copies occur in a reentrant lock. * Recycling of byte[]s becomes a bit more complex as RAM IRs will likely hold references to them. When the RAM IR is closed, however, the byte[]s can be recycled. The user could experience unusual RAM usage spikes if IRs are not closed properly. > Concurrent byte and int block implementations > --------------------------------------------- > > Key: LUCENE-2575 > URL: https://issues.apache.org/jira/browse/LUCENE-2575 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2575.patch > > > The current *BlockPool implementations aren't quite concurrent. > We really need something that has a locking flush method, where > flush is called at the end of adding a document. Once flushed, > the newly written data would be available to all other reading > threads (ie, postings etc). I'm not sure I understand the slices > concept, it seems like it'd be easier to implement a seekable > random access file like API. One'd seek to a given position, > then read or write from there. The underlying management of byte > arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org