[
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878617#action_12878617
]
Mark Harwood commented on LUCENE-2454:
--------------------------------------
Yep, I can see an app with a thousand cached filters would have a problem with
this impl as it stands.
Maintaining parallel indexes always feels a little flaky to me, not least
because of the loss of transactional integrity you can get from using a single
index.
Is another approach to make your cached filters document-type-specific? I.e.
they only hold numbers in the range of zero to number-of-docs-of-this-type.
To use a cached doc ID in such a filter you would need to make use of mapping
arrays to project the type-specific doc id numbers into global doc-id
references and back.
Lets imagine an index with a mix of "A", "B" and "C" doc types organised as
follows:
docId docType
===== =======
1 A
2 B
3 C
4 A
5 C
6 C
The mapping arrays for docType "C" would look as follows
{code:title=Bar.java|borderStyle=solid}
int [ ] globalDocIdToTypeCLookUp = {-1,-1,0,-1,1,2} // sparse, sized 0->
num docs in overall index
int [ ] typeCToGlobalDocIdLookUp = {0,1,2} // dense, sized 0-> num
type C docs in overall index
{code}
Your cached filters would be created as follows:
{code:title=Bar.java|borderStyle=solid}
myTypeCBitset=new OpenBitSet(numberOfTypeCDocs); //this line is hopefully
where you save RAM!
//for all matching type C docs...
myTypeCBitSet.setBit(globalDocIdToTypeCLookUp[realDocId];
{code}
Your filters can then be used by dereferencing the child doc IDs as follows:
{code:title=Bar.java|borderStyle=solid}
int nextRealDocId=typeCToGlobalDocIdLookUp [myTypeCBitSet.getNextSetBit()];
{code}
Clearly the mapping arrays come at a cost of 4bytes*num docs which is non
trivial. The sparse globalDocIdToTypeCLookUp array shown here could be avoided
by reading TermDocs and counting at cached-Filter-create time .
> Nested Document query support
> -----------------------------
>
> Key: LUCENE-2454
> URL: https://issues.apache.org/jira/browse/LUCENE-2454
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Affects Versions: 3.0.2
> Reporter: Mark Harwood
> Assignee: Mark Harwood
> Priority: Minor
> Attachments: LuceneNestedDocumentSupport-1.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]