Hi all.
We've been using Lucene to index our dynamic data structure and so far
Lucene has been flexible enough to accommodate our requirements.
Now we have this requirement about searching repeating fields, whose
implementation is not clear.
Our data records have a dynamic tree-like structure :
e.g. a portion of record will look like:
-company-data
--- financial-data
------revenue-info
--------year
--------amount
For above record portion we create Lucene fields
"/company-data/financial-data/revenue-info/year" and
"/company-data/financial-data/revenue-info/amount"
Here, the 'revenue-info' is a repeating node, so we can have records like :
Record 1
---financial-data
------revenue-info
--------year = 2000
--------amount = 1000000
------revenue-info
--------year = 2001
--------amount = 2000000
Record 2
---financial-data
------revenue-info
--------year = 2000
--------amount = 2000000
Now, we need to find records where 'year=2000' and 'amount=2000000' --
only those **belonging to same revenue-info node**.
So the search should match Record2 above , but NOT Record1.
Here, simple BooleanQuery using two Terms
(/company-data/financial-data/revenue-info/year:2000 and
/company-data/financial-data/revenue-info/amount:2000000) cannot do
this - since it will also match Record 1.
A couple of clarifications : each of the nodes (company-data,
financial-data, revenue-info ... etc ) have physical records in DB and
are individually accessible.
Also, we generate Lucene queries programmatically , we don't use the
query parser at all.
We thought of a couple of approaches to implement this but both have
some limitations :
Approach 1. Generate a separate dummy Document for each 'revenue-info'
node That way, we can ensure that both fields matched belong to same
revenue-info node. The 'revenue-info' Document can be then linked to
actual record Document using some other 'id' field.
This approach would however increase index size (one additional Doc per
repeating node ). [ We would have a few dozen repeating nodes in each
record. ]
Approach 2 : Merge values of 'year' and 'amount' into a single field.
e.g. /company-data/financial-data/revenue-info-dummy-field: = 2000#2000000
However, the problem is , we may need to do range queries for some
fields -- so for queries like year =2000 AND amount>1000000, we can't
search this composite field value. So this approach may be unusable for us.
Has someone encountered something similar ? Can it be implemented any
other way ?
Any suggestion would be greatly appreciated.
-Subodh
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]