Multivalued fields are the other approach to keyword value pairs.

And if you can denormalize your data, storing structure as separate documents can make sense and support more powerful queries. Although the join capabilities are rather limited.

-- Jack Krupansky

-----Original Message----- From: Paul Bell
Sent: Sunday, March 31, 2013 8:52 AM
To: java-user@lucene.apache.org
Subject: Re: Indexing a long list

Hi Jack,

Thanks for the reply. I am very new to Lucene.

Your timing is a bit uncanny. I was just coming to the conclusion that
there's nothing special about this case for Lucene, i.e., a tokenized field
should work, when I looked up and saw your e-mail.

In re the larger context: yeah, the properties in question here belong to
some kind of node, e.g., maybe a vertex in a graph DB. Possible properties
include 'name', 'type', 'inEdges', 'outEdges', etc. Most properties are
simple k=v pairs. But a few, notable the 'edge' properties, could be long
lists.

My intent was to create a Lucene Document for each node. The Fields in this
Document would represent all of the node's properties. A generic (not in
Lucene syntax) query should be able to ask after any property, e.g.,

   ('name' equals "vol1" AND 'outEdges.name' startsWith "hasMirror")

Note that 'outEdges.name' represents multiple elements, where 'name'
represents only one. That is, the generic query syntax is trying to match
any out-edge whose name property starts with "hasMirror". I haven't quite
crystallized the generic query syntax and don't know how best to map it to
both a Lucene query and to an appropriate Lucene index structure. Please
let me know if you've any suggestions!

Thanks again.

-Paul



On Sun, Mar 31, 2013 at 8:33 AM, Jack Krupansky <j...@basetechnology.com>wrote:

The first question is how do you want to access the data? What do you want
your queries to look like?

What is the larger context? Are these properties of larger documents? Are
there more than one per document? Etc.

Why not just store the property as a tokenized field? Then you can query
whether v(i) or v(j) are or are not present as keywords.

-- Jack Krupansky

-----Original Message----- From: Paul Bell
Sent: Sunday, March 31, 2013 8:21 AM
To: java-user@lucene.apache.org
Subject: Indexing a long list


Hi All,

Suppose I need to index a property whose value is a long list of terms. For
example,

   someProperty = ["v1", "v2", .... , "v1000000"]

Please note that I could drop the leading "v" and index these as numbers
instead of strings.

But the question is what's the best practice in Lucene when dealing with a
case like this? I need to be able to retrieve the list. This makes methink
that I need to store it. And I suppose that the list could be stored in the
index itself or in the "content" to which the index points.

So there are really two parts to this question:

1. Lucene "best practices" for long list
2. Where to store such a list

Thanks for your help.

-Paul

------------------------------**------------------------------**---------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org>




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to