RE: One item, multiple fields, and range queries

2011-04-08 Thread wojtekpia
Hi Hoss,
I realize I'm reviving a really old thread, but I have the same need, and
SpanNumericRangeQuery sounds like a good solution for me. Can you give me
some guidance on how to implement that?

Thanks,

Wojtek

--
View this message in context: 
http://lucene.472066.n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p2796613.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: One item, multiple fields, and range queries

2010-03-30 Thread Chris Hostetter

: parallel arrays, one array per address-part field.  The parallel array 
: alignment is effected via alignment of position increments.  What's 
: missing from Solr/Lucene is the ability to constrain matches such that 
: the position increment of all matching address-part fields is the same.

It exists using Span Queries...

http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html


...this let's you construct a SpanNearQuery requiring that a span on 
fieldA occurs "near" a span on fieldB (in terms of position value, even 
though the fields are different).  

The only thing that's really missing as far as i can see is a 
"SpanNumericRangeQuery"


-Hoss



RE: One item, multiple fields, and range queries

2010-03-29 Thread Steven A Rowe
Hi David,

On 03/29/2010 at 4:54 PM, David Smiley (@MITRE.org) wrote:
> Did you read my original message where I suggested perhaps a solution
> might lie in intersecting different queries based on common multi-value
> field offsets derived from matching term positions?  I have no idea how
> far off the current codebase is to exposing enough information to make
> such an approach possible.

AFAICT, your above-described solution addresses the "one-to-many problem" by 
representing multiple records within a single document via parallel arrays, one 
array per address-part field.  The parallel array alignment is effected via 
alignment of position increments.  What's missing from Solr/Lucene is the 
ability to constrain matches such that the position increment of all matching 
address-part fields is the same.

I suspect that the Flexible Indexing branch would allow a slightly less 
involved index usage pattern: you could add a new term attribute that 
explicitly represents the record index.  That way you wouldn't have to fiddle 
around with increment gaps and guess about maximum record size.

You still need to perform the equivalent of an SQL table join across the 
matching address-part fields (in addition to any non-address constraints), 
using parallel array index equality as the join predicate.  I don't know how 
hard it would be to implement this, but you'd need to: add the ability to 
express this kind of constraint in the query language; make a new Similarity 
implementation that could handle it; and, if you go the route of adding a new 
record index term attribute, add a new postings codec that handles 
writing/reading it.

Steve



RE: One item, multiple fields, and range queries

2010-03-29 Thread David Smiley (@MITRE.org)

Steven,

The composite doc idea is an interesting avenue to a solution here that I 
didn't think of.  What's missing is code to do the group by and then do an 
intersection in order to get boolean AND behavior between the addresses and 
primary documents, and  then filter out the non-primary documents.  Perhaps 
Solr's popular field-collapsing patch would be a starting point.

I realize of course that Lucene/Solr isn't a database but there is plenty of 
gray area in-between.

Did you read my original message where I suggested perhaps a solution might lie 
in intersecting different queries based on common multi-value field offsets 
derived from matching term positions?  I have no idea how far off the current 
codebase is to exposing enough information to make such an approach possible.

~ David Smiley

From: Steven A Rowe [via Lucene] 
[mailto:ml-node+684371-1863547009-13...@n3.nabble.com]
Sent: Monday, March 29, 2010 4:29 PM
To: Smiley, David W.
Subject: RE: One item, multiple fields, and range queries

Hi David,

On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote:
> I'm not sure what to make of "or index using a heterogeneous field
> schema, grouping the different doc type instances with a unique key
> (the one) to form a composite doc"

Lucene is schema-free - you can mix and match different document types in a 
single index.  You could emulate this in Solr by merging the two document types 
and leaving blank the parts that are inapplicable to a given instance.  E.g.:

Address-doc-type:
Field: Unique-key
Field: Street
Field: City
...

Everything-else-doc-type:
Field: Unique-key
Field: Blob-o'-text
...

Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ...
Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ...
Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ...


> I could use the scheme you mention provided with the spanNear query but
> it conflates different fields into one indexed field which will mess
> with the scoring and make queries like range queries if there are dates
> involved next to impossible.

I agree, dimensional reduction can be an issue, though I'm sure there are use 
cases where the attendant scoring distortion would be acceptable, e.g. 
non-scoring filters.  (Stuffing a variable number of addresses into a single 
document will also "mess with the scoring" unless you turn off norms, which is 
of course another form of scoring-messing.)

I've seen a couple of different mentions of private SpanRangeQuery 
implementations on the mailing lists, so range queries likely wouldn't be a 
problem for long, should it become a general issue.

> This "solution" is really a hack workaround to a limitation in
> Lucene/Solr.  I was hoping to start a conversation to a more
> truer resolution to this problem rather than these workarounds
> which aren't always satisfactory.

Limitation: Solr/Lucene is not a database.

"Solutions":
1. Hack workaround
2. Rewrite Solr/Lucene to be a database
3. ? (fill in "more truer resolution" here)

Good luck,
Steve


____________
View message @ 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684371.html
To unsubscribe from RE: One item, multiple fields, and range queries, click 
here< (link removed) ==>.



-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684415.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: One item, multiple fields, and range queries

2010-03-29 Thread Steven A Rowe
Hi David,

On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote:
> I'm not sure what to make of "or index using a heterogeneous field
> schema, grouping the different doc type instances with a unique key
> (the one) to form a composite doc"

Lucene is schema-free - you can mix and match different document types in a 
single index.  You could emulate this in Solr by merging the two document types 
and leaving blank the parts that are inapplicable to a given instance.  E.g.:

Address-doc-type: 
Field: Unique-key
Field: Street
Field: City
...

Everything-else-doc-type:
Field: Unique-key
Field: Blob-o'-text
...

Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ...
Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ...
Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ...


> I could use the scheme you mention provided with the spanNear query but
> it conflates different fields into one indexed field which will mess
> with the scoring and make queries like range queries if there are dates
> involved next to impossible.

I agree, dimensional reduction can be an issue, though I'm sure there are use 
cases where the attendant scoring distortion would be acceptable, e.g. 
non-scoring filters.  (Stuffing a variable number of addresses into a single 
document will also "mess with the scoring" unless you turn off norms, which is 
of course another form of scoring-messing.)

I've seen a couple of different mentions of private SpanRangeQuery 
implementations on the mailing lists, so range queries likely wouldn't be a 
problem for long, should it become a general issue.

> This "solution" is really a hack workaround to a limitation in
> Lucene/Solr.  I was hoping to start a conversation to a more
> truer resolution to this problem rather than these workarounds
> which aren't always satisfactory.

Limitation: Solr/Lucene is not a database.  

"Solutions":
1. Hack workaround
2. Rewrite Solr/Lucene to be a database
3. ? (fill in "more truer resolution" here)

Good luck,
Steve



RE: One item, multiple fields, and range queries

2010-03-29 Thread David Smiley (@MITRE.org)

I'm not going to index each address as its own document because the
"one-side" that I have currently has loads of text and there are many
addresses.  Furthermore, it doesn't really address the general case of my
problem statement.
I'm not sure what to make of "or index using a heterogeneous field schema,
grouping the different doc type instances with a unique key (the one) to
form a composite doc"
I could use the scheme you mention provided with the spanNear query but it
conflates different fields into one indexed field which will mess with the
scoring and make queries like range queries if there are dates involved next
to impossible.  This "solution" is really a hack workaround to a limitation
in Lucene/Solr.  I was hoping to start a conversation to a more truer
resolution to this problem rather than these workarounds which aren't always
satisfactory.

~ David Smiley

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684282.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: One item, multiple fields, and range queries

2010-03-29 Thread Steven A Rowe
David,

The standard one-to-many solution is indexing each address (the many) as its 
own document, and then either copy the other fields from your current schema to 
these documents, or index using a heterogeneous field schema, grouping the 
different doc type instances with a unique key (the one) to form a composite 
doc.  (These solutions address your discomfort with a single address field.)

Also, while you say that you don't have a hierarchy, I think you do; what you 
have described could be expressed in XML as:


  ...
  ...
  

  ...
  ...
  ...
  ...

  ...
  ...
  ...
  ...

...
  


I believe you could use the scheme I described on the other thread, using a 
single address field, if you encoded it like so:

  _ADDRESS_ _STREET_ 12 Main Street _CITY_ Metripilos _STATE_ MZ _ZIP_ 0
  _ADDRESS_ _STREET_ 512 23rd Avenue _CITY_ Carmtwon _STATE_ XB _ZIP_ 1
  ...

Then to find the docs associated with Carmtwon, XB:


  

  
_CITY_
Carmtwon
_STATE_
XB
  

  
  
_ADDRESS_
  


Steve

On 03/29/2010 at 9:11 AM, David Smiley (@MITRE.org) wrote:
> 
> Sorry, I intended to design my post so that one wouldn't have to read
> the thread for context but it seems I failed to do that.  Don't bother
> reading the thread.  The use-case I'm pondering modifying Lucene/Solr to
> solve is the one-to-many problem.  Imagine a document that contains
> multiple addresses where each field of an address (like street, state,
> zipcode) go in different multi-valued fields.  The main difficulty is
> considering how Lucene might be modified to have query results across
> different fields be intersected by a matching term position offset
> (which is designed in these fields to refer to a known value offset).
> 
> Following the link you gave is interesting though the general case I'm
> talking about doesn't have a hierarchy.  And I find the use of a single
> multi-valued field unpalatable for a variety of reasons.
> 
> ~ David Smiley
> 
> -
>  Author: https://www.packtpub.com/solr-1-4-enterprise-search-
> server/book -- View this message in context:
> http://n3.nabble.com/One-item-multiple-
> fields-and-range-queries-tp475030p683361.html Sent from the Solr - User
> mailing list archive at Nabble.com.




Re: One item, multiple fields, and range queries

2010-03-29 Thread Lukas Kahwe Smith

On 29.03.2010, at 15:11, David Smiley (@MITRE.org) wrote:

> 
> Sorry, I intended to design my post so that one wouldn't have to read the
> thread for context but it seems I failed to do that.  Don't bother reading
> the thread.  The use-case I'm pondering modifying Lucene/Solr to solve is
> the one-to-many problem.  Imagine a document that contains multiple
> addresses where each field of an address (like street, state, zipcode) go in
> different multi-valued fields.  The main difficulty is considering how
> Lucene might be modified to have query results across different fields be
> intersected by a matching term position offset (which is designed in these
> fields to refer to a known value offset).


i posted another use case the other day as well .. then again i hope the 
spatial support in 1.5 will make this use case obsolete soon. basically we have 
an app where we have offers that can be available in multiple stores. now in 
order to have a speedy compact index the idea was to simply store the geo 
location of the stores along with the offers in a multi valued field. however 
in order to filter on the x-y geo coordinates we would have to filter on the 
pairs. this is i guess similar to your above example as well with multiple 
addresses.

here is the link to my post:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201003.mbox/%3cfb3f49c8-31d9-48fc-b416-73a1bbd3f...@pooteeweet.org%3e

btw: i was mailed offlist if i have found an answer to the above question. so 
its not some crazy use case ..

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





RE: One item, multiple fields, and range queries

2010-03-29 Thread David Smiley (@MITRE.org)

Sorry, I intended to design my post so that one wouldn't have to read the
thread for context but it seems I failed to do that.  Don't bother reading
the thread.  The use-case I'm pondering modifying Lucene/Solr to solve is
the one-to-many problem.  Imagine a document that contains multiple
addresses where each field of an address (like street, state, zipcode) go in
different multi-valued fields.  The main difficulty is considering how
Lucene might be modified to have query results across different fields be
intersected by a matching term position offset (which is designed in these
fields to refer to a known value offset).

Following the link you gave is interesting though the general case I'm
talking about doesn't have a hierarchy.  And I find the use of a single
multi-valued field unpalatable for a variety of reasons.

~ David Smiley

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p683361.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: One item, multiple fields, and range queries

2010-03-28 Thread Steven A Rowe
Hi David,

I confess that even after looking at earlier posts in the thread your subject 
refers to, I'm not entirely sure exactly what problem you're trying to solve.

However, aspects of your desired solution seem quite similar to what the OP on 
this thread over on java-user was trying to do:

http://www.lucidimagination.com/search/document/61851fe5651331cc/increase_number_of_available_positions

If the solution described over there is not applicable to what you're trying to 
do, I apologize for the noise.

Steve

> -Original Message-
> From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org]
> Sent: Sunday, March 28, 2010 6:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: One item, multiple fields, and range queries
> 
> 
> It's been three years since this discussion and I'm unaware of any work
> that
> has plugged this capability gap in Lucene/Solr.  In summary, it would
> be
> very, *very*, useful to be able to query multiple multi-valued fields
> and
> require that such matches occur at the same index offset.  I'm working
> on an
> app where I should be able to get away with a single multi-valued field
> and
> query with slop.  If I have time to get fancy, I could induce a delta
> position increment gap scheme since I know my inner fields can't be
> very
> long, and thus I can avoid the slop (a performance win) but still use a
> phrase query.  But for those of you wanting numeric range queries or
> other
> things where the data is indexed differently, this isn't going to work.
> Using multiple fields is cleaner but there's no way to cross-query
> multi-valued fields with restraining the position increment gap.  Has
> anyone
> out there done this yet?
> 
> I think it's a tough problem.  One piece of the solution would be to
> configure a position increment gap such that the gap between values
> isn't
> fixed, it'd be the delta to the next multiple of 1000 (where 1000 is
> configurable). This would allow you to know which value offset a given
> searched term is from based on the term's position as queried from
> Lucene.
> That's the easy part.  But then somehow you'd have to cross-
> correlate/filter
> multiple query results taking the intersection based on common offsets.
> Surely that would take some serious hacking and I have no clue how
> feasible
> that is.  Thoughts?
> 
> ~ David Smiley
> --
> View this message in context: http://n3.nabble.com/One-item-multiple-
> fields-and-range-queries-tp475030p682227.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: One item, multiple fields, and range queries

2010-03-28 Thread David Smiley (@MITRE.org)

It's been three years since this discussion and I'm unaware of any work that
has plugged this capability gap in Lucene/Solr.  In summary, it would be
very, *very*, useful to be able to query multiple multi-valued fields and
require that such matches occur at the same index offset.  I'm working on an
app where I should be able to get away with a single multi-valued field and
query with slop.  If I have time to get fancy, I could induce a delta
position increment gap scheme since I know my inner fields can't be very
long, and thus I can avoid the slop (a performance win) but still use a
phrase query.  But for those of you wanting numeric range queries or other
things where the data is indexed differently, this isn't going to work. 
Using multiple fields is cleaner but there's no way to cross-query
multi-valued fields with restraining the position increment gap.  Has anyone
out there done this yet?  

I think it's a tough problem.  One piece of the solution would be to
configure a position increment gap such that the gap between values isn't
fixed, it'd be the delta to the next multiple of 1000 (where 1000 is
configurable). This would allow you to know which value offset a given
searched term is from based on the term's position as queried from Lucene. 
That's the easy part.  But then somehow you'd have to cross-correlate/filter
multiple query results taking the intersection based on common offsets. 
Surely that would take some serious hacking and I have no clue how feasible
that is.  Thoughts?

~ David Smiley
-- 
View this message in context: 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p682227.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: One item, multiple fields, and range queries

2007-01-18 Thread Chris Hostetter

: Now I follow.  I was misreading the first comments, thinking that the field
: content would be deconstructed to smaller components or pieces.  Too much
: (or not enough) coffee.

that's my bad .. i was trying to explain the concept by simplifying the
numeric range part out of the discussion and just tell you about hte
multifield phrase query idea.

: I'm expecting the index doc needs to be constructed with lat/long/dates in
: sequential order, i.e.:

there's no requirement that you actually interleave them in the file, but
yes: the value you add to the lat field would need to corrispond to the
first value you add to the lon field and the when field as a single
event instance.  the second value you add to each field would all ned to
corrispond to each other as the next instance.

: Assuming slop count of 0, while the intention is to match lat/long/when in
: that order, could it possibly match long/when/lat, or when/lat/long?  Does
: PhraseQuery enforce order and starting point as well?

the key is that you aren't storing the lat/lon/when in the same field wo
you'll only match the time in the when field, the lat in the lat field
etc...

: Assuming all of this, how does range query come into play?  Or could the
: PhraseQuery portion be applied as a filter?

this is why i said it was pretty theoretical ... not only would you need a
modified version of PhraseQuery to work across multiple fields, you'd need
to change it to match on ranges as well.


-Hoss


Re: One item, multiple fields, and range queries

2007-01-17 Thread Jeff Rodenburg

Now I follow.  I was misreading the first comments, thinking that the field
content would be deconstructed to smaller components or pieces.  Too much
(or not enough) coffee.

I'm expecting the index doc needs to be constructed with lat/long/dates in
sequential order, i.e.:



  123

  32.123456
  -88.987654
  01/31/2007

  42.123456
  -98.987654
  01/31/2007

  40.123456
  -108.987654
  01/30/2007
.etc.

Assuming slop count of 0, while the intention is to match lat/long/when in
that order, could it possibly match long/when/lat, or when/lat/long?  Does
PhraseQuery enforce order and starting point as well?

Assuming all of this, how does range query come into play?  Or could the
PhraseQuery portion be applied as a filter?



On 1/17/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: OK, you lost me.  It sounds as if this PhraseQuery-ish approach involves
: breaking datetime and lat/long values into pieces, and evaluation occurs
: with positioning.  Is that accurate?

i'm not sure what you mean by pieces ... the idea is that you would have a
single "latitude" field and a single "longitude" field and a single "when"
field, and if an item had a single event, you would store a single value
in each field ... but if the item has multiple events, you would store
them in the same relative ordering, and then use the same kind of logic
PhraseQuery uses to verify that if the "latitude" field has a value in the
right range, and the "longitude" field has a value in the right range, and
the "when" field has a value in the right range, that all of those values
have the same position (specificly: are within a set amount of slop from
eachother, which you would allways set to "0")

: > It seems like this could even be done in the same field if one had a
: > query type that allowed querying for tokens at the same position.
: > Just index "_noun" at the same position as "house" (and make sure
: > there can't be collisions between real terms and markers via escaping,
: > or use \0 instead of _, etc).

true ... but the point doug made way back when is that with a generalized
multi-field phrase query you wouldn't have to do that escaping ... the
hard part in this case is the numeric ranges.


-Hoss




Re: One item, multiple fields, and range queries

2007-01-17 Thread Chris Hostetter

: OK, you lost me.  It sounds as if this PhraseQuery-ish approach involves
: breaking datetime and lat/long values into pieces, and evaluation occurs
: with positioning.  Is that accurate?

i'm not sure what you mean by pieces ... the idea is that you would have a
single "latitude" field and a single "longitude" field and a single "when"
field, and if an item had a single event, you would store a single value
in each field ... but if the item has multiple events, you would store
them in the same relative ordering, and then use the same kind of logic
PhraseQuery uses to verify that if the "latitude" field has a value in the
right range, and the "longitude" field has a value in the right range, and
the "when" field has a value in the right range, that all of those values
have the same position (specificly: are within a set amount of slop from
eachother, which you would allways set to "0")

: > It seems like this could even be done in the same field if one had a
: > query type that allowed querying for tokens at the same position.
: > Just index "_noun" at the same position as "house" (and make sure
: > there can't be collisions between real terms and markers via escaping,
: > or use \0 instead of _, etc).

true ... but the point doug made way back when is that with a generalized
multi-field phrase query you wouldn't have to do that escaping ... the
hard part in this case is the numeric ranges.


-Hoss



Re: One item, multiple fields, and range queries

2007-01-16 Thread Jeff Rodenburg

Yonik/Hoss -

OK, you lost me.  It sounds as if this PhraseQuery-ish approach involves
breaking datetime and lat/long values into pieces, and evaluation occurs
with positioning.  Is that accurate?



On 1/16/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 1/15/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> PhraseQuery artificially enforces that the Terms you add to it are
> in the same field ... you could easily write a PhraseQuery-ish query
that
> takes Terms from differnet fields, and ensures that they appear "near"
> eachother in terms of their token sequence -- the context of that
comment
> was searching for instances of words with specific usage (ie: "house"
used
> as a noun) by putting the usage type of each term in a different term in
a
> seperate parallel field, but with identicle token positions.

It seems like this could even be done in the same field if one had a
query type that allowed querying for tokens at the same position.
Just index "_noun" at the same position as "house" (and make sure
there can't be collisions between real terms and markers via escaping,
or use \0 instead of _, etc).

-Yonik



Re: One item, multiple fields, and range queries

2007-01-16 Thread Yonik Seeley

On 1/15/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

PhraseQuery artificially enforces that the Terms you add to it are
in the same field ... you could easily write a PhraseQuery-ish query that
takes Terms from differnet fields, and ensures that they appear "near"
eachother in terms of their token sequence -- the context of that comment
was searching for instances of words with specific usage (ie: "house" used
as a noun) by putting the usage type of each term in a different term in a
seperate parallel field, but with identicle token positions.


It seems like this could even be done in the same field if one had a
query type that allowed querying for tokens at the same position.
Just index "_noun" at the same position as "house" (and make sure
there can't be collisions between real terms and markers via escaping,
or use \0 instead of _, etc).

-Yonik


Re: One item, multiple fields, and range queries

2007-01-15 Thread Chris Hostetter

: I've not yet used dynamic fields in this manner.  With that number range,
: what limitations could I encounter?  Given the size of that, I would need

very little, yonik recently listed the "costs" of dynamic fields...
http://www.nabble.com/Searching-multiple-indices-%28solr-newbie%29-tf2903899.html#a8245621
..as he points out, with omitNorms="true" you can have thousands of
dynamic fields and not even notice.

: the solr engine to formulate that query, correct?  I can't imagine I could
: pass that entire subquery statement in the http request, as the character
: limit would likely be exceeded.

yeah ... if you wanted to try the approach i described, and your "N"
wasn't a single digit number, i would recommend putting the query
building code into a custom RequestHandler ... it could even inspect the
list of field names from the IndexReader and know exactly how big N is at
any given moment.  i have no idea how efficient this approach would be if
N really does get up into the hundreds.


A completely different approach you could take if you want to get into
Lucene Query internals would be to take advantage of something Doug
mentioned once that has stayed in the back of my mind for almost a year
now:  PhraseQuery artificially enforces that the Terms you add to it are
in the same field ... you could easily write a PhraseQuery-ish query that
takes Terms from differnet fields, and ensures that they appear "near"
eachother in terms of their token sequence -- the context of that comment
was searching for instances of words with specific usage (ie: "house" used
as a noun) by putting the usage type of each term in a different term in a
seperate parallel field, but with identicle token positions.

if you forget for a moment about the ranges you need to do, and imagine
instead that you store the "quadrent number" and "hour of day" for each
event, where e1q is the quadtrent of event1 for an item, and e1h is the
hour of the day that event1 happened at, then for an item with multiple
events you could index the field/terms lists
quadrent:  e1q   e2q   e3q
hour:  e1h   e2h   e3h

and query for your input quadrent at a term position equal to the term
position of your input hour.

if you got *that* working, you could concievably change the query to take
in a range for each field -- using TermEnum to get the list of of all
latitude Terms in your latitude range, then for each of those Terms get
the list of documents and the term position within thta document, and then
look for the longitude terms in the same relative term position which are
in your longitude range, and time terms in the same relative term position
in your time range.

does that make any sense?

this is all purely theoretical, it just seems like it *should* be
possible, but i haven't thought through how it would be implimented.  if
you acctually wanted to tackle it, i would start a discussion on
[EMAIL PROTECTED] first, so people smarter then me can tlel you if i'm
smoking crack or not.

-Hoss



Re: One item, multiple fields, and range queries

2007-01-15 Thread Jeff Rodenburg

Thanks Hoss.  Interesting approach, but the "N" bound could be well in the
hundreds, and the N bound would be variable (some maximum number, but
different across events.)

I've not yet used dynamic fields in this manner.  With that number range,
what limitations could I encounter?  Given the size of that, I would need
the solr engine to formulate that query, correct?  I can't imagine I could
pass that entire subquery statement in the http request, as the character
limit would likely be exceeded.

Some of my comments may not make sense, so I'll check into dynamic fields
and such in the meantime.

thanks,
j


On 1/14/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: 2) use multivalued fields as correlated vectors, so the first start
: date corresponds
:to the first end date corresponds to the first lat and long value.
: You get them all back
:in a query though, so your app would need to do extra work to sort
: out which matched.

if you expect a bounded number of correlated "events" per item, you can
use dynaimc fields, and build up N correlated subqueries where N is the
upper bound on the number of events you expect any item to have, ie...

  (+lat1:[x TO y] +lon1:[w TO z] +time1:[a TO b])
   OR (+lat2:[x TO y] +lon2:[w TO z] +time2:[a TO b])
   OR (+lat3:[x TO y] +lon3:[w TO z] +time3:[a TO b])
   ...




-Hoss




Re: One item, multiple fields, and range queries

2007-01-14 Thread Chris Hostetter

: 2) use multivalued fields as correlated vectors, so the first start
: date corresponds
:to the first end date corresponds to the first lat and long value.
: You get them all back
:in a query though, so your app would need to do extra work to sort
: out which matched.

if you expect a bounded number of correlated "events" per item, you can
use dynaimc fields, and build up N correlated subqueries where N is the
upper bound on the number of events you expect any item to have, ie...

  (+lat1:[x TO y] +lon1:[w TO z] +time1:[a TO b])
   OR (+lat2:[x TO y] +lon2:[w TO z] +time2:[a TO b])
   OR (+lat3:[x TO y] +lon3:[w TO z] +time3:[a TO b])
   ...




-Hoss



Re: One item, multiple fields, and range queries

2007-01-13 Thread Jeff Rodenburg

Thanks Yonik.


1) model a single document as a single event at a singe place with a start

and end date.

This was my first approach, but at presentation time I need to display the
event once -- with multiple start/end dates and locations beneath it.

Is treating the given event uniqueId as a facet the way to go?

thanks,
jeff


On 1/12/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 1/12/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> I'm stuck with a query issue that at present seems unresolvable.  Hoping
the
> community has some insight to this.
>
> My index contains events that have multiple beginning/ending date ranges
and
> multiple locations.  For example, event A (uniqueId = 123) occurs every
> weekend, sometimes in one location, sometimes in many locations.  Dates
have
> a beginning and ending date, and locations have a latitude &
longitude.  I
> need to query for the set of events for a given "area", where area =
> bounding box.  So, a single event has multiple beginning and ending
dates
> and multiple locations.
>
> So, the beginning date, ending date, latitude and longitude values only
> apply collectively as a unit.  However, I need to do range queries on
both
> the dates and the lat/long values.

1) model a single document as a single event at a singe place with a
start and end date.
  OR
2) use multivalued fields as correlated vectors, so the first start
date corresponds
   to the first end date corresponds to the first lat and long value.
You get them all back
   in a query though, so your app would need to do extra work to sort
out which matched.

I'd do (1) if you can... it's simpler.

-Yonik



Re: One item, multiple fields, and range queries

2007-01-12 Thread Yonik Seeley

On 1/12/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:

I'm stuck with a query issue that at present seems unresolvable.  Hoping the
community has some insight to this.

My index contains events that have multiple beginning/ending date ranges and
multiple locations.  For example, event A (uniqueId = 123) occurs every
weekend, sometimes in one location, sometimes in many locations.  Dates have
a beginning and ending date, and locations have a latitude & longitude.  I
need to query for the set of events for a given "area", where area =
bounding box.  So, a single event has multiple beginning and ending dates
and multiple locations.

So, the beginning date, ending date, latitude and longitude values only
apply collectively as a unit.  However, I need to do range queries on both
the dates and the lat/long values.


1) model a single document as a single event at a singe place with a
start and end date.
 OR
2) use multivalued fields as correlated vectors, so the first start
date corresponds
  to the first end date corresponds to the first lat and long value.
You get them all back
  in a query though, so your app would need to do extra work to sort
out which matched.

I'd do (1) if you can... it's simpler.

-Yonik