Re: Document-Map, Hits-List

2004-12-01 Thread Luke Francl
On Wed, 2004-12-01 at 10:27, Otis Gospodnetic wrote:

> This is very similar to what I do - I create a List of Maps from Hits
> and its Documents.  So I think this change may be handy, if doable (I
> didn't look into changing the two Lucene classes, actually).


How do you avoid the problem Eric just mentioned, iterating through all
the Hits at once to populate this data structure?

I do a similar thing, creating a List of asset references from a field
in each Lucene Document in my Hits list (actual data for display
retrieved from a separate datastore). I was not aware of any performance
problems from doing this, but now I am wondering about the implications.

Thanks,
Luke


Re: Document-Map, Hits-List

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 17:31, Luke Francl wrote:
How do you avoid the problem Eric just mentioned, iterating through all
the Hits at once to populate this data structure?
You don't need to iterate through anything upfront... you simply do it 
on-demand... eg when invoking List.get() you would invoke the 
underlying Hits.doc()...

In other words, there is _no_ new data structure... simply an 
additional interface...

PA.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Document-Map, Hits-List

2004-12-01 Thread Luke Francl
On Wed, 2004-12-01 at 10:39, petite_abeille wrote:


> You don't need to iterate through anything upfront... you simply do it 
> on-demand... eg when invoking List.get() you would invoke the 
> underlying Hits.doc()...
> 
> In other words, there is _no_ new data structure... simply an 
> additional interface...


Yes, but Otis hasn't implemented that interface. He's wrapping his Hits
with a List of Maps. 

Luke


Re: Document-Map, Hits-List

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 17:41, Luke Francl wrote:
Yes, but Otis hasn't implemented that interface. He's wrapping his Hits
with a List of Maps.
Right... I'm sure that Otis knows what he is doing :)
As far as implementation goes, you have at least 3 options:
- Implement List and Map directly in Lucene's relevant objects (e.g. 
Hits and Document)
- Extend Hits and Document to achieve the same
- Wrap Hits and Document in another class which implements the relevant 
interfaces (e.g HitsList and DocumentMap)

The point of the exercise being to provide a standard interface while 
still benefiting from the underlying Lucene optimizations.

PA.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Document-Map, Hits-List

2004-12-01 Thread Erik Hatcher
On Dec 1, 2004, at 11:31 AM, Luke Francl wrote:
I do a similar thing, creating a List of asset references from a field
in each Lucene Document in my Hits list (actual data for display
retrieved from a separate datastore). I was not aware of any 
performance
problems from doing this, but now I am wondering about the 
implications.
The performance "concern" (lets not say "problem") is when you get 
10,000,000 (or so :) results back from a search.  No user wants to see 
all of that, only the first 20, perhaps.  Calling Hits.doc(i) pulls the 
document data from the index and populates a Document instance.  There 
is file I/O involved, and doing lots of unnecessary Hits.doc(i) calls 
may potentially be noticeable.  If you're only getting 100 hits back 
then you'll likely not even notice.  (all numbers quoted here are just 
random figures - don't quote me on actual performance numbers :).

In my current application, I have a paging feature.  Each new page does 
a search again using the same query, but I only iterate through the 20 
that should display on that page and build a highlighted data structure 
to hand to the presentation of only the appropriate ones for the range.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Document-Map, Hits-List

2004-12-01 Thread Otis Gospodnetic
Hello,

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> On Dec 1, 2004, at 11:31 AM, Luke Francl wrote:
> > I do a similar thing, creating a List of asset references from a
> field
> > in each Lucene Document in my Hits list (actual data for display
> > retrieved from a separate datastore). I was not aware of any 
> > performance
> > problems from doing this, but now I am wondering about the 
> > implications.
> 
> The performance "concern" (lets not say "problem") is when you get 
> 10,000,000 (or so :) results back from a search.  No user wants to
> see 
> all of that, only the first 20, perhaps.  Calling Hits.doc(i) pulls
> the 
> document data from the index and populates a Document instance. 
> There 
> is file I/O involved, and doing lots of unnecessary Hits.doc(i) calls
> 
> may potentially be noticeable.  If you're only getting 100 hits back 
> then you'll likely not even notice.  (all numbers quoted here are
> just 
> random figures - don't quote me on actual performance numbers :).

Somewhat related and interesting post from Tim Bray:
  http://tbray.org/ongoing/When/200x/2004/11/26/SearchSort

> In my current application, I have a paging feature.  Each new page
> does 
> a search again using the same query, but I only iterate through the
> 20 
> that should display on that page and build a highlighted data
> structure 
> to hand to the presentation of only the appropriate ones for the
> range.

Same here.  I make use of List's subList method a lot.

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Document-Map, Hits-List

2004-12-03 Thread Otis Gospodnetic
Yes, it's not wise to just pull all Document instances from Hits
instance, unless you really need them all.  I don't do that, I really
just provide a wrapper, like this:

/**
 * A simple List implementation wrapping a Hits object.
 *
 * @author Otis Gospodnetic
 * @version $Id: HitList.java,v 1.4 2004/11/11 14:08:33 otis Exp $
 */
public class HitList extends AbstractList
{
private Hits _hits;

/**
 * Creates a new HitList instance.
 *
 * @param hits Hits to wrap
 */
public HitList(Hits hits)
{
_hits = hits;
}

/**
 * @see java.util.List#get(int)
 */
public Object get(int index)
{
try {
return _hits.doc(index);
} catch (IOException e) {
throw new RuntimeException(e);
}
}

/**
 * @see java.util.List#size()
 */
public int size() {
return _hits.length();
}


...
...

Otis


--- Luke Francl <[EMAIL PROTECTED]> wrote:

> On Wed, 2004-12-01 at 10:27, Otis Gospodnetic wrote:
> 
> > This is very similar to what I do - I create a List of Maps from
> Hits
> > and its Documents.  So I think this change may be handy, if doable
> (I
> > didn't look into changing the two Lucene classes, actually).
> 
> 
> How do you avoid the problem Eric just mentioned, iterating through
> all
> the Hits at once to populate this data structure?
> 
> I do a similar thing, creating a List of asset references from a
> field
> in each Lucene Document in my Hits list (actual data for display
> retrieved from a separate datastore). I was not aware of any
> performance
> problems from doing this, but now I am wondering about the
> implications.
> 
> Thanks,
> Luke
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]