Re: [Ferret-talk] How can I do my own search limits?

David Balmain Mon, 16 Oct 2006 09:24:48 -0700

On 10/16/06, Charlie Hubbard <[EMAIL PROTECTED]> wrote:
> David Balmain wrote:
>
> >> If that's the case then it's doable to create this type of search, but
> >> it would make more sense to modify ferret to support this type of query.
> >
> > I don't see a way to add this feature cleanly. It is just as easy for
> > you to do iterate through all the results yourself. Besides, you still
> > haven't explained why you can't add all Pages to each Book document?
> > As I said, the field length limit isn't an issue. This would be the
> > best way to solve this problem.
>
> There is no reason why I couldn't.  I was just trying to figure out a
> way to avoid it.  The big drawback to indexing all the pages onto a
> single field in book would mean I'd have to pick a size of the field up
> front that could be the maximum.  I don't have a lot of data yet, but I
> tried running some tests.  A 94 chapter book it's somewhere around of
> 100,000.  But that's a smaller book.  It's just something you have to
> watch closely which I was trying to avoid is all.  Right now your right
> the best approach is to store it twice.


Set it to Ferret::FIX_INT_MAX. This is the largest number that you set
any of the properties too and effectively sets no limit to the field
length. I'll add :all as an option at some point.

> > In my suggested database approach the search would be the equivalent
> > of a simple SQL join query. By adding a feature like this to
> > acts_as_ferret you'll need to pull all the matching page ids out of
> > the index and peform a much slower SQL query for all books that
> > include those page ids. I'm not sure it is feasible but I'll leave
> > that decision to the acts_as_ferret developers. The best solution is
> > definitely to index all the pages with the book document, even if it
> > means indexing each page twice.
>
> I was thinking it would be more like a SQL union.  In other words the
> query didn't have to match the Book document in order to be included.
> It just had to match the Page object to be included.  For example, say I
> have a book title of Lucene in Action, but you'd expect a query "java"
> would pull that one back.  Java is probably mentioned in the text of
> that book.  I sort of saw it as a multi_index query, since aaf maps the
> objects that way, where you'd first query Book Documents, then query the
> Page documents.  Instead of adding those Page documents to the resulting
> array.  They would only add a new entry if there was a Book not already
> there.  I suppose I could do that in Ruby, but it just seems like it
> might be more optimized if ferret understood this type of relationship
> since it is already iterating over this already.

Trust me, Ferret is complex enough as it is without having to
understand relationships between different documents. I need to draw
the line somewhere. If I want to add features like this I need to
design Ferret from the ground up to be more like a database which is
exactly what I intend to do with the Ferret object database. I hope
that makes sense.

Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] How can I do my own search limits?

Reply via email to