On 10/16/06, Jens Kraemer <[EMAIL PROTECTED]> wrote:
> Hi!
>
> On Mon, Oct 16, 2006 at 04:23:28PM +0900, David Balmain wrote:
> > On 10/16/06, Charlie Hubbard <[EMAIL PROTECTED]> wrote:
> [..]
> > > I'm interested in your database approach. It could help simplify this
> > > problem.  It seems doable to add this to acts_as_ferret without needing
> > > a seperate project.  Not to mention it's really needed in Rails apps as
> > > well.
> > >
> >
> > In my suggested database approach the search would be the equivalent
> > of a simple SQL join query. By adding a feature like this to
> > acts_as_ferret you'll need to pull all the matching page ids out of
> > the index and peform a much slower SQL query for all books that
> > include those page ids. I'm not sure it is feasible but I'll leave
> > that decision to the acts_as_ferret developers. The best solution is
> > definitely to index all the pages with the book document, even if it
> > means indexing each page twice.
>
> I'd suggest going that route, too.
>
> An imho interesting question around this is, how much the size of the
> value for that pages field containing all pages of a book really would
> influence the total index size (when not storing the contents and not
> storing term vectors), i.e. will the index size grow in a linear way, or
> will it grow slower over time, as with bigger size of the value of a
> field more terms occur more than once ?
>
> Jens

That is an interesting question. I haven't done any tests to back this
up but I would guess you are correct. Indexing the content as a single
field in Book will take up a lot less space than it would in separated
into multiple documents as pages. So indexing the field twice as I
suggested shouldn't double the size of your index. In fact, if you
give the fields the same name (ie :content for both Page and Book)
then the increase in index size will be negligable. There will however
be a noticable difference in indexing time but again, it shouldn't be
double. As far as search goes this solution will probably be orders of
magnitude better.

Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to