Re: Postgres + Xapian (was Re: [HACKERS] fulltext searching via a custom index type )

Eric Ridge Mon, 05 Jan 2004 09:04:35 -0800

On Jan 2, 2004, at 4:54 PM, Alvaro Herrera wrote:

I think your approach is too ugly.  You will have tons of problems the
minute you start thinking about concurrency (unless you want to allow
only a single user accessing the index)

It might be ugly, but it's very fast. Surprisingly fast, actually.

Concerning concurrency, Xapian internally supports multiple readers and only 1 concurrent writer. So the locking requirements should be far less complex than a true concurrent solution. Now, I'm not arguing that this ideal, but if Xapian is a search engine you're interested in, then you've already made up your mind that you're willing to deal with 1 writer at a time.

However, Xapian does have built-in support for searching multiple databases at once. One thought I've had is to simply create a new 1-document database on every INSERT/UPDATE beyond the initial CREATE INDEX. Then whenever you do an index scan, tell Xapian to use all the little databases that exist in the index. This would give some bit of concurrency. Then on VACUUM (or FULL), all these little databases could be merged back into the main index.

and recovery (unless you want to force users to REINDEX when the system crashes).

I don't yet understand how the WAL stuff works. I haven't looked at the API's yet, but if something you can record is "write these bytes to this BlockNumber at this offset", or if you can say, "index Tuple X from Relation Y", then it seems like recovery is still possible.

If ya can't do any of that, then I need to go look at WAL further.

I think one way of attacking the problem would be using the existing
nbtree by allowing it to store the five btrees.  First read the README
in the nbtree dir, and then poke at the metapage's only structure.  You
will see that it has a BlockNumber to the root page of the index.

Right, I had gotten this far in my investigation already. The daunting thing about trying to use the nbtree code, is the a code itself. It's very complex. Plus, I just don't know how well the rest of Xapian would respond to all of a sudden having a concurrent backend. It's likely that it would make no difference, but it's just an unknown to me at this time.

Try modifying that to make it have a BlockNumber to every index's root page. You will have to provide ways to access each root page and maybe other nonstandard things (such as telling the root split operation what root page are you going to split), but you will get recovery and concurrency (at least to a point) for free.

And I'm not convinced that recovery and concurrency would be "for free" in this case either. The need to keep essentially 5 different trees in sync greatly complicates the concurrency issue, I would think.

thanks for your time!

eric


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: Postgres + Xapian (was Re: [HACKERS] fulltext searching via a custom index type )

Reply via email to