Re: Clustered Indexing on common network filesystem

Zach Bailey Thu, 02 Aug 2007 08:59:37 -0700

Rajesh,

I forgot to mention this, but we did investigate this option as well andeven prototyped it for an internal project. It ended up being too slowfor us.

It was adding a lot of overhead even to small updates, IIRC, mainly dueto the fact that the index was essentially stored as a filesystem in thedatabase. As you can probably imagine, using a database as a filesystemis not very performant.


Rajesh parab wrote:

One more alternative, though I am not sure if anyone
is using it.

Apache Compass has added a plug-in to allow storing
Lucene index files inside the database. This should
work in clustered environment as all nodes will share
the same database instance.

I am not sure the impact it will have on performance.

Is anyone using DB for index storage? Any drawbacks of
this approach?

Regards,
Rajesh

--- Zach Bailey <[EMAIL PROTECTED]> wrote:
Thanks for your response --

Based on my understanding, hadoop and nutch are
essentially the samething, with nutch being derived from hadoop, and areprimarily intendedto be standalone applications.
We are not looking for a standalone application,
rather we must use aframework to implement search inside our currentcontent managementapplication. Currently the application searchfunctionality is designedand built around Lucene, so migrating frameworks atthis point is notfeasible.
We are currently re-working our back-end to support
clustering (intomcat) and we are looking for information on themigration of Lucenefrom a single node filesystem index (which is whatwe use now and hopeto continue to use for clients with a single-nodedeployment) to ashared filesystem index on a mounted network share.
We prefer to use this strategy because it means we
do not have to havetwo disparate methods of managing indexes forclients who run in asingle-node, non-clustered environment versusclients who run in amultiple-node, clustered environment.
So, hopefully here are some easy questions someone
could shed some light on:

Is this not a recommended method of managing indexes
across multiple nodes?

At this point would people recommend storing an
individual index on eachnode and propagating index updates via a JMSframework rather thanattempting to handle it transparently with a single
shared index?

Is the Lucene index code so intimately tied to
filesystem semantics thatusing a shared/networked file system is infeasible
at this point in time?

What would be the quickest time-to-implementation of
these strategies(JMS vs. shared FS)? The most robust/least
error-prone?

I really appreciate any insight or response anyone
can provide, even ifit is a short answer to any of the related topics,"i.e. we implementedclustered search using per-node indexing with JMSupdate propagation andit works great", or even something as simple as"don't use a sharedfilesystem at this point".
Cheers,
-Zach

testn wrote:
Why don't you check out Hadoop and Nutch? It
should provide what you are
looking for.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
____________________________________________________________________________________
Building a website is a piece of cake. Yahoo! Small Business gives you all the 
tools to get online.
http://smallbusiness.yahoo.com/webhosting
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Clustered Indexing on common network filesystem

Reply via email to