Re: Clustered Indexing on common network filesystem

Zach Bailey Thu, 02 Aug 2007 08:36:25 -0700

Thanks for your response --

Based on my understanding, hadoop and nutch are essentially the samething, with nutch being derived from hadoop, and are primarily intendedto be standalone applications.

We are not looking for a standalone application, rather we must use aframework to implement search inside our current content managementapplication. Currently the application search functionality is designedand built around Lucene, so migrating frameworks at this point is notfeasible.

We are currently re-working our back-end to support clustering (intomcat) and we are looking for information on the migration of Lucenefrom a single node filesystem index (which is what we use now and hopeto continue to use for clients with a single-node deployment) to ashared filesystem index on a mounted network share.

We prefer to use this strategy because it means we do not have to havetwo disparate methods of managing indexes for clients who run in asingle-node, non-clustered environment versus clients who run in amultiple-node, clustered environment.


So, hopefully here are some easy questions someone could shed some light on:

Is this not a recommended method of managing indexes across multiple nodes?

At this point would people recommend storing an individual index on eachnode and propagating index updates via a JMS framework rather thanattempting to handle it transparently with a single shared index?

Is the Lucene index code so intimately tied to filesystem semantics thatusing a shared/networked file system is infeasible at this point in time?

What would be the quickest time-to-implementation of these strategies(JMS vs. shared FS)? The most robust/least error-prone?

I really appreciate any insight or response anyone can provide, even ifit is a short answer to any of the related topics, "i.e. we implementedclustered search using per-node indexing with JMS update propagation andit works great", or even something as simple as "don't use a sharedfilesystem at this point".


Cheers,
-Zach

testn wrote:

Why don't you check out Hadoop and Nutch? It should provide what you are
looking for.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Clustered Indexing on common network filesystem

Reply via email to