Re: Clustered Indexing on common network filesystem

Zach Bailey Thu, 02 Aug 2007 09:13:55 -0700

Mark,

Thanks so much for your response.

Unfortunately, I am not sure the leader of the project would feel goodabout running code from trunk, save without an explicit endorsement froma majority of the developers or contributors for that particular code(do those people keep up with this list, anyway?) Is there any word onthe possible timeframe the code required to work with NFS might be released?

Thanks for your other insight about hardlinks and rsync. I will lookinto that; unfortunately it does not cover our userbase who may beclustering in a Windows Server environment. I still have not heard/seenany evidence (anecdotal or otherwise) about how well lucene might worksharing indexes over a mounted Windows share.


-Zach

Mark Miller wrote:

Some quick info:
NFS should work, but I think youll want to be working off the trunk.Also, Sharing an index over NFS is supposed to be slow. The standard sofar, if you are not partitioning the index, is to use a unix/linuxfilesystem and hardlinks + rsync to efficiently share index changesacross nodes (hard links for instant copy, rsync to only transferchanged index files, search the mailing list). If you look at solr youcan see scripts that give an example of this. I don't think the scriptsrely on solr. This kind of setup should be quick and simple toimplement. Same with NFS. An RMI solution that allowed for indexpartitioning would probably be the longest to do.
-Mark



Zach Bailey wrote:
Thanks for your response --
Based on my understanding, hadoop and nutch are essentially the samething, with nutch being derived from hadoop, and are primarilyintended to be standalone applications.
We are not looking for a standalone application, rather we must use aframework to implement search inside our current content managementapplication. Currently the application search functionality isdesigned and built around Lucene, so migrating frameworks at thispoint is not feasible.
We are currently re-working our back-end to support clustering (intomcat) and we are looking for information on the migration of Lucenefrom a single node filesystem index (which is what we use now and hopeto continue to use for clients with a single-node deployment) to ashared filesystem index on a mounted network share.
We prefer to use this strategy because it means we do not have to havetwo disparate methods of managing indexes for clients who run in asingle-node, non-clustered environment versus clients who run in amultiple-node, clustered environment.
So, hopefully here are some easy questions someone could shed somelight on:
Is this not a recommended method of managing indexes across multiplenodes?
At this point would people recommend storing an individual index oneach node and propagating index updates via a JMS framework ratherthan attempting to handle it transparently with a single shared index?
Is the Lucene index code so intimately tied to filesystem semanticsthat using a shared/networked file system is infeasible at this pointin time?
What would be the quickest time-to-implementation of these strategies(JMS vs. shared FS)? The most robust/least error-prone?
I really appreciate any insight or response anyone can provide, evenif it is a short answer to any of the related topics, "i.e. weimplemented clustered search using per-node indexing with JMS updatepropagation and it works great", or even something as simple as "don'tuse a shared filesystem at this point".
Cheers,
-Zach

testn wrote:
Why don't you check out Hadoop and Nutch? It should provide what you are
looking for.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Clustered Indexing on common network filesystem

Reply via email to