I can answer a few of these. If you haven't yet, you'd do yourself a favor
to pick up the book "Lucene in Action". It's written to the 1.4 code-base,
the examples compile but give deprecated warnings for the 1.9 code base, and
need a few more tweaks for the 2.0 code base.

Also, download a copy of Luke. It's invaluable for looking at an index to
answer questions like "why isn't this query returning what I thought it
should?".

That said, see below, and understand that I have exactly one 500K document
(growing to 1M over the next few months) project under my belt with Lucene,
so my opinions are those of an relative novice....

On 7/11/06, Tomi NA <[EMAIL PROTECTED]> wrote:

I plan to make lucene (and nutch) a key element in an intranet
solution, but I only know about lucene what I've read in the last
couple of days.
Here's what I'd like opinions about.

I would like to build a single point of access to data on intranet web
pages and LAN shared documents.

I've looked into the file:// vs. http:// issue and it seems that it
doesn't make sense securitywise to use file:// so I'd make the shares
accessible through a web server and *then* index them.
However, if I leverage the web inbuilt lucene search, I'd build two
"indices" (please pardon the lack of knowledge of lucene terminology).
Can I search over two indices? Do I have to merge them? How would I do
the former or the latter?


I'll let folks who understand security answer this one <G>.



Is there a good explanation somewhere how to set up incremental
indexing, rather than e.g. building the whole index over nightly?


See Luncen in Action (LIA). The short answer is yes. In the simplest form,
you can just add new data to a currently-existing index. You probably want
to make a copy just in case your power goes out or something.

Then there's IndexMergeTool which I haven't used, but looks interesting.

Then there's the possibility of using a MultiSearcher, but I'll leave it to
the experts to talk about how that combines results....

Will lucene produce usable results, knowing that there'll be no rich
interlinking between a major part of the content (shared docs) to base
ranking upon? I know lucene uses a number of criteria to rank
documents upon: I'm asking about real-world experiences and subjective
grading quality impressions.
Does anyone have similar experiences with intranet  enterprise data
searching?

Cheers,
Tomi

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to