Ok, there is a patch (http://issues.apache.org/jira/browse/LUCENE-532). This
is what I saw. But I still have a question. I guess it´s better to ask this
in the hadoop mailing list, anyway, the hadoop project implements a DFS and
the whole MapReduce paradigm. Is it possible to do the indexing and
searching using all of that? Because the concept of a DFS and the MapReduce
is to have an app that is scalable and fast, like google. I´m sure this is
not easy at all, but it would be nice if one day this kind of stuff became a
commodity.
[]s
Rossini
On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Rossini{},
I think what you might have read might have been that searching a Lucene
index that lives in a HDFS would be slow. As far as I understand things,
the thing to do is to copy the index to a local disk, out of HDFS, and then
search it with Lucene from there.
Otis()
----- Original Message ----
From: Rafael Rossini <[EMAIL PROTECTED]>
To: [email protected]; Otis Gospodnetic <
[EMAIL PROTECTED]>
Sent: Thursday, July 27, 2006 4:23:56 PM
Subject: Re: Indexing large sets of documents?
Oits,
You mentioned the hadoop project. I check it out not a long time ago and
I read someting about it did not support the lucene index. Is it possible
to
index and then search in a HDFS?
[]s
Rossini
On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
>
> Michael,
>
> Certainly parallelizing on a set of servers would work (hmm... hadoop?),
> but if you want to do this on a single machine you should tune some of
the
> IndexWriter params. You didn't mention them, so I assume you didn't
tune
> anything yet. If you have Lucene in Action, check out
> 2.7.1 : Tuning indexing performance starts on page 42 under
> section 2.7 (Controlling the indexing process) in chapter 2 (Indexing)
> (found via: http://lucenebook.com/search?query=index+tuning )
>
> If not, check maxBufferedDocs and mergeFactor in IndexWriter
> javadocs. This is likely in the FAQ, too, but I didn't check.
>
> Otis
>
> ----- Original Message ----
> From: Michael J. Prichard
> To: [email protected]
> Sent: Thursday, July 27, 2006 12:29:31 PM
> Subject: Indexing large sets of documents?
>
> I built an indexer that runs through email and its attachments, rips out
> content and what not and then creates a Document and adds it to an
> index. It works w/ no problem. The issue is that it takes around 3-5
> seconds per email and I have seen up to 10-15 seconds for email w/
> attachments. I need to index 750k emails and at those times it will
> take FOREVER! I am trying to find places to cut a second or two here or
> there but are there any suggestions as to what I can do? Should I look
> into parallelizing indexing? Help?!
>
> Thanks,
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]