:44 PM
Subject: Re: Why is lucene so slow indexing in nfs file system ?
Thanks for yours suggestions.
I'm sorry I didn't know but I would want to know what Do you mean with
"SAN"
and "FC"?
Another thing, I have visited the lucene home page and there is not
release
--
> > From: Ariel <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Thursday, January 10, 2008 10:05:28 AM
> > Subject: Re: Why is lucene so slow indexing in nfs file system ?
> >
> > In a distributed enviroment the application should ma
s in advance.
Ariel
On Jan 10, 2008 2:59 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
> Ariel,
>
> Comments inline.
>
>
> - Original Message
> From: Ariel <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Thursday, January 10, 2008
Ariel,
Comments inline.
- Original Message
From: Ariel <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, January 10, 2008 10:05:28 AM
Subject: Re: Why is lucene so slow indexing in nfs file system ?
In a distributed enviroment the application should m
I am indexing into RAM then merging explicitly because my application demand
it due to I have design it as a distributed enviroment so many threads or
workers are in different machines indexing into RAM serialize to disk an
another thread in another machine access the segment index to merge it with
If possible you should also test the soon-to-be-released version 2.3,
which has a number of speedups to indexing.
Also try the steps here:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
You should also try an A/B test: A) writing your index to the NFS
directory and then B) to a
This seems really clunky. Especially if your merge step also optimizes.
There's not much point in indexing into RAM then merging explicitly.
Just use an FSDirectory rather than a RAMDirectory. There is *already*
buffering built in to FSDirectory, and your merge factor etc. control
how much RAM is
In a distributed enviroment the application should make an exhaustive use of
the network and there is not another way to access to the documents in a
remote repository but accessing in nfs file system.
One thing I must clarify: I index the documents in memory, I use
RAMDirectory to do that, then wh
Thanks all you for yours answers, I going to change a few things in my
application and make tests.
One thing I haven't find another good pdfToText converter like pdfBox Do you
know any other faster ?
Greetings
Thanks for yours answers
Ariel
On Jan 9, 2008 11:08 PM, Otis Gospodnetic <[EMAIL PROTECT
Ariel,
I believe PDFBox is not the fastest thing and was built more to handle all
possible PDFs than for speed (just my impression - Ben, PDFBox's author might
still be on this list and might comment). Pulling data from NFS to index seems
like a bad idea. I hope at least the indices are local
Ariel wrote:
The problem I have is that my application spends a lot of time to index all
the documents, the delay to index 10 gb of pdf documents is about 2 days (to
convert pdf to text I am using pdfbox) that is of course a lot of time,
others applications based in lucene, for instance ibm omni
There's also Nutch. However, 10GB isn't that big... Perhaps you can
index where the docs/index lives, then just make the index available
via NFS? Or, better yet, use rsync to replicate it like Solr does.
-Grant
On Jan 9, 2008, at 10:49 AM, Steven A Rowe wrote:
Hi Ariel,
On 01/09/2008 a
Hi Ariel,
On 01/09/2008 at 8:50 AM, Ariel wrote:
> Dou you know others distributed architecture application that
> uses lucene to index big amounts of documents ?
Apache Solr is an open source enterprise search server based on the Lucene Java
search library, with XML/HTTP and JSON APIs, hit high
<<< would like to find out why my application has this big
delay to index>>>
Well, then you have to measure . Tthe first thing I'd do
is pinpoint where the time was being spent. Until you have
that answered, you simply cannot take any meaningful action.
1> don't do any of the indexing. No new Doc
14 matches
Mail list logo