Re: Apache Lucene file search

2012-02-06 Thread Dheeraj Kv
Hi


 The issue of searching file name is resolved with some 
modifications in SearchFiles.java . 
A field named path has been added in the code.
String field = "path";
Also appended parser.setAllowLeadingWildcard(true)  for  searching leading 
wildcard strings, which was not available by default.
If parser.setAllowLeadingWildcard(true)  is set to false the code cant find the 
file name. 
If the file name consists of a string xyz, the query option should have *xyz*. 


Dheeraj KV




-Original Message-
From: Dheeraj Kv 
To: java-user 
Sent: Wed, Feb 1, 2012 1:10 pm
Subject: Apache Lucene  file search


Hi
I learnt about Lucene from google and i thought of implementing it my 
company.
I don't want to use Lucene as a web search application. I have a large backup 
storage and which consists of html file, doc files and pdf files.
I need to search inside a file as well as search for file names. For that 
purpose I thought I will try Lucene. I m not good in JAVA. 
I installed  it on the server and it worked me for normal search  inside the 
files, but  found out that file name search is not supported. 
Also it takes the same amount of time for making the index and updating the 
index of the same directory. 
Can you put some light on it. I used IndexFiles and SearchFiles APIsfor this 
purpose. 
Commands I used was:


Indexing
java org.apache.lucene.demo.IndexFiles -index /mnt1 -docs /Documents 


Searching
java org.apache.lucene.demo.SearchFiles -index /mnt1 


Regards

Dheeraj KV


 


Re: weightage of each word according to precedence in document

2012-02-06 Thread Ian Lea
At least it doesn't give the same score for a doc which doesn't have
all the terms which I think at one point you claimed.

So to try and simplify this, you've got one field called content and

doc1: pqrst uvwx abcd
doc2: abcd pqrst uvwx

and the query "abcd^10.0 content:pqrst^5.0" gives the same score for
doc1 and doc2.  That is to be expected since both docs are the same
length and both contain both search terms.

As I said before, if you want the order of matched terms to matter,
see PhraseQuery or SpanQuery.

Or store positional info in a Payload and factor that in somehow.
Powerful but complicated.  See
http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/
for an example.

I can't think of another way to make, in your case, abcd score higher
if is first rather than third term in the doc.  I'd try a SpanQuery
with some reasonable slop value and add it as an optional clause to
your query, possibly with a boost.


--
Ian.


On Sat, Feb 4, 2012 at 10:11 AM, A Z <4azfri...@gmail.com> wrote:
> hi lan,
>
> sorry for late reply ,
>
> it is simple search with default similarity only,
> here it gives same score for doc which has both token that is abcd pqrst,
> there is no more weight for doc which has predence of abcd in document .
>
> here is output with score and searcher.explain
>
>
> Query content:abcd^10.0 content:pqrst^5.0
>
> *title ->pqrst uvwx abcd ::: content -> pqrst uvwx abcd::: Score ->0.6175326
> *
>
> Searcher.explain -> 0.6175326 = (MATCH) sum of:
>
> 0.46281427 = (MATCH) weight(content:abcd^10.0 in 0), product of:
>
> 0.92562854 = queryWeight(content:abcd^10.0), product of:
>
> 10.0 = boost
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.5 = (MATCH) fieldWeight(content:abcd in 0), product of:
>
> 1.0 = tf(termFreq(content:abcd)=1)
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=0)
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 0), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content:pqrst in 0), product of:
>
> 1.0 = tf(termFreq(content:pqrst)=1)
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=0)
>
> *title ->abcd pqrst uvwx ::: content -> abcd pqrst uvwx::: Score ->0.6175326
> *
>
> Searcher.explain -> 0.6175326 = (MATCH) sum of:
>
> 0.46281427 = (MATCH) weight(content:abcd^10.0 in 1), product of:
>
> 0.92562854 = queryWeight(content:abcd^10.0), product of:
>
> 10.0 = boost
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.5 = (MATCH) fieldWeight(content:abcd in 1), product of:
>
> 1.0 = tf(termFreq(content:abcd)=1)
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=1)
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 1), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content:pqrst in 1), product of:
>
> 1.0 = tf(termFreq(content:pqrst)=1)
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=1)
>
> *title ->pqrst uvwx lmn abcd ::: content -> pqrst uvwx lmn abcd::: Score
> ->0.6175326*
>
> Searcher.explain -> 0.6175326 = (MATCH) sum of:
>
> 0.46281427 = (MATCH) weight(content:abcd^10.0 in 3), product of:
>
> 0.92562854 = queryWeight(content:abcd^10.0), product of:
>
> 10.0 = boost
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.5 = (MATCH) fieldWeight(content:abcd in 3), product of:
>
> 1.0 = tf(termFreq(content:abcd)=1)
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=3)
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 3), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content:pqrst in 3), product of:
>
> 1.0 = tf(termFreq(content:pqrst)=1)
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=3)
>
> *title ->pqrst abcd uvwx lmn ::: content -> pqrst abcd uvwx lmn::: Score
> ->0.6175326*
>
> Searcher.explain -> 0.6175326 = (MATCH) sum of:
>
> 0.46281427 = (MATCH) weight(content:abcd^10.0 in 4), product of:
>
> 0.92562854 = queryWeight(content:abcd^10.0), product of:
>
> 10.0 = boost
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.5 = (MATCH) fieldWeight(content:abcd in 4), product of:
>
> 1.0 = tf(termFreq(content:abcd)=1)
>
> 1.0 = idf(docFreq=4, maxDocs=5)
>
> 0.5 = fieldNorm(field=content, doc=4)
>
> 0.15471835 = (MATCH) weight(content:pqrst^5.0 in 4), product of:
>
> 0.37843326 = queryWeight(content:pqrst^5.0), product of:
>
> 5.0 = boost
>
> 0.81767845 = idf(docFreq=5, maxDocs=5)
>
> 0.092562854 = queryNorm
>
> 0.40883923 = (MATCH) fieldWeight(content

Re: recording a universal ID from DocID in a CustomScoreQuery

2012-02-06 Thread Ian Lea
int doc will be for the subreader, not for the entire index.
oal.search.Collector has setNextReader(IndexReader reader, int
docBase) which you might somehow be able to use.  Failing that I'd go
for FieldCache, or store the docids in a Set in a Map keyed by current
Reader, if that would give you what you needed for the subsequent
messing around.


--
Ian.


On Sat, Feb 4, 2012 at 12:09 AM, Paul Allan Hill  wrote:
> My Index does NOT have a simple UID, it uses the file PATH to the file as the 
> unique key.
> I was implementing a CustomScoreQuery which not only tweaked the score it 
> also wanted to write down which documents had passed through this part of 
> overall rebuilt query, so that I could further mess with those particular 
> documents later.
> I was hoping to do it without using loading up all PATHs from my index into a 
> field cache, but maybe that is a false way to try to save memory.
>
> I thought I could write down the docId provided in the call to customScore
>
> public float customScore(int doc, float subQueryScore, float valSrcScore) 
> throws IOException {
>     docIds.add(docId);
>   return ...;
>  }
>
> private Set docIds = new HashSet();
>
> While I thought I had this working, apparently I had not taken into 
> consideration the subreader and segment problem.
> The int called doc is not the docId for the entire index, just the local 
> reader doc number.  Is that right?
> So is there a standard way to convert back to the index wide DocID?
>
> If there is no standard way, I _might_ create a small subclass of 
> IndexSearcher and provide a method to:
>
>
> (1)    Find the right reader by looping through all 
> IndexSearcher.subReaders[] to find what reader called the CustomScoreQuery
>
> (2)    Add an offset of the proper value from IndexSearcher.docStarts[iReader]
>
> But I'm am thinking this prone to the problem that subreader can be made of 
> more subreaders etc., so I really don't have a clue where to find the current 
> reader and then to map back to
> docStarts.
>
> I also think I'm doing this wrong, because ReaderUtil has nothing like this?
>
> Is there some way to note for later that a particular document came through 
> this function query or should I just accept the fact of using the field cache?
>
> -Paul
>
>
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Ian Lea
If you can use NRTManager and SearcherManager things should be easy
and blazingly fast rather than unbearably slow.  The latter phrase is
not one often associated with lucene.

IndexWriter iw = new IndexWriter(whatever - some standard disk index);
NRTManager nrtm = new NRTManager(iw, null);
NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
ropt.setXxx(...);
...
ropt.start();

SearcherManager srchm = nrtm.getSearcherManager(b);

Then add docs to your index via nrtm.addDocument(d), update with
nrtm.updateDocument(...), and to search use

IndexSearcher searcher = srchm.acquire();
try {
  search ...
} finally {
 srchm.release(searcher);
}

All thread safe so you don't have to worry about any complications
there.  And I bet it'll be blindingly fast.

Don't forget to close() things down at the end.


--
Ian.



On Mon, Feb 6, 2012 at 12:15 AM, Cheng  wrote:
> I was trying to, but don't know how to even I read some of your blogs.
>
> On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Are you using near-real-time readers?
>>
>> (IndexReader.open(IndexWriter))
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sun, Feb 5, 2012 at 9:03 AM, Cheng  wrote:
>> > Hi Uwe,
>> >
>> > My challenge is that I need to update/modify the indexes frequently while
>> > providing the search capability. I was trying to use FSDirectory, but
>> found
>> > out that the reading and writing from/to FSDirectory is unbearably slow.
>> So
>> > I now am trying the RAMDirectory, which is fast.
>> >
>> > I don't know of  MMapDirectory, and wonder if it is as fast as
>> RAMDirectory.
>> >
>> >
>> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler  wrote:
>> >
>> >> Hi Cheng,
>> >>
>> >> It seems that you use a RAMDirectory for *caching*, otherwise it makes
>> no
>> >> sense to write changes back. In recent Lucene versions, this is not a
>> good
>> >> idea, especially for large indexes (RAMDirectory eats your heap space,
>> >> allocates millions of small byte[] arrays,...). If you need something
>> like
>> >> a
>> >> caching Directory and you are working on a 64bit platform, you can use
>> >> MMapDirectory (where the operating system kernel manages the read/write
>> >> between disk an memory). MMapDirectory is returned by default for
>> >> FSDirectory.open() on most 64 bit platforms. The good thing: the
>> "caching"
>> >> space is outside your JVM heap, so does not slowdown the garbage
>> collector.
>> >> So be sure to *not* allocate too much heap space (-Xmx) to your search
>> app,
>> >> only the minimum needed to execute it and leave the rest of your RAM
>> >> available for the OS kernel to manage FS cache.
>> >>
>> >> Uwe
>> >>
>> >> -
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: u...@thetaphi.de
>> >>
>> >>
>> >> > -Original Message-
>> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
>> >> > Sent: Sunday, February 05, 2012 7:56 AM
>> >> > To: java-user@lucene.apache.org
>> >> > Subject: Configure writer to write to FSDirectory?
>> >> >
>> >> > Hi,
>> >> >
>> >> > I build an RAMDirectory on a FSDirectory, and would like the writer
>> >> associated
>> >> > with the RAMDirectory to periodically write to hard drive.
>> >> >
>> >> > Is this achievable?
>> >> >
>> >> > Thanks.
>> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
I don't understand this following portion:

IndexWriter iw = new IndexWriter(whatever - some standard disk index);
NRTManager nrtm = new NRTManager(iw, null);
NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
ropt.setXxx(...);

ropt.start();

I have a java ExecutorServices instance running which take care of my own
applications. I don't know how this NRTManagerReopenThread works with my
own ExecutorService instance.

Can both work together? How can the NRTManagerReopenThread instance ropt be
plugged into my own multithreading framework?

On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:

> If you can use NRTManager and SearcherManager things should be easy
> and blazingly fast rather than unbearably slow.  The latter phrase is
> not one often associated with lucene.
>
> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
> NRTManager nrtm = new NRTManager(iw, null);
> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> ropt.setXxx(...);
> ...
> ropt.start();
>
> SearcherManager srchm = nrtm.getSearcherManager(b);
>
> Then add docs to your index via nrtm.addDocument(d), update with
> nrtm.updateDocument(...), and to search use
>
> IndexSearcher searcher = srchm.acquire();
> try {
>  search ...
> } finally {
>  srchm.release(searcher);
> }
>
> All thread safe so you don't have to worry about any complications
> there.  And I bet it'll be blindingly fast.
>
> Don't forget to close() things down at the end.
>
>
> --
> Ian.
>
>
>
> On Mon, Feb 6, 2012 at 12:15 AM, Cheng  wrote:
> > I was trying to, but don't know how to even I read some of your blogs.
> >
> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Are you using near-real-time readers?
> >>
> >> (IndexReader.open(IndexWriter))
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng  wrote:
> >> > Hi Uwe,
> >> >
> >> > My challenge is that I need to update/modify the indexes frequently
> while
> >> > providing the search capability. I was trying to use FSDirectory, but
> >> found
> >> > out that the reading and writing from/to FSDirectory is unbearably
> slow.
> >> So
> >> > I now am trying the RAMDirectory, which is fast.
> >> >
> >> > I don't know of  MMapDirectory, and wonder if it is as fast as
> >> RAMDirectory.
> >> >
> >> >
> >> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler 
> wrote:
> >> >
> >> >> Hi Cheng,
> >> >>
> >> >> It seems that you use a RAMDirectory for *caching*, otherwise it
> makes
> >> no
> >> >> sense to write changes back. In recent Lucene versions, this is not a
> >> good
> >> >> idea, especially for large indexes (RAMDirectory eats your heap
> space,
> >> >> allocates millions of small byte[] arrays,...). If you need something
> >> like
> >> >> a
> >> >> caching Directory and you are working on a 64bit platform, you can
> use
> >> >> MMapDirectory (where the operating system kernel manages the
> read/write
> >> >> between disk an memory). MMapDirectory is returned by default for
> >> >> FSDirectory.open() on most 64 bit platforms. The good thing: the
> >> "caching"
> >> >> space is outside your JVM heap, so does not slowdown the garbage
> >> collector.
> >> >> So be sure to *not* allocate too much heap space (-Xmx) to your
> search
> >> app,
> >> >> only the minimum needed to execute it and leave the rest of your RAM
> >> >> available for the OS kernel to manage FS cache.
> >> >>
> >> >> Uwe
> >> >>
> >> >> -
> >> >> Uwe Schindler
> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> >> http://www.thetaphi.de
> >> >> eMail: u...@thetaphi.de
> >> >>
> >> >>
> >> >> > -Original Message-
> >> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> >> > Sent: Sunday, February 05, 2012 7:56 AM
> >> >> > To: java-user@lucene.apache.org
> >> >> > Subject: Configure writer to write to FSDirectory?
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I build an RAMDirectory on a FSDirectory, and would like the writer
> >> >> associated
> >> >> > with the RAMDirectory to periodically write to hard drive.
> >> >> >
> >> >> > Is this achievable?
> >> >> >
> >> >> > Thanks.
> >> >>
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Custom Payload Analyzer and Query

2012-02-06 Thread Ian Lea
Not sure if you got an answer to this or not.  Don't recall seeing one
and gmail threading says not.

> Is the use of payloads I've described appropriate?

Sounds OK to me, although I'm not sure why you can't store the
metadata as a Document Field.

> Can I exclude/filter the matching terms based on the payload within a query 
>itself ?

I think not.  Could if the metadata was an indexed Field.



--
Ian.


On Mon, Jan 30, 2012 at 10:24 PM,   wrote:
> I'm working on providing advanced searching for annotated Medical
> Documents (using UIMA).  In the context of an annotated document, I
> identify relevant medical terms, as well as the negation of certain terms.
>  Following what I've read and seen in Lucene examples, I've been able to
> provide a search that takes into account the metadata contained in the
> payload.  Although very primitive, I've implemented a search which returns
> the payloads (using PayloadSpanUtil), and then excludes those terms where
> the payload doesn't meet the criteria.
>
> Is the use of payloads I've described appropriate?  Can I exclude/filter
> the matching terms based on the payload within a query itself ?   Are
> there any examples that do this?
>
> Cheers,
> Kyley

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Ian Lea
You would use NRTManagerReopenThread as a standalone thread, not
plugged into your Executor stuff.  It is a utility class which you
don't have to use.  See the javadocs.

But in your case I'd use it, to start with anyway.  Fire it up with
suitable settings and forget about it, except to call close()
eventually. Once you've got things up and running you can tweak things
as much as you want but you appear to be having trouble getting up and
running.

So ... somewhere in the initialisation code of your app, create an
IndexWriter, NRTManager + ReopenThread and SearcherManager as outlined
before.  Then pass the NRTManager to any/all write methods or threads
and the SearcherManager instance to any/all search methods or threads
and you're done.  If you want to use threads that are part of your
ExecutorService, fine.  Just wrap it all together in whatever
combination of Thread or Runnable instances you want.


Does that help?


--
Ian.


> I don't understand this following portion:
>
> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
> NRTManager nrtm = new NRTManager(iw, null);
> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> ropt.setXxx(...);
> 
> ropt.start();
>
> I have a java ExecutorServices instance running which take care of my own
> applications. I don't know how this NRTManagerReopenThread works with my
> own ExecutorService instance.
>
> Can both work together? How can the NRTManagerReopenThread instance ropt be
> plugged into my own multithreading framework?
>
> On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
>
>> If you can use NRTManager and SearcherManager things should be easy
>> and blazingly fast rather than unbearably slow.  The latter phrase is
>> not one often associated with lucene.
>>
>> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
>> NRTManager nrtm = new NRTManager(iw, null);
>> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
>> ropt.setXxx(...);
>> ...
>> ropt.start();
>>
>> SearcherManager srchm = nrtm.getSearcherManager(b);
>>
>> Then add docs to your index via nrtm.addDocument(d), update with
>> nrtm.updateDocument(...), and to search use
>>
>> IndexSearcher searcher = srchm.acquire();
>> try {
>>  search ...
>> } finally {
>>  srchm.release(searcher);
>> }
>>
>> All thread safe so you don't have to worry about any complications
>> there.  And I bet it'll be blindingly fast.
>>
>> Don't forget to close() things down at the end.
>>
>>
>> --
>> Ian.
>>
>>
>>
>> On Mon, Feb 6, 2012 at 12:15 AM, Cheng  wrote:
>> > I was trying to, but don't know how to even I read some of your blogs.
>> >
>> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
>> > luc...@mikemccandless.com> wrote:
>> >
>> >> Are you using near-real-time readers?
>> >>
>> >> (IndexReader.open(IndexWriter))
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng  wrote:
>> >> > Hi Uwe,
>> >> >
>> >> > My challenge is that I need to update/modify the indexes frequently
>> while
>> >> > providing the search capability. I was trying to use FSDirectory, but
>> >> found
>> >> > out that the reading and writing from/to FSDirectory is unbearably
>> slow.
>> >> So
>> >> > I now am trying the RAMDirectory, which is fast.
>> >> >
>> >> > I don't know of  MMapDirectory, and wonder if it is as fast as
>> >> RAMDirectory.
>> >> >
>> >> >
>> >> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler 
>> wrote:
>> >> >
>> >> >> Hi Cheng,
>> >> >>
>> >> >> It seems that you use a RAMDirectory for *caching*, otherwise it
>> makes
>> >> no
>> >> >> sense to write changes back. In recent Lucene versions, this is not a
>> >> good
>> >> >> idea, especially for large indexes (RAMDirectory eats your heap
>> space,
>> >> >> allocates millions of small byte[] arrays,...). If you need something
>> >> like
>> >> >> a
>> >> >> caching Directory and you are working on a 64bit platform, you can
>> use
>> >> >> MMapDirectory (where the operating system kernel manages the
>> read/write
>> >> >> between disk an memory). MMapDirectory is returned by default for
>> >> >> FSDirectory.open() on most 64 bit platforms. The good thing: the
>> >> "caching"
>> >> >> space is outside your JVM heap, so does not slowdown the garbage
>> >> collector.
>> >> >> So be sure to *not* allocate too much heap space (-Xmx) to your
>> search
>> >> app,
>> >> >> only the minimum needed to execute it and leave the rest of your RAM
>> >> >> available for the OS kernel to manage FS cache.
>> >> >>
>> >> >> Uwe
>> >> >>
>> >> >> -
>> >> >> Uwe Schindler
>> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> >> http://www.thetaphi.de
>> >> >> eMail: u...@thetaphi.de
>> >> >>
>> >> >>
>> >> >> > -Original Message-
>> >> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
>> >> >> > Sent: Sunday, February 05, 2012 7:56 AM
>> >> >> > To: java-user@lucene.apache.org
>> >> >> > Subject: Configure writer to write to FSDirectory?

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
That really helps! I will try it out.

Thanks.

On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:

> You would use NRTManagerReopenThread as a standalone thread, not
> plugged into your Executor stuff.  It is a utility class which you
> don't have to use.  See the javadocs.
>
> But in your case I'd use it, to start with anyway.  Fire it up with
> suitable settings and forget about it, except to call close()
> eventually. Once you've got things up and running you can tweak things
> as much as you want but you appear to be having trouble getting up and
> running.
>
> So ... somewhere in the initialisation code of your app, create an
> IndexWriter, NRTManager + ReopenThread and SearcherManager as outlined
> before.  Then pass the NRTManager to any/all write methods or threads
> and the SearcherManager instance to any/all search methods or threads
> and you're done.  If you want to use threads that are part of your
> ExecutorService, fine.  Just wrap it all together in whatever
> combination of Thread or Runnable instances you want.
>
>
> Does that help?
>
>
> --
> Ian.
>
>
> > I don't understand this following portion:
> >
> > IndexWriter iw = new IndexWriter(whatever - some standard disk index);
> > NRTManager nrtm = new NRTManager(iw, null);
> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> > ropt.setXxx(...);
> > 
> > ropt.start();
> >
> > I have a java ExecutorServices instance running which take care of my own
> > applications. I don't know how this NRTManagerReopenThread works with my
> > own ExecutorService instance.
> >
> > Can both work together? How can the NRTManagerReopenThread instance ropt
> be
> > plugged into my own multithreading framework?
> >
> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
> >
> >> If you can use NRTManager and SearcherManager things should be easy
> >> and blazingly fast rather than unbearably slow.  The latter phrase is
> >> not one often associated with lucene.
> >>
> >> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
> >> NRTManager nrtm = new NRTManager(iw, null);
> >> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> >> ropt.setXxx(...);
> >> ...
> >> ropt.start();
> >>
> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> >>
> >> Then add docs to your index via nrtm.addDocument(d), update with
> >> nrtm.updateDocument(...), and to search use
> >>
> >> IndexSearcher searcher = srchm.acquire();
> >> try {
> >>  search ...
> >> } finally {
> >>  srchm.release(searcher);
> >> }
> >>
> >> All thread safe so you don't have to worry about any complications
> >> there.  And I bet it'll be blindingly fast.
> >>
> >> Don't forget to close() things down at the end.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >>
> >> On Mon, Feb 6, 2012 at 12:15 AM, Cheng  wrote:
> >> > I was trying to, but don't know how to even I read some of your blogs.
> >> >
> >> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
> >> > luc...@mikemccandless.com> wrote:
> >> >
> >> >> Are you using near-real-time readers?
> >> >>
> >> >> (IndexReader.open(IndexWriter))
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng 
> wrote:
> >> >> > Hi Uwe,
> >> >> >
> >> >> > My challenge is that I need to update/modify the indexes frequently
> >> while
> >> >> > providing the search capability. I was trying to use FSDirectory,
> but
> >> >> found
> >> >> > out that the reading and writing from/to FSDirectory is unbearably
> >> slow.
> >> >> So
> >> >> > I now am trying the RAMDirectory, which is fast.
> >> >> >
> >> >> > I don't know of  MMapDirectory, and wonder if it is as fast as
> >> >> RAMDirectory.
> >> >> >
> >> >> >
> >> >> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler 
> >> wrote:
> >> >> >
> >> >> >> Hi Cheng,
> >> >> >>
> >> >> >> It seems that you use a RAMDirectory for *caching*, otherwise it
> >> makes
> >> >> no
> >> >> >> sense to write changes back. In recent Lucene versions, this is
> not a
> >> >> good
> >> >> >> idea, especially for large indexes (RAMDirectory eats your heap
> >> space,
> >> >> >> allocates millions of small byte[] arrays,...). If you need
> something
> >> >> like
> >> >> >> a
> >> >> >> caching Directory and you are working on a 64bit platform, you can
> >> use
> >> >> >> MMapDirectory (where the operating system kernel manages the
> >> read/write
> >> >> >> between disk an memory). MMapDirectory is returned by default for
> >> >> >> FSDirectory.open() on most 64 bit platforms. The good thing: the
> >> >> "caching"
> >> >> >> space is outside your JVM heap, so does not slowdown the garbage
> >> >> collector.
> >> >> >> So be sure to *not* allocate too much heap space (-Xmx) to your
> >> search
> >> >> app,
> >> >> >> only the minimum needed to execute it and leave the rest of your
> RAM
> >> >> >> available for the OS kernel to manage FS cache.
> >> >> >>
> >> >> >> Uwe
> >> >> >>
> >> >> >> -
> >> >> >> Uwe Sch

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Ian,

I encountered an issue that I need to frequently update the index. The
NRTManager seems not very helpful on this front as the speed is slower than
RAMDirectory is used.

Any improvement advice?



On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:

> That really helps! I will try it out.
>
> Thanks.
>
>
> On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
>
>> You would use NRTManagerReopenThread as a standalone thread, not
>> plugged into your Executor stuff.  It is a utility class which you
>> don't have to use.  See the javadocs.
>>
>> But in your case I'd use it, to start with anyway.  Fire it up with
>> suitable settings and forget about it, except to call close()
>> eventually. Once you've got things up and running you can tweak things
>> as much as you want but you appear to be having trouble getting up and
>> running.
>>
>> So ... somewhere in the initialisation code of your app, create an
>> IndexWriter, NRTManager + ReopenThread and SearcherManager as outlined
>> before.  Then pass the NRTManager to any/all write methods or threads
>> and the SearcherManager instance to any/all search methods or threads
>> and you're done.  If you want to use threads that are part of your
>> ExecutorService, fine.  Just wrap it all together in whatever
>> combination of Thread or Runnable instances you want.
>>
>>
>> Does that help?
>>
>>
>> --
>> Ian.
>>
>>
>> > I don't understand this following portion:
>> >
>> > IndexWriter iw = new IndexWriter(whatever - some standard disk index);
>> > NRTManager nrtm = new NRTManager(iw, null);
>> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
>> > ropt.setXxx(...);
>> > 
>> > ropt.start();
>> >
>> > I have a java ExecutorServices instance running which take care of my
>> own
>> > applications. I don't know how this NRTManagerReopenThread works with my
>> > own ExecutorService instance.
>> >
>> > Can both work together? How can the NRTManagerReopenThread instance
>> ropt be
>> > plugged into my own multithreading framework?
>> >
>> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
>> >
>> >> If you can use NRTManager and SearcherManager things should be easy
>> >> and blazingly fast rather than unbearably slow.  The latter phrase is
>> >> not one often associated with lucene.
>> >>
>> >> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
>> >> NRTManager nrtm = new NRTManager(iw, null);
>> >> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
>> >> ropt.setXxx(...);
>> >> ...
>> >> ropt.start();
>> >>
>> >> SearcherManager srchm = nrtm.getSearcherManager(b);
>> >>
>> >> Then add docs to your index via nrtm.addDocument(d), update with
>> >> nrtm.updateDocument(...), and to search use
>> >>
>> >> IndexSearcher searcher = srchm.acquire();
>> >> try {
>> >>  search ...
>> >> } finally {
>> >>  srchm.release(searcher);
>> >> }
>> >>
>> >> All thread safe so you don't have to worry about any complications
>> >> there.  And I bet it'll be blindingly fast.
>> >>
>> >> Don't forget to close() things down at the end.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >>
>> >> On Mon, Feb 6, 2012 at 12:15 AM, Cheng 
>> wrote:
>> >> > I was trying to, but don't know how to even I read some of your
>> blogs.
>> >> >
>> >> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
>> >> > luc...@mikemccandless.com> wrote:
>> >> >
>> >> >> Are you using near-real-time readers?
>> >> >>
>> >> >> (IndexReader.open(IndexWriter))
>> >> >>
>> >> >> Mike McCandless
>> >> >>
>> >> >> http://blog.mikemccandless.com
>> >> >>
>> >> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng 
>> wrote:
>> >> >> > Hi Uwe,
>> >> >> >
>> >> >> > My challenge is that I need to update/modify the indexes
>> frequently
>> >> while
>> >> >> > providing the search capability. I was trying to use FSDirectory,
>> but
>> >> >> found
>> >> >> > out that the reading and writing from/to FSDirectory is unbearably
>> >> slow.
>> >> >> So
>> >> >> > I now am trying the RAMDirectory, which is fast.
>> >> >> >
>> >> >> > I don't know of  MMapDirectory, and wonder if it is as fast as
>> >> >> RAMDirectory.
>> >> >> >
>> >> >> >
>> >> >> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler 
>> >> wrote:
>> >> >> >
>> >> >> >> Hi Cheng,
>> >> >> >>
>> >> >> >> It seems that you use a RAMDirectory for *caching*, otherwise it
>> >> makes
>> >> >> no
>> >> >> >> sense to write changes back. In recent Lucene versions, this is
>> not a
>> >> >> good
>> >> >> >> idea, especially for large indexes (RAMDirectory eats your heap
>> >> space,
>> >> >> >> allocates millions of small byte[] arrays,...). If you need
>> something
>> >> >> like
>> >> >> >> a
>> >> >> >> caching Directory and you are working on a 64bit platform, you
>> can
>> >> use
>> >> >> >> MMapDirectory (where the operating system kernel manages the
>> >> read/write
>> >> >> >> between disk an memory). MMapDirectory is returned by default for
>> >> >> >> FSDirectory.open() on most 64 bit platforms. The good thing: the
>> >> >> "cachi

RE: Configure writer to write to FSDirectory?

2012-02-06 Thread Uwe Schindler
Please review the following articles about NRT, absolutely instant updates
that are visible as they are done are almost impossible (even with
RAMDirectory):

http://goo.gl/mzAHt 
http://goo.gl/5RoPx
http://goo.gl/vSJ7x

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Cheng [mailto:zhoucheng2...@gmail.com]
> Sent: Monday, February 06, 2012 4:27 PM
> To: java-user@lucene.apache.org
> Subject: Re: Configure writer to write to FSDirectory?
> 
> Ian,
> 
> I encountered an issue that I need to frequently update the index. The
> NRTManager seems not very helpful on this front as the speed is slower
than
> RAMDirectory is used.
> 
> Any improvement advice?
> 
> 
> 
> On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:
> 
> > That really helps! I will try it out.
> >
> > Thanks.
> >
> >
> > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> >
> >> You would use NRTManagerReopenThread as a standalone thread, not
> >> plugged into your Executor stuff.  It is a utility class which you
> >> don't have to use.  See the javadocs.
> >>
> >> But in your case I'd use it, to start with anyway.  Fire it up with
> >> suitable settings and forget about it, except to call close()
> >> eventually. Once you've got things up and running you can tweak
> >> things as much as you want but you appear to be having trouble
> >> getting up and running.
> >>
> >> So ... somewhere in the initialisation code of your app, create an
> >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
> >> outlined before.  Then pass the NRTManager to any/all write methods
> >> or threads and the SearcherManager instance to any/all search methods
> >> or threads and you're done.  If you want to use threads that are part
> >> of your ExecutorService, fine.  Just wrap it all together in whatever
> >> combination of Thread or Runnable instances you want.
> >>
> >>
> >> Does that help?
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> > I don't understand this following portion:
> >> >
> >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> > index); NRTManager nrtm = new NRTManager(iw, null);
> >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
> >> > ...); ropt.setXxx(...); 
> >> > ropt.start();
> >> >
> >> > I have a java ExecutorServices instance running which take care of
> >> > my
> >> own
> >> > applications. I don't know how this NRTManagerReopenThread works
> >> > with my own ExecutorService instance.
> >> >
> >> > Can both work together? How can the NRTManagerReopenThread
> instance
> >> ropt be
> >> > plugged into my own multithreading framework?
> >> >
> >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
> >> >
> >> >> If you can use NRTManager and SearcherManager things should be
> >> >> easy and blazingly fast rather than unbearably slow.  The latter
> >> >> phrase is not one often associated with lucene.
> >> >>
> >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> >> index); NRTManager nrtm = new NRTManager(iw, null);
> >> >> NRTManagerReopenThread ropt = new
> NRTManagerReopenThread(nrtm,
> >> >> ...); ropt.setXxx(...); ...
> >> >> ropt.start();
> >> >>
> >> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> >> >>
> >> >> Then add docs to your index via nrtm.addDocument(d), update with
> >> >> nrtm.updateDocument(...), and to search use
> >> >>
> >> >> IndexSearcher searcher = srchm.acquire(); try {  search ...
> >> >> } finally {
> >> >>  srchm.release(searcher);
> >> >> }
> >> >>
> >> >> All thread safe so you don't have to worry about any complications
> >> >> there.  And I bet it'll be blindingly fast.
> >> >>
> >> >> Don't forget to close() things down at the end.
> >> >>
> >> >>
> >> >> --
> >> >> Ian.
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Feb 6, 2012 at 12:15 AM, Cheng 
> >> wrote:
> >> >> > I was trying to, but don't know how to even I read some of your
> >> blogs.
> >> >> >
> >> >> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
> >> >> > luc...@mikemccandless.com> wrote:
> >> >> >
> >> >> >> Are you using near-real-time readers?
> >> >> >>
> >> >> >> (IndexReader.open(IndexWriter))
> >> >> >>
> >> >> >> Mike McCandless
> >> >> >>
> >> >> >> http://blog.mikemccandless.com
> >> >> >>
> >> >> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng 
> >> wrote:
> >> >> >> > Hi Uwe,
> >> >> >> >
> >> >> >> > My challenge is that I need to update/modify the indexes
> >> frequently
> >> >> while
> >> >> >> > providing the search capability. I was trying to use
> >> >> >> > FSDirectory,
> >> but
> >> >> >> found
> >> >> >> > out that the reading and writing from/to FSDirectory is
> >> >> >> > unbearably
> >> >> slow.
> >> >> >> So
> >> >> >> > I now am trying the RAMDirectory, which is fast.
> >> >> >> >
> >> >> >> > I don't know of  MMapDirectory, and wonder if it is as fast
> >> >> >> > as
> >> >> >> RAMDirectory.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Sun, Feb 5, 2012 at 4:14 

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Ian Lea
What exactly do you mean by the "speed is slower"?  Time taken to
update the index?  Time taken for updates to become visible in search
results?  Time taken for searches to run on the IndexSearcher returned
from SearcherManager?  Something else?


--
Ian.


On Mon, Feb 6, 2012 at 3:27 PM, Cheng  wrote:
> Ian,
>
> I encountered an issue that I need to frequently update the index. The
> NRTManager seems not very helpful on this front as the speed is slower than
> RAMDirectory is used.
>
> Any improvement advice?
>
>
>
> On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:
>
>> That really helps! I will try it out.
>>
>> Thanks.
>>
>>
>> On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
>>
>>> You would use NRTManagerReopenThread as a standalone thread, not
>>> plugged into your Executor stuff.  It is a utility class which you
>>> don't have to use.  See the javadocs.
>>>
>>> But in your case I'd use it, to start with anyway.  Fire it up with
>>> suitable settings and forget about it, except to call close()
>>> eventually. Once you've got things up and running you can tweak things
>>> as much as you want but you appear to be having trouble getting up and
>>> running.
>>>
>>> So ... somewhere in the initialisation code of your app, create an
>>> IndexWriter, NRTManager + ReopenThread and SearcherManager as outlined
>>> before.  Then pass the NRTManager to any/all write methods or threads
>>> and the SearcherManager instance to any/all search methods or threads
>>> and you're done.  If you want to use threads that are part of your
>>> ExecutorService, fine.  Just wrap it all together in whatever
>>> combination of Thread or Runnable instances you want.
>>>
>>>
>>> Does that help?
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> > I don't understand this following portion:
>>> >
>>> > IndexWriter iw = new IndexWriter(whatever - some standard disk index);
>>> > NRTManager nrtm = new NRTManager(iw, null);
>>> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
>>> > ropt.setXxx(...);
>>> > 
>>> > ropt.start();
>>> >
>>> > I have a java ExecutorServices instance running which take care of my
>>> own
>>> > applications. I don't know how this NRTManagerReopenThread works with my
>>> > own ExecutorService instance.
>>> >
>>> > Can both work together? How can the NRTManagerReopenThread instance
>>> ropt be
>>> > plugged into my own multithreading framework?
>>> >
>>> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
>>> >
>>> >> If you can use NRTManager and SearcherManager things should be easy
>>> >> and blazingly fast rather than unbearably slow.  The latter phrase is
>>> >> not one often associated with lucene.
>>> >>
>>> >> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
>>> >> NRTManager nrtm = new NRTManager(iw, null);
>>> >> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
>>> >> ropt.setXxx(...);
>>> >> ...
>>> >> ropt.start();
>>> >>
>>> >> SearcherManager srchm = nrtm.getSearcherManager(b);
>>> >>
>>> >> Then add docs to your index via nrtm.addDocument(d), update with
>>> >> nrtm.updateDocument(...), and to search use
>>> >>
>>> >> IndexSearcher searcher = srchm.acquire();
>>> >> try {
>>> >>  search ...
>>> >> } finally {
>>> >>  srchm.release(searcher);
>>> >> }
>>> >>
>>> >> All thread safe so you don't have to worry about any complications
>>> >> there.  And I bet it'll be blindingly fast.
>>> >>
>>> >> Don't forget to close() things down at the end.
>>> >>
>>> >>
>>> >> --
>>> >> Ian.
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Feb 6, 2012 at 12:15 AM, Cheng 
>>> wrote:
>>> >> > I was trying to, but don't know how to even I read some of your
>>> blogs.
>>> >> >
>>> >> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
>>> >> > luc...@mikemccandless.com> wrote:
>>> >> >
>>> >> >> Are you using near-real-time readers?
>>> >> >>
>>> >> >> (IndexReader.open(IndexWriter))
>>> >> >>
>>> >> >> Mike McCandless
>>> >> >>
>>> >> >> http://blog.mikemccandless.com
>>> >> >>
>>> >> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng 
>>> wrote:
>>> >> >> > Hi Uwe,
>>> >> >> >
>>> >> >> > My challenge is that I need to update/modify the indexes
>>> frequently
>>> >> while
>>> >> >> > providing the search capability. I was trying to use FSDirectory,
>>> but
>>> >> >> found
>>> >> >> > out that the reading and writing from/to FSDirectory is unbearably
>>> >> slow.
>>> >> >> So
>>> >> >> > I now am trying the RAMDirectory, which is fast.
>>> >> >> >
>>> >> >> > I don't know of  MMapDirectory, and wonder if it is as fast as
>>> >> >> RAMDirectory.
>>> >> >> >
>>> >> >> >
>>> >> >> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler 
>>> >> wrote:
>>> >> >> >
>>> >> >> >> Hi Cheng,
>>> >> >> >>
>>> >> >> >> It seems that you use a RAMDirectory for *caching*, otherwise it
>>> >> makes
>>> >> >> no
>>> >> >> >> sense to write changes back. In recent Lucene versions, this is
>>> not a
>>> >> >> good
>>> >> >> >> idea, especially for large indexes (RAMDirectory eats your heap
>>> >> space,
>>> >> >

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Uwe, when I meant speed is slow, I didn't refer to instant visibility of
changes, but that the changes may be synchronized with FSDirectory when I
use writer.commit().

When I use RAMDirectory, the writer.commit() seems much faster than using
NRTManager built upon FSDirectory. So, I am guessing the difference is the
index synchronization.



On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:

> Please review the following articles about NRT, absolutely instant updates
> that are visible as they are done are almost impossible (even with
> RAMDirectory):
>
> http://goo.gl/mzAHt
> http://goo.gl/5RoPx
> http://goo.gl/vSJ7x
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Monday, February 06, 2012 4:27 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Configure writer to write to FSDirectory?
> >
> > Ian,
> >
> > I encountered an issue that I need to frequently update the index. The
> > NRTManager seems not very helpful on this front as the speed is slower
> than
> > RAMDirectory is used.
> >
> > Any improvement advice?
> >
> >
> >
> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:
> >
> > > That really helps! I will try it out.
> > >
> > > Thanks.
> > >
> > >
> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> > >
> > >> You would use NRTManagerReopenThread as a standalone thread, not
> > >> plugged into your Executor stuff.  It is a utility class which you
> > >> don't have to use.  See the javadocs.
> > >>
> > >> But in your case I'd use it, to start with anyway.  Fire it up with
> > >> suitable settings and forget about it, except to call close()
> > >> eventually. Once you've got things up and running you can tweak
> > >> things as much as you want but you appear to be having trouble
> > >> getting up and running.
> > >>
> > >> So ... somewhere in the initialisation code of your app, create an
> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
> > >> outlined before.  Then pass the NRTManager to any/all write methods
> > >> or threads and the SearcherManager instance to any/all search methods
> > >> or threads and you're done.  If you want to use threads that are part
> > >> of your ExecutorService, fine.  Just wrap it all together in whatever
> > >> combination of Thread or Runnable instances you want.
> > >>
> > >>
> > >> Does that help?
> > >>
> > >>
> > >> --
> > >> Ian.
> > >>
> > >>
> > >> > I don't understand this following portion:
> > >> >
> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
> > >> > ...); ropt.setXxx(...); 
> > >> > ropt.start();
> > >> >
> > >> > I have a java ExecutorServices instance running which take care of
> > >> > my
> > >> own
> > >> > applications. I don't know how this NRTManagerReopenThread works
> > >> > with my own ExecutorService instance.
> > >> >
> > >> > Can both work together? How can the NRTManagerReopenThread
> > instance
> > >> ropt be
> > >> > plugged into my own multithreading framework?
> > >> >
> > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
> > >> >
> > >> >> If you can use NRTManager and SearcherManager things should be
> > >> >> easy and blazingly fast rather than unbearably slow.  The latter
> > >> >> phrase is not one often associated with lucene.
> > >> >>
> > >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> > >> >> index); NRTManager nrtm = new NRTManager(iw, null);
> > >> >> NRTManagerReopenThread ropt = new
> > NRTManagerReopenThread(nrtm,
> > >> >> ...); ropt.setXxx(...); ...
> > >> >> ropt.start();
> > >> >>
> > >> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> > >> >>
> > >> >> Then add docs to your index via nrtm.addDocument(d), update with
> > >> >> nrtm.updateDocument(...), and to search use
> > >> >>
> > >> >> IndexSearcher searcher = srchm.acquire(); try {  search ...
> > >> >> } finally {
> > >> >>  srchm.release(searcher);
> > >> >> }
> > >> >>
> > >> >> All thread safe so you don't have to worry about any complications
> > >> >> there.  And I bet it'll be blindingly fast.
> > >> >>
> > >> >> Don't forget to close() things down at the end.
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Ian.
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Mon, Feb 6, 2012 at 12:15 AM, Cheng 
> > >> wrote:
> > >> >> > I was trying to, but don't know how to even I read some of your
> > >> blogs.
> > >> >> >
> > >> >> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
> > >> >> > luc...@mikemccandless.com> wrote:
> > >> >> >
> > >> >> >> Are you using near-real-time readers?
> > >> >> >>
> > >> >> >> (IndexReader.open(IndexWriter))
> > >> >> >>
> > >> >> >> Mike McCandless
> > >> >> >>
> > >> >> >> http://blog.mikemccandless.com
> > >> >> >>
> > >> >> >> On Sun, Feb 5, 2

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
I meant that when I use NRTManager and use commit(), the speed is slower
than when I use RAMDirectory.

In my case, NRTManager instance not only perform search but update/modify
indexes which should be visible to other threads. In RAMDirectory, the
commit() doesn't synchronize indexes with the FSDirectory. The slower speed
of using NRTManager built upon FSDirectory may be caused by the frequent
updates or modification of indexes.

That is my guess.

On Mon, Feb 6, 2012 at 11:41 PM, Ian Lea  wrote:

> What exactly do you mean by the "speed is slower"?  Time taken to
> update the index?  Time taken for updates to become visible in search
> results?  Time taken for searches to run on the IndexSearcher returned
> from SearcherManager?  Something else?
>
>
> --
> Ian.
>
>
> On Mon, Feb 6, 2012 at 3:27 PM, Cheng  wrote:
> > Ian,
> >
> > I encountered an issue that I need to frequently update the index. The
> > NRTManager seems not very helpful on this front as the speed is slower
> than
> > RAMDirectory is used.
> >
> > Any improvement advice?
> >
> >
> >
> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:
> >
> >> That really helps! I will try it out.
> >>
> >> Thanks.
> >>
> >>
> >> On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> >>
> >>> You would use NRTManagerReopenThread as a standalone thread, not
> >>> plugged into your Executor stuff.  It is a utility class which you
> >>> don't have to use.  See the javadocs.
> >>>
> >>> But in your case I'd use it, to start with anyway.  Fire it up with
> >>> suitable settings and forget about it, except to call close()
> >>> eventually. Once you've got things up and running you can tweak things
> >>> as much as you want but you appear to be having trouble getting up and
> >>> running.
> >>>
> >>> So ... somewhere in the initialisation code of your app, create an
> >>> IndexWriter, NRTManager + ReopenThread and SearcherManager as outlined
> >>> before.  Then pass the NRTManager to any/all write methods or threads
> >>> and the SearcherManager instance to any/all search methods or threads
> >>> and you're done.  If you want to use threads that are part of your
> >>> ExecutorService, fine.  Just wrap it all together in whatever
> >>> combination of Thread or Runnable instances you want.
> >>>
> >>>
> >>> Does that help?
> >>>
> >>>
> >>> --
> >>> Ian.
> >>>
> >>>
> >>> > I don't understand this following portion:
> >>> >
> >>> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> index);
> >>> > NRTManager nrtm = new NRTManager(iw, null);
> >>> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> >>> > ropt.setXxx(...);
> >>> > 
> >>> > ropt.start();
> >>> >
> >>> > I have a java ExecutorServices instance running which take care of my
> >>> own
> >>> > applications. I don't know how this NRTManagerReopenThread works
> with my
> >>> > own ExecutorService instance.
> >>> >
> >>> > Can both work together? How can the NRTManagerReopenThread instance
> >>> ropt be
> >>> > plugged into my own multithreading framework?
> >>> >
> >>> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
> >>> >
> >>> >> If you can use NRTManager and SearcherManager things should be easy
> >>> >> and blazingly fast rather than unbearably slow.  The latter phrase
> is
> >>> >> not one often associated with lucene.
> >>> >>
> >>> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> index);
> >>> >> NRTManager nrtm = new NRTManager(iw, null);
> >>> >> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> >>> >> ropt.setXxx(...);
> >>> >> ...
> >>> >> ropt.start();
> >>> >>
> >>> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> >>> >>
> >>> >> Then add docs to your index via nrtm.addDocument(d), update with
> >>> >> nrtm.updateDocument(...), and to search use
> >>> >>
> >>> >> IndexSearcher searcher = srchm.acquire();
> >>> >> try {
> >>> >>  search ...
> >>> >> } finally {
> >>> >>  srchm.release(searcher);
> >>> >> }
> >>> >>
> >>> >> All thread safe so you don't have to worry about any complications
> >>> >> there.  And I bet it'll be blindingly fast.
> >>> >>
> >>> >> Don't forget to close() things down at the end.
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Ian.
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Mon, Feb 6, 2012 at 12:15 AM, Cheng 
> >>> wrote:
> >>> >> > I was trying to, but don't know how to even I read some of your
> >>> blogs.
> >>> >> >
> >>> >> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
> >>> >> > luc...@mikemccandless.com> wrote:
> >>> >> >
> >>> >> >> Are you using near-real-time readers?
> >>> >> >>
> >>> >> >> (IndexReader.open(IndexWriter))
> >>> >> >>
> >>> >> >> Mike McCandless
> >>> >> >>
> >>> >> >> http://blog.mikemccandless.com
> >>> >> >>
> >>> >> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng 
> >>> wrote:
> >>> >> >> > Hi Uwe,
> >>> >> >> >
> >>> >> >> > My challenge is that I need to update/modify the indexes
> >>> frequently
> >>> >> while
> >>> >> >> > providing the search capability. I was 

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Ian Lea
Well, yes.  What would you expect?  From the javadocs for IndexWriter.commit()

Commits all pending changes (added & deleted documents, segment
merges, added indexes, etc.) to the index, and syncs all referenced
index files ... This may be a costly operation, so you should test the
cost in your application and do it only when really necessary.

If you are using NRTManager why do you care how long this takes?  How
often are you calling it?  Why?


--
Ian.


On Mon, Feb 6, 2012 at 3:45 PM, Cheng  wrote:
> Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> changes, but that the changes may be synchronized with FSDirectory when I
> use writer.commit().
>
> When I use RAMDirectory, the writer.commit() seems much faster than using
> NRTManager built upon FSDirectory. So, I am guessing the difference is the
> index synchronization.
>
>
>
> On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
>
>> Please review the following articles about NRT, absolutely instant updates
>> that are visible as they are done are almost impossible (even with
>> RAMDirectory):
>>
>> http://goo.gl/mzAHt
>> http://goo.gl/5RoPx
>> http://goo.gl/vSJ7x
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>> > -Original Message-
>> > From: Cheng [mailto:zhoucheng2...@gmail.com]
>> > Sent: Monday, February 06, 2012 4:27 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: Configure writer to write to FSDirectory?
>> >
>> > Ian,
>> >
>> > I encountered an issue that I need to frequently update the index. The
>> > NRTManager seems not very helpful on this front as the speed is slower
>> than
>> > RAMDirectory is used.
>> >
>> > Any improvement advice?
>> >
>> >
>> >
>> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:
>> >
>> > > That really helps! I will try it out.
>> > >
>> > > Thanks.
>> > >
>> > >
>> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
>> > >
>> > >> You would use NRTManagerReopenThread as a standalone thread, not
>> > >> plugged into your Executor stuff.  It is a utility class which you
>> > >> don't have to use.  See the javadocs.
>> > >>
>> > >> But in your case I'd use it, to start with anyway.  Fire it up with
>> > >> suitable settings and forget about it, except to call close()
>> > >> eventually. Once you've got things up and running you can tweak
>> > >> things as much as you want but you appear to be having trouble
>> > >> getting up and running.
>> > >>
>> > >> So ... somewhere in the initialisation code of your app, create an
>> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
>> > >> outlined before.  Then pass the NRTManager to any/all write methods
>> > >> or threads and the SearcherManager instance to any/all search methods
>> > >> or threads and you're done.  If you want to use threads that are part
>> > >> of your ExecutorService, fine.  Just wrap it all together in whatever
>> > >> combination of Thread or Runnable instances you want.
>> > >>
>> > >>
>> > >> Does that help?
>> > >>
>> > >>
>> > >> --
>> > >> Ian.
>> > >>
>> > >>
>> > >> > I don't understand this following portion:
>> > >> >
>> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
>> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
>> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
>> > >> > ...); ropt.setXxx(...); 
>> > >> > ropt.start();
>> > >> >
>> > >> > I have a java ExecutorServices instance running which take care of
>> > >> > my
>> > >> own
>> > >> > applications. I don't know how this NRTManagerReopenThread works
>> > >> > with my own ExecutorService instance.
>> > >> >
>> > >> > Can both work together? How can the NRTManagerReopenThread
>> > instance
>> > >> ropt be
>> > >> > plugged into my own multithreading framework?
>> > >> >
>> > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
>> > >> >
>> > >> >> If you can use NRTManager and SearcherManager things should be
>> > >> >> easy and blazingly fast rather than unbearably slow.  The latter
>> > >> >> phrase is not one often associated with lucene.
>> > >> >>
>> > >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
>> > >> >> index); NRTManager nrtm = new NRTManager(iw, null);
>> > >> >> NRTManagerReopenThread ropt = new
>> > NRTManagerReopenThread(nrtm,
>> > >> >> ...); ropt.setXxx(...); ...
>> > >> >> ropt.start();
>> > >> >>
>> > >> >> SearcherManager srchm = nrtm.getSearcherManager(b);
>> > >> >>
>> > >> >> Then add docs to your index via nrtm.addDocument(d), update with
>> > >> >> nrtm.updateDocument(...), and to search use
>> > >> >>
>> > >> >> IndexSearcher searcher = srchm.acquire(); try {  search ...
>> > >> >> } finally {
>> > >> >>  srchm.release(searcher);
>> > >> >> }
>> > >> >>
>> > >> >> All thread safe so you don't have to worry about any complications
>> > >> >> there.  And I bet it'll be blindingly fast.
>> > >> >>
>> > >> >> Don't forget to close() th

RE: Configure writer to write to FSDirectory?

2012-02-06 Thread Uwe Schindler
Hi Cheng,

all pros and cons are explained in those articles written by Mike! As soon
as there are harddisks in the game, there is a slowdown, what do you expect?
If you need it faster, buy SSDs! :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Cheng [mailto:zhoucheng2...@gmail.com]
> Sent: Monday, February 06, 2012 4:45 PM
> To: java-user@lucene.apache.org
> Subject: Re: Configure writer to write to FSDirectory?
> 
> Uwe, when I meant speed is slow, I didn't refer to instant visibility of
changes,
> but that the changes may be synchronized with FSDirectory when I use
> writer.commit().
> 
> When I use RAMDirectory, the writer.commit() seems much faster than using
> NRTManager built upon FSDirectory. So, I am guessing the difference is the
> index synchronization.
> 
> 
> 
> On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
> 
> > Please review the following articles about NRT, absolutely instant
> > updates that are visible as they are done are almost impossible (even
> > with
> > RAMDirectory):
> >
> > http://goo.gl/mzAHt
> > http://goo.gl/5RoPx
> > http://goo.gl/vSJ7x
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> > > -Original Message-
> > > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > > Sent: Monday, February 06, 2012 4:27 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Configure writer to write to FSDirectory?
> > >
> > > Ian,
> > >
> > > I encountered an issue that I need to frequently update the index.
> > > The NRTManager seems not very helpful on this front as the speed is
> > > slower
> > than
> > > RAMDirectory is used.
> > >
> > > Any improvement advice?
> > >
> > >
> > >
> > > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> wrote:
> > >
> > > > That really helps! I will try it out.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> > > >
> > > >> You would use NRTManagerReopenThread as a standalone thread, not
> > > >> plugged into your Executor stuff.  It is a utility class which
> > > >> you don't have to use.  See the javadocs.
> > > >>
> > > >> But in your case I'd use it, to start with anyway.  Fire it up
> > > >> with suitable settings and forget about it, except to call
> > > >> close() eventually. Once you've got things up and running you can
> > > >> tweak things as much as you want but you appear to be having
> > > >> trouble getting up and running.
> > > >>
> > > >> So ... somewhere in the initialisation code of your app, create
> > > >> an IndexWriter, NRTManager + ReopenThread and SearcherManager as
> > > >> outlined before.  Then pass the NRTManager to any/all write
> > > >> methods or threads and the SearcherManager instance to any/all
> > > >> search methods or threads and you're done.  If you want to use
> > > >> threads that are part of your ExecutorService, fine.  Just wrap
> > > >> it all together in whatever combination of Thread or Runnable
instances
> you want.
> > > >>
> > > >>
> > > >> Does that help?
> > > >>
> > > >>
> > > >> --
> > > >> Ian.
> > > >>
> > > >>
> > > >> > I don't understand this following portion:
> > > >> >
> > > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> > > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> > > >> > NRTManagerReopenThread ropt = new
> NRTManagerReopenThread(nrtm,
> > > >> > ...); ropt.setXxx(...); 
> > > >> > ropt.start();
> > > >> >
> > > >> > I have a java ExecutorServices instance running which take care
> > > >> > of my
> > > >> own
> > > >> > applications. I don't know how this NRTManagerReopenThread
> > > >> > works with my own ExecutorService instance.
> > > >> >
> > > >> > Can both work together? How can the NRTManagerReopenThread
> > > instance
> > > >> ropt be
> > > >> > plugged into my own multithreading framework?
> > > >> >
> > > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea 
wrote:
> > > >> >
> > > >> >> If you can use NRTManager and SearcherManager things should be
> > > >> >> easy and blazingly fast rather than unbearably slow.  The
> > > >> >> latter phrase is not one often associated with lucene.
> > > >> >>
> > > >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> > > >> >> index); NRTManager nrtm = new NRTManager(iw, null);
> > > >> >> NRTManagerReopenThread ropt = new
> > > NRTManagerReopenThread(nrtm,
> > > >> >> ...); ropt.setXxx(...); ...
> > > >> >> ropt.start();
> > > >> >>
> > > >> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> > > >> >>
> > > >> >> Then add docs to your index via nrtm.addDocument(d), update
> > > >> >> with nrtm.updateDocument(...), and to search use
> > > >> >>
> > > >> >> IndexSearcher searcher = srchm.acquire(); try {  search ...
> > > >> >> } finally {
> > > >> >>  srchm.release(searcher);
> > > >> >> }
> > > >> >>
> > > >> >> All thread safe so yo

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
My original question is if there exists a way to configure writer when to
writer to FSDirectory. I think there may be something in
the IndexWriterConfig that can helps.

On Mon, Feb 6, 2012 at 11:50 PM, Ian Lea  wrote:

> Well, yes.  What would you expect?  From the javadocs for
> IndexWriter.commit()
>
> Commits all pending changes (added & deleted documents, segment
> merges, added indexes, etc.) to the index, and syncs all referenced
> index files ... This may be a costly operation, so you should test the
> cost in your application and do it only when really necessary.
>
> If you are using NRTManager why do you care how long this takes?  How
> often are you calling it?  Why?
>
>
> --
> Ian.
>
>
> On Mon, Feb 6, 2012 at 3:45 PM, Cheng  wrote:
> > Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> > changes, but that the changes may be synchronized with FSDirectory when I
> > use writer.commit().
> >
> > When I use RAMDirectory, the writer.commit() seems much faster than using
> > NRTManager built upon FSDirectory. So, I am guessing the difference is
> the
> > index synchronization.
> >
> >
> >
> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
> >
> >> Please review the following articles about NRT, absolutely instant
> updates
> >> that are visible as they are done are almost impossible (even with
> >> RAMDirectory):
> >>
> >> http://goo.gl/mzAHt
> >> http://goo.gl/5RoPx
> >> http://goo.gl/vSJ7x
> >>
> >> Uwe
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >> > -Original Message-
> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> > Sent: Monday, February 06, 2012 4:27 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Re: Configure writer to write to FSDirectory?
> >> >
> >> > Ian,
> >> >
> >> > I encountered an issue that I need to frequently update the index. The
> >> > NRTManager seems not very helpful on this front as the speed is slower
> >> than
> >> > RAMDirectory is used.
> >> >
> >> > Any improvement advice?
> >> >
> >> >
> >> >
> >> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> wrote:
> >> >
> >> > > That really helps! I will try it out.
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> >> > >
> >> > >> You would use NRTManagerReopenThread as a standalone thread, not
> >> > >> plugged into your Executor stuff.  It is a utility class which you
> >> > >> don't have to use.  See the javadocs.
> >> > >>
> >> > >> But in your case I'd use it, to start with anyway.  Fire it up with
> >> > >> suitable settings and forget about it, except to call close()
> >> > >> eventually. Once you've got things up and running you can tweak
> >> > >> things as much as you want but you appear to be having trouble
> >> > >> getting up and running.
> >> > >>
> >> > >> So ... somewhere in the initialisation code of your app, create an
> >> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
> >> > >> outlined before.  Then pass the NRTManager to any/all write methods
> >> > >> or threads and the SearcherManager instance to any/all search
> methods
> >> > >> or threads and you're done.  If you want to use threads that are
> part
> >> > >> of your ExecutorService, fine.  Just wrap it all together in
> whatever
> >> > >> combination of Thread or Runnable instances you want.
> >> > >>
> >> > >>
> >> > >> Does that help?
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Ian.
> >> > >>
> >> > >>
> >> > >> > I don't understand this following portion:
> >> > >> >
> >> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> >> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
> >> > >> > ...); ropt.setXxx(...); 
> >> > >> > ropt.start();
> >> > >> >
> >> > >> > I have a java ExecutorServices instance running which take care
> of
> >> > >> > my
> >> > >> own
> >> > >> > applications. I don't know how this NRTManagerReopenThread works
> >> > >> > with my own ExecutorService instance.
> >> > >> >
> >> > >> > Can both work together? How can the NRTManagerReopenThread
> >> > instance
> >> > >> ropt be
> >> > >> > plugged into my own multithreading framework?
> >> > >> >
> >> > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea 
> wrote:
> >> > >> >
> >> > >> >> If you can use NRTManager and SearcherManager things should be
> >> > >> >> easy and blazingly fast rather than unbearably slow.  The latter
> >> > >> >> phrase is not one often associated with lucene.
> >> > >> >>
> >> > >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> > >> >> index); NRTManager nrtm = new NRTManager(iw, null);
> >> > >> >> NRTManagerReopenThread ropt = new
> >> > NRTManagerReopenThread(nrtm,
> >> > >> >> ...); ropt.setXxx(...); ...
> >> > >> >> ropt.start();
> >> > >> >>
> >> > >> >> SearcherManager srchm = nrtm.getSearcherManager(b);

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Agree.

On Mon, Feb 6, 2012 at 11:53 PM, Uwe Schindler  wrote:

> Hi Cheng,
>
> all pros and cons are explained in those articles written by Mike! As soon
> as there are harddisks in the game, there is a slowdown, what do you
> expect?
> If you need it faster, buy SSDs! :-)
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Monday, February 06, 2012 4:45 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Configure writer to write to FSDirectory?
> >
> > Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> changes,
> > but that the changes may be synchronized with FSDirectory when I use
> > writer.commit().
> >
> > When I use RAMDirectory, the writer.commit() seems much faster than using
> > NRTManager built upon FSDirectory. So, I am guessing the difference is
> the
> > index synchronization.
> >
> >
> >
> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
> >
> > > Please review the following articles about NRT, absolutely instant
> > > updates that are visible as they are done are almost impossible (even
> > > with
> > > RAMDirectory):
> > >
> > > http://goo.gl/mzAHt
> > > http://goo.gl/5RoPx
> > > http://goo.gl/vSJ7x
> > >
> > > Uwe
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: u...@thetaphi.de
> > >
> > > > -Original Message-
> > > > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > > > Sent: Monday, February 06, 2012 4:27 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: Configure writer to write to FSDirectory?
> > > >
> > > > Ian,
> > > >
> > > > I encountered an issue that I need to frequently update the index.
> > > > The NRTManager seems not very helpful on this front as the speed is
> > > > slower
> > > than
> > > > RAMDirectory is used.
> > > >
> > > > Any improvement advice?
> > > >
> > > >
> > > >
> > > > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> > wrote:
> > > >
> > > > > That really helps! I will try it out.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea 
> wrote:
> > > > >
> > > > >> You would use NRTManagerReopenThread as a standalone thread, not
> > > > >> plugged into your Executor stuff.  It is a utility class which
> > > > >> you don't have to use.  See the javadocs.
> > > > >>
> > > > >> But in your case I'd use it, to start with anyway.  Fire it up
> > > > >> with suitable settings and forget about it, except to call
> > > > >> close() eventually. Once you've got things up and running you can
> > > > >> tweak things as much as you want but you appear to be having
> > > > >> trouble getting up and running.
> > > > >>
> > > > >> So ... somewhere in the initialisation code of your app, create
> > > > >> an IndexWriter, NRTManager + ReopenThread and SearcherManager as
> > > > >> outlined before.  Then pass the NRTManager to any/all write
> > > > >> methods or threads and the SearcherManager instance to any/all
> > > > >> search methods or threads and you're done.  If you want to use
> > > > >> threads that are part of your ExecutorService, fine.  Just wrap
> > > > >> it all together in whatever combination of Thread or Runnable
> instances
> > you want.
> > > > >>
> > > > >>
> > > > >> Does that help?
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Ian.
> > > > >>
> > > > >>
> > > > >> > I don't understand this following portion:
> > > > >> >
> > > > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> > > > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> > > > >> > NRTManagerReopenThread ropt = new
> > NRTManagerReopenThread(nrtm,
> > > > >> > ...); ropt.setXxx(...); 
> > > > >> > ropt.start();
> > > > >> >
> > > > >> > I have a java ExecutorServices instance running which take care
> > > > >> > of my
> > > > >> own
> > > > >> > applications. I don't know how this NRTManagerReopenThread
> > > > >> > works with my own ExecutorService instance.
> > > > >> >
> > > > >> > Can both work together? How can the NRTManagerReopenThread
> > > > instance
> > > > >> ropt be
> > > > >> > plugged into my own multithreading framework?
> > > > >> >
> > > > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea 
> wrote:
> > > > >> >
> > > > >> >> If you can use NRTManager and SearcherManager things should be
> > > > >> >> easy and blazingly fast rather than unbearably slow.  The
> > > > >> >> latter phrase is not one often associated with lucene.
> > > > >> >>
> > > > >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> > > > >> >> index); NRTManager nrtm = new NRTManager(iw, null);
> > > > >> >> NRTManagerReopenThread ropt = new
> > > > NRTManagerReopenThread(nrtm,
> > > > >> >> ...); ropt.setXxx(...); ...
> > > > >> >> ropt.start();
> > > > >> >>
> > > > >> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> > > >

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Michael McCandless
You shouldn't call IW.commit when using NRT; that's the point of NRT
(making changes visible w/o calling commit).

Only call commit when you require that all changes be durable (surive
OS / JVM crash, power loss, etc.) on disk.

Also, you can use NRTCachingDirectory which acts like RAMDirectory for
small flushed segments.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Feb 6, 2012 at 10:45 AM, Cheng  wrote:
> Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> changes, but that the changes may be synchronized with FSDirectory when I
> use writer.commit().
>
> When I use RAMDirectory, the writer.commit() seems much faster than using
> NRTManager built upon FSDirectory. So, I am guessing the difference is the
> index synchronization.
>
>
>
> On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
>
>> Please review the following articles about NRT, absolutely instant updates
>> that are visible as they are done are almost impossible (even with
>> RAMDirectory):
>>
>> http://goo.gl/mzAHt
>> http://goo.gl/5RoPx
>> http://goo.gl/vSJ7x
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>> > -Original Message-
>> > From: Cheng [mailto:zhoucheng2...@gmail.com]
>> > Sent: Monday, February 06, 2012 4:27 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: Configure writer to write to FSDirectory?
>> >
>> > Ian,
>> >
>> > I encountered an issue that I need to frequently update the index. The
>> > NRTManager seems not very helpful on this front as the speed is slower
>> than
>> > RAMDirectory is used.
>> >
>> > Any improvement advice?
>> >
>> >
>> >
>> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:
>> >
>> > > That really helps! I will try it out.
>> > >
>> > > Thanks.
>> > >
>> > >
>> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
>> > >
>> > >> You would use NRTManagerReopenThread as a standalone thread, not
>> > >> plugged into your Executor stuff.  It is a utility class which you
>> > >> don't have to use.  See the javadocs.
>> > >>
>> > >> But in your case I'd use it, to start with anyway.  Fire it up with
>> > >> suitable settings and forget about it, except to call close()
>> > >> eventually. Once you've got things up and running you can tweak
>> > >> things as much as you want but you appear to be having trouble
>> > >> getting up and running.
>> > >>
>> > >> So ... somewhere in the initialisation code of your app, create an
>> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
>> > >> outlined before.  Then pass the NRTManager to any/all write methods
>> > >> or threads and the SearcherManager instance to any/all search methods
>> > >> or threads and you're done.  If you want to use threads that are part
>> > >> of your ExecutorService, fine.  Just wrap it all together in whatever
>> > >> combination of Thread or Runnable instances you want.
>> > >>
>> > >>
>> > >> Does that help?
>> > >>
>> > >>
>> > >> --
>> > >> Ian.
>> > >>
>> > >>
>> > >> > I don't understand this following portion:
>> > >> >
>> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
>> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
>> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
>> > >> > ...); ropt.setXxx(...); 
>> > >> > ropt.start();
>> > >> >
>> > >> > I have a java ExecutorServices instance running which take care of
>> > >> > my
>> > >> own
>> > >> > applications. I don't know how this NRTManagerReopenThread works
>> > >> > with my own ExecutorService instance.
>> > >> >
>> > >> > Can both work together? How can the NRTManagerReopenThread
>> > instance
>> > >> ropt be
>> > >> > plugged into my own multithreading framework?
>> > >> >
>> > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
>> > >> >
>> > >> >> If you can use NRTManager and SearcherManager things should be
>> > >> >> easy and blazingly fast rather than unbearably slow.  The latter
>> > >> >> phrase is not one often associated with lucene.
>> > >> >>
>> > >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
>> > >> >> index); NRTManager nrtm = new NRTManager(iw, null);
>> > >> >> NRTManagerReopenThread ropt = new
>> > NRTManagerReopenThread(nrtm,
>> > >> >> ...); ropt.setXxx(...); ...
>> > >> >> ropt.start();
>> > >> >>
>> > >> >> SearcherManager srchm = nrtm.getSearcherManager(b);
>> > >> >>
>> > >> >> Then add docs to your index via nrtm.addDocument(d), update with
>> > >> >> nrtm.updateDocument(...), and to search use
>> > >> >>
>> > >> >> IndexSearcher searcher = srchm.acquire(); try {  search ...
>> > >> >> } finally {
>> > >> >>  srchm.release(searcher);
>> > >> >> }
>> > >> >>
>> > >> >> All thread safe so you don't have to worry about any complications
>> > >> >> there.  And I bet it'll be blindingly fast.
>> > >> >>
>> > >> >> Don't forget to close() things down at the end.
>> > >> >>
>> > >> >>
>> > >> >> --
>> > >> >> Ian.
>> > >> >>
>

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Good point. I should remove the commits.

Any difference between NRTCashingDirectory and RAMDirectory? how to define
the "small"?

On Tue, Feb 7, 2012 at 12:42 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> You shouldn't call IW.commit when using NRT; that's the point of NRT
> (making changes visible w/o calling commit).
>
> Only call commit when you require that all changes be durable (surive
> OS / JVM crash, power loss, etc.) on disk.
>
> Also, you can use NRTCachingDirectory which acts like RAMDirectory for
> small flushed segments.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Feb 6, 2012 at 10:45 AM, Cheng  wrote:
> > Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> > changes, but that the changes may be synchronized with FSDirectory when I
> > use writer.commit().
> >
> > When I use RAMDirectory, the writer.commit() seems much faster than using
> > NRTManager built upon FSDirectory. So, I am guessing the difference is
> the
> > index synchronization.
> >
> >
> >
> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
> >
> >> Please review the following articles about NRT, absolutely instant
> updates
> >> that are visible as they are done are almost impossible (even with
> >> RAMDirectory):
> >>
> >> http://goo.gl/mzAHt
> >> http://goo.gl/5RoPx
> >> http://goo.gl/vSJ7x
> >>
> >> Uwe
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >> > -Original Message-
> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> > Sent: Monday, February 06, 2012 4:27 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Re: Configure writer to write to FSDirectory?
> >> >
> >> > Ian,
> >> >
> >> > I encountered an issue that I need to frequently update the index. The
> >> > NRTManager seems not very helpful on this front as the speed is slower
> >> than
> >> > RAMDirectory is used.
> >> >
> >> > Any improvement advice?
> >> >
> >> >
> >> >
> >> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> wrote:
> >> >
> >> > > That really helps! I will try it out.
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> >> > >
> >> > >> You would use NRTManagerReopenThread as a standalone thread, not
> >> > >> plugged into your Executor stuff.  It is a utility class which you
> >> > >> don't have to use.  See the javadocs.
> >> > >>
> >> > >> But in your case I'd use it, to start with anyway.  Fire it up with
> >> > >> suitable settings and forget about it, except to call close()
> >> > >> eventually. Once you've got things up and running you can tweak
> >> > >> things as much as you want but you appear to be having trouble
> >> > >> getting up and running.
> >> > >>
> >> > >> So ... somewhere in the initialisation code of your app, create an
> >> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
> >> > >> outlined before.  Then pass the NRTManager to any/all write methods
> >> > >> or threads and the SearcherManager instance to any/all search
> methods
> >> > >> or threads and you're done.  If you want to use threads that are
> part
> >> > >> of your ExecutorService, fine.  Just wrap it all together in
> whatever
> >> > >> combination of Thread or Runnable instances you want.
> >> > >>
> >> > >>
> >> > >> Does that help?
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Ian.
> >> > >>
> >> > >>
> >> > >> > I don't understand this following portion:
> >> > >> >
> >> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> >> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
> >> > >> > ...); ropt.setXxx(...); 
> >> > >> > ropt.start();
> >> > >> >
> >> > >> > I have a java ExecutorServices instance running which take care
> of
> >> > >> > my
> >> > >> own
> >> > >> > applications. I don't know how this NRTManagerReopenThread works
> >> > >> > with my own ExecutorService instance.
> >> > >> >
> >> > >> > Can both work together? How can the NRTManagerReopenThread
> >> > instance
> >> > >> ropt be
> >> > >> > plugged into my own multithreading framework?
> >> > >> >
> >> > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea 
> wrote:
> >> > >> >
> >> > >> >> If you can use NRTManager and SearcherManager things should be
> >> > >> >> easy and blazingly fast rather than unbearably slow.  The latter
> >> > >> >> phrase is not one often associated with lucene.
> >> > >> >>
> >> > >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> > >> >> index); NRTManager nrtm = new NRTManager(iw, null);
> >> > >> >> NRTManagerReopenThread ropt = new
> >> > NRTManagerReopenThread(nrtm,
> >> > >> >> ...); ropt.setXxx(...); ...
> >> > >> >> ropt.start();
> >> > >> >>
> >> > >> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> >> > >> >>
> >> > >> >> Then add docs to your index via nrtm.addDocument(d), update with

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Michael McCandless
You tell NRTCachingDirectory how much RAM it's allowed to use, and it
then caches newly flushed segments in a private RAMDirectory.

But you should first test performance w/o it (after removing the
commit calls).  NRT is very fast...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Feb 6, 2012 at 11:46 AM, Cheng  wrote:
> Good point. I should remove the commits.
>
> Any difference between NRTCashingDirectory and RAMDirectory? how to define
> the "small"?
>
> On Tue, Feb 7, 2012 at 12:42 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> You shouldn't call IW.commit when using NRT; that's the point of NRT
>> (making changes visible w/o calling commit).
>>
>> Only call commit when you require that all changes be durable (surive
>> OS / JVM crash, power loss, etc.) on disk.
>>
>> Also, you can use NRTCachingDirectory which acts like RAMDirectory for
>> small flushed segments.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Mon, Feb 6, 2012 at 10:45 AM, Cheng  wrote:
>> > Uwe, when I meant speed is slow, I didn't refer to instant visibility of
>> > changes, but that the changes may be synchronized with FSDirectory when I
>> > use writer.commit().
>> >
>> > When I use RAMDirectory, the writer.commit() seems much faster than using
>> > NRTManager built upon FSDirectory. So, I am guessing the difference is
>> the
>> > index synchronization.
>> >
>> >
>> >
>> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
>> >
>> >> Please review the following articles about NRT, absolutely instant
>> updates
>> >> that are visible as they are done are almost impossible (even with
>> >> RAMDirectory):
>> >>
>> >> http://goo.gl/mzAHt
>> >> http://goo.gl/5RoPx
>> >> http://goo.gl/vSJ7x
>> >>
>> >> Uwe
>> >>
>> >> -
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: u...@thetaphi.de
>> >>
>> >> > -Original Message-
>> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
>> >> > Sent: Monday, February 06, 2012 4:27 PM
>> >> > To: java-user@lucene.apache.org
>> >> > Subject: Re: Configure writer to write to FSDirectory?
>> >> >
>> >> > Ian,
>> >> >
>> >> > I encountered an issue that I need to frequently update the index. The
>> >> > NRTManager seems not very helpful on this front as the speed is slower
>> >> than
>> >> > RAMDirectory is used.
>> >> >
>> >> > Any improvement advice?
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
>> wrote:
>> >> >
>> >> > > That really helps! I will try it out.
>> >> > >
>> >> > > Thanks.
>> >> > >
>> >> > >
>> >> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
>> >> > >
>> >> > >> You would use NRTManagerReopenThread as a standalone thread, not
>> >> > >> plugged into your Executor stuff.  It is a utility class which you
>> >> > >> don't have to use.  See the javadocs.
>> >> > >>
>> >> > >> But in your case I'd use it, to start with anyway.  Fire it up with
>> >> > >> suitable settings and forget about it, except to call close()
>> >> > >> eventually. Once you've got things up and running you can tweak
>> >> > >> things as much as you want but you appear to be having trouble
>> >> > >> getting up and running.
>> >> > >>
>> >> > >> So ... somewhere in the initialisation code of your app, create an
>> >> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
>> >> > >> outlined before.  Then pass the NRTManager to any/all write methods
>> >> > >> or threads and the SearcherManager instance to any/all search
>> methods
>> >> > >> or threads and you're done.  If you want to use threads that are
>> part
>> >> > >> of your ExecutorService, fine.  Just wrap it all together in
>> whatever
>> >> > >> combination of Thread or Runnable instances you want.
>> >> > >>
>> >> > >>
>> >> > >> Does that help?
>> >> > >>
>> >> > >>
>> >> > >> --
>> >> > >> Ian.
>> >> > >>
>> >> > >>
>> >> > >> > I don't understand this following portion:
>> >> > >> >
>> >> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
>> >> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
>> >> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
>> >> > >> > ...); ropt.setXxx(...); 
>> >> > >> > ropt.start();
>> >> > >> >
>> >> > >> > I have a java ExecutorServices instance running which take care
>> of
>> >> > >> > my
>> >> > >> own
>> >> > >> > applications. I don't know how this NRTManagerReopenThread works
>> >> > >> > with my own ExecutorService instance.
>> >> > >> >
>> >> > >> > Can both work together? How can the NRTManagerReopenThread
>> >> > instance
>> >> > >> ropt be
>> >> > >> > plugged into my own multithreading framework?
>> >> > >> >
>> >> > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea 
>> wrote:
>> >> > >> >
>> >> > >> >> If you can use NRTManager and SearcherManager things should be
>> >> > >> >> easy and blazingly fast rather than unbearably slow.  The latter
>> >> > >> >> phrase is not one often associated with lucene.
>> 

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Will do.

On Tue, Feb 7, 2012 at 12:52 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> You tell NRTCachingDirectory how much RAM it's allowed to use, and it
> then caches newly flushed segments in a private RAMDirectory.
>
> But you should first test performance w/o it (after removing the
> commit calls).  NRT is very fast...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Feb 6, 2012 at 11:46 AM, Cheng  wrote:
> > Good point. I should remove the commits.
> >
> > Any difference between NRTCashingDirectory and RAMDirectory? how to
> define
> > the "small"?
> >
> > On Tue, Feb 7, 2012 at 12:42 AM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> You shouldn't call IW.commit when using NRT; that's the point of NRT
> >> (making changes visible w/o calling commit).
> >>
> >> Only call commit when you require that all changes be durable (surive
> >> OS / JVM crash, power loss, etc.) on disk.
> >>
> >> Also, you can use NRTCachingDirectory which acts like RAMDirectory for
> >> small flushed segments.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Mon, Feb 6, 2012 at 10:45 AM, Cheng  wrote:
> >> > Uwe, when I meant speed is slow, I didn't refer to instant visibility
> of
> >> > changes, but that the changes may be synchronized with FSDirectory
> when I
> >> > use writer.commit().
> >> >
> >> > When I use RAMDirectory, the writer.commit() seems much faster than
> using
> >> > NRTManager built upon FSDirectory. So, I am guessing the difference is
> >> the
> >> > index synchronization.
> >> >
> >> >
> >> >
> >> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler 
> wrote:
> >> >
> >> >> Please review the following articles about NRT, absolutely instant
> >> updates
> >> >> that are visible as they are done are almost impossible (even with
> >> >> RAMDirectory):
> >> >>
> >> >> http://goo.gl/mzAHt
> >> >> http://goo.gl/5RoPx
> >> >> http://goo.gl/vSJ7x
> >> >>
> >> >> Uwe
> >> >>
> >> >> -
> >> >> Uwe Schindler
> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> >> http://www.thetaphi.de
> >> >> eMail: u...@thetaphi.de
> >> >>
> >> >> > -Original Message-
> >> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> >> > Sent: Monday, February 06, 2012 4:27 PM
> >> >> > To: java-user@lucene.apache.org
> >> >> > Subject: Re: Configure writer to write to FSDirectory?
> >> >> >
> >> >> > Ian,
> >> >> >
> >> >> > I encountered an issue that I need to frequently update the index.
> The
> >> >> > NRTManager seems not very helpful on this front as the speed is
> slower
> >> >> than
> >> >> > RAMDirectory is used.
> >> >> >
> >> >> > Any improvement advice?
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> >> wrote:
> >> >> >
> >> >> > > That really helps! I will try it out.
> >> >> > >
> >> >> > > Thanks.
> >> >> > >
> >> >> > >
> >> >> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea 
> wrote:
> >> >> > >
> >> >> > >> You would use NRTManagerReopenThread as a standalone thread, not
> >> >> > >> plugged into your Executor stuff.  It is a utility class which
> you
> >> >> > >> don't have to use.  See the javadocs.
> >> >> > >>
> >> >> > >> But in your case I'd use it, to start with anyway.  Fire it up
> with
> >> >> > >> suitable settings and forget about it, except to call close()
> >> >> > >> eventually. Once you've got things up and running you can tweak
> >> >> > >> things as much as you want but you appear to be having trouble
> >> >> > >> getting up and running.
> >> >> > >>
> >> >> > >> So ... somewhere in the initialisation code of your app, create
> an
> >> >> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
> >> >> > >> outlined before.  Then pass the NRTManager to any/all write
> methods
> >> >> > >> or threads and the SearcherManager instance to any/all search
> >> methods
> >> >> > >> or threads and you're done.  If you want to use threads that are
> >> part
> >> >> > >> of your ExecutorService, fine.  Just wrap it all together in
> >> whatever
> >> >> > >> combination of Thread or Runnable instances you want.
> >> >> > >>
> >> >> > >>
> >> >> > >> Does that help?
> >> >> > >>
> >> >> > >>
> >> >> > >> --
> >> >> > >> Ian.
> >> >> > >>
> >> >> > >>
> >> >> > >> > I don't understand this following portion:
> >> >> > >> >
> >> >> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> >> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> >> >> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
> >> >> > >> > ...); ropt.setXxx(...); 
> >> >> > >> > ropt.start();
> >> >> > >> >
> >> >> > >> > I have a java ExecutorServices instance running which take
> care
> >> of
> >> >> > >> > my
> >> >> > >> own
> >> >> > >> > applications. I don't know how this NRTManagerReopenThread
> works
> >> >> > >> > with my own ExecutorService instance.
> >> >> > >> >
> >> >> > >> > Can both work together? How can the NRTManagerReopenThread
> >> >> > ins

Need to enforce logging of Lucene queries

2012-02-06 Thread Charles Bearden
I have a set of Lucene indexes for which I need to log all accesses and possibly 
queries. I can use kernel-level auditing to record file accesses, but what would 
be the best approach to logging the strings for all queries against these indexes?


What comes to mind is a Lucene analogy to a database server, such that the index 
files are owned by a special user and readable only by software running as that 
user. Could Solr provide a good audit trail?


Thanks for any pointers.
Chuck
--
Chuck Bearden
Programmer Analyst IV
The University of Texas Health Science Center at Houston
School of Biomedical Informatics
Email: charles.f.bear...@uth.tmc.edu
Phone: 713.500.9672


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: recording a universal ID from DocID in a CustomScoreQuery

2012-02-06 Thread Paul Allan Hill
To complete this thread, I read the document itself with a 1 field 
fieldSelector, so as not to bother with anything but exactly what I needed at 
this point in the code (particular not the text body).

Then I saved the primary key (the path) of documents that visited this 
CustomScoreQuery (function query) in a Set seenDocs
seenDocs.add(reader.document(docId, fieldSelector 
).getFieldable(KEY_FIELD).stringValue());

If We do introduce a short global unique ID field, the code needs little change 
to move to a different field.

When the entire query rounded up all the results, It asks the question which 
ones had come through that function query by consulting the list of seenDocs.

I decided NOT to use the fieldcache for this particular application, because 
the number of documents that are the result of this part of the query are very 
small compared to all documents
Their rarity was the point of knowing, so that I could mark the result as 
'special' for other parts of the application.  Such special documents get 
different treatment in the UI, but that's not my concern, just IDing which ones 
was the useful part for index layer.

As usual thanks for the feedback.

-Paul

> -Original Message-
> From: Ian Lea [mailto:ian@gmail.com]
> Sent: Monday, February 06, 2012 3:54 AM
> To: java-user@lucene.apache.org
> Subject: Re: recording a universal ID from DocID in a CustomScoreQuery
> 
> int doc will be for the subreader, not for the entire index.
> oal.search.Collector has setNextReader(IndexReader reader, int
> docBase) which you might somehow be able to use.  Failing that I'd go for 
> FieldCache, or store the
> docids in a Set in a Map keyed by current Reader, if that would give you what 
> you needed for the
> subsequent messing around.
> 
> 
> --
> Ian.
> 
> 
> On Sat, Feb 4, 2012 at 12:09 AM, Paul Allan Hill  wrote:
> > My Index does NOT have a simple UID, it uses the file PATH to the file as 
> > the unique key.
> > I was implementing a CustomScoreQuery which not only tweaked the score it 
> > also wanted to write
> down which documents had passed through this part of overall rebuilt query, 
> so that I could further
> mess with those particular documents later.
> > I was hoping to do it without using loading up all PATHs from my index into 
> > a field cache, but maybe
> that is a false way to try to save memory.
> >
> > I thought I could write down the docId provided in the call to
> > customScore
> >
> > public float customScore(int doc, float subQueryScore, float
> > valSrcScore) throws IOException {
> >     docIds.add(docId);
> >   return ...;
> >  }
> >
> > private Set docIds = new HashSet();
> >
> > While I thought I had this working, apparently I had not taken into 
> > consideration the subreader and
> segment problem.
> > The int called doc is not the docId for the entire index, just the local 
> > reader doc number.  Is that
> right?
> > So is there a standard way to convert back to the index wide DocID?
> >
> > If there is no standard way, I _might_ create a small subclass of 
> > IndexSearcher and provide a method
> to:
> >
> >
> > (1)    Find the right reader by looping through all
> > IndexSearcher.subReaders[] to find what reader called the
> > CustomScoreQuery
> >
> > (2)    Add an offset of the proper value from
> > IndexSearcher.docStarts[iReader]
> >
> > But I'm am thinking this prone to the problem that subreader can be
> > made of more subreaders etc., so I really don't have a clue where to find 
> > the current reader and
> then to map back to docStarts.
> >
> > I also think I'm doing this wrong, because ReaderUtil has nothing like this?
> >
> > Is there some way to note for later that a particular document came through 
> > this function query or
> should I just accept the fact of using the field cache?
> >
> > -Paul
> >
> >
> >
> >
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Need to enforce logging of Lucene queries

2012-02-06 Thread Erick Erickson
Solr already logs the queries themselves although there isn't any way
that I know of to associate that with a user.

Although in Solr land, it seems that whatever servlet container that
you would use for Solr should be able to log all the URLs that hit
the server.

Best
Erick

On Mon, Feb 6, 2012 at 5:45 PM, Charles Bearden
 wrote:
> I have a set of Lucene indexes for which I need to log all accesses and
> possibly queries. I can use kernel-level auditing to record file accesses,
> but what would be the best approach to logging the strings for all queries
> against these indexes?
>
> What comes to mind is a Lucene analogy to a database server, such that the
> index files are owned by a special user and readable only by software
> running as that user. Could Solr provide a good audit trail?
>
> Thanks for any pointers.
> Chuck
> --
> Chuck Bearden
> Programmer Analyst IV
> The University of Texas Health Science Center at Houston
> School of Biomedical Informatics
> Email: charles.f.bear...@uth.tmc.edu
> Phone: 713.500.9672
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Why read past EOF

2012-02-06 Thread superruiye
ok,thanks.
I modify my program like you suggest.But another problem appear:

java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.index.TermInfosReader.seekEnum(TermInfosReader.java:203)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:273)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:210)
at
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:507)
at
org.apache.lucene.search.TermQuery$TermWeight$1.add(TermQuery.java:56)
at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:77)
at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:82)
at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:66)
at
org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:53)
at
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:198)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:176)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:354)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:176)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:354)
at
org.apache.lucene.search.Searcher.createNormalizedWeight(Searcher.java:168)
at
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:661)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:356)
at
com.ableskysearch.migration.search.IndexManagerImpl$1.getResultList(IndexManagerImpl.java:608)

It appear unfrequently.And can't search that time,but goes well soon.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-read-past-EOF-tp3639401p3721594.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: How best to handle a reasonable amount to data (25TB+)

2012-02-06 Thread Peter Miller
Thanks for the response. Actually, I am more concerned with trying to use an 
Object Store for the indexes. The next concern is the use of a local index 
versus the sharded ones, but I'm more relaxed about that now after thinking 
about it. I see that index shards could be up to 100 million documents, so that 
makes the 1.25 trillion number look reasonable.

Any other thoughts?

Thanks,
The Captn.

-Original Message-
From: ppp c [mailto:peter.c.e...@gmail.com] 
Sent: Monday, 6 February 2012 5:29 PM
To: java-user@lucene.apache.org
Subject: Re: How best to handle a reasonable amount to data (25TB+)

it sounds not an issue of lucene but the logic of your app.
if you're afraid too many docs in one index you can make multiple indexes.
And then search across them, then merge, then over.

On Mon, Feb 6, 2012 at 10:50 AM, Peter Miller < 
peter.mil...@objectconsulting.com.au> wrote:

> Hi,
>
> I have a little bit of an unusual set of requirements, and I am 
> looking for advice. I have researched the archives, and seen some 
> relevant posts, but they are fairly old and not specifically a match, 
> so I thought I would give this a try.
>
> We will eventually have about 50TB raw, non-searchable data and 25TB 
> of search attributes to handle in Lucene, across about 1.25 trillion 
> documents. The app is write once, read many. There are many document 
> types involved that have to be able to be searched separately or 
> together, with some common attributes, but also unique ones per type. 
> I plan on using a JCP implementation that uses Lucene under the 
> covers. The data itself is not searchable, only the attributes. I plan 
> to hook the JCP repo
> (ModeShape) up to the OpenStack Object Storage on commodity hardware 
> eventually with 5 machines, each with 24 x 2TB drives. This should 
> allow for redundancy (3 copies), although I would suppose we would add 
> bigger drives as we go on.
>
> Since there is such a lot of data to index (not outrageous amounts for 
> these days, but a bit chunky), I was sort of assuming that the Lucene 
> indexes would go on the object storage solution too, to handle 
> availability and other infrastructure issues. Most of the searches 
> would be date-constrained, so I thought that the indexes could be sharded by 
> date.
>
> There would be a local disk index being built near real time on the 
> JCP hardware that could be regularly merged in with the main indexes 
> on the object storage, I suppose.
>
> Does that make sense, and would it work? Sorry, but this is just 
> theoretical at the moment and I'm not experienced in Lucene, as you 
> can no doubt tell.
>
> I came across a piece that was talking about Hardoop and distributed 
> Solr, http://blog.mgm-tp.com/2010/09/hadoop-log-management-part4/, and 
> I'm now wondering if that would be a superior approach? Or any other 
> suggestions?
>
> Many Thanks,
> The Captn
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org