date:20111201

Design qs: search for multiple terms in document collection

2011-12-01 Thread Ilya Zavorin

I am trying to make some high- (and not so high) level design decisions for my 
app that is supposed to check a collection of documents against a set of 
terms/queries. Basically, I need to perform a triage of sorts when I would find 
only those docs in the collection which have occurrences of at least one term 
from the term list. For those docs, I also need to find where in the document 
each occurrence is, since I then need to collect a small amount of surrounding 
text for a more detailed analysis.

Clearly, I will need to index the document collection using indexing classes of 
Lucene. This is pretty straighforward. 

Then I will need to use the highlighting classes. In some sample cose I found 
online, a query is first searched for and hits are returned. Then docids are 
extracted for the hits and query is highlighted. Some questions:

Q1: Does Lucene perform essentially the same searching operation twice, first 
to find hits, then to highlight? If so, does this mean that if I expect most of 
the docs in my collection to contain at least one of the search terms, it might 
be faster for me to skip searching and simply go over all docs, applying 
highlighting? Then for those docs where no hits occurred I would simply get an 
empty list of relevant fragments. 

Q2: Is the same scoring mechanism used during search and during highlighting? 
That is, can I be sure that if I get a hit during search, the corresponding 
document indeed contains my query that will then be found dyuring highlighting?

Q3: Are there any mechanisms in Lucene that would facilitate merging of 
highlighting results for two different queries against a single document? 

Q4: I did some small tests of highlighting and noticed that some of the 
fragments returned for a query contained highlighted text that was quite far 
from the original query. For instance, I was looking for a 3-word term and it 
highlighted a sequence of only 2 of these 3 words. How can I control how close 
highlighted fragments should be to the original query?



Thanks much,

Ilya Zavorin



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Boost more recent document

2011-12-01 Thread Simon Willnauer

On Thu, Dec 1, 2011 at 7:36 AM, Zhang, Lisheng
 wrote:
> Hi Simon,
>
> Sorry I found that I cannot use payload for this purpose because payload
> can be accessed only through term positions but we did not use timestamp
> for query. Ideally it would be great if we can have some doc-level "payload"
> accessible through docId?

lucene 4 has a feature called IndexDocValues which is essentially a
payload per document per field.

you can read about it here:
http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values
http://www.searchworkings.org/blog/-/blogs/apache-lucene-flexiblescoring-with-indexdocvalues
http://www.searchworkings.org/blog/-/blogs/indexdocvalues-their-applications
>
> Then your initial suggestion to use CustomScoreQuery would be our solution,
> from source code I see sort is implemented by FieldCache and its performance
> seems OK even though we didnot cache reader. So we will use CustomeScoreQuery
> without cache for now (cutting time stamp to hour or day may help), if too
> slow we may consider selected cache.

what do you mean by cache readers?

simon
>
> Thanks very much for all your great helps, please point out if you see wrong
> in above statements?
>
> Best regards, Lisheng
>
> -Original Message-
> From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
> Sent: Wednesday, November 30, 2011 1:40 PM
> To: java-user@lucene.apache.org; simon.willna...@gmail.com
> Subject: RE: Boost more recent document
>
>
> Hi,
>
> Thanks for the very interesting idea!
>
> Currently we use lucene 2.3.2 and we just use default merge policy (at
> any time we have a few segments and after some accumulation small segments
> are merged into big ones). I need to double check if docId can reflect doc
> age.
>
> But I have one concern: docId may not reflect true age interval, like docId
> difference by 2 may reflect 2m or 1h. If no better choice I may just use
> payload and adapt a few query classes?
>
> Thanks very much for helps, Lisheng
>
> -Original Message-
> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Wednesday, November 30, 2011 1:02 PM
> To: java-user@lucene.apache.org
> Subject: Re: Boost more recent document
>
>
> If you use LogMergePolicy ie. do merges in order you could use the
> absolute docID as a relative age value. Smaller docIDs mean younger
> documents. Maybe this works for you?
>
> simon
>
> On Wed, Nov 30, 2011 at 9:08 PM, Zhang, Lisheng
>  wrote:
>> Thanks very much for your helps! I got the point, only problem is that
>> I cannot afford to to use FieldCache because in our app we have many
>> lucene index data folders, is there another simple way?
>>
>> Thanks again, Lisheng
>>
>> -Original Message-
>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>> Sent: Wednesday, November 30, 2011 11:40 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Boost more recent document
>>
>>
>> On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng
>>  wrote:
>>> Hi,
>>>
>>> We need to boost document which is more recent (each doc has time stamp 
>>> attribute). It seems that
>>> we cannot use doc boost at index time because it will be condensed into one 
>>> byte (cannot differentiate
>>> 365 days), so we may use payload (save time stamp as payload) to boost at 
>>> search time.
>>>
>>> In our app we let user enter query at browser and use QueryParser to 
>>> generate query, the query can
>>> be different types (TermQuery, BooleanQuery, WildcardQuery, ...), then it 
>>> seems we need to create
>>> each customized query class similar to PayloadTermQuery, is there another 
>>> simpler way?
>>
>> you can simply index your timestamp (untokenzied) and wrap your query
>> in a CustomScoreQuery. This query accepts your user query and a
>> ValueSource. During search CustomScoreQuery calls your valuesource for
>> each document that the user query scores and multiplies the result of
>> the ValueSource into the score. Inside your valuesource you can simply
>> get the timestamps from the FieldCache and calculate your custom
>> boost...
>>
>> hope that helps
>>
>> simon
>>>
>>> Thanks very much for helps, Lisheng
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Boost more recent document

2011-12-01 Thread Zhang, Lisheng

Hi Simon,

1) Thanks for suggesting lucene 4.0 feature, we will make use of it as soon as 
   we upgrade lucene.

2) Currently we recreate IndexSearcher for each query, which means recreate 
   underlying IndexReader for each query (I should have said IndexReader), but 
   sort performance is OK, so I would like to try CustomScoreQuery without 
cache 
   first?

Thanks very much for helps, Lisheng

-Original Message-
From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
Sent: Thursday, December 01, 2011 11:21 AM
To: Zhang, Lisheng
Cc: java-user@lucene.apache.org
Subject: Re: Boost more recent document


On Thu, Dec 1, 2011 at 7:36 AM, Zhang, Lisheng
 wrote:
> Hi Simon,
>
> Sorry I found that I cannot use payload for this purpose because payload
> can be accessed only through term positions but we did not use timestamp
> for query. Ideally it would be great if we can have some doc-level "payload"
> accessible through docId?

lucene 4 has a feature called IndexDocValues which is essentially a
payload per document per field.

you can read about it here:
http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values
http://www.searchworkings.org/blog/-/blogs/apache-lucene-flexiblescoring-with-indexdocvalues
http://www.searchworkings.org/blog/-/blogs/indexdocvalues-their-applications
>
> Then your initial suggestion to use CustomScoreQuery would be our solution,
> from source code I see sort is implemented by FieldCache and its performance
> seems OK even though we didnot cache reader. So we will use CustomeScoreQuery
> without cache for now (cutting time stamp to hour or day may help), if too
> slow we may consider selected cache.

what do you mean by cache readers?

simon
>
> Thanks very much for all your great helps, please point out if you see wrong
> in above statements?
>
> Best regards, Lisheng
>
> -Original Message-
> From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
> Sent: Wednesday, November 30, 2011 1:40 PM
> To: java-user@lucene.apache.org; simon.willna...@gmail.com
> Subject: RE: Boost more recent document
>
>
> Hi,
>
> Thanks for the very interesting idea!
>
> Currently we use lucene 2.3.2 and we just use default merge policy (at
> any time we have a few segments and after some accumulation small segments
> are merged into big ones). I need to double check if docId can reflect doc
> age.
>
> But I have one concern: docId may not reflect true age interval, like docId
> difference by 2 may reflect 2m or 1h. If no better choice I may just use
> payload and adapt a few query classes?
>
> Thanks very much for helps, Lisheng
>
> -Original Message-
> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Wednesday, November 30, 2011 1:02 PM
> To: java-user@lucene.apache.org
> Subject: Re: Boost more recent document
>
>
> If you use LogMergePolicy ie. do merges in order you could use the
> absolute docID as a relative age value. Smaller docIDs mean younger
> documents. Maybe this works for you?
>
> simon
>
> On Wed, Nov 30, 2011 at 9:08 PM, Zhang, Lisheng
>  wrote:
>> Thanks very much for your helps! I got the point, only problem is that
>> I cannot afford to to use FieldCache because in our app we have many
>> lucene index data folders, is there another simple way?
>>
>> Thanks again, Lisheng
>>
>> -Original Message-
>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>> Sent: Wednesday, November 30, 2011 11:40 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Boost more recent document
>>
>>
>> On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng
>>  wrote:
>>> Hi,
>>>
>>> We need to boost document which is more recent (each doc has time stamp 
>>> attribute). It seems that
>>> we cannot use doc boost at index time because it will be condensed into one 
>>> byte (cannot differentiate
>>> 365 days), so we may use payload (save time stamp as payload) to boost at 
>>> search time.
>>>
>>> In our app we let user enter query at browser and use QueryParser to 
>>> generate query, the query can
>>> be different types (TermQuery, BooleanQuery, WildcardQuery, ...), then it 
>>> seems we need to create
>>> each customized query class similar to PayloadTermQuery, is there another 
>>> simpler way?
>>
>> you can simply index your timestamp (untokenzied) and wrap your query
>> in a CustomScoreQuery. This query accepts your user query and a
>> ValueSource. During search CustomScoreQuery calls your valuesource for
>> each document that the user query scores and multiplies the result of
>> the ValueSource into the score. Inside your valuesource you can simply
>> get the timestamps from the FieldCache and calculate your custom
>> boost...
>>
>> hope that helps
>>
>> simon
>>>
>>> Thanks very much for helps, Lisheng
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>

Re: Boost more recent document

2011-12-01 Thread Simon Willnauer

On Thu, Dec 1, 2011 at 8:30 PM, Zhang, Lisheng
 wrote:
> Hi Simon,
>
> 1) Thanks for suggesting lucene 4.0 feature, we will make use of it as soon as
>   we upgrade lucene.
>
> 2) Currently we recreate IndexSearcher for each query, which means recreate
>   underlying IndexReader for each query (I should have said IndexReader), but
>   sort performance is OK, so I would like to try CustomScoreQuery without 
> cache
>   first?

WOW - why do you do this? Can't you use the SearcherManager in Lucene 3.5?

simon
>
> Thanks very much for helps, Lisheng
>
> -Original Message-
> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Thursday, December 01, 2011 11:21 AM
> To: Zhang, Lisheng
> Cc: java-user@lucene.apache.org
> Subject: Re: Boost more recent document
>
>
> On Thu, Dec 1, 2011 at 7:36 AM, Zhang, Lisheng
>  wrote:
>> Hi Simon,
>>
>> Sorry I found that I cannot use payload for this purpose because payload
>> can be accessed only through term positions but we did not use timestamp
>> for query. Ideally it would be great if we can have some doc-level "payload"
>> accessible through docId?
>
> lucene 4 has a feature called IndexDocValues which is essentially a
> payload per document per field.
>
> you can read about it here:
> http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values
> http://www.searchworkings.org/blog/-/blogs/apache-lucene-flexiblescoring-with-indexdocvalues
> http://www.searchworkings.org/blog/-/blogs/indexdocvalues-their-applications
>>
>> Then your initial suggestion to use CustomScoreQuery would be our solution,
>> from source code I see sort is implemented by FieldCache and its performance
>> seems OK even though we didnot cache reader. So we will use CustomeScoreQuery
>> without cache for now (cutting time stamp to hour or day may help), if too
>> slow we may consider selected cache.
>
> what do you mean by cache readers?
>
> simon
>>
>> Thanks very much for all your great helps, please point out if you see wrong
>> in above statements?
>>
>> Best regards, Lisheng
>>
>> -Original Message-
>> From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
>> Sent: Wednesday, November 30, 2011 1:40 PM
>> To: java-user@lucene.apache.org; simon.willna...@gmail.com
>> Subject: RE: Boost more recent document
>>
>>
>> Hi,
>>
>> Thanks for the very interesting idea!
>>
>> Currently we use lucene 2.3.2 and we just use default merge policy (at
>> any time we have a few segments and after some accumulation small segments
>> are merged into big ones). I need to double check if docId can reflect doc
>> age.
>>
>> But I have one concern: docId may not reflect true age interval, like docId
>> difference by 2 may reflect 2m or 1h. If no better choice I may just use
>> payload and adapt a few query classes?
>>
>> Thanks very much for helps, Lisheng
>>
>> -Original Message-
>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>> Sent: Wednesday, November 30, 2011 1:02 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Boost more recent document
>>
>>
>> If you use LogMergePolicy ie. do merges in order you could use the
>> absolute docID as a relative age value. Smaller docIDs mean younger
>> documents. Maybe this works for you?
>>
>> simon
>>
>> On Wed, Nov 30, 2011 at 9:08 PM, Zhang, Lisheng
>>  wrote:
>>> Thanks very much for your helps! I got the point, only problem is that
>>> I cannot afford to to use FieldCache because in our app we have many
>>> lucene index data folders, is there another simple way?
>>>
>>> Thanks again, Lisheng
>>>
>>> -Original Message-
>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>> Sent: Wednesday, November 30, 2011 11:40 AM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Boost more recent document
>>>
>>>
>>> On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng
>>>  wrote:
 Hi,

 We need to boost document which is more recent (each doc has time stamp 
 attribute). It seems that
 we cannot use doc boost at index time because it will be condensed into 
 one byte (cannot differentiate
 365 days), so we may use payload (save time stamp as payload) to boost at 
 search time.

 In our app we let user enter query at browser and use QueryParser to 
 generate query, the query can
 be different types (TermQuery, BooleanQuery, WildcardQuery, ...), then it 
 seems we need to create
 each customized query class similar to PayloadTermQuery, is there another 
 simpler way?
>>>
>>> you can simply index your timestamp (untokenzied) and wrap your query
>>> in a CustomScoreQuery. This query accepts your user query and a
>>> ValueSource. During search CustomScoreQuery calls your valuesource for
>>> each document that the user query scores and multiplies the result of
>>> the ValueSource into the score. Inside your valuesource you can simply
>>> get the timestamps from the FieldCache and calculate your custom
>>> boost...
>>>
>>> hop

RE: Boost more recent document

2011-12-01 Thread Zhang, Lisheng

Currently we use lucene 2.3.2, the reason why we recreate searcher each time
is that within one server we managed a few thousand independent lucene index
data folders. Those folders have different sizes, the large ones have about
200K docs (but growing).

Thanks very much for helps, Lisheng

-Original Message-
From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
Sent: Thursday, December 01, 2011 11:34 AM
To: Zhang, Lisheng
Cc: java-user@lucene.apache.org
Subject: Re: Boost more recent document


On Thu, Dec 1, 2011 at 8:30 PM, Zhang, Lisheng
 wrote:
> Hi Simon,
>
> 1) Thanks for suggesting lucene 4.0 feature, we will make use of it as soon as
>   we upgrade lucene.
>
> 2) Currently we recreate IndexSearcher for each query, which means recreate
>   underlying IndexReader for each query (I should have said IndexReader), but
>   sort performance is OK, so I would like to try CustomScoreQuery without 
> cache
>   first?

WOW - why do you do this? Can't you use the SearcherManager in Lucene 3.5?

simon
>
> Thanks very much for helps, Lisheng
>
> -Original Message-
> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Thursday, December 01, 2011 11:21 AM
> To: Zhang, Lisheng
> Cc: java-user@lucene.apache.org
> Subject: Re: Boost more recent document
>
>
> On Thu, Dec 1, 2011 at 7:36 AM, Zhang, Lisheng
>  wrote:
>> Hi Simon,
>>
>> Sorry I found that I cannot use payload for this purpose because payload
>> can be accessed only through term positions but we did not use timestamp
>> for query. Ideally it would be great if we can have some doc-level "payload"
>> accessible through docId?
>
> lucene 4 has a feature called IndexDocValues which is essentially a
> payload per document per field.
>
> you can read about it here:
> http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values
> http://www.searchworkings.org/blog/-/blogs/apache-lucene-flexiblescoring-with-indexdocvalues
> http://www.searchworkings.org/blog/-/blogs/indexdocvalues-their-applications
>>
>> Then your initial suggestion to use CustomScoreQuery would be our solution,
>> from source code I see sort is implemented by FieldCache and its performance
>> seems OK even though we didnot cache reader. So we will use CustomeScoreQuery
>> without cache for now (cutting time stamp to hour or day may help), if too
>> slow we may consider selected cache.
>
> what do you mean by cache readers?
>
> simon
>>
>> Thanks very much for all your great helps, please point out if you see wrong
>> in above statements?
>>
>> Best regards, Lisheng
>>
>> -Original Message-
>> From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com]
>> Sent: Wednesday, November 30, 2011 1:40 PM
>> To: java-user@lucene.apache.org; simon.willna...@gmail.com
>> Subject: RE: Boost more recent document
>>
>>
>> Hi,
>>
>> Thanks for the very interesting idea!
>>
>> Currently we use lucene 2.3.2 and we just use default merge policy (at
>> any time we have a few segments and after some accumulation small segments
>> are merged into big ones). I need to double check if docId can reflect doc
>> age.
>>
>> But I have one concern: docId may not reflect true age interval, like docId
>> difference by 2 may reflect 2m or 1h. If no better choice I may just use
>> payload and adapt a few query classes?
>>
>> Thanks very much for helps, Lisheng
>>
>> -Original Message-
>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>> Sent: Wednesday, November 30, 2011 1:02 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Boost more recent document
>>
>>
>> If you use LogMergePolicy ie. do merges in order you could use the
>> absolute docID as a relative age value. Smaller docIDs mean younger
>> documents. Maybe this works for you?
>>
>> simon
>>
>> On Wed, Nov 30, 2011 at 9:08 PM, Zhang, Lisheng
>>  wrote:
>>> Thanks very much for your helps! I got the point, only problem is that
>>> I cannot afford to to use FieldCache because in our app we have many
>>> lucene index data folders, is there another simple way?
>>>
>>> Thanks again, Lisheng
>>>
>>> -Original Message-
>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>> Sent: Wednesday, November 30, 2011 11:40 AM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Boost more recent document
>>>
>>>
>>> On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng
>>>  wrote:
 Hi,

 We need to boost document which is more recent (each doc has time stamp 
 attribute). It seems that
 we cannot use doc boost at index time because it will be condensed into 
 one byte (cannot differentiate
 365 days), so we may use payload (save time stamp as payload) to boost at 
 search time.

 In our app we let user enter query at browser and use QueryParser to 
 generate query, the query can
 be different types (TermQuery, BooleanQuery, WildcardQuery, ...), then it 
 seems we need to create
 each customized query class similar to

lucene-core-3.3.0 not optimizing

2011-12-01 Thread KARTHIK SHIVAKUMAR

Hi

Spec
O/s win os 7
Jdk : 1.6.0_29
Lucene  lucene-core-3.3.0



Finally after Indexing successfully ,Why this Code does not optimize (
sample code )

INDEX_WRITER.optimize(100);
INDEX_WRITER.commit();
INDEX_WRITER.close();


*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*

Re: Lucene index inside of a web app?

2011-12-01 Thread KARTHIK SHIVAKUMAR

Hi

>> generated Lucene index

What if u need to upgrade this with More docs

Best approach is Inject the Real path of the Index ( c:/temp/Indexes )  to
the Web server Application via "web.xml"

By this approach u can even achieve

1) Load balancing of multiple Web servers pointing to same Index files
2) Update /Delete /Re-index with out the Web application being interrupted



with regards
Karthik

On Tue, Nov 29, 2011 at 12:25 AM, okayndc  wrote:

> Awesome.  Thanks guys!
>
> On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler  wrote:
>
> > You can store the index in WEB_INF directory, just use something:
> > ServletContext.getRealPath("/WEB-INF/data/myIndexName");
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: Ian Lea [mailto:ian@gmail.com]
> > > Sent: Monday, November 28, 2011 6:11 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Lucene index inside of a web app?
> > >
> > > Using a static string is fine - it just wasn't clear from your original
> > post what it
> > > was.
> > >
> > > I usually use a full path read from a properties file so that I can
> > change
> > it
> > > without a recompile, have different settings on test/live/whatever
> > systems, etc.
> > > Works for me, but isn't the only way to do it.
> > >
> > > If you know where your app lives, you could use a full path pointing to
> > > somewhere within that tree, or you could use a partial path that the
> app
> > server
> > > will interpret relative to something.  Which is fine too - take your
> pick
> > of
> > > whatever works for you.
> > >
> > >
> > > --
> > > Ian.
> > >
> > >
> > > On Mon, Nov 28, 2011 at 4:40 PM, okayndc  wrote:
> > > > Hi,
> > > >
> > > > Thanks for your response.  Yes, LUCENE_INDEX_DIRECTORY is a static
> > > > string which contains the file system path of the index (for example,
> > > c:\\index).
> > > >  Is this good practice?  If not,  what should the full path to an
> > > > index look like?
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea  wrote:
> > > >
> > > >> What is LUCENE_INDEX_DIRECTORY?  Some static string in your app?
> > > >>
> > > >> Lucene knows nothing about your app, JSP, or what app server you are
> > > >> using.  It requires a file system path and it is up to you to
> provide
> > > >> that.  I always use a full path since I prefer to store indexes
> > > >> outside the app and it avoids complications with what the app server
> > > >> considers the default directory. But if you want to store it inside,
> > > >> without specifying full path, look at the docs for your app server.
> > > >>
> > > >>
> > > >> --
> > > >> Ian.
> > > >>
> > > >>
> > > >> On Sun, Nov 27, 2011 at 2:10 AM, okayndc 
> wrote:
> > > >> > Hello,
> > > >> >
> > > >> > I want to store the generated Lucene index inside of my Java
> > > >> > application, preferably within a folder where my JSP files are
> > > >> > located.  I also want
> > > >> to
> > > >> > be able to search from the index within the web app. I've been
> > > >> > using the LUCENE_INDEX_DIRECTORY but, this is on a file system
> > > >> > (currently my hard drive).  Should I continue to use
> > > >> > LUCENE_INDEX_DIRECTORY if I want the Lucene index inside the app
> or
> > > >> > use something else.  I was a bit confused about this.  Btw, the
> > Lucene index
> > > content comes from a database.
> > > >> >
> > > >> > Any help is appreciated
> > > >> >
> > > >>
> > > >>
> -
> > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > >>
> > > >>
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>



-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*

Re: lucene-core-3.3.0 not optimizing

2011-12-01 Thread Simon Willnauer

what do you understand when you say optimize? Unless you tell us what
this code does in your case and what you'd expect it doing its
impossible to give you any reasonable answer.

simon

On Fri, Dec 2, 2011 at 4:54 AM, KARTHIK SHIVAKUMAR
 wrote:
> Hi
>
> Spec
> O/s win os 7
> Jdk : 1.6.0_29
> Lucene  lucene-core-3.3.0
>
>
>
> Finally after Indexing successfully ,Why this Code does not optimize (
> sample code )
>
>            INDEX_WRITER.optimize(100);
>            INDEX_WRITER.commit();
>            INDEX_WRITER.close();
>
>
> *N.S.KARTHIK
> R.M.S.COLONY
> BEHIND BANK OF INDIA
> R.M.V 2ND STAGE
> BANGALORE
> 560094*

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Design qs: search for multiple terms in document collection

Re: Boost more recent document

RE: Boost more recent document

Re: Boost more recent document

RE: Boost more recent document

lucene-core-3.3.0 not optimizing

Re: Lucene index inside of a web app?

Re: lucene-core-3.3.0 not optimizing

8 matches

Site Navigation

Mail list logo

Footer information