RE: Lucene refresh index function (incremental indexing).

2003-11-24 Thread Tun Lin
Does it support indexing the contents of pdf files? I have found one project
called PDFBox that can be integrated with Lucene to search inside of the pdf
files. Currently, Lucene can only search for the pdf filename. I tried with
PDFBox and I got the following message when I typed the command: java
org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 

log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse
r).
log4j:WARN Please initialize the log4j system properly.

Can anyone advise?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 5:01 AM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Tun Lin wrote:
> These are the steps I took:
> 
> 1) I compile all the files in a particular directory using the command: 
> java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
> , putting all the indexed files in c:\\index.
> 2) Everytime, I added an additional file in that directory. I need to 
> reindex/recompile that directory to generate the indexes again. As the 
> directory gets larger, the indexing takes a longer time.
> 
> My question is how do I generate the indexes automatically everytime a 
> new document is added in that directory without me recompiling everytime
manually?

To update, try removing the '-create' from the command line.  The demo code
supports incremental updates.  It will re-scan the directory and figure out
which files have changed, what new files have appeared and which previously
existing files have been removed.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene refresh index function (incremental indexing).

2003-11-24 Thread Tun Lin
 
Will the final version 1.3 include an application that does the incremental
updates automatically?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 5:01 AM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Tun Lin wrote:
> These are the steps I took:
> 
> 1) I compile all the files in a particular directory using the command: 
> java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
> , putting all the indexed files in c:\\index.
> 2) Everytime, I added an additional file in that directory. I need to 
> reindex/recompile that directory to generate the indexes again. As the 
> directory gets larger, the indexing takes a longer time.
> 
> My question is how do I generate the indexes automatically everytime a 
> new document is added in that directory without me recompiling everytime
manually?

To update, try removing the '-create' from the command line.  The demo code
supports incremental updates.  It will re-scan the directory and figure out
which files have changed, what new files have appeared and which previously
existing files have been removed.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene version 1.3.

2003-11-24 Thread Tun Lin
I am now using 1.3RC2.

-Original Message-
From: Scott Smith [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 4:04 AM
To: 'Lucene Users List'; '[EMAIL PROTECTED]'
Subject: RE: Lucene version 1.3.

If you had to be production in January, would you be using 1.3RC2 or 1.2?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Monday, November 24, 2003 4:03 AM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: Lucene version 1.3.


Sorry, no firm date.  However, 1.3 RC2 is pretty solid, so I suggest you
just use that until 1.3 final is out.

Otis

--- Tun Lin <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> Anyone knows when the full version of Lucene version 1.3 will be 
> released?
> 
> Please advise.
> 
> Thanks.
> 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-24 Thread Jianshuo Niu
Dear Victor:

Finally, I got search results. I made a mistake to create index files.

Thanks  a lot

Jianshuo

On Tue, 25 Nov 2003 09:24:23 +1100, Victor Hadianto wrote:

> Odd ...
> 
> This is working fine for us. You have to use the patched Lucene to both
> build and do the search and to make sure to use the same analyser. Failing
> that, email me the code that you use to build/search the index and I'll have
> a look.
> 
> HTH,
> 
> victor
> 
> - Original Message - 
> From: "Jianshuo Niu" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Tuesday, November 25, 2003 7:29 AM
> Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
> 
> 
>> Dear Victor:
>>
>> I applied the changed based on the patch. Also, I got t-shirt in the
> search query.
>> I rebuilt the search index using the modified lucene-1.3-rc2.jar and did
> the search by the modified jar as well.
>> The search field was specified as indexed, tokenized and stored.
>> When I do the search, I did not get any results. I also tried to use the
> modified jar to create search query and did the search on the index files
> which was built by original lucene-1.3-rc2.jar. It did not get search
> results as well. Could you tell me which part I did wrong?
>>
>> Thanks
>>
>> Jianshuo
>>
>> On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote:
>>
>> > Hi,
>> >
>> > You missed another change in the file, if you follow that thread I later
>> > attached a patch that changes another file (standard tokenizer). Hangon
> let
>> > me try to find the patch for you.
>> >
>> >
> http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=764036
>> >
>> > You also need to change standard tokenizer.
>> >
>> > Hope this help.
>> >
>> > /victor
>> >
>> > - Original Message - 
>> > From: "Jianshuo Niu" <[EMAIL PROTECTED]>
>> > To: <[EMAIL PROTECTED]>
>> > Sent: Saturday, November 22, 2003 9:34 AM
>> > Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
>> >
>> >
>> >> Dear Victor:
>> >>
>> >> I read your post on lucene bug list. However, I try the change you
>> >> suggested, but it just changed "t-shirts" to "shirt".
>> >>
>> >> I downloaded lucene1.3-rc1 source, changed the above line in
>> >> QueryParser.jj, and recompiled the source. After the change, the query
> I
>> >> got is:
>> >>
>> >> +(name:shirt)
>> >>
>> >> before the change, the query was:
>> >>
>> >> -(name:shirt)
>> >>
>> >> I have the following two questions:
>> >>
>> >> 1. Did I get the results it supposes to be?
>> >> 2. in your post, you mentioned only one line change: <#_TERM_CHAR: (
>> >> <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" ) >
>> >> is this only line needs to change?
>> >>
>> >>
>> >>
>> >>
>> >> Thank you for time and help
>> >>
>> >>
>> >> Jianshuo
>> >>
>> >>
>> >> On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote:
>> >>
>> >> > Path: main.gmane.org!not-for-mail
>> >> > From: "Victor Hadianto" <[EMAIL PROTECTED]>
>> >> > Newsgroups: gmane.comp.jakarta.lucene.user
>> >> > Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
>> >> > Date: Wed, 15 Oct 2003 10:51:28 +1000
>> >> > Lines: 14
>> >> > Approved: [EMAIL PROTECTED]
>> >> > Message-ID: <[EMAIL PROTECTED]>
>> >> > References:
> <[EMAIL PROTECTED]>
>> >> > Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
>> >> > NNTP-Posting-Host: deer.gmane.org
>> >> > X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003
>> > 00:51:38
>> >> > GMT)
>> >> > X-Complaints-To: [EMAIL PROTECTED]
>> >> > NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC)
>> >> > Original-X-From:
>> >> >
> [EMAIL PROTECTED]
>> > Wed
>> >> > Oct 15 02:51:36 2003
>> >> > Return-path:
>> >> >
> <[EMAIL PROTECTED]>
>> >> > Original-Received: from daedalus.apache.org ([208.185.179.12]
>> >> > helo=mail.apache.org)
>> >> > by deer.gmane.org with smtp (Exim 3.35 #1 (Debian))
>> >> > id 1A9Zt1-0004Hs-00
>> >> > for <[EMAIL PROTECTED]>; Wed, 15 Oct 2003 02:51:36 +0200
>> >> > Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003
>> > 00:51:23
>> >> > -
>> >> > Mailing-List: contact [EMAIL PROTECTED]; run by
> ezmlm
>> >> > Precedence: bulk
>> >> > List-Unsubscribe: 
>> >> > List-Subscribe: 
>> >> > List-Help: 
>> >> > List-Post: 
>> >> > List-Id: "Lucene Users List" 
>> >> > Delivered-To: mailing list [EMAIL PROTECTED]
>> >> > Original-Received: (qmail 46822 invoked from network); 15 Oct 2003
>> > 00:51:23
>> >> > -
>> >> > Original-Received: from unknown (HELO avalon.siteprotect.com)
>> > (64.26.0.99)
>> >> > by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 -
>> >> > Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au
>> >> > [203.51.7.52])
>> >> > by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058
>> >> > for <[EMAIL PROTECTED]>; Tue, 14 Oct 2003 19:51:30 -0500
>> >> > Original-To: "Lucene Users List" <[EMAIL PROTECTED]>
>> >> > X-Priority: 3
>> >> > X-MSMai

Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-24 Thread Victor Hadianto
Odd ...

This is working fine for us. You have to use the patched Lucene to both
build and do the search and to make sure to use the same analyser. Failing
that, email me the code that you use to build/search the index and I'll have
a look.

HTH,

victor

- Original Message - 
From: "Jianshuo Niu" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, November 25, 2003 7:29 AM
Subject: Re: Dash Confusion in QueryParser - Bug? Feature?


> Dear Victor:
>
> I applied the changed based on the patch. Also, I got t-shirt in the
search query.
> I rebuilt the search index using the modified lucene-1.3-rc2.jar and did
the search by the modified jar as well.
> The search field was specified as indexed, tokenized and stored.
> When I do the search, I did not get any results. I also tried to use the
modified jar to create search query and did the search on the index files
which was built by original lucene-1.3-rc2.jar. It did not get search
results as well. Could you tell me which part I did wrong?
>
> Thanks
>
> Jianshuo
>
> On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote:
>
> > Hi,
> >
> > You missed another change in the file, if you follow that thread I later
> > attached a patch that changes another file (standard tokenizer). Hangon
let
> > me try to find the patch for you.
> >
> >
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=764036
> >
> > You also need to change standard tokenizer.
> >
> > Hope this help.
> >
> > /victor
> >
> > - Original Message - 
> > From: "Jianshuo Niu" <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Saturday, November 22, 2003 9:34 AM
> > Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
> >
> >
> >> Dear Victor:
> >>
> >> I read your post on lucene bug list. However, I try the change you
> >> suggested, but it just changed "t-shirts" to "shirt".
> >>
> >> I downloaded lucene1.3-rc1 source, changed the above line in
> >> QueryParser.jj, and recompiled the source. After the change, the query
I
> >> got is:
> >>
> >> +(name:shirt)
> >>
> >> before the change, the query was:
> >>
> >> -(name:shirt)
> >>
> >> I have the following two questions:
> >>
> >> 1. Did I get the results it supposes to be?
> >> 2. in your post, you mentioned only one line change: <#_TERM_CHAR: (
> >> <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" ) >
> >> is this only line needs to change?
> >>
> >>
> >>
> >>
> >> Thank you for time and help
> >>
> >>
> >> Jianshuo
> >>
> >>
> >> On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote:
> >>
> >> > Path: main.gmane.org!not-for-mail
> >> > From: "Victor Hadianto" <[EMAIL PROTECTED]>
> >> > Newsgroups: gmane.comp.jakarta.lucene.user
> >> > Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
> >> > Date: Wed, 15 Oct 2003 10:51:28 +1000
> >> > Lines: 14
> >> > Approved: [EMAIL PROTECTED]
> >> > Message-ID: <[EMAIL PROTECTED]>
> >> > References:
<[EMAIL PROTECTED]>
> >> > Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
> >> > NNTP-Posting-Host: deer.gmane.org
> >> > X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003
> > 00:51:38
> >> > GMT)
> >> > X-Complaints-To: [EMAIL PROTECTED]
> >> > NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC)
> >> > Original-X-From:
> >> >
[EMAIL PROTECTED]
> > Wed
> >> > Oct 15 02:51:36 2003
> >> > Return-path:
> >> >
<[EMAIL PROTECTED]>
> >> > Original-Received: from daedalus.apache.org ([208.185.179.12]
> >> > helo=mail.apache.org)
> >> > by deer.gmane.org with smtp (Exim 3.35 #1 (Debian))
> >> > id 1A9Zt1-0004Hs-00
> >> > for <[EMAIL PROTECTED]>; Wed, 15 Oct 2003 02:51:36 +0200
> >> > Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003
> > 00:51:23
> >> > -
> >> > Mailing-List: contact [EMAIL PROTECTED]; run by
ezmlm
> >> > Precedence: bulk
> >> > List-Unsubscribe: 
> >> > List-Subscribe: 
> >> > List-Help: 
> >> > List-Post: 
> >> > List-Id: "Lucene Users List" 
> >> > Delivered-To: mailing list [EMAIL PROTECTED]
> >> > Original-Received: (qmail 46822 invoked from network); 15 Oct 2003
> > 00:51:23
> >> > -
> >> > Original-Received: from unknown (HELO avalon.siteprotect.com)
> > (64.26.0.99)
> >> > by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 -
> >> > Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au
> >> > [203.51.7.52])
> >> > by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058
> >> > for <[EMAIL PROTECTED]>; Tue, 14 Oct 2003 19:51:30 -0500
> >> > Original-To: "Lucene Users List" <[EMAIL PROTECTED]>
> >> > X-Priority: 3
> >> > X-MSMail-Priority: Normal
> >> > X-Mailer: Microsoft Outlook Express 6.00.2800.1158
> >> > X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
> >> > X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
> >> > Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555
> >> > X-Report-Spam:
http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555
> >> > 

Re: permissions or lock problem?

2003-11-24 Thread DMGoodstein
resolved...it was the new feature that writes locks to java.io.tmpdir, which in my 
servlet engine was interpreted as $CATALINA_HOME/temp, which didn't exist.

--D


- Original Message -
From: <[EMAIL PROTECTED]>
Date: Monday, November 24, 2003 1:30 pm
Subject: permissions or lock problem?

> I'm having difficulty creating an IndexSearcher from an FSDirectory 
> in 1.3-rc2.  The code is as follows (log.writeToLog is a 
> convenience method):
> 
> log.writeToLog(Log.DEBUG,"directory path ="+hitPath);
> File f = new File(hitPath);
> log.writeToLog(Log.DEBUG,"file exists = "+String.valueOf(f.exists()));
> 
> IndexSearcher t = new 
> IndexSearcher(FSDirectory.getDirectory(f,false));
> the output is:
> [2003-11-24 13:23:09] [--ERROR--] Error building multisearcher
> java.io.IOException: No such file or directory
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:827)
>   at org.apache.lucene.store.FSDirectory$1.obtain(FSDirectory.java:324)
>   at org.apache.lucene.store.Lock.obtain(Lock.java:92)
>   at org.apache.lucene.store.Lock$With.run(Lock.java:147)
>   at org.apache.lucene.index.IndexReader.open(IndexReader.java:110)
>   at 
> org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:80)  at 
> jgi.util.search.LuceneSearch.(LuceneSearch.java:110)
> 
> 
> Since file.exists() is returning true, the No Such file or 
> directory error is a bit surprising...is something getting mangled 
> on its way from FSDirectory.getDirectory to 
> java.io.UnixFileSystem.createFileExclusively?
> thx,
> --David
> 
> 
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



permissions or lock problem?

2003-11-24 Thread DMGoodstein
I'm having difficulty creating an IndexSearcher from an FSDirectory in 1.3-rc2.  The 
code is as follows (log.writeToLog is a convenience method):

log.writeToLog(Log.DEBUG,"directory path ="+hitPath);
File f = new File(hitPath);
log.writeToLog(Log.DEBUG,"file exists = "+String.valueOf(f.exists()));

IndexSearcher t = new IndexSearcher(FSDirectory.getDirectory(f,false));

the output is:
[2003-11-24 13:23:09] [--ERROR--] Error building multisearcher
java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:827)
at org.apache.lucene.store.FSDirectory$1.obtain(FSDirectory.java:324)
at org.apache.lucene.store.Lock.obtain(Lock.java:92)
at org.apache.lucene.store.Lock$With.run(Lock.java:147)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:110)
at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:80)
at jgi.util.search.LuceneSearch.(LuceneSearch.java:110)


Since file.exists() is returning true, the No Such file or directory error is a bit 
surprising...is something getting mangled on its way from FSDirectory.getDirectory to 
java.io.UnixFileSystem.createFileExclusively?

thx,
--David





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Fields with same name but different boosts

2003-11-24 Thread Doug Cutting
Andrzej Bialecki wrote:
Now, I'm wondering how do I encode the weight of keywords... If I do the 
following:

Field f = Field.Keyword("kw", "value1");
f.setBoost(10.0);
doc.add(f);
f = Field.Keyword("kw", "value2");
f.setBoost(20.0);
doc.add(f);
Now the question is: what is the boost value for the fields when I 
search? Is it equivalent to "value1^10.0 value2^20.0" (which is my 
intention), or rather "value1^20.0 value2^20.0"?
I think the boost will be 200.0.  Boosts are multiplicative.

If the latter, do you have any suggestions how to achieve the original 
effect?
The only way to do this is to repeat occurences of each word.

So you might:

  for (int i = 0; i < 10; i++) {
doc.add(new Field("kw", "value1", false, true, false);
  }
  for (int i = 0; i < 20; i++) {
doc.add(new Field("kw", "value1", false, true, false);
  }
You might also consider using a custom Similarity implementation so that 
you can control the interpretation of these frequencies.  For example, 
you might, instead of 10 and 20, be able to just use 1 and 2 and then, 
in your Similarity.tf() implementation, turn this into whatever value 
you want used in the scoring.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene refresh index function (incremental indexing).

2003-11-24 Thread Doug Cutting
Tun Lin wrote:
These are the steps I took:

1) I compile all the files in a particular directory using the command: 
java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
, putting all the indexed files in c:\\index.
2) Everytime, I added an additional file in that directory. I need to
reindex/recompile that directory to generate the indexes again. As the directory
gets larger, the indexing takes a longer time.

My question is how do I generate the indexes automatically everytime a new
document is added in that directory without me recompiling everytime manually? 
To update, try removing the '-create' from the command line.  The demo 
code supports incremental updates.  It will re-scan the directory and 
figure out which files have changed, what new files have appeared and 
which previously existing files have been removed.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene version 1.3.

2003-11-24 Thread Francesco Bellomi
I'm using 1.3RC2 in production right now.

Francesco

Scott Smith <[EMAIL PROTECTED]> wrote:
> If you had to be production in January, would you be using 1.3RC2 or
> 1.2? 
> 

-
Francesco Bellomi
"Use truth to show illusion,
and illusion to show truth."



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Dash Confusion in QueryParser - Bug? Feature?

2003-11-24 Thread Jianshuo Niu
Dear Victor:

I applied the changed based on the patch. Also, I got t-shirt in the search query. 
I rebuilt the search index using the modified lucene-1.3-rc2.jar and did the search by 
the modified jar as well. 
The search field was specified as indexed, tokenized and stored.
When I do the search, I did not get any results. I also tried to use the modified jar 
to create search query and did the search on the index files which was built by 
original lucene-1.3-rc2.jar. It did not get search results as well. Could you tell me 
which part I did wrong?

Thanks

Jianshuo

On Mon, 24 Nov 2003 11:15:38 +1100, Victor Hadianto wrote:

> Hi,
> 
> You missed another change in the file, if you follow that thread I later
> attached a patch that changes another file (standard tokenizer). Hangon let
> me try to find the patch for you.
> 
> http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=764036
> 
> You also need to change standard tokenizer.
> 
> Hope this help.
> 
> /victor
> 
> - Original Message - 
> From: "Jianshuo Niu" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Saturday, November 22, 2003 9:34 AM
> Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
> 
> 
>> Dear Victor:
>>
>> I read your post on lucene bug list. However, I try the change you
>> suggested, but it just changed "t-shirts" to "shirt".
>>
>> I downloaded lucene1.3-rc1 source, changed the above line in
>> QueryParser.jj, and recompiled the source. After the change, the query I
>> got is:
>>
>> +(name:shirt)
>>
>> before the change, the query was:
>>
>> -(name:shirt)
>>
>> I have the following two questions:
>>
>> 1. Did I get the results it supposes to be?
>> 2. in your post, you mentioned only one line change: <#_TERM_CHAR: (
>> <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" ) >
>> is this only line needs to change?
>>
>>
>>
>>
>> Thank you for time and help
>>
>>
>> Jianshuo
>>
>>
>> On Wed, 15 Oct 2003 10:51:28 +1000, Victor Hadianto wrote:
>>
>> > Path: main.gmane.org!not-for-mail
>> > From: "Victor Hadianto" <[EMAIL PROTECTED]>
>> > Newsgroups: gmane.comp.jakarta.lucene.user
>> > Subject: Re: Dash Confusion in QueryParser - Bug? Feature?
>> > Date: Wed, 15 Oct 2003 10:51:28 +1000
>> > Lines: 14
>> > Approved: [EMAIL PROTECTED]
>> > Message-ID: <[EMAIL PROTECTED]>
>> > References: <[EMAIL PROTECTED]>
>> > Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
>> > NNTP-Posting-Host: deer.gmane.org
>> > X-Trace: sea.gmane.org 1066179098 25516 80.91.224.253 (15 Oct 2003
> 00:51:38
>> > GMT)
>> > X-Complaints-To: [EMAIL PROTECTED]
>> > NNTP-Posting-Date: Wed, 15 Oct 2003 00:51:38 + (UTC)
>> > Original-X-From:
>> > [EMAIL PROTECTED]
> Wed
>> > Oct 15 02:51:36 2003
>> > Return-path:
>> > <[EMAIL PROTECTED]>
>> > Original-Received: from daedalus.apache.org ([208.185.179.12]
>> > helo=mail.apache.org)
>> > by deer.gmane.org with smtp (Exim 3.35 #1 (Debian))
>> > id 1A9Zt1-0004Hs-00
>> > for <[EMAIL PROTECTED]>; Wed, 15 Oct 2003 02:51:36 +0200
>> > Original-Received: (qmail 46864 invoked by uid 500); 15 Oct 2003
> 00:51:23
>> > -
>> > Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
>> > Precedence: bulk
>> > List-Unsubscribe: 
>> > List-Subscribe: 
>> > List-Help: 
>> > List-Post: 
>> > List-Id: "Lucene Users List" 
>> > Delivered-To: mailing list [EMAIL PROTECTED]
>> > Original-Received: (qmail 46822 invoked from network); 15 Oct 2003
> 00:51:23
>> > -
>> > Original-Received: from unknown (HELO avalon.siteprotect.com)
> (64.26.0.99)
>> > by daedalus.apache.org with SMTP; 15 Oct 2003 00:51:23 -
>> > Original-Received: from victor (CPE-203-51-7-52.nsw.bigpond.net.au
>> > [203.51.7.52])
>> > by avalon.siteprotect.com (8.11.6/8.11.6) with ESMTP id h9F0pUU10058
>> > for <[EMAIL PROTECTED]>; Tue, 14 Oct 2003 19:51:30 -0500
>> > Original-To: "Lucene Users List" <[EMAIL PROTECTED]>
>> > X-Priority: 3
>> > X-MSMail-Priority: Normal
>> > X-Mailer: Microsoft Outlook Express 6.00.2800.1158
>> > X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
>> > X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
>> > Xref: main.gmane.org gmane.comp.jakarta.lucene.user:4555
>> > X-Report-Spam: http://spam.gmane.org/gmane.comp.jakarta.lucene.user:4555
>> > MIME-Version: 1.0
>> > Content-Type: text/plain; charset=iso-8859-1
>> > Content-Transfer-Encoding: 7bit
>> >
>> >
>> >> On Tuesday, October 14, 2003, at 08:38  PM, Victor Hadianto wrote:
>> >> > I believe this is the same problem that I had the other day. If you
>> >> > search
>> >> > the mailing list for "t-shirt" you should get some threads discussing
>> >> > this
>> >> > problem.
>> >>
>> >> Haha!  Better search for "shirt", not "t-shirt" :))
>> >
>> > If the QueryParser implemented the solution that I suggested then
> "t-shirt"
>> > will get you the correct hits :)
>> >
>> >
>> > /vh
>>
>>
>>
>> -
>> T

Re: Score

2003-11-24 Thread Gerret Apelt
Tracey --

it would help if you could give more detail on the types of documents, 
fields and analyzers you're using. Also what do you mean by "Multi Field 
Search"? I presume you're using the MultiFieldQueryParser to have query 
terms in a user-submitted query be searched for in each field in your index.

If I am understanding your problem, then it might be the same one I had 
a few weeks ago -- highly relevant matches would not receive a high 
ranking. (This paragraph will apply to you only if you use more than 
just one Analyzer for the set of your fields). I had six fields in my 
index, most of which were populated with a standard analyzer. I used 
self-made Analyzers for two of the fields. This turned out to be my 
problem when using MultiFieldQueryParser: I told my 
MultiFieldQueryParser instance to use only the standard analyzer. 
Instead I discovered that I needed to make use of 
org.apache.lucene.analysis.PerFieldAnalyzerWrapper and feed that to the 
MultiFieldQueryParser. Unless you do this, your problem is whats 
described here: 
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q15.

Most likely, if your scoring is off, you're "doing something wrong" in 
the way you use the Lucene API -- at least, thats what I've discovered 
to be the case when my ranking is off.

If you're interested in the nitty-gritty of how scoring is done, check 
this FAQ entry:
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq#q31

cheers,
Gerret
Pleasant, Tracy wrote:

Hi,

I'm using the Multi Field Search to search all the fields of my
documents during the search. 

When it returns results the scores are numerically low - .06, .17, etc.
I would think if I searched for "Dog" and there was a doc with "Dog" in
the title and several times in the contents of a document that it would
receive a score more like 1.0 or close to it.
Is there a way that I can tweak the score?

I tried using Boost but that did absolutely nothing.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Lucene version 1.3.

2003-11-24 Thread Scott Smith
If you had to be production in January, would you be using 1.3RC2 or 1.2?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 24, 2003 4:03 AM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: Lucene version 1.3.


Sorry, no firm date.  However, 1.3 RC2 is pretty solid, so I suggest you
just use that until 1.3 final is out.

Otis

--- Tun Lin <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> Anyone knows when the full version of Lucene version 1.3 will be 
> released?
> 
> Please advise.
> 
> Thanks.
> 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Dates and others

2003-11-24 Thread Ype Kingma
Erik,

On Sunday 23 November 2003 12:51, Erik Hatcher wrote:
> On Saturday, November 22, 2003, at 06:33  PM, Dion Almaer wrote:
> > 3. I have some fields suck as title, owner, etc as well as the content
> > blob which I index and use as
> > the default search field.  Is there an easy way to extend the
> > QueryParser to merge it with a
> > MultiTermQuery which can also search this meta data and give them
> > certain weights?  Or, if you go
> > down this path do you have to leave the QueryParser behind and build
> > your own queries?  Any best
> > practices would be great.
>
> And Ype said:
> You can provide field weights at document indexing time (norms) and use
> a
> MultiTermQuery for searching multiple fields. At query time you can
> again use field weights.
> I don't know how the scoring of the MultiTermQuery is done,
> it might use the max. score over the fields of a document, or combine
> the
> scores in the fields of a document.
>  end Ype's reply cut and paste
>
> I'm a little confused with this question and Ype's reply.
> MultiTermQuery is an abstract base class under Query, which is the
> parent for WildcardQuery and FuzzyQuery.
>
> What I think you're after is using MultiFieldQueryParser, but you want

Thanks for the correction,

> to weight the fields differently.  You can add the boosts at indexing
> time using Field.setBoost.  Unfortunately at the moment

and thanks for explaining how to provide field weights.

Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: StandardAnalyzer defaults to OR on query

2003-11-24 Thread Dror Matalon
I see

setOperator(DEFAULT_OPERATOR_AND);

Maybe the docs should be changed not to say
"The only method that clients should need to call is parse()."

Thanks for the pointer,

Dror

On Mon, Nov 24, 2003 at 08:14:12PM +1100, Victor Hadianto wrote:
> There is an attribute in QueryParser that will let you set AND as the
> default operator.
> 
> victor
> 
> - Original Message - 
> From: "Dror Matalon" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Monday, November 24, 2003 7:33 PM
> Subject: StandardAnalyzer defaults to OR on query
> 
> 
> > Hi,
> >
> > >From
> > http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
> >
> > The OR operator is the default conjunction operator. This means that
> > if there is no Boolean operator between two terms, the OR operator is
> > used.
> >
> > So "foo bar" is equivalent to "foo OR bar" and will probably return more
> > hits than just plain "foo" .
> >
> > On the other hand when I go to google and type "foo" i get 4 million
> > hits, and if I type "foo bar" I get around 1 million hits, which seems
> > to indicate that it's equivalent to "foo AND bar" . That's my experience
> > of using google that the more keywords you add, the more specific the
> > query is, the less documents we get.
> >
> > Personally, I don't mind too much, I can add the "AND" when I need them
> > and Lucene does return the documents that match both terms first. But
> > I'm worried that it'll confuse some users who are used to the google
> > approach.
> >
> >
> > Regards,
> >
> > Dror
> >
> >
> >
> > -- 
> > Dror Matalon
> > Zapatec Inc
> > 1700 MLK Way
> > Berkeley, CA 94709
> > http://www.fastbuzz.com
> > http://www.zapatec.com
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Score

2003-11-24 Thread Pleasant, Tracy
Hi,

I'm using the Multi Field Search to search all the fields of my
documents during the search. 

When it returns results the scores are numerically low - .06, .17, etc.
I would think if I searched for "Dog" and there was a doc with "Dog" in
the title and several times in the contents of a document that it would
receive a score more like 1.0 or close to it.

Is there a way that I can tweak the score?

I tried using Boost but that did absolutely nothing.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Similarity class

2003-11-24 Thread Erik Hatcher
On Monday, November 24, 2003, at 12:22  PM, Ralf B wrote:
One question:

The similarity class is abstract. Are there default implementations 
like in
other parts of this API (Analysers for example) available and how can 
I use
it i.e. to calculate weights? Are there some default implementations 
hidden in
other classes (i.e. in Hits) to calculate the term weighting. Does 
somebody
have some code which demonstrates the usage?
Well, nothing is "hidden" when you have a nice IDE :))

In IntelliJ, I did this:

- Ctrl-n, typed "simi" followed by return, which took me to 
Similarity.java
- Ctrl-h

Two subclasses appeared: DefaultSimilarity, and 
TestSimilarity.SimpleSimilarity.

So there are two subclasses within Lucene's codebase, one used for 
testing.  To see it's usage, have a look at TestSimilarity.  The test 
cases are a great place to look for "documentation".

As for weighting - the scoring process uses weighting based on the type 
of query - look at the Query class and its subclasses to see those 
details.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: 1.2 javadoc

2003-11-24 Thread Erik Hatcher
On Monday, November 24, 2003, at 12:57  PM, [EMAIL PROTECTED] wrote:
Is there a url that will take me to the javadocs for Lucene 1.2, 
rather than 1.3-rc2?
No, but the 1.2 binary distribution ships with the javadocs, I believe. 
 And, of course, they would be easy to generate from the 1.2 source 
distribution.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


1.2 javadoc

2003-11-24 Thread DMGoodstein
Is there a url that will take me to the javadocs for Lucene 1.2, rather than 1.3-rc2?

thanks,
--David


- Original Message -
From: Dion Almaer <[EMAIL PROTECTED]>
Date: Monday, November 24, 2003 6:07 am
Subject: RE: Dates and others

> Erik -
> 
> Spot on. I should have listened to your advice from the talk and 
> just used MMDD :)
> 
> Everything works nicely now that I do the conversion.
> 
> Thanks for the great ideas.
> 
> Dion 
> 
> > -Original Message-
> > From: Erik Hatcher [EMAIL PROTECTED] 
> > Sent: Sunday, November 23, 2003 11:41 PM
> > To: Lucene Users List
> > Subject: Re: Dates and others
> > 
> > On Sunday, November 23, 2003, at 03:33  PM, Dion Almaer wrote:
> > > This leads me to another issue actually.  On certain range 
> > queries I 
> > > get exceptions:
> > >
> > > Query: modifieddate:[1/1/03 TO 12/31/03]
> > >
> > > org.apache.lucene.search.BooleanQuery$TooManyClauses
> > 
> > I'm guessing you're using Field.Keyword(String, Date) for 
> > modifieddate? 
> >   The date field stuff in Lucene is really a timestamp, and 
> > doing a range query enumerates all the terms for that field 
> > in that ranging making a big ol' boolean OR query of all the 
> > individual ones.  Since you want this to be just a date, use 
> > Field.Keyword(String, "MMDD") instead.  But you'll 
> > want to subclass QueryParser and override getRangeQuery to do 
> > the right date format parsing from "MM/DD/" 
> > into "MMDD" rather than the internal Date representation 
> > Lucene uses for "date" fields.
> > 
> > Erik
> > 
> > 
> > --
> ---
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Similarity class

2003-11-24 Thread Otis Gospodnetic
It sounds like you missed this:

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DefaultSimilarity.html

You can write your own implementations and use it during indexing and
searching.

Otis

--- Ralf B  <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I am a very beginner of Lucene und started to look into some articles
> and
> the API documentation. I know the theories behind Information
> Retrieval and
> want to find out about Lucene. I think it is the best Java API IR
> package
> available nowadays.
> 
> One question:
> 
> The similarity class is abstract. Are there default implementations
> like in
> other parts of this API (Analysers for example) available and how can
> I use
> it i.e. to calculate weights? Are there some default implementations
> hidden in
> other classes (i.e. in Hits) to calculate the term weighting. Does
> somebody
> have some code which demonstrates the usage?
> 
> Kind Regards,
> Ralf
> 
> -- 
> GMX Weihnachts-Special: Seychellen-Traumreise zu gewinnen!
> 
> Rentier entlaufen. Finden Sie Rudolph! Als Belohnung winken
> tolle Preise. http://www.gmx.net/de/cgi/specialmail/
> 
> +++ GMX - die erste Adresse für Mail, Message, More! +++
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Similarity class

2003-11-24 Thread Ralf B
Hi,

I am a very beginner of Lucene und started to look into some articles and
the API documentation. I know the theories behind Information Retrieval and
want to find out about Lucene. I think it is the best Java API IR package
available nowadays.

One question:

The similarity class is abstract. Are there default implementations like in
other parts of this API (Analysers for example) available and how can I use
it i.e. to calculate weights? Are there some default implementations hidden in
other classes (i.e. in Hits) to calculate the term weighting. Does somebody
have some code which demonstrates the usage?

Kind Regards,
Ralf

-- 
GMX Weihnachts-Special: Seychellen-Traumreise zu gewinnen!

Rentier entlaufen. Finden Sie Rudolph! Als Belohnung winken
tolle Preise. http://www.gmx.net/de/cgi/specialmail/

+++ GMX - die erste Adresse für Mail, Message, More! +++


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Dates and others

2003-11-24 Thread Dion Almaer
Erik -

Spot on. I should have listened to your advice from the talk and just used MMDD :)

Everything works nicely now that I do the conversion.

Thanks for the great ideas.

Dion 

> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, November 23, 2003 11:41 PM
> To: Lucene Users List
> Subject: Re: Dates and others
> 
> On Sunday, November 23, 2003, at 03:33  PM, Dion Almaer wrote:
> > This leads me to another issue actually.  On certain range 
> queries I 
> > get exceptions:
> >
> > Query: modifieddate:[1/1/03 TO 12/31/03]
> >
> > org.apache.lucene.search.BooleanQuery$TooManyClauses
> 
> I'm guessing you're using Field.Keyword(String, Date) for 
> modifieddate? 
>   The date field stuff in Lucene is really a timestamp, and 
> doing a range query enumerates all the terms for that field 
> in that ranging making a big ol' boolean OR query of all the 
> individual ones.  Since you want this to be just a date, use 
> Field.Keyword(String, "MMDD") instead.  But you'll 
> want to subclass QueryParser and override getRangeQuery to do 
> the right date format parsing from "MM/DD/" 
> into "MMDD" rather than the internal Date representation 
> Lucene uses for "date" fields.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene version 1.3.

2003-11-24 Thread Otis Gospodnetic
Sorry, no firm date.  However, 1.3 RC2 is pretty solid, so I suggest
you just use that until 1.3 final is out.

Otis

--- Tun Lin <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> Anyone knows when the full version of Lucene version 1.3 will be
> released? 
> 
> Please advise.
> 
> Thanks.
> 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Fields with same name but different boosts

2003-11-24 Thread Andrzej Bialecki
Hello,

I have the following problem: in my application I'm trying to store set 
of keywords and their weights. Since the number of keywords is variable 
(and can be as high as 40) I decided to use a single field to store it. 
In other words, I want to use a single field with multiple values to 
store a keyword histogram of the document.

Now, I'm wondering how do I encode the weight of keywords... If I do the 
following:

Field f = Field.Keyword("kw", "value1");
f.setBoost(10.0);
doc.add(f);
f = Field.Keyword("kw", "value2");
f.setBoost(20.0);
doc.add(f);
Now the question is: what is the boost value for the fields when I 
search? Is it equivalent to "value1^10.0 value2^20.0" (which is my 
intention), or rather "value1^20.0 value2^20.0"?

If the latter, do you have any suggestions how to achieve the original 
effect?

Thanks in advance!

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: StandardAnalyzer defaults to OR on query

2003-11-24 Thread Victor Hadianto
There is an attribute in QueryParser that will let you set AND as the
default operator.

victor

- Original Message - 
From: "Dror Matalon" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, November 24, 2003 7:33 PM
Subject: StandardAnalyzer defaults to OR on query


> Hi,
>
> >From
> http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
>
> The OR operator is the default conjunction operator. This means that
> if there is no Boolean operator between two terms, the OR operator is
> used.
>
> So "foo bar" is equivalent to "foo OR bar" and will probably return more
> hits than just plain "foo" .
>
> On the other hand when I go to google and type "foo" i get 4 million
> hits, and if I type "foo bar" I get around 1 million hits, which seems
> to indicate that it's equivalent to "foo AND bar" . That's my experience
> of using google that the more keywords you add, the more specific the
> query is, the less documents we get.
>
> Personally, I don't mind too much, I can add the "AND" when I need them
> and Lucene does return the documents that match both terms first. But
> I'm worried that it'll confuse some users who are used to the google
> approach.
>
>
> Regards,
>
> Dror
>
>
>
> -- 
> Dror Matalon
> Zapatec Inc
> 1700 MLK Way
> Berkeley, CA 94709
> http://www.fastbuzz.com
> http://www.zapatec.com
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene refresh index function (incremental indexing).

2003-11-24 Thread Victor Hadianto
> 1) I compile all the files in a particular directory using the command:
> java org.apache.lucene.demo.IndexHTML -create -index c:\\index ..
> , putting all the indexed files in c:\\index.
> 2) Everytime, I added an additional file in that directory. I need to
> reindex/recompile that directory to generate the indexes again. As the
directory
> gets larger, the indexing takes a longer time.
>
> My question is how do I generate the indexes automatically everytime a new
> document is added in that directory without me recompiling everytime
manually?

You can't, unless you write your own application that monitor the directory
for a new document added. This application then will just index that new
document withouth reindexing the entire document set.

If you do incremental indexing, the indexing does take longer as the
document base grows, but you shouldn't really have this problem until your
index size reached hundreds of megabytes.


victor


>
> -Original Message-
> From: Victor Hadianto [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 24, 2003 1:07 PM
> To: Lucene Users List
> Subject: Re: Lucene refresh index function (incremental indexing).
>
> Ah .. ic,
>
> But you don't need to do that even if you can do it. Lucene does
incremental
> indexing. So you would create a new program to add your document manually
using
> IndexWriter, not blatting the index and doing it again.
>
> Seems like you just trying out Lucene, I suggest having a look in the
source
> code of IndexHTML and you will see that there is no magic there, it just
> traverse the directory and index the HTML file one by one using
IndexWriter.
>
> BTW you don't compile directory using Lucene .. :)
>
> /victor
>
> - Original Message -
> From: "Tun Lin" <[EMAIL PROTECTED]>
> To: "'Lucene Users List'" <[EMAIL PROTECTED]>
> Sent: Monday, November 24, 2003 3:45 PM
> Subject: RE: Lucene refresh index function (incremental indexing).
>
>
> > Hi,
> >
> > Thanks for your reply.
> >
> > What if I add a new document into the directory that I have compiled
using
> the
> > following command: java org.apache.lucene.demo.IndexHTML -create -index
> > {index-dir} ..
> >
> > Will it automatically reindex like I did manually to reflect the new
> document
> > being added in that particular directory?
> >
> > Please advise.
> >
> > -Original Message-
> > From: Victor Hadianto [mailto:[EMAIL PROTECTED]
> > Sent: Monday, November 24, 2003 12:36 PM
> > To: Lucene Users List
> > Subject: Re: Lucene refresh index function (incremental indexing).
> >
> > > I delete the old ones and add them again manually. But how do I
> > > reindex
> > the
> > > documents automatically without doing it manually?
> >
> > You don't need to reindex the documents again. Lucene does incremental
> indexing.
> > Just add your document to the index and that's it. You need to create a
> new
> > IndexSearcher to reflect the new changes into the your search result.
> >
> > /victor
> >
> >
> > >
> > > -Original Message-
> > > From: Dror Matalon [mailto:[EMAIL PROTECTED]
> > > Sent: Sunday, November 23, 2003 4:44 AM
> > > To: Lucene Users List
> > > Subject: Re: Lucene refresh index function (incremental indexing).
> > >
> > > Hi,
> > >
> > > It's not clear what you mean when you say "refresh indexes"  or
> > "re-compiling."
> > > If you're adding new documents just use the add() method. If you are
> > replacing
> > > documents, you need to first delete the old ones and then add them
> again.
> > Look
> > > at the mailing list archive for this, since it's been discussed
> > > several
> > times.
> > >
> > >
> > > On Sun, Nov 23, 2003 at 12:22:40AM +0800, Tun Lin wrote:
> > > > Hi,
> > > >
> > > > I am new here.
> > > >
> > > > May I know how to refresh indexes in Lucene immediately after new
> > > > documents have been added without re-compiling again to reindex the
> > > > documents in that particular directory?
> > > >
> > > > Thanks.
> > >
> > > --
> > > Dror Matalon
> > > Zapatec Inc
> > > 1700 MLK Way
> > > Berkeley, CA 94709
> > > http://www.fastbuzz.com
> > > http://www.zapatec.com
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional c

RE: Lucene refresh index function (incremental indexing).

2003-11-24 Thread Tun Lin
Can you elaborate on "you don't compile directory using Lucene"? 

These are the steps I took:

1) I compile all the files in a particular directory using the command: 
java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
, putting all the indexed files in c:\\index.
2) Everytime, I added an additional file in that directory. I need to
reindex/recompile that directory to generate the indexes again. As the directory
gets larger, the indexing takes a longer time.

My question is how do I generate the indexes automatically everytime a new
document is added in that directory without me recompiling everytime manually? 

How does Lucene detect new documents to be added to the indexes?

I saw the codes but the indexes are only generated for that directory only after
I use the command mentioned above.

Is there a code or built in function that allows Lucene to detect and build the
indexes on its own?

-Original Message-
From: Victor Hadianto [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 24, 2003 1:07 PM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Ah .. ic,

But you don't need to do that even if you can do it. Lucene does incremental
indexing. So you would create a new program to add your document manually using
IndexWriter, not blatting the index and doing it again.

Seems like you just trying out Lucene, I suggest having a look in the source
code of IndexHTML and you will see that there is no magic there, it just
traverse the directory and index the HTML file one by one using IndexWriter.

BTW you don't compile directory using Lucene .. :)

/victor

- Original Message -
From: "Tun Lin" <[EMAIL PROTECTED]>
To: "'Lucene Users List'" <[EMAIL PROTECTED]>
Sent: Monday, November 24, 2003 3:45 PM
Subject: RE: Lucene refresh index function (incremental indexing).


> Hi,
>
> Thanks for your reply.
>
> What if I add a new document into the directory that I have compiled using
the
> following command: java org.apache.lucene.demo.IndexHTML -create -index
> {index-dir} ..
>
> Will it automatically reindex like I did manually to reflect the new
document
> being added in that particular directory?
>
> Please advise.
>
> -Original Message-
> From: Victor Hadianto [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 24, 2003 12:36 PM
> To: Lucene Users List
> Subject: Re: Lucene refresh index function (incremental indexing).
>
> > I delete the old ones and add them again manually. But how do I
> > reindex
> the
> > documents automatically without doing it manually?
>
> You don't need to reindex the documents again. Lucene does incremental
indexing.
> Just add your document to the index and that's it. You need to create a
new
> IndexSearcher to reflect the new changes into the your search result.
>
> /victor
>
>
> >
> > -Original Message-
> > From: Dror Matalon [mailto:[EMAIL PROTECTED]
> > Sent: Sunday, November 23, 2003 4:44 AM
> > To: Lucene Users List
> > Subject: Re: Lucene refresh index function (incremental indexing).
> >
> > Hi,
> >
> > It's not clear what you mean when you say "refresh indexes"  or
> "re-compiling."
> > If you're adding new documents just use the add() method. If you are
> replacing
> > documents, you need to first delete the old ones and then add them
again.
> Look
> > at the mailing list archive for this, since it's been discussed
> > several
> times.
> >
> >
> > On Sun, Nov 23, 2003 at 12:22:40AM +0800, Tun Lin wrote:
> > > Hi,
> > >
> > > I am new here.
> > >
> > > May I know how to refresh indexes in Lucene immediately after new
> > > documents have been added without re-compiling again to reindex the
> > > documents in that particular directory?
> > >
> > > Thanks.
> >
> > --
> > Dror Matalon
> > Zapatec Inc
> > 1700 MLK Way
> > Berkeley, CA 94709
> > http://www.fastbuzz.com
> > http://www.zapatec.com
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



StandardAnalyzer defaults to OR on query

2003-11-24 Thread Dror Matalon
Hi,

>From 
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

The OR operator is the default conjunction operator. This means that
if there is no Boolean operator between two terms, the OR operator is
used.

So "foo bar" is equivalent to "foo OR bar" and will probably return more
hits than just plain "foo" .

On the other hand when I go to google and type "foo" i get 4 million
hits, and if I type "foo bar" I get around 1 million hits, which seems
to indicate that it's equivalent to "foo AND bar" . That's my experience
of using google that the more keywords you add, the more specific the
query is, the less documents we get.

Personally, I don't mind too much, I can add the "AND" when I need them
and Lucene does return the documents that match both terms first. But
I'm worried that it'll confuse some users who are used to the google
approach.


Regards,

Dror



-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene web demo problems

2003-11-24 Thread Holger Klawitter
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

> org.apache.jasper.JasperException: Unable to compile class for JSP An error
> occurred at line: 36 in the jsp file: /web/results.jsp Generated servlet
> error: [javac] Compiling 1 source file [javac]
> /web/jakarta-tomcat-4.1.27/work/bomEng/localhost/jsp_search_luceneweb/web/r
>e sults_jsp.java:11: package org.apache.lucene.analysis does not exist
> [javac] import org.apache.lucene.analysis.Analyzer; [javac]
>
> Any ideas on where I'm going wrong would be appreciated.

* Is the directory (and the class file) readable for the user under which 
tomcat is running?

* By the way, you can put the while .jar file into WEB-INF/lib - that way you 
don't have to transfer class files one-by-one.

Mit freundlichem Gruß / With kind regards
Holger Klawitter
- --
[EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/wbTs1Xdt0HKSwgYRApH2AJ4wP4xaoEDp1DpgvEluJJizQKz/QACaA9Lw
icZrdARXjHh5goiztj1WJOc=
=DgX9
-END PGP SIGNATURE-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]