Full List of Stop Words for Standard Analyzer.

2002-08-01 Thread Suneetha Rao

Hi,
I would like  to include in my documentation all the stop words
.
Can somebody tell me where to find the list for the Standard Analyzer ?

Thanks in Advance,
Suneetha


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Hit Navigation in Lucene?

2002-08-01 Thread Peter Carlson

This clicking to the next highlighted term is all done in javascript, not by
the backend system.

So if you get permission, you can use their code and look this in with the
Lucene Highlight. I'll bet that the highlighting is being done via
javascript too so you don't need the lucene highlighting code.

Although, the Lucene highlighting code works with wildcards.

--Peter


On 8/1/02 12:36 PM, "Bruce Best (CRO)" <[EMAIL PROTECTED]> wrote:

> I am looking at Lucene as the search engine for our office's legal research
> site. We have been looking at some of the commercial offerings, but Lucene
> seems to offer most of what we need, and we may end up using it and spending
> money on paying someone to customize it to our needs.
> 
> For our purposes, one feature that is probably indispensible is hit
> highlighing and hit navigation. I see the former has already been added to
> the contributions section.
> 
> With respect to hit navigation, the kind of thing I am looking at is along
> the lines of that used by the Fulcrum search engine; if anyone is not
> familiar with Fulcrum, a good example site is the Government of Canada
> Employment Insurance Jurisprudence Library at
> http://www.ei-ae.gc.ca/easyk/search.asp. Do a search for any term (try
> "fired"), then click on any of the resulting documents. The resulting page
> has the search terms highlighted, much as they would be in Lucene with the
> hit highlighting added, with a narrow frame at the top of the window with
> hit navigation buttons to allow users to jump to the next search term in the
> document. 
> 
> Would it be difficult to implement something similar with Lucene? I am not
> familiar with the technologies involved (I am not a coder), so do not know
> if this is trivial or impossible or somewhere in between.
> 
> Any thoughts would be appreciated,
> 
> Bruce
> 
> --
> To unsubscribe, e-mail:   
> For additional commands, e-mail: 
> 
> 


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Hit Navigation in Lucene?

2002-08-01 Thread Bruce Best (CRO)

To avoid confusion

The "Fulcrum" I mentioned in my previous message has nothing whatsoever to
do with the Fulcrum services framework that is part of the Apache Jakarta
Turbine project. I was talking about what is now known as Hummingbird
Fulcrum KnowledgeServer, a commercial search engine. 

Bruce

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Hit Navigation in Lucene?

2002-08-01 Thread Bruce Best (CRO)

I am looking at Lucene as the search engine for our office's legal research
site. We have been looking at some of the commercial offerings, but Lucene
seems to offer most of what we need, and we may end up using it and spending
money on paying someone to customize it to our needs.

For our purposes, one feature that is probably indispensible is hit
highlighing and hit navigation. I see the former has already been added to
the contributions section. 

With respect to hit navigation, the kind of thing I am looking at is along
the lines of that used by the Fulcrum search engine; if anyone is not
familiar with Fulcrum, a good example site is the Government of Canada
Employment Insurance Jurisprudence Library at
http://www.ei-ae.gc.ca/easyk/search.asp. Do a search for any term (try
"fired"), then click on any of the resulting documents. The resulting page
has the search terms highlighted, much as they would be in Lucene with the
hit highlighting added, with a narrow frame at the top of the window with
hit navigation buttons to allow users to jump to the next search term in the
document. 

Would it be difficult to implement something similar with Lucene? I am not
familiar with the technologies involved (I am not a coder), so do not know
if this is trivial or impossible or somewhere in between.

Any thoughts would be appreciated,

Bruce

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Deleting Problem

2002-08-01 Thread Doug Cutting

Terry Steichen wrote:
> fine now. (I thought I read someplace that you didn't have to optimize after
> a delete, but if I don't, it doesn't seem to work.)

You don't need to optimize after delete for search results to be 
correct.  However IndexReader.docFreq() may be incorrect until you've 
optimized.  So if your application requires that IndexReader.docFreq() 
is always correct, you'll need to optimize after deletes.

An alternative, if you sometimes need to know the actual current 
frequency, but don't want to optimize after deletes, is to make a single 
term query and use Hits.length().  This is slower than docFreq(), but 
does take account of deleted documents.

Doug


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Deleting Problem

2002-08-01 Thread Terry Steichen

Thanks Ian.  You are right.  Actually, I did originally create an
IndexWriter in my code, but neglected to set the second parameter to
'false'.  So when I ran it, the whole index disappeared.  So I stopped using
it.  Got a gotcha no matter which way I goofed.  Thanks again - it works
fine now. (I thought I read someplace that you didn't have to optimize after
a delete, but if I don't, it doesn't seem to work.)

Regards,

Terry

- Original Message -
From: "Ian Lea" <[EMAIL PROTECTED]>
To: "Terry Steichen" <[EMAIL PROTECTED]>
Cc: "Lucene Users Group" <[EMAIL PROTECTED]>
Sent: Thursday, August 01, 2002 11:26 AM
Subject: Re: Deleting Problem


>
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade
r.html
> says, for delete:
>
> "Deletes the document numbered docNum. Once a document is deleted it will
not appear in TermDocs or TermPostitions enumerations. Attempts to read its
field with the document(int) method will result in an error. The presence of
this document may still be reflected in the
docFreq(org.apache.lucene.index.Term) statistic, though this will be
corrected eventually as the index is further modified."
>
> This is from the delete(int) method rather than delete(Term) but I would
> expect that it still holds true.
>
> If you want the deleted documents to really disappear for good, now,
optimize
> the index.
>
>
> --
> Ian.
> [EMAIL PROTECTED]
>
>
> > [EMAIL PROTECTED] (Terry Steichen) wrote
> >
> > I'm having difficulty deleting documents from my index.
> >
> > Here's code snippet 1:
> >
> > IndexReader reader = IndexReader.open(index_dir);
> > Term dterm = new Term("pub_date",pub_date);
> > int docs = reader.docFreq(dterm);
> > reader.close();
> > System.out.println("Found "+docs+" docs matching term pub_date =
"+pub_date);
> >
> > It reports back that I have 48 matching documents.  Then I run code
snippet 2:
> >
> > IndexReader reader = IndexReader.open(index_dir);
> > Term dterm = new Term("pub_date",pub_date);
> > int docs = reader.delete(dterm);
> > reader.close();
> > System.out.println("Deleted"+docs+" docs matching term pub_date =
"+pub_date);
> >
> > It reports back that I deleted 48 documents.
> >
> > But when I run snippet 1 once again, it reports 48 matching documents
still exist.
> >
> > If I run snippet 2 again, it reports that it (this time) deleted 0 docs.
> >
> > Obviously I'm overlooking something (probably obvious and simple), but I
can't seem to delete the selected documents.  Ideas/help would be welcome.
> >
> > Regards,
> >
> > Terry
>
> --
> Searchable personal storage and archiving from http://www.digimem.net/
>
>






> --
> To unsubscribe, e-mail:

> For additional commands, e-mail:



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Lucene and Slide

2002-08-01 Thread Anand Krishnan

Has any one had the experience of using both Slide and Lucene (for CM and
search capabilities ) . If yes do let me know if you have faced any
integration issues .

regards
Anand

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Deleting Problem

2002-08-01 Thread Ian Lea

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html
says, for delete:

"Deletes the document numbered docNum. Once a document is deleted it will not appear 
in TermDocs or TermPostitions enumerations. Attempts to read its field with the 
document(int) method will result in an error. The presence of this document may still 
be reflected in the docFreq(org.apache.lucene.index.Term) statistic, though this will 
be corrected eventually as the index is further modified."

This is from the delete(int) method rather than delete(Term) but I would
expect that it still holds true.

If you want the deleted documents to really disappear for good, now, optimize
the index.


--
Ian.
[EMAIL PROTECTED]


> [EMAIL PROTECTED] (Terry Steichen) wrote 
>
> I'm having difficulty deleting documents from my index.
> 
> Here's code snippet 1:
> 
> IndexReader reader = IndexReader.open(index_dir);
> Term dterm = new Term("pub_date",pub_date);
> int docs = reader.docFreq(dterm);
> reader.close();
> System.out.println("Found "+docs+" docs matching term pub_date = "+pub_date);
> 
> It reports back that I have 48 matching documents.  Then I run code snippet 2:
> 
> IndexReader reader = IndexReader.open(index_dir);
> Term dterm = new Term("pub_date",pub_date);
> int docs = reader.delete(dterm);
> reader.close();
> System.out.println("Deleted"+docs+" docs matching term pub_date = "+pub_date);
> 
> It reports back that I deleted 48 documents.  
> 
> But when I run snippet 1 once again, it reports 48 matching documents still exist. 
> 
> If I run snippet 2 again, it reports that it (this time) deleted 0 docs.
> 
> Obviously I'm overlooking something (probably obvious and simple), but I can't seem 
>to delete the selected documents.  Ideas/help would be welcome.
> 
> Regards,
> 
> Terry

--
Searchable personal storage and archiving from http://www.digimem.net/



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 


Deleting Problem

2002-08-01 Thread Terry Steichen

I'm having difficulty deleting documents from my index.

Here's code snippet 1:

IndexReader reader = IndexReader.open(index_dir);
Term dterm = new Term("pub_date",pub_date);
int docs = reader.docFreq(dterm);
reader.close();
System.out.println("Found "+docs+" docs matching term pub_date = "+pub_date);

It reports back that I have 48 matching documents.  Then I run code snippet 2:

IndexReader reader = IndexReader.open(index_dir);
Term dterm = new Term("pub_date",pub_date);
int docs = reader.delete(dterm);
reader.close();
System.out.println("Deleted"+docs+" docs matching term pub_date = "+pub_date);

It reports back that I deleted 48 documents.  

But when I run snippet 1 once again, it reports 48 matching documents still exist. 

If I run snippet 2 again, it reports that it (this time) deleted 0 docs.

Obviously I'm overlooking something (probably obvious and simple), but I can't seem to 
delete the selected documents.  Ideas/help would be welcome.

Regards,

Terry