Re: Indexing in a CBD Environment

2002-12-10 Thread petite_abeille

On Wednesday, Dec 11, 2002, at 07:16 Europe/Zurich, Otis Gospodnetic 
wrote:

It uses Lucene as an
object store, of sort, I believe, with variuos relations between
objects (I did not look at the source, but I suspect it does this based
on the functionality it offers).


Yep. The basic approach ZOE takes is to create one index per class and 
index the primary and foreigns key as keywords. It then query the 
different indexes to simulate a "relational" storage... Which is all 
handy, dandy... On the other hand, if you already have a relational 
database in the first place, there is no reason to go through this 
circus in the first place...

  You may want to look at its source.


If you are so inclined, you can check the alt.dev.szobject package for 
more gory details. In particular, SZIndex deals with Lucene directly.

You can find the app and its source here:

http://guests.evectors.it/zoe/

Cheers,

PA.



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 



RE: Indexing in a CBD Environment

2002-12-10 Thread Otis Gospodnetic
Hm.  Is Lucene really the best tool for this job?
Anyhow, I'm not going to get into details, but this makes me think of
Zoe.  Zoe is a nice app that handles email in 'novel' ways, and is
built on top of Lucene, among other things.  It uses Lucene as an
object store, of sort, I believe, with variuos relations between
objects (I did not look at the source, but I suspect it does this based
on the functionality it offers).  You may want to look at its source.
Bed.

Otis


--- "Cohan, Sean" <[EMAIL PROTECTED]> wrote:
> I was thinking that may help, but not sure if it will completely
> solve our
> problems.  I'll try to give an overly simplified example.
> 
> Say we have a user table in a user component with columns of varying
> types
> (int, char, varchar, date.)  Say we have a note table in note
> component with
> date and varchar note columns.  A user can create notes so we want to
> associate notes in the note component to users in the user component,
> (there
> could be other apps or parts of our app creating other kinds of notes
> [i.e.,
> not user notes.])  To map the 2, say we have an associative table in
> the
> user component containing user foreign keys and note foreign keys.
> 
> I think we want several types of indexes, but I may be wrong given my
> limited knowledge on search engine design.  I think we may want to an
> index
> of the user columns and an index of note columns.  Easy enough.  The
> tricky
> one is an index of user info with their associated note info.  We
> want to be
> able to search for note info tied to a specific user or specific
> lists of
> users.  If that note info changes (say by some means other than the
> user to
> which it is associated) how do we re-fresh the index tying the user
> to the
> new note info?  How do we keep things in sync in the index if the
> data is
> spread out among several databases.
> 
> Perhaps there is a better index design than what I stated above.  
> 
> In actuality, we will have a core component containing objects
> related to
> several other components so we will several associative tables.  We
> would
> want to be able to tie the core component objects to each of the
> other
> related components within the index(es.)  To further compound things,
> the
> sub component objects could be related to their own sub-components
> (D, E,
> and F below.)
> 
> Hopefully, I've kind of clarified what it is we're trying to do, and
> hopefully, someone can aid us in coming up with a good approach using
> Lucene.
> 
> Thanks. 
> 
> 
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, December 10, 2002 5:51 PM
> To: Lucene Users List
> Subject: Re: Indexing in a CBD Environment
> 
> 
> I'm not sure I understand the question, but I'll hazard an answer 
> anyway.  Might it work to maintain separate indexes for B, C, E and
> F, 
> then use a MultiSearcher to search them all?  That would keep updates
> 
> local...
> 
> Doug
> 
> Cohan, Sean wrote:
> > I am a total newbie to Lucene.  We are developing using a
> Component-Based
> > Development (CBD) approach (j2ee, oracle, linux) where our app is
> built
> > using separate stand-alone components.  The standalone components
> may
> reside
> > on separate boxes and will typically have their own databases.  
> > 
> > From what I understand, Lucene operates on a collection of flat
> documents
> > (or objects) of a single type at one time.  For our project, we
> need a
> > search that will operate on a diverse range of objects that are
> interrelated
> > by foreign keys.  
> > 
> > We have thought of constructing a flat multi-field document that
> represents
> > the tree of all dependent objects we wish to search. 
> Unfortunately, doing
> > so is difficult to do with CBD.  
> > 
> > Object Hierarchy  Flattened Document
> > 
> > A A.A-field1
> > | A.A-field2
> > +---+---+ A.B-field1
> > |   |   | A.B-field2
> > B   C   D A.C-field1
> > +--+  A.D-field1
> > |  |  A.D-E-field1
> > E  F  A.D-F-field1
> > 
> > In the example above, if you want to index the object tree
> indicated by
> the
> > diagram at left, you can do so easily upon an update of A, by
> traversing
> the
> > tree, to produce something that looks like the flattened document
> at
> right.
> > The problem comes when you want to individually update objects B-F.
> > Assuming these objects are in other components (i.e., databases)
> that have
> > no knowledge of A, there is no way to update their data within the
> context
> > of the hierarchy.
> > 
> > We can't think of any way to make the flat structure of Lucerne
> work with
> > CBD.
> > 
> > We greatly appreciate any ideas or suggestions.  Thanks.
> > 
> > 
> > 
> > --
> > To unsubscribe, e-mail:
> 
> > For additi

RE: Indexing in a CBD Environment

2002-12-10 Thread Cohan, Sean
I was thinking that may help, but not sure if it will completely solve our
problems.  I'll try to give an overly simplified example.

Say we have a user table in a user component with columns of varying types
(int, char, varchar, date.)  Say we have a note table in note component with
date and varchar note columns.  A user can create notes so we want to
associate notes in the note component to users in the user component, (there
could be other apps or parts of our app creating other kinds of notes [i.e.,
not user notes.])  To map the 2, say we have an associative table in the
user component containing user foreign keys and note foreign keys.

I think we want several types of indexes, but I may be wrong given my
limited knowledge on search engine design.  I think we may want to an index
of the user columns and an index of note columns.  Easy enough.  The tricky
one is an index of user info with their associated note info.  We want to be
able to search for note info tied to a specific user or specific lists of
users.  If that note info changes (say by some means other than the user to
which it is associated) how do we re-fresh the index tying the user to the
new note info?  How do we keep things in sync in the index if the data is
spread out among several databases.

Perhaps there is a better index design than what I stated above.  

In actuality, we will have a core component containing objects related to
several other components so we will several associative tables.  We would
want to be able to tie the core component objects to each of the other
related components within the index(es.)  To further compound things, the
sub component objects could be related to their own sub-components (D, E,
and F below.)

Hopefully, I've kind of clarified what it is we're trying to do, and
hopefully, someone can aid us in coming up with a good approach using
Lucene.

Thanks. 


-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 10, 2002 5:51 PM
To: Lucene Users List
Subject: Re: Indexing in a CBD Environment


I'm not sure I understand the question, but I'll hazard an answer 
anyway.  Might it work to maintain separate indexes for B, C, E and F, 
then use a MultiSearcher to search them all?  That would keep updates 
local...

Doug

Cohan, Sean wrote:
> I am a total newbie to Lucene.  We are developing using a Component-Based
> Development (CBD) approach (j2ee, oracle, linux) where our app is built
> using separate stand-alone components.  The standalone components may
reside
> on separate boxes and will typically have their own databases.  
> 
> From what I understand, Lucene operates on a collection of flat documents
> (or objects) of a single type at one time.  For our project, we need a
> search that will operate on a diverse range of objects that are
interrelated
> by foreign keys.  
> 
> We have thought of constructing a flat multi-field document that
represents
> the tree of all dependent objects we wish to search.  Unfortunately, doing
> so is difficult to do with CBD.  
> 
> Object Hierarchy  Flattened Document
> 
> A A.A-field1
> | A.A-field2
> +---+---+ A.B-field1
> |   |   | A.B-field2
> B   C   D A.C-field1
> +--+  A.D-field1
> |  |  A.D-E-field1
> E  F  A.D-F-field1
> 
> In the example above, if you want to index the object tree indicated by
the
> diagram at left, you can do so easily upon an update of A, by traversing
the
> tree, to produce something that looks like the flattened document at
right.
> The problem comes when you want to individually update objects B-F.
> Assuming these objects are in other components (i.e., databases) that have
> no knowledge of A, there is no way to update their data within the context
> of the hierarchy.
> 
> We can't think of any way to make the flat structure of Lucerne work with
> CBD.
> 
> We greatly appreciate any ideas or suggestions.  Thanks.
> 
> 
> 
> --
> To unsubscribe, e-mail:

> For additional commands, e-mail:

> 


--
To unsubscribe, e-mail:

For additional commands, e-mail:


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Indexing in a CBD Environment

2002-12-10 Thread Doug Cutting
I'm not sure I understand the question, but I'll hazard an answer 
anyway.  Might it work to maintain separate indexes for B, C, E and F, 
then use a MultiSearcher to search them all?  That would keep updates 
local...

Doug

Cohan, Sean wrote:
I am a total newbie to Lucene.  We are developing using a Component-Based
Development (CBD) approach (j2ee, oracle, linux) where our app is built
using separate stand-alone components.  The standalone components may reside
on separate boxes and will typically have their own databases.  

From what I understand, Lucene operates on a collection of flat documents
(or objects) of a single type at one time.  For our project, we need a
search that will operate on a diverse range of objects that are interrelated
by foreign keys.  

We have thought of constructing a flat multi-field document that represents
the tree of all dependent objects we wish to search.  Unfortunately, doing
so is difficult to do with CBD.  

Object Hierarchy  Flattened Document

A A.A-field1
| A.A-field2
+---+---+ A.B-field1
|   |   | A.B-field2
B   C   D A.C-field1
+--+  A.D-field1
|  |  A.D-E-field1
E  F  A.D-F-field1

In the example above, if you want to index the object tree indicated by the
diagram at left, you can do so easily upon an update of A, by traversing the
tree, to produce something that looks like the flattened document at right.
The problem comes when you want to individually update objects B-F.
Assuming these objects are in other components (i.e., databases) that have
no knowledge of A, there is no way to update their data within the context
of the hierarchy.

We can't think of any way to make the flat structure of Lucerne work with
CBD.

We greatly appreciate any ideas or suggestions.  Thanks.



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Indexing in a CBD Environment

2002-12-10 Thread Cohan, Sean
I am a total newbie to Lucene.  We are developing using a Component-Based
Development (CBD) approach (j2ee, oracle, linux) where our app is built
using separate stand-alone components.  The standalone components may reside
on separate boxes and will typically have their own databases.  

>From what I understand, Lucene operates on a collection of flat documents
(or objects) of a single type at one time.  For our project, we need a
search that will operate on a diverse range of objects that are interrelated
by foreign keys.  

We have thought of constructing a flat multi-field document that represents
the tree of all dependent objects we wish to search.  Unfortunately, doing
so is difficult to do with CBD.  

Object Hierarchy  Flattened Document

A A.A-field1
| A.A-field2
+---+---+ A.B-field1
|   |   | A.B-field2
B   C   D A.C-field1
+--+  A.D-field1
|  |  A.D-E-field1
E  F  A.D-F-field1

In the example above, if you want to index the object tree indicated by the
diagram at left, you can do so easily upon an update of A, by traversing the
tree, to produce something that looks like the flattened document at right.
The problem comes when you want to individually update objects B-F.
Assuming these objects are in other components (i.e., databases) that have
no knowledge of A, there is no way to update their data within the context
of the hierarchy.

We can't think of any way to make the flat structure of Lucerne work with
CBD.

We greatly appreciate any ideas or suggestions.  Thanks.



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Accentuated characters

2002-12-10 Thread Alex Murzaku
IBM's ICU4J has a normalizer which should do what you need. It's a big
library, but if you deal with multilingual text often, it might make your
life easier.


-- 
Alex Murzaku
___
 alex(at)lissus.com  http://www.lissus.com

-Original Message-
From: stephane vaucher [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, December 10, 2002 2:58 PM
To: [EMAIL PROTECTED]
Subject: Accentuated characters


Hello everyone,

I wish to implement a TokenFilter that will remove accentuated 
characters so for example 'é' will become 'e'. As I would rather not 
reinvent the wheel, I've tried to find something on the web and on the 
mailing lists. I saw a mention of a contrib that could do this (see 
http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg02146.html),

but I don't see anything applicable.

Has anyone done this yet, if so I would much appreciate some pointers 
(or code), otherwise, I'll be happy to contribute whatever I produce 
(but it might be very simple since I'll only need to deal with french).

Cheers,
Stephane


--
To unsubscribe, e-mail:

For additional commands, e-mail:


<>--
To unsubscribe, e-mail:   
For additional commands, e-mail: 


RE: Accentuated characters

2002-12-10 Thread Eric Isakson
If you really want to make your own TokenFilter, have a look at 
org.apache.lucene.analysis.LowerCaseFilter.next()

it does:
  public final Token next() throws java.io.IOException {
Token t = input.next();

if (t == null)
  return null;

t.termText = t.termText.toLowerCase();

return t;
  }

The termText member of the Token class is package scoped, so you will have to 
implement your filter in the org.apache.lucene.analysis package. No worries about 
encoding as the termText is already a java (unicode) string. You will just have to 
provide the mechanism to get the accented characters converted to there non-accented 
equivalents. java.text.Collator has some magic that does this for string comparisons 
but I couldn't find any public methods that give you access to convert a string to its 
non-accented equivalent.

Eric
--
Eric D. IsaksonSAS Institute Inc.
Application Developer  SAS Campus Drive
XML Technologies   Cary, NC 27513
(919) 531-3639 http://www.sas.com



-Original Message-
From: stephane vaucher [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 10, 2002 2:58 PM
To: [EMAIL PROTECTED]
Subject: Accentuated characters


Hello everyone,

I wish to implement a TokenFilter that will remove accentuated 
characters so for example 'é' will become 'e'. As I would rather not 
reinvent the wheel, I've tried to find something on the web and on the 
mailing lists. I saw a mention of a contrib that could do this (see 
http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg02146.html), 
but I don't see anything applicable.

Has anyone done this yet, if so I would much appreciate some pointers 
(or code), otherwise, I'll be happy to contribute whatever I produce 
(but it might be very simple since I'll only need to deal with french).

Cheers,
Stephane


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Accentuated characters

2002-12-10 Thread Joshua O'Madadhain
On Tue, 10 Dec 2002, stephane vaucher wrote:

> I wish to implement a TokenFilter that will remove accentuated
> characters so for example 'é' will become 'e'. As I would rather not
> reinvent the wheel, I've tried to find something on the web and on the
> mailing lists. I saw a mention of a contrib that could do this (see
> http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg02146.html),
> but I don't see anything applicable.

It may depend on what kind of encoding you're working with.  (E.g., HTML
documents represent such characters in a different way than that of
Postscript documents.)  Probably the easiest way to handle this, if you
want to avoid such questions, would be to convert all your input documents
(and query text) to Java (Unicode) strings, and then do a
search-and-replace with the appropriate character-pair arguments.  (After
this is done, you would then do whatever Lucene processing (indexing,
query parsing, etc.) was appropriate.  I am not aware of any code that
does this, but it should be straightforward.
 
Good luck--

Joshua O'Madadhain

 [EMAIL PROTECTED] Per Obscurius...www.ics.uci.edu/~jmadden
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.




--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Accentuated characters

2002-12-10 Thread Eric Isakson
Don't know if any of the code in this French analyzer that was contributed by Patrick 
Talbot may apply, any reason you don't just use it? see 
http://nagoya.apache.org/eyebrowse/ReadMsg?[EMAIL PROTECTED]&msgNo=870

Eric
--
Eric D. IsaksonSAS Institute Inc.
Application Developer  SAS Campus Drive
XML Technologies   Cary, NC 27513
(919) 531-3639 http://www.sas.com


-Original Message-
From: stephane vaucher [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 10, 2002 2:58 PM
To: [EMAIL PROTECTED]
Subject: Accentuated characters


Hello everyone,

I wish to implement a TokenFilter that will remove accentuated 
characters so for example 'é' will become 'e'. As I would rather not 
reinvent the wheel, I've tried to find something on the web and on the 
mailing lists. I saw a mention of a contrib that could do this (see 
http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg02146.html), 
but I don't see anything applicable.

Has anyone done this yet, if so I would much appreciate some pointers 
(or code), otherwise, I'll be happy to contribute whatever I produce 
(but it might be very simple since I'll only need to deal with french).

Cheers,
Stephane


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Accentuated characters

2002-12-10 Thread stephane vaucher
Hello everyone,

I wish to implement a TokenFilter that will remove accentuated 
characters so for example 'é' will become 'e'. As I would rather not 
reinvent the wheel, I've tried to find something on the web and on the 
mailing lists. I saw a mention of a contrib that could do this (see 
http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg02146.html), 
but I don't see anything applicable.

Has anyone done this yet, if so I would much appreciate some pointers 
(or code), otherwise, I'll be happy to contribute whatever I produce 
(but it might be very simple since I'll only need to deal with french).

Cheers,
Stephane


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 



Integrate lucene with an editor

2002-12-10 Thread alex
Hi all

I have use the lucene demo to index my own files by typing

 java org.apache.lucene.demo.IndexFiles c:\docs

I have a simply editor and wish to intergrate this demo with the editor so that 
searching is invoke automatically. 

is it possible to do this ? and any suggestions on how this could be done ?

thxs

Alex



RE: QueryParser searches

2002-12-10 Thread Eric Isakson
Hi Roy,

One of the things I do with my query tool is show the user what happened to the query 
after it went through the query parser and analyzer. You might try producing your 
Query object then dump it out somewhere for examination using the 
Query.toString(String defaultField) method.

I get results like:

report AND title:search
becomes
+report +title:search

report AND title:search OR as
becomes
+report +title:search
Note that here "as" is a stop word, so the query is the same

(report OR HaPPy) AND (title:SeArCH OR Ju?k)
becomes
+(report happy) +(title:search Ju?k)
Notice the analyzer normalized the case of all the tokens except the one with the 
wildcard character

Eric
--
Eric D. IsaksonSAS Institute Inc.
Application Developer  SAS Campus Drive
XML Technologies   Cary, NC 27513
(919) 531-3639 http://www.sas.com



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 10, 2002 11:00 AM
To: '[EMAIL PROTECTED]'
Subject: QueryParser searches


Hey guys,

We have a question about how the QueryParser class optimizes searches, if it
does at all.  We have some searches that are taking an abnormal amount of
time.  Our search string specifies 3-4 fields separated by ORs with a single
term each, and another field searching for many terms separated by a bunch
of ORs.  So it looks kind of like this:

Field1:( term1 term2 term3 term4 ... termN ) && ( Field2: termx || Field3:
termy || Field4: termz ) )

Does the QueryParser class do anything to optimize its search for this?  Or
is there a better way to do this?

Thanx.

Roy.


This email and any attachments are confidential and may be 
legally privileged. No confidentiality or privilege is waived 
or lost by any transmission in error.  If you are not the 
intended recipient you are hereby notified that any use, 
printing, copying or disclosure is strictly prohibited.  
Please delete this email and any attachments, without 
printing, copying, forwarding or saving them and notify the 
sender immediately by reply e-mail.  Zurich Capital Markets 
and its affiliates reserve the right to monitor all e-mail 
communications through its networks.  Unless otherwise 
stated, any pricing information in this e-mail is indicative 
only, is subject to change and does not constitute an offer 
to enter into any transaction at such price and any terms in 
relation to any proposed transaction are indicative only and 
subject to express final confirmation.

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: QueryParser searches

2002-12-10 Thread Peter Carlson
The QueryParser does no optimzation.

It tokenizes a text string into Lucene search objects.

How long is your search string (how many terms)?
How large is your index (number of terms would be great to know also)?



On Tuesday, December 10, 2002, at 07:59 AM, [EMAIL PROTECTED] 
wrote:

Hey guys,

We have a question about how the QueryParser class optimizes searches, 
if it
does at all.  We have some searches that are taking an abnormal amount 
of
time.  Our search string specifies 3-4 fields separated by ORs with a 
single
term each, and another field searching for many terms separated by a 
bunch
of ORs.  So it looks kind of like this:

Field1:( term1 term2 term3 term4 ... termN ) && ( Field2: termx || 
Field3:
termy || Field4: termz ) )

Does the QueryParser class do anything to optimize its search for 
this?  Or
is there a better way to do this?

Thanx.

Roy.


This email and any attachments are confidential and may be
legally privileged. No confidentiality or privilege is waived
or lost by any transmission in error.  If you are not the
intended recipient you are hereby notified that any use,
printing, copying or disclosure is strictly prohibited.
Please delete this email and any attachments, without
printing, copying, forwarding or saving them and notify the
sender immediately by reply e-mail.  Zurich Capital Markets
and its affiliates reserve the right to monitor all e-mail
communications through its networks.  Unless otherwise
stated, any pricing information in this e-mail is indicative
only, is subject to change and does not constitute an offer
to enter into any transaction at such price and any terms in
relation to any proposed transaction are indicative only and
subject to express final confirmation.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




QueryParser searches

2002-12-10 Thread roy-lucene-user
Hey guys,

We have a question about how the QueryParser class optimizes searches, if it
does at all.  We have some searches that are taking an abnormal amount of
time.  Our search string specifies 3-4 fields separated by ORs with a single
term each, and another field searching for many terms separated by a bunch
of ORs.  So it looks kind of like this:

Field1:( term1 term2 term3 term4 ... termN ) && ( Field2: termx || Field3:
termy || Field4: termz ) )

Does the QueryParser class do anything to optimize its search for this?  Or
is there a better way to do this?

Thanx.

Roy.


This email and any attachments are confidential and may be 
legally privileged. No confidentiality or privilege is waived 
or lost by any transmission in error.  If you are not the 
intended recipient you are hereby notified that any use, 
printing, copying or disclosure is strictly prohibited.  
Please delete this email and any attachments, without 
printing, copying, forwarding or saving them and notify the 
sender immediately by reply e-mail.  Zurich Capital Markets 
and its affiliates reserve the right to monitor all e-mail 
communications through its networks.  Unless otherwise 
stated, any pricing information in this e-mail is indicative 
only, is subject to change and does not constitute an offer 
to enter into any transaction at such price and any terms in 
relation to any proposed transaction are indicative only and 
subject to express final confirmation.



Re: has this exception been seen before

2002-12-10 Thread Avi Drissman
At 9:55 AM -0800 11/12/02, you wrote:


A self-contained, reproducible test case is required before someone 
can really start looking at it.

Yep, I know how that goes. Still trying to get my hands on the 
database that this seems to happen to.

The reason I write is that there's a new stack trace available. Same 
place, but this time I have line numbers (corresponding to the 
lucene-1.2-src tarball release). Perhaps that might help a bit.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14355

Avi

--
Avi 'rlwimi' Drissman
[EMAIL PROTECTED]
Argh! This darn mail server is trunca

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 



Re: When do documents become searchable?

2002-12-10 Thread Ashley Collins

Excellent. Thanks.



From: Otis Gospodnetic <[EMAIL PROTECTED]>
Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
To: Lucene Users List <[EMAIL PROTECTED]>
Subject: Re: When do documents become searchable?
Date: Tue, 10 Dec 2002 05:00:36 -0800 (PST)

I believe you can just call optimize() and then open a new
IndexSearcher (when you detect that the index changed).  That should
let you find the newly added docs, too.

Otis

--- Ashley Collins <[EMAIL PROTECTED]> wrote:
>
> I'm keeping an IndexWriter open so new documents can be indexed as
> they
> arrive.
>
> I open a new IndexSearcher every time a user runs a search.
>
> It seems that search results don't include all documents until I
> restart the
> application (which calls IndexWriter.optimize() then
> IndexWriter.close()).
>
> Should I be able to keep an IndexWriter open all the time? And, is
> calling
> optimize() periodically enough to flush data to disk and make it
> searchable?
>
> Thanks in advance.
> Ashley
>
>
>
>
>
>
> _
> Add photos to your messages with MSN 8. Get 2 months FREE*.
> http://join.msn.com/?page=features/featuredemail
>
>
> --
> To unsubscribe, e-mail:
> 
> For additional commands, e-mail:
> 
>


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   

For additional commands, e-mail: 



_
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 



Re: When do documents become searchable?

2002-12-10 Thread Otis Gospodnetic
I believe you can just call optimize() and then open a new
IndexSearcher (when you detect that the index changed).  That should
let you find the newly added docs, too.

Otis

--- Ashley Collins <[EMAIL PROTECTED]> wrote:
> 
> I'm keeping an IndexWriter open so new documents can be indexed as
> they 
> arrive.
> 
> I open a new IndexSearcher every time a user runs a search.
> 
> It seems that search results don't include all documents until I
> restart the 
> application (which calls IndexWriter.optimize() then
> IndexWriter.close()).
> 
> Should I be able to keep an IndexWriter open all the time? And, is
> calling 
> optimize() periodically enough to flush data to disk and make it
> searchable?
> 
> Thanks in advance.
> Ashley
> 
> 
> 
> 
> 
> 
> _
> Add photos to your messages with MSN 8. Get 2 months FREE*. 
> http://join.msn.com/?page=features/featuredemail
> 
> 
> --
> To unsubscribe, e-mail:  
> 
> For additional commands, e-mail:
> 
> 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: stemming feature

2002-12-10 Thread Otis Gospodnetic
Check out PorterStemFilter class and Analyzer class.  Then look at some
Analyzer implementations and see how to implement your own
PorterAnalyzer.

Otis

--- M Srinivas Rao <[EMAIL PROTECTED]> wrote:
> Hi all
> 
> Does the lucene will do stemming of a word?  If yes can anyone
> say how to do it in java using lucene api.
> 
> Thanks
> rgds
> srinivas
> 
> __
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> 
> --
> To unsubscribe, e-mail:  
> 
> For additional commands, e-mail:
> 
> 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




stemming feature

2002-12-10 Thread M Srinivas Rao
Hi all

Can anyone tell, where can i get the process flow  diagrams kind
of thing for lucene. I want to know how lucene works.

Thanks
rgds
srinivas

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




stemming feature

2002-12-10 Thread M Srinivas Rao
Hi all

Does the lucene will do stemming of a word?  If yes can anyone
say how to do it in java using lucene api.

Thanks
rgds
srinivas

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




When do documents become searchable?

2002-12-10 Thread Ashley Collins

I'm keeping an IndexWriter open so new documents can be indexed as they 
arrive.

I open a new IndexSearcher every time a user runs a search.

It seems that search results don't include all documents until I restart the 
application (which calls IndexWriter.optimize() then IndexWriter.close()).

Should I be able to keep an IndexWriter open all the time? And, is calling 
optimize() periodically enough to flush data to disk and make it searchable?

Thanks in advance.
Ashley






_
Add photos to your messages with MSN 8. Get 2 months FREE*. 
http://join.msn.com/?page=features/featuredemail


--
To unsubscribe, e-mail:   
For additional commands, e-mail: