RE: Re[2]: Is IndexSearcher thread safe?

2005-03-01 Thread Cocula Remi

I probably had the same trouble (but I'm not sure).
I have run a test programm that was creating  a lot of IndexSearchers (but also 
close and free them).
It went to an outOfMemory Exception.
But i'm not finished with that problem (need to use a profiler).


>But I have discovered one strange fact. When you have indexSearcher on
>big index, so IndexSearcher object takes a lot of memory (900Mb) and
>when you create new IndexSearcher after deletion of all references to
>old IndexSearcher then memory consumed my old IndexSearcher will not be
>ever freed.
>What can community answer on this strange fact?

Yura Smolsky.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Is IndexSearcher thread safe?

2005-03-01 Thread Cocula Remi


>Additional question.
>If I'm sharing one instance of IndexSearcher between different threads 
>Is it good to just to drop this instance to GC.
>Because I don't know if some thread is still using this searcher or done 
>with it.

Note that as far as one of the threads keep a reference on the IndexSearcher it 
can not be garbage collected.
Perhaps you meant that you do not know how a thread can declare that it does no 
more need the indexSearcher.

To cope this that I created an IndexSercher pool.
The pool contains a list of IndexSearchers and each one is associated with a 
counter. 
To get an IndexSearcher reference one must request it to the pool and then the 
counter is incremented.
(To make it cleaner I had the idea to replace IndexSearcher references in the 
pool with proxy objects thus the pool will never distribute references of 
IndexSearchers to clients objects.
The counter can be manage inside the proxy.)

The pool has the ability to close and dereference an IndexSearcher when it is 
no more used (counter=0).

Hope it helps.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-28 Thread Cocula Remi

In addition to this discution I would like to mention my efforts in creating 
a wrapper around Lucene with the LuceneServer project 
(http://sourceforge.net/projects/luceneserver/).
It uses RMI to make indexes available over a network and includes automation 
tasks.
I am courrently working on a search session mechanism that includes pagination 
and highlightment.
I have many ideas to make it a real standalone Intranet search engine.

I would be verry glad that some people could help me make it grow.
Perhaps it could be a starting point of what Erik mentions.



-Message d'origine-
De : Erik Hatcher [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 28 janvier 2005 02:02
À : Lucene Users List
Objet : Re: rackmount lucene/nutch - Re: google mini? who needs it when
Lucene is there


I've often said that there is a business to be had in packaging up 
Lucene (and now Nutch) into a cute little box with user friendly 
management software to search your intranet.  SearchBlox is already 
there (except they don't include the box).

I really hope that an application like SearchBlox/Zilverline can be 
created as part of the Lucene project itself, replacing the sad demos 
that currently ship with Lucene.  I've got so many things on my plate 
that I don't foresee myself getting to this as soon as I'd like, but I 
would most definitely support and contribute what time I could to such 
an effort.  If the web UI used Tapestry, I'd be very inclined to dig in 
hardcore to it.  Any other web UI technology would likely turn me off.  
One of these days I'll Tapestry-ify Nutch just for grins and submit it 
as a replacement for the JSPs.

And I'm even more sold on it if Mac Mini's are involved!  :)

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Reloading an index

2005-01-27 Thread Cocula Remi
Make sure that the older searcher is not referenced elsewhere otherwise the 
garbage collector should 
delete it.
Just remember that the Garbage collector runs when memory is needed but not 
immediatly after changing a reference to null.


-Message d'origine-
De : Greg Gershman [mailto:[EMAIL PROTECTED]
Envoyé : jeudi 27 janvier 2005 17:29
À : lucene-user@jakarta.apache.org
Objet : Reloading an index


I have an index that is frequently updated.  When
indexing is completed, an event triggers a new
Searcher to be opened.  When the new Searcher is
opened, incoming searches are redirected to the new
Searcher, the old Searcher is closed and nulled, but I
still see about twice the amount of memory in use well
after the original searcher has been closed.   Is
there something else I can do to get this memory
reclaimed?  Should I explicitly call garbarge
collection?  Any ideas?

Thanks.

Greg Gershman 



__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: closing an IndexSearcher

2005-01-20 Thread Cocula Remi
As said [EMAIL PROTECTED] I was openning multiple instances of IndexSearcher.

Now the IndexReader seems to be closed but I am surprised that the searching 
over this closed index Reader still work, that was the original subject of this 
thread.


-Message d'origine-
De : Morus Walter [mailto:[EMAIL PROTECTED]
Envoyé : jeudi 20 janvier 2005 12:52
À : Lucene Users List
Objet : RE: closing an IndexSearcher


Hi Cocula,
> 
> And now here is a code that works : the only differance with the previous one 
> is the QueryParser call before new IndexWriter. The QueryParser .parse 
> statement seems to close the IndexReader but I really can't figure how.
>  
I rather suspect your OS/filesystem to delay the effect of the close.
QueryParser does not even know about your searcher.

What OS are you using?

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: closing an IndexSearcher

2005-01-20 Thread Cocula Remi
You are wright ! 
I didn't notice that.


-Message d'origine-
De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Envoyé : jeudi 20 janvier 2005 12:50
À : lucene-user@jakarta.apache.org
Objet : RE: closing an IndexSearcher


>   IndexSearcher searcher = new
IndexSearcher("c:\\tmp\\index");
>   searcher = new IndexSearcher("c:\\tmp\\index");
>   searcher.close();


Wouldn't the following mean you have two IndexSearcher instances, where
you would only close the last one?

Try just the first and the third line, thus:

>   IndexSearcher searcher = new
IndexSearcher("c:\\tmp\\index");
>   searcher.close();

Greetz, 
Nick



Disclaimer:
' Aan de inhoud van dit bericht kunnen alleen rechten ten opzichte van Interpay 
Nederland B.V. of aan haar gelieerde ondernemingen worden ontleend, indien zij 
door rechtsgeldig ondertekende stukken worden ondersteund. De informatie in dit 
e-mailbericht is van vertrouwelijke aard en alleen bedoeld voor gebruik door de 
geadresseerde. Als u een bericht onbedoeld heeft ontvangen, wordt u verzocht de 
verzender hiervan in kennis te stellen en het bericht te vernietigen zonder van 
de inhoud kennis te nemen, deze te vermenigvuldigen of andersoortig te 
gebruiken.' 
An English version of this disclaimer is available on 
http://www.interpay.nl/xhtml/ContentEng.aspx?linkid=en-04-00-00-00-00-00-001



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: closing an IndexSearcher

2005-01-20 Thread Cocula Remi

Complementary to my prvious mail I noticed strange behaviour of 
IndexSearcher.close().
Here is a code that does not work : the new IndexWriter() statement throws 
"java.io.IOException: Cannot delete _3.cfs" as if the Index searcher's 
underlying IndexReader where not closed.  

IndexSearcher searcher = new 
IndexSearcher("c:\\tmp\\index");
searcher = new IndexSearcher("c:\\tmp\\index");
searcher.close();

writer = new IndexWriter("c:\\tmp\\index",ana,true);
doc = new Document();
doc.add(Field.Text("text","toto li toto"));
writer.addDocument(doc);
writer.close();

And now here is a code that works : the only differance with the previous one 
is the QueryParser call before new IndexWriter. The QueryParser .parse 
statement seems to close the IndexReader but I really can't figure how.
 
IndexSearcher searcher = new 
IndexSearcher("c:\\tmp\\index");
searcher = new IndexSearcher("c:\\tmp\\index");
searcher.close();

Query query = QueryParser.parse("toto","text",ana); 


writer = new IndexWriter("c:\\tmp\\index",ana,true);
doc = new Document();
doc.add(Field.Text("text","toto li toto"));
writer.addDocument(doc);
writer.close();


Note : I use Lucene 1.4 Final

Does it seem to be a bug ? 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: closing an IndexSearcher

2005-01-20 Thread Cocula Remi
I have run my code with eclipse debugger and the IndexReader is closed (I mean 
it steps into the reader.close() statement)
but the search over this IndexReader still works.

Should a query work on a closed indexReader or should it throw an IOException ?


-Message d'origine-
De : Erik Hatcher [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 19 janvier 2005 18:31
À : Lucene Users List
Objet : Re: closing an IndexSearcher



On Jan 19, 2005, at 12:14 PM, Cocula Remi wrote:

>
> Hi ,
>
> I remarked that after closing an IndexSearcher, queries on this 
> Seacher will still run.
> My question is : why not always closing an IndexSearcher ?

IndexSearcher.close:

   public void close() throws IOException {
 if(closeReader)
   reader.close();
   }

However, you open it with a String:

> -
> searcher = new IndexSearcher("c:\\tmp\\index");

Which should close the underlying IndexReader.

Maybe this was a bug that has since been fixed in CVS (which is the 
code I'm referencing)?

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: help in indexing

2005-01-20 Thread Cocula Remi
What is LucenePDFDocument ?
Is it a sample class ?

However  LucenePDFDocument.getDocument(myPdfFile) should create a document 
containing an indexed field.
You can achieve  that for instance by using Field.Text().

writer = new IndexWriter("c:\\tmp\\index",ana,true);
Document doc = new Document();
doc.add(Field.Text("text","toto li toto"));
writer.addDocument(doc);
writer.close();

please refer to the static methods of the Field class.


-Message d'origine-
De : chetan minajagi [mailto:[EMAIL PROTECTED]
Envoyé : jeudi 20 janvier 2005 10:15
À : Lucene Users List
Objet : RE: help in indexing


Hi Karthik/Cocula,

Luke didn't work but Limo helped.I seem to get results when i use Limo for my 
text/xls files.
Now the problem with pdf search
The problem that i see is the "summary" field as seen through LIMO is not 
indexed and hence no hits.
I'm using the default document got by 
 LucenePDFDocument.getDocument(myPdfFile);
So how do i ensure that a few of the fields in this which are not indexed are 
set to indexed.
As far as I can see I can only probe whether a field is indexed or not by using 
Field.isIndexed() but is there a method by which i can set to indexed.
can someone provide any help or pointers in this regard?
 
Thanks & Regards,
Chetan

Karthik N S <[EMAIL PROTECTED]> wrote:
Hi

Probably u need to use the Luke S/w to peek insid tu'r Indexer,Use it then
come back for more help


Karthik


-Original Message-
From: chetan minajagi [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 20, 2005 12:05 PM
To: lucene-user@jakarta.apache.org
Subject: help in indexing


Hi ,

It might seem elementary to most of you.
I am trying to build a search tool for internal use using lucene.
I have used the following
for
.pdf --> PDFBOx
.html --> demo file of lucene(HTMLDocument)
.xls --> poi

The indexing seems to work without throwing up any errors.
But,when i try to search i end up getting with zero hits always.
I have tried to use the same string that i see (System.out.print(Document))
but in vain.
Can somebody let me know where and what could be wrong.
Regards,
Chetan


-
Do you Yahoo!?
Yahoo! Search presents - Jib Jab's 'Second Term'


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
 Yahoo! Mail - You care about security. So do we.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: help in indexing

2005-01-20 Thread Cocula Remi
You don't tell how you created the fields of your documents.
Please post some code.


-Message d'origine-
De : chetan minajagi [mailto:[EMAIL PROTECTED]
Envoyé : jeudi 20 janvier 2005 07:35
À : lucene-user@jakarta.apache.org
Objet : help in indexing


Hi ,

It might seem elementary to most of you.
I am trying to build a search tool for internal use using lucene.
I have used the following
 for 
 .pdf   -->  PDFBOx
 .html -->  demo file of lucene(HTMLDocument)
 .xls   -->  poi
 
The indexing seems to work without throwing up any errors.
But,when i try to search i end up getting with zero hits always.
I have tried to use the same string that i see (System.out.print(Document)) but 
in vain.
Can somebody let me know where and what could be wrong.
Regards,
Chetan


-
Do you Yahoo!?
 Yahoo! Search presents - Jib Jab's 'Second Term'

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



closing an IndexSearcher

2005-01-19 Thread Cocula Remi

Hi ,

I remarked that after closing an IndexSearcher, queries on this Seacher will 
still run.
My question is : why not always closing an IndexSearcher ?
In my case I need to close all indexsearchers when I want to rebuild the index.



Sample code 
-
searcher = new IndexSearcher("c:\\tmp\\index");
searcher.close();

Query query = QueryParser.parse("toto","text",ana);


Hits hits = searcher.search(query);
System.out.println(hits.length());
for (int i=0;i

Merry Christmas to every one concerned.

2004-12-24 Thread Cocula Remi


Question about multi-searching [re-post]

2004-11-22 Thread Cocula Remi



> Hi,
> 
> (First of all : what is the plurial of index in english ; indexes or indices 
> ?)
> 
> 
> I want to search into several indexes (indices ?).
> For that, I parse a new query using QueryParser or MultiFieldQueryParser.
> Then I search my indexes using the MultiSearcher class.
> 
> Ok, but the problem comes when different analyzer are used for each index.
> QueryParser requires an analyzer to parse the query but a query parsed with 
> an analyzer is not suitable for searching into an index that uses another 
> analyzer. 
> 
> Does anyone know a trick to cope this problem.
> 
> Eventually I could run a different query on each index to obtain several Hits 
> objects. 
> Then I could write some collector that collects Hits in the order of highest 
> scores.
> I wonder if this could work and if it would be as efficient as the 
> MultiSearcher . In this situation does it make sense to compare  the scores 
> of two differents Hits.


RE: Searching and indexing from different processes (applications)

2004-11-16 Thread Cocula Remi
I have created a tool that could respond to your question.
It is called "Lucene Server" (http://luceneserver.sourceforge.net/)
It is a tool for integration of Lucene in distributed environnements (via RMI). 

A new release is under developpement. It will include a paginated search 
service using XML.

If you are interested by this project just try it; I will support you.

Regards.



-Message d'origine-
De?: K Kim [mailto:[EMAIL PROTECTED]
Envoye?: mardi 16 novembre 2004 16:11
A?: [EMAIL PROTECTED]
Objet?: Searching and indexing from different processes (applications)



Hi.

I just started to play around with Lucene.  I was
wondering if searching and indexing can be done
simultaneously from different processes (two different
processes.)  For example, searching is serviced from a
web appliation, while indexing is done periodically
from a stand-alone application.

What would be the best way to implement this?  

Thanks.






___
최대 100MB, 더 이상 용량 고민없는 - 야후! 메일 
(http://mail.yahoo.co.kr)
최신곡, 추천곡, 가요, OST, 팝송, 뮤직비디오 - 야후! 비트박스  
(http://kr.music.yahoo.com)
최신 휴대폰 정보, 벨소리, 캐릭터, 문자메세지 - 야후! 모바일   
(http://kr.mobile.yahoo.com)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Question about multi-searching

2004-11-03 Thread Cocula Remi

Hi,

(First of all : what is the plurial of index in english ; indexes or indices ?)


I want to search into several indexes (indices ?).
For that, I parse a new query using QueryParser or MultiFieldQueryParser.
Then I search my indexes using the MultiSearcher class.

Ok, but the problem comes when different analyzer are used for each index.
QueryParser requires an analyzer to parse the query but a query parsed with an 
analyzer is not suitable for searching into an index that uses another analyzer. 

Does anyone know a trick to cope this problem.

Eventually I could run a different query on each index to obtain several Hits objects. 
Then I could write some collector that collects Hits in the order of highest scores.
I wonder if this could work and if it would be as efficient as the MultiSearcher . In 
this situation does it make sense to compare  the scores of two differents Hits.


RE: Search Help in word doc

2004-10-19 Thread Cocula Remi
In my case, search.
But probably that the best is to do it at indexing time.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 11:41
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Are you doing this functionality under indexing part or search part

-Original Message-----
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:37 PM
To: Lucene Users List
Subject: RE: Search Help in word doc

This sample code changes undesired characters into underscores.


Document doc = 

char[] cs = doc.get("content").toCharArray();
StringBuffer sb = new StringBuffer();
for (int j=0;j< Array.getLength(cs);j++)
{
if (!Character.isISOControl(cs[j]))
{
sb.append(cs[j]);
}
else
{
sb.append(" _ ");
}
}

System.out.println(sb.toString());

-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 11:06
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Hi Remi,

Thanks for your response...
Pls send me the jar name with sample code.

Thanks,
Natarajan.



-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:26 PM
To: Lucene Users List
Subject: RE: Search Help in word doc


Seen that.
I use the Character.isISOControl() function to identify and remove these
characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Help in word doc

2004-10-19 Thread Cocula Remi
This sample code changes undesired characters into underscores.


Document doc = 

char[] cs = doc.get("content").toCharArray();
StringBuffer sb = new StringBuffer();
for (int j=0;j< Array.getLength(cs);j++)
{
if (!Character.isISOControl(cs[j]))
{
sb.append(cs[j]);
}
else
{
sb.append(" _ ");
}
}

System.out.println(sb.toString());

-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 11:06
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Hi Remi,

Thanks for your response...
Pls send me the jar name with sample code.

Thanks,
Natarajan.



-----Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:26 PM
To: Lucene Users List
Subject: RE: Search Help in word doc


Seen that.
I use the Character.isISOControl() function to identify and remove these
characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Help in word doc

2004-10-19 Thread Cocula Remi

Seen that.
I use the Character.isISOControl() function to identify and remove these characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Memory usage: IndexSearcher & Sort

2004-09-30 Thread Cocula Remi


-Message d'origine-
De : Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 29 septembre 2004 18:28
À : Lucene Users List
Objet : RE: Memory usage: IndexSearcher & Sort



>> 2.  How does this approach work with multiple, simultaneous users?

>IndexSearcher is thread-safe.

You mean one can invoque at the same time the search method of a unique Searcheable in 
two different threads, 
Don't you ?



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Memory usage: IndexSearcher & Sort

2004-09-29 Thread Cocula Remi
My solution is :

I have bound in an RMI registry one RemoteSearchable object for each index.
Thus I do not have to create any IndexSearcher and I can execute query from any 
application.
This has been implemented in the Lucene Server that I have just began to create.
http://sourceforge.net/projects/luceneserver/

I use it in a web app.

It would be nice if some people could test it (don't you want ?)
  

-Message d'origine-
De : Bryan Dotzour [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 29 septembre 2004 15:11
À : '[EMAIL PROTECTED]'
Objet : Memory usage: IndexSearcher & Sort


I have been investigating a serious memory problem in our web app (using
Tapestry, Hibernate, & Lucene) and have reduced it to being the way in which
we are using Lucene to search on things.  Being a webapp, we have focused on
doing our work within a user's request.  So we basically end up opening at
least one new IndexSearcher on each individual page view.  In one particular
case, we were doing this in a loop, eventually opening ~20-~40
IndexSearchers which caused our memory usage to skyrocket.  After viewing
that one page 3 or 4 times we would exhaust the server's memory allocation.
 
Most helpful in this search was the following thread from Bugzilla:
 
http://issues.apache.org/bugzilla/show_bug.cgi?id=30628
 
 
>From this thread, it sounds like constantly opening and closing
IndexSearcher objects is a "BAD THING", but it is exactly what we are doing
in our app.  
There are a few things that puzzle me and I'd love it if anyone has some
input that might clear up some of these questions.
 
1.  According to the Bugzilla thread, and from my own testing, you can open
lots of IndexSearchers in a loop and do a search WITHOUT SORTING and not
have this memory problem.  Is there an issue with the Sort code?
2.  Can anyone give a brief, technical explanation as to why opening
multiple IndexSearcher objects is bad?
3.  Certainly some of you on this list are using Lucene in a web-app
environment.  Can anyone list some best practices on managing
reading/writing/searching a Lucene index in that context?
 
 
Thank you all
Bryan
---
Some extra information about my Lucene setup:
 
Lucene 1.4.1
We maintain 5 different indexes, all in RAMDirectories.  The indexes aren't
especially big (< 100,000 total objects combined).
  
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[ANNOUNCE] : Lucene Server

2004-09-23 Thread Cocula Remi
I am glad to introduce a new project on SourceForge that is related to Lucene.

"Lucene Server is a java server application for simply create and manage Jakarta 
Lucene Indexes. It is designed to help you integrate Lucene in distributed 
environnements."
The first release 0.1 is available for download.
Hope it will be usefull for somebody.
http://sourceforge.net/projects/luceneserver/

Remi COCULA.



RE: Help for text based indexing

2004-09-15 Thread Cocula Remi
No.

group:Group1 AND Hello

the group: prefix means that the word Group1 has to be searched in the group field.


-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 18:24
À : Lucene Users List
Objet : RE: Help for text based indexing


If i have rightly understood, you mean to say that the query for search has  to be 
 
"Group1" AND "Hello" (if hello is what I want to search ?)
 
Cocula Remi <[EMAIL PROTECTED]> wrote:
A keyword is not tokenized, that's why you wont be able to search over a part of it. 
You'd rather use a Text fied.

About creating a special field : 

IndexWriter Ir = 

File f = 
Document doc = new Document();
if (f.toString.startsWith("C:\tomcat\webapps\Root\Group1")
{
doc.add(Field.Text("group", "Group1"));
}
if (f.toString.startsWith("C:\tomcat\webapps\Root\Group2")
{
doc.add(Field.Text("group", "Group2"));
}
doc.add(Field.Text("content", getContent(f)));
Ir.addDocument(doc);



Then you can search in group1 with query like that : 

group:Group1 AND rest_of_the_query.



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 18:03
À : Lucene Users List
Objet : RE: Help for text based indexing


Well in my case the path is KeyWord. I had tried that earlier and it does not seems to 
work in a single index file. 

Can you explain a bit more about adding group1 and group2 ?

Cocula Remi wrote:
Well you could add a field to each of your Documents whose value would be either 
"group1" or "group2".
Or you could use the path to your files ...



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 17:49
À : [EMAIL PROTECTED]
Objet : RE: Help for text based indexing


I am clear with looping recursively to index all the file under Root folder.
But the problem is if I want to search only in group1 or group2.Is that possible to 
search only in one of the group folder ?

Cocula Remi wrote:
You just have to loop recurssively over the C:\tomcat\webapps\Root tree to create your 
index.
Yes you can index databases; you will just have to write a mechanism that is able to 
create org.apache.lucene.document.Document from database.
For instance : 
- connect JDBC
- run a query for obtaining a ResultSet
- loop for each row of that ResultSet :
Create a new org.apache.lucene.document.Document from ResultSet data
and add this document to the Index.
end loop.

For incremental indexing, I suppose you have to store some timestamp field in your 
index; but it's up to you.
Note that Lucene is very fast and I don't think that incremetal indexing is required 
for small or medium amout of data.



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 17:22
À : [EMAIL PROTECTED]
Objet : Help for text based indexing



Hi

I have implemented Text based search using lucene. I was wonderful playing around with 
it.

Now I want to enchance the application.

I have a Root folder, under that I have many other folder, that are group specific, 
say (group1, group2, .. so on). The Root folder is in C:\tomcat\webapps\Root and group 
folder within that.

Now I am index for these groups separately, ie , I have index as C:/index/group1, 
C:/index/group2, C:/index/group3 and so on

I want to know if I can have only one index for all these say C:/index/Root (this has 
index for all the folder) and I should be able to Search using 
C:\tomcat\webapps\Root\group1(if want to search for group1) similarly for the other 
groups.

Let me know if this is possible and have anybody tried this.

2nd question

Is lucene good to index databases ? How do we support incremental indexing ?

(Right now I am using LIKE for searching )

Thanks in Advance

Mahaveer



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Help for text based indexing

2004-09-14 Thread Cocula Remi
A keyword is not tokenized, that's why you wont be able to search over a part of it. 
You'd rather use a Text fied.

About creating a special field  : 

IndexWriter Ir = 

File f = 
Document  doc = new Document();
if (f.toString.startsWith("C:\tomcat\webapps\Root\Group1")
{
doc.add(Field.Text("group", "Group1"));
}
if (f.toString.startsWith("C:\tomcat\webapps\Root\Group2")
{
doc.add(Field.Text("group", "Group2"));
}
doc.add(Field.Text("content", getContent(f)));
Ir.addDocument(doc);



Then you can search in group1 with query like that : 

 group:Group1 AND rest_of_the_query.



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 18:03
À : Lucene Users List
Objet : RE: Help for text based indexing


Well in my case the path is KeyWord. I had tried that earlier and it does not seems to 
work in a single index file. 
 
Can you explain a bit more about adding group1 and group2 ?
 
Cocula Remi <[EMAIL PROTECTED]> wrote:
Well you could add a field to each of your Documents whose value would be either 
"group1" or "group2".
Or you could use the path to your files ...



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 17:49
À : [EMAIL PROTECTED]
Objet : RE: Help for text based indexing


I am clear with looping recursively to index all the file under Root folder.
But the problem is if I want to search only in group1 or group2.Is that possible to 
search only in one of the group folder ?

Cocula Remi wrote:
You just have to loop recurssively over the C:\tomcat\webapps\Root tree to create your 
index.
Yes you can index databases; you will just have to write a mechanism that is able to 
create org.apache.lucene.document.Document from database.
For instance : 
- connect JDBC
- run a query for obtaining a ResultSet
- loop for each row of that ResultSet :
Create a new org.apache.lucene.document.Document from ResultSet data
and add this document to the Index.
end loop.

For incremental indexing, I suppose you have to store some timestamp field in your 
index; but it's up to you.
Note that Lucene is very fast and I don't think that incremetal indexing is required 
for small or medium amout of data.



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 17:22
À : [EMAIL PROTECTED]
Objet : Help for text based indexing



Hi

I have implemented Text based search using lucene. I was wonderful playing around with 
it.

Now I want to enchance the application.

I have a Root folder, under that I have many other folder, that are group specific, 
say (group1, group2, .. so on). The Root folder is in C:\tomcat\webapps\Root and group 
folder within that.

Now I am index for these groups separately, ie , I have index as C:/index/group1, 
C:/index/group2, C:/index/group3 and so on

I want to know if I can have only one index for all these say C:/index/Root (this has 
index for all the folder) and I should be able to Search using 
C:\tomcat\webapps\Root\group1(if want to search for group1) similarly for the other 
groups.

Let me know if this is possible and have anybody tried this.

2nd question

Is lucene good to index databases ? How do we support incremental indexing ?

(Right now I am using LIKE for searching )

Thanks in Advance

Mahaveer



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Help for text based indexing

2004-09-14 Thread Cocula Remi
Well you could add a field to each of your Documents whose value would be either 
"group1" or "group2".
Or you could use the path to your files ...



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 17:49
À : [EMAIL PROTECTED]
Objet : RE: Help for text based indexing


I am clear with looping recursively to index all the file under Root folder.
But the problem is if I want to search only in group1 or group2.Is that possible to 
search only in one of the group folder ?
 
Cocula Remi <[EMAIL PROTECTED]> wrote:
You just have to loop recurssively over the C:\tomcat\webapps\Root tree to create your 
index.
Yes you can index databases; you will just have to write a mechanism that is able to 
create org.apache.lucene.document.Document from database.
For instance : 
- connect JDBC
- run a query for obtaining a ResultSet
- loop for each row of that ResultSet :
Create a new org.apache.lucene.document.Document from ResultSet data
and add this document to the Index.
end loop.

For incremental indexing, I suppose you have to store some timestamp field in your 
index; but it's up to you.
Note that Lucene is very fast and I don't think that incremetal indexing is required 
for small or medium amout of data.



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 17:22
À : [EMAIL PROTECTED]
Objet : Help for text based indexing



Hi

I have implemented Text based search using lucene. I was wonderful playing around with 
it.

Now I want to enchance the application.

I have a Root folder, under that I have many other folder, that are group specific, 
say (group1, group2, .. so on). The Root folder is in C:\tomcat\webapps\Root and group 
folder within that.

Now I am index for these groups separately, ie , I have index as C:/index/group1, 
C:/index/group2, C:/index/group3 and so on

I want to know if I can have only one index for all these say C:/index/Root (this has 
index for all the folder) and I should be able to Search using 
C:\tomcat\webapps\Root\group1(if want to search for group1) similarly for the other 
groups.

Let me know if this is possible and have anybody tried this.

2nd question

Is lucene good to index databases ? How do we support incremental indexing ?

(Right now I am using LIKE for searching )

Thanks in Advance

Mahaveer



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Help for text based indexing

2004-09-14 Thread Cocula Remi
You just have to loop recurssively over the C:\tomcat\webapps\Root tree to create your 
index.
Yes you can index databases; you will just have to write a mechanism that is able to 
create org.apache.lucene.document.Document from database.
For instance : 
- connect JDBC
- run a query for obtaining a ResultSet
- loop for each row of that ResultSet :
Create a new org.apache.lucene.document.Document from ResultSet data
and add this document to the Index.
end loop.

For incremental indexing, I suppose you have to store some timestamp field in your 
index; but it's up to you.
Note that Lucene is very fast and I don't think that incremetal indexing is required 
for small or medium amout of data.



-Message d'origine-
De : mahaveer jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 17:22
À : [EMAIL PROTECTED]
Objet : Help for text based indexing



Hi

I have implemented Text based search using lucene. I was wonderful playing around with 
it.

Now I want to enchance the application.

I have a Root folder, under that I have many other folder, that are group specific, 
say (group1, group2, .. so on). The Root folder is in C:\tomcat\webapps\Root and group 
folder within that.

Now I am index for these groups separately, ie , I have index as C:/index/group1, 
C:/index/group2, C:/index/group3 and so on

I want to know if I can have only one index for all these say C:/index/Root (this has 
index for all the folder) and I should be able to Search using 
C:\tomcat\webapps\Root\group1(if want to search for group1) similarly for the other 
groups.

Let me know if this is possible and have anybody tried this.

2nd question

Is lucene good to index databases ? How do we support incremental indexing ?

(Right now I am using LIKE for searching )

Thanks in Advance

Mahaveer



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search PharseQuery

2004-09-14 Thread Cocula Remi
Use QueryParser. 
please take a look at 
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
It's pretty clear.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 14 septembre 2004 11:26
À : 'Lucene Users List'
Objet : Search PharseQuery


Hi All,

 

How do I implement PharseQuery API? Pls send me some sample code.( How
can I handle "java is platform" as single word?

)

  

Regards,

Natarajan.

 

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: question on Hits.doc

2004-09-13 Thread Cocula Remi
Hi,

I recently had the same kind of problem but it was due to the way à was dealing with 
Hits.
Obtaining a Hits object from a Query is very fast. but then I was looping over ALL the 
hits to retrieve informations on the documents before displaying the result to the 
user.
It was not necessary because in my case, the display of search results is paginated.
Now I extract documents from Hits "on demand" (ie only the few ones I need to display 
a page of results). It's much more better.


-Message d'origine-
De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Envoyé : samedi 11 septembre 2004 00:20
À : [EMAIL PROTECTED]
Objet : question on Hits.doc


Hey guys,

We were noticing some speed problems on our searches and after adding some
debug statements to the lucene source code, we have determined that the
Hits.doc(x) is the problem.  (BTW, we are using Lucene 1.2 [with plans to
upgrade]).  It seems that retrieving the actual Document from the search is
very slow.

We think it might be our "Message" field which stores a huge amount of text. 
We are currently running a test in which we won't "store" the "Message" field,
however, I was wondering if any of you guys would know if that would be the
reason why we're having the performance problems?  If so, could anyone also
please explain it?  It seemed that we weren't having these performance
problems before.  Has anyone else experienced this?  Our environment is NT 4,
JDK 1.4.2, and PIIIs.

I know that for large text fields, storing the field is not a good practice,
however, it held certain conveniences for us that I hope to not get rid of.

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Existing Parsers

2004-09-09 Thread Cocula Remi
For Word see the tm-extractor at www.text-mining.org (based on POI). Pretty simple to 
use.


-Message d'origine-
De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Envoyé : jeudi 9 septembre 2004 15:47
À : Lucene Users List
Objet : Existing Parsers


Anyone know of any reliable parsers out there for pdf word 
excel or powerpoint?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Question about remote searching

2004-07-02 Thread Cocula Remi
Hi,

I am trying to do remote searching via RMI.
In a first step I wrote my own remote search method that should return results as an 
object of type Hits.
But it does not work as the Hit class is not Serializable.
Then I took a look at the RemoteSearchable class and realized that it implements 
search using the low level API (ie:  public void search(Query query, Filter filter, 
HitCollector results)).

Elsewhere in Lucene source code I read that using the high level API (those how deals 
with Hits) is much more efficient.

Question : would it be possible to make the Hit class Serializable so it could be used 
through RMI mechanisms ?


RE: Analysis of wildcard queries

2004-05-10 Thread Cocula Remi
You have to write a special analyzer that include an accent filter.
Then use this analyzer for both indexing and querying.

-Message d'origine-
De : Stephane James Vaucher [mailto:[EMAIL PROTECTED]
Envoyé : lundi 10 mai 2004 10:05
À : Lucene Users List
Objet : Analysis of wildcard queries


I've seen this:
http://www.jguru.com/faq/view.jsp?EID=538312

I've seen in the code that there is a method to set lowercasing, but I
need to remove accentuated chars as well. Any suggestions as to which is
preferable, preprocessing the input or subclassing a QueryParser and
redefining getWildcardQuery?

cheers,
sv


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: need info for database based Lucene but not flat file

2004-04-27 Thread Cocula Remi
As lucene implements its own concept of document it is not dedicated to index a 
particular type of data source.
It's up to you to write a tool that is able to browse your database and then submit 
the data as Lucene documents to the Lucene indexer.

For example if your database contains a "customer" entity and you want to index all 
informations about these customers, you can create a module that will perform a select 
on the customer table an for each row  returned create un Lucene Document and then add 
it to the indexWriter.
It is recommended that your Lucene Document contains a keyword Field  that represent 
the unique id of a customer in the database.

As a first step you should be familiar with the concept of Document and Field. See 
Lucene short intro documentation.


-Message d'origine-
De : Yukun Song [mailto:[EMAIL PROTECTED]
Envoyé : mardi 27 avril 2004 02:35
À : [EMAIL PROTECTED]
Objet : need info for database based Lucene but not flat file


As known, currently Lucene uses flat file to store information for
indexing. 

Any people has idea or resources for combining database (Like MySQL or
PostreSQL) and Lucene instead of current flat index file formats?

Regards,

Yukun Song



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: lucene usage without website

2004-03-24 Thread Cocula Remi
Lucene is not dedicated to a special application type. 
Your can integrate it's fonctionnalities in any program that can invoke java APIs.

In particular I don't think that Lucene can be invoked from an applet as the applet 
API does not permit to read and write local files.



-Message d'origine-
De : Pleasant, Tracy [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 24 mars 2004 17:41
À : Lucene Users List
Objet : lucene usage without website



I want to create a knowledgebase but it needs to be something that does
not require a server to run constantly (like with using jsp). I just
needs to run on the Windows platform.  Lucene works well with Windows
using an applet right?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Wildcards problem

2004-03-09 Thread Cocula Remi
As you seems to be French (So do I), I suppose that your classes AccentFilter, 
SpecialFilter and  PlurielFilter are dedicated to French documents analysis. 
I would be interested in these classes (could you send them to me ?) .
I was on the way to create an accent filter and propose it to the lucene sandbox.

-Message d'origine-
De : Stephane NOBILET [mailto:[EMAIL PROTECTED]
Envoyé : mardi 9 mars 2004 17:55
À : Lucene Users List
Objet : Re: Wildcards problem


index classique :

writer = new IndexWriter( indexPath , new MagicAnalyzer(), false );
writer.mergeFactor = 20;
writer.addDocument(PublicationDocument.getDocument(publi.getId(), filter) );
writer.close()

search :
QueryParser queryParser = new QueryParser( "content", new MagicAnalyzer() );
queryParser.setOperator( QueryParser.DEFAULT_OPERATOR_AND );
query = queryParser.parse( text );


dans MagicAnalyser : tokenStream () :
result = new AccentFilter(result);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopTable);
result = new SpecialFilter(result);
result = new PlurielFilter(result);

- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, March 09, 2004 5:45 PM
Subject: Re: Wildcards problem


> That doesn't sound right :(
> Can you send a self-sufficient code that adds an example document to
> the index and then runs the comp* query that shows this problem?
>
> Thanks,
> Otis
>
>
> --- Stephane NOBILET <[EMAIL PROTECTED]> wrote:
> > Hello !
> >
> > version : lucene 1.3 final
> >
> > I search : comptable, I find the document.
> > I search : compt*, I find so
> > but : comp*, I don' find my document.
> >
> > Have you meet this problem ?
> >
> > Thanks
> >
> > excuse me for my english...
> >
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]