Re: Multiuser environments

2003-07-14 Thread Tatu Saloranta
On Monday 14 July 2003 08:52, Guilherme Barile wrote:
> Hi
> I'm writing a web application which will index files using
> textmining to extract text and lucene to store it. I do have the
> following implementation questions:
>
> 1) Only one user can write to an index at each time. How are you people
> dealing with this ? Maybe some kind of connection pooling ?

Two obvious candidates are locking bottleneck methods and doing index
writing in a critical section, or having a background thread that does
reindexing, and other threads add requests to a queue. In CMS I'm working we 
are doing the latter (so as not to block actual request threads which could
happen with first approach, adding/deleting documents is done as 
post-processing when documents are created/edited/deleted).

In either case you usually have a singleton instance that represents the 
search engine functionality (assuming single index), and from there on it's 
reasonably easy to reuse IndexReader as necessary.

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Delete files

2003-07-14 Thread Victor Hadianto
On Mon, 14 Jul 2003 07:11 pm, [EMAIL PROTECTED] wrote:
> Hi, I am new in Lucene. I have a problem with my code. Somebody can help me
> why I can't delete some files.Maybe I missing something. Thanks in advance.

You have IndexWriter opened while trying to delete document. Use IndexReader 
to delete document and then open IndexWriter to optimize.

> Regards,
>
> Michel
victor


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Files getting deleted when optimize is killed?

2003-07-14 Thread Steve Rajavuori
Upon further examination what I found is this:

- Killing the process while optimize() is still working does NOT cause the
index files to be deleted, HOWEVER --

- Once the index is opened again by a new process (now apparently in an
unstable state due to the incomplete optimize()), at that time all existing
files are deleted and only a file called "segments" remains.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Saturday, July 12, 2003 7:06 AM
To: Lucene Users List
Subject: Re: Files getting deleted when optimize is killed?



--- Steve Rajavuori <[EMAIL PROTECTED]> wrote:
> I've had a problem on several occasions where my entire index is
> deleted --
> that is, EVERY file (except 'segments') is gone. There were many
> users on
> the system each time, so its a little hard to tell for sure what was
> going
> on, but my theory is this:
> 
> My code will automatically call optimize( ) periodically. Because the
> index
> is very large, it can take a long time. It looks like an
> administrator may
> have killed my process, and its possible that it was killed while an
> optimize( ) was in progress.
> 
> I have two questions:
> 
> 1) Does anyone know if killing an optimize( ) in progress could wipe
> out all
> files like this? (New index created in temporary files that were not
> saved
> properly, while old index files were already deleted???)

I highly doubt it.

> 2) Does anyone know of any other way all files in an index could be
> inadvertently deleted (e.g. through killing a process)? For example,
> if you
> kill the process during an 'add' would that cause all files to be
> deleted?

Same as above.  You can create an artificial, large index for testing
purposes.  Call optimize once in a while, and then kill the process.  I
don't think Lucene will remove your files.

Otis


__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Luke - Lucene Index Browser

2003-07-14 Thread Andrzej Bialecki
Scott Ganyo wrote:
Nifty cool!  I'm gonna like this, I can tell already!

I'm having a really hard time actually using Luke, though, as all the 
window panes and table columns are apparently of fixed size.  Do you 
think you could through in the ability to resize the various window 
panes and table columns?  This would make the tool truly useful.  Pretty 
please? :)
Well, you can resize the main window.. :-) Regarding column sizes: 
that's a limitation of the GUI toolkit (soon to be fixed) - if you can 
wait patiently a couple weeks for the new release of that toolkit, I can 
add this as well...

In any case, if you're referring to the "Search" panel, then you can 
always double-click on one of the search results, and it will be 
displayed in the "Documents" panel, where you can not only see all the 
fields, but also copy them to clipboard...

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Luke - Lucene Index Browser

2003-07-14 Thread Scott Ganyo
Nifty cool!  I'm gonna like this, I can tell already!

I'm having a really hard time actually using Luke, though, as all the 
window panes and table columns are apparently of fixed size.  Do you 
think you could through in the ability to resize the various window 
panes and table columns?  This would make the tool truly useful.  Pretty 
please? :)

Thanks,
Scott
Andrzej Bialecki wrote:

Dear Lucene Users,

Luke is a diagnostic tool for Lucene 
(http://jakarta.apache.org/lucene) indexes. It enables you to browse 
documents in existing indexes, perform queries, navigate through 
terms, optimize indexes and more.

Please go to http://www.getopt.org/luke and give it a try. A Java 
WebStart version will be available soon.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Luke - Lucene Index Browser

2003-07-14 Thread Andrzej Bialecki
Dear Lucene Users,

Luke is a diagnostic tool for Lucene (http://jakarta.apache.org/lucene) 
indexes. It enables you to browse documents in existing indexes, perform 
queries, navigate through terms, optimize indexes and more.

Please go to http://www.getopt.org/luke and give it a try. A Java 
WebStart version will be available soon.

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Multiuser environments

2003-07-14 Thread Guilherme Barile
Hi
I'm writing a web application which will index files using
textmining to extract text and lucene to store it. I do have the
following implementation questions:

1) Only one user can write to an index at each time. How are you people
dealing with this ? Maybe some kind of connection pooling ?

2) OutOfMemory issues - I've read about many on the list, which
workarounds are you using?

Thanks in advance

gui



Multiuser environments

2003-07-14 Thread Guilherme Barile
Hi
I'm writing a web application which will index files using
textmining to extract text and lucene to store it. I do have the
following implementation questions:

1) Only one user can write to an index at each time. How are you people
dealing with this ? Maybe some kind of connection pooling ?

2) OutOfMemory issues - I've read about many on the list, which
workarounds are you using?

Thanks in advance

gui



Field.Text(string, string) and Queryparser

2003-07-14 Thread di99mwo
Hello

When I use QueryParser.parse(String query,String field,Analyzer analyzer) and 
have added the field with the type Field.Text(string, string), I can't search 
in specific field like

component:call
   
It can't find the word call in the field component.
But if I instead use Field.Text(string, Reader) the query works.


Another problem is that I have to use Field.Text(string, string) if I want to 
know if the hits contain field with the name component. Then I use this code:

Vector component = new Vector();
if(doc.getField("component")!=null){
component.addElement(doc);
}
To use getField(string) I have to use Field.Text(string, string) 

When I display the hits I want to divide them in catagory according to the 
field name. If I have field called component, interface and datatypes I want to 
display the hits like:

component
hit1
hit2
hit3

interface
hit4

data type
hit5

But I also want to search in a specific field at the samt time.

Help me, please.

I hope that I have explained my problem so you can understand it.

Thanks
/Michelle











-
This mail sent through IMP: http://horde.org/imp/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: HELP in QueyParsing !!

2003-07-14 Thread Bharatbhushan_Shetty
Thanks Victor. I'll look into your earlier postings for the solution. 
But I was wandering, there might be many more scenarios what a user
might search for. 

-Original Message-
From: Victor Hadianto [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 14, 2003 2:02 PM
To: Lucene Users List
Subject: Re: HELP in QueyParsing !!


> Input:   QueryCreated Remarks
> c\+\+  c   (Escape character not working)

The StandardTokenizer and QueryParser will drop the ++ sign. This
problem is 
similar to the recent thread. Search the archive the the following
strings '-' characer not interpreted correctly in field names

You may be able to implement similar solution to the one that I've
posted. 

Actually your query got me interested, I've tried my solution for c--
and the 
-- signs are dropped. This because I define DASHESWORD as 

|  ("-" )+ >

This will search for t-shirt, but not tshirt-. Yet another QueryParser 
peculiarity :)

If you absolutely has to search for c++ then I suggest you define
another 
token which encompasses all alpharnumeric word and plus sign. For
example 
(modify StandardTokenizer.jj):

|"+")+ >

add the line:

token = 

in the next() method. This may work.

> c++-   (Parser throws an exception) [NOTE-1]
As expected.

> *c -   (throws an exception -   [NOTE-2]
There has been a number of discussion on this subject, search the
mailing list 
for more information. 

> Does that mean that the program should taken care of validating the 
> User input and then pass the query string to QueryParser?

Depends how do you look at it. QueryParser will throw ParseException if
it has 
parsing issues, you can in some way treat this as the validation.


HTH,
victor


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Delete files

2003-07-14 Thread MMachado
Hi, I am new in Lucene. I have a problem with my code. Somebody can help me
why I can't delete some files.Maybe I missing something. Thanks in advance.

Regards,

Michel

 

Here the two code that I use for indexing files and for delete files: the
error message is the following:  java.io.IOException: Index locked for
write: [EMAIL PROTECTED]:\yo\write.lock

  at org.apache.lucene.index.IndexReader.delete(Unknown Source)

  at indice.deletefile.main(deletefile.java:15)

Exception in thread "main"

 

 

package indice;

 

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.analysis.SimpleAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import java.io.*;

 

 

public class IndexarFile {

  public static void main(String[] args) throws Exception{

String indexPath = "c:\\yo";

IndexWriter writer;

String f = "c:\\PROCEDURES\\BCNDMPRD.xls";

writer = new IndexWriter(indexPath, new SimpleAnalyzer(),
false);



//for (int i=0; i

Re: HELP in QueyParsing !!

2003-07-14 Thread Victor Hadianto
> Input:   QueryCreated Remarks
> c\+\+  c   (Escape character not working)

The StandardTokenizer and QueryParser will drop the ++ sign. This problem is 
similar to the recent thread. Search the archive the the following strings
'-' characer not interpreted correctly in field names

You may be able to implement similar solution to the one that I've posted. 

Actually your query got me interested, I've tried my solution for c-- and the 
-- signs are dropped. This because I define DASHESWORD as 

|  ("-" )+ >

This will search for t-shirt, but not tshirt-. Yet another QueryParser 
peculiarity :)

If you absolutely has to search for c++ then I suggest you define another 
token which encompasses all alpharnumeric word and plus sign. For example 
(modify StandardTokenizer.jj):

|"+")+ >

add the line:

token = 

in the next() method. This may work.

> c++-   (Parser throws an exception) [NOTE-1]
As expected.

> *c -   (throws an exception -   [NOTE-2]
There has been a number of discussion on this subject, search the mailing list 
for more information. 

> Does that mean that the program should taken care of validating the
> User input and then pass the query string to QueryParser?

Depends how do you look at it. QueryParser will throw ParseException if it has 
parsing issues, you can in some way treat this as the validation.


HTH,
victor


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



HELP in QueyParsing !!

2003-07-14 Thread Bharatbhushan_Shetty
Hi 

Need some help in queryparsing. There are few things that doesn't
Seem to work as I expected. Have a look at the code at the end 
Before reading my observations.

Document contains following information:
D1 = c++ hello bharat
D2 = c hello sharat 
D3 = hello bharat

Observations

Input:   QueryCreated Remarks
c\+\+  c   (Escape character not working)
c++-   (Parser throws an exception) [NOTE-1]
c* c*  (Wild card works perfectly fine)
*c -   (throws an exception -   [NOTE-2]
   (org.apache.lucene.queryParser.TokenMgrError:)
"c -   (throws an exception - [NOTE-3]
Hello ""   -   (throws an exception)


[NOTE-1] : - ( ) { } ! [ ] etc characters behave in the same manner 
  as "+" shown above.
[NOTE-2] : Looks like wildcard cannot be the first character of the
   query

[NOTE-3] : I guess this validation can be done after accepting user
   input. 


My Comments/Questions
=
Does that mean that the program should taken care of validating the 
User input and then pass the query string to QueryParser?

If yes, I guess there might be some more validations that should be 
Done that I have missed out. Can anyone throw some light on those
Validations that the program should take care?


Code


import java.io.IOException;
import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.*;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.search.*;

public class TestQueryParser
{
  public static void main(String[] argv)
  {
try
{
  IndexWriter writer = new IndexWriter("indexbbs", new
StandardAnalyzer(), true);
  
  Document d1 = new Document();
  d1.add(Field.Text("f1", "c++ hello bharat"));
  writer.addDocument(d1);
  
  Document d2 = new Document();
  d2.add(Field.Text("f1", "c hello sharat"));
  writer.addDocument(d2);
  
  Document d3 = new Document();
  d3.add(Field.Text("f1", "hello bharat"));
  writer.addDocument(d3);
  
  writer.optimize();
  writer.close();
  
  String qString = "";
  try
  {
BufferedReader in = new BufferedReader(new
InputStreamReader(System.in));
System.out.print("Input for f1: ");
qString = in.readLine();
  }
  catch(Exception e)
  { System.out.println("Exiting..." + e.getMessage()); return; }
  
  System.out.println("");
  
  Searcher searcher = new IndexSearcher("indexbbs");
  Analyzer analyzer = new StandardAnalyzer();

  QueryParser qp = new QueryParser("f1", analyzer);
  
  Query query = qp.parse(qString);
 
  System.out.println("QueryInput:" + qString);
  System.out.println("QueryCreated:" + query.toString("f1"));
  
  Hits hits = searcher.search(query);
  for (int i=0; i