Bug in QueryParser ?

2003-06-13 Thread Borkenhagen, Michael (ofd-ko zdfin)
I´ve got the following Exeption during my tests with a query like
word1 || word2 || word3
if one of the words, e.g. word2 is in the stopword - list of my Analyzer :

java.lang.ArrayIndexOutOfBoundsException: -1 < 0
at java.util.Vector.elementAt(Vector.java:427)
at
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java:171)
at
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:463)
at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:113)

I´m using Lucene 1.3 rc1.
Is this a Bug ?

Michael


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How to search for ':'

2003-06-13 Thread grohmann . andreas
Hey,

is there a way to search for phrases including the ':' character, e.g. in
file pathes.

Thanks

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How to get field contents

2003-06-13 Thread Ulrich Mayring
Hello,

I'd like to build a list with all values from a certain field that occur 
in an index. Looking at the API, there's a method getFieldNames(), but I 
already know the field name, I want to get a list of all the values. 
Also, I can enumerate fields by giving Terms, but a Term means I have to 
have the field name and value.

Can this be done?

Kind regards,

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Limited range in RangeQuery

2003-06-13 Thread Eric Jain
It seems that Lucene can't handle RangeQueries with a range of something
over 1024. Is this a limitation or a bug (or am I doing something
wrong)?

  +length:[null TO 01026] -> OK
  +length:[null TO 01027] -> BooleanQuery$TooManyClauses

  +mass:[001 TO 0011026] -> OK
  +mass:[001 TO 0011027] -> BooleanQuery$TooManyClauses

--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Bug in QueryParser ?

2003-06-13 Thread Otis Gospodnetic
Yes, this is a known bug.  It's in Bugzilla.

Otis

--- "Borkenhagen, Michael (ofd-ko zdfin)"
<[EMAIL PROTECTED]> wrote:
> I´ve got the following Exeption during my tests with a query like
> word1 || word2 || word3
> if one of the words, e.g. word2 is in the stopword - list of my
> Analyzer :
> 
> java.lang.ArrayIndexOutOfBoundsException: -1 < 0
> at java.util.Vector.elementAt(Vector.java:427)
> at
>
org.apache.lucene.queryParser.QueryParser.addClause(QueryParser.java:171)
> at
> org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:463)
> at
> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:113)
> 
> I´m using Lucene 1.3 rc1.
> Is this a Bug ?
> 
> Michael
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How to search for ':'

2003-06-13 Thread Otis Gospodnetic
Yes, using \ as the escape character, and an Analyzer that doesn't
discard \.
Look at TestQueryParser.java for examples.

Otis

--- [EMAIL PROTECTED] wrote:
> Hey,
> 
> is there a way to search for phrases including the ':' character,
> e.g. in
> file pathes.
> 
> Thanks
> 
> -- 
> +++ GMX - Mail, Messaging & more  http://www.gmx.net +++
> Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Limited range in RangeQuery

2003-06-13 Thread Eric Jain
> It seems that Lucene can't handle RangeQueries with a range of
> something over 1024.

Solved:

  BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE);

It seems Lucene needs to expand [1000 - 2000] into '1000 or 1001 or 1003
or ...' (assuming 1002 does not occur in the index). Correct?


--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: How to get field contents

2003-06-13 Thread Neil Couture

As far as I know you will have to go throught the index and verify the 
field field (that is not an error) of each term:

IndexReader reader  = IndexReader.open("index");
TermEnumte  = reader.terms(); 

while ( te.next() ) {

Term tt = te.term();

String fieldOfTerm = tt.field();

if ( fieldOfTerm.compareTo( "contents" ) ) {

do_something_silly()...

} else if ( fieldOfTerm.compareTo( "path" ) ) {

do_something_silly2()...
}
}


-neil




-Original Message-
From: Ulrich Mayring [mailto:[EMAIL PROTECTED]
Sent: 13 juin, 2003 08:14
To: [EMAIL PROTECTED]
Subject: How to get field contents


Hello,

I'd like to build a list with all values from a certain field that occur 
in an index. Looking at the API, there's a method getFieldNames(), but I 
already know the field name, I want to get a list of all the values. 
Also, I can enumerate fields by giving Terms, but a Term means I have to 
have the field name and value.

Can this be done?

Kind regards,

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How to get field contents

2003-06-13 Thread Ulrich Mayring
Neil Couture wrote:
As far as I know you will have to go throught the index and verify the 
field field (that is not an error) of each term:
Works like a charm, many thanks :)

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: How to get field contents

2003-06-13 Thread Doug Cutting
This can be done more efficiently if you only want to enumerate the 
terms of a particular field.  Term enumerations are ordered first by 
field, then by the term text.  You can also specify the initial position 
of a term enumeration.  Thus an efficient enumeration of the terms in 
"myField" can be done with something like:

  IndexReader reader = IndexReader.open("index");
  TermEnum te = reader.terms(new Term("myField", ""));
  while (te.term() != null && "myField".equals(te.term().field())) {
... do something silly ...
te.next();
  }
Doug

Neil Couture wrote:
As far as I know you will have to go throught the index and verify the 
field field (that is not an error) of each term:

			IndexReader reader 	= IndexReader.open("index");
			TermEnum 	te 		= reader.terms(); 
			
			while ( te.next() ) {

Term tt = te.term();

String fieldOfTerm = tt.field();

if ( fieldOfTerm.compareTo( "contents" ) ) {

do_something_silly()...

} else if ( fieldOfTerm.compareTo( "path" ) ) {
do_something_silly2()...
}
}
-neil



-Original Message-
From: Ulrich Mayring [mailto:[EMAIL PROTECTED]
Sent: 13 juin, 2003 08:14
To: [EMAIL PROTECTED]
Subject: How to get field contents
Hello,

I'd like to build a list with all values from a certain field that occur 
in an index. Looking at the API, there's a method getFieldNames(), but I 
already know the field name, I want to get a list of all the values. 
Also, I can enumerate fields by giving Terms, but a Term means I have to 
have the field name and value.

Can this be done?

Kind regards,

Ulrich



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: How to get field contents

2003-06-13 Thread Neil Couture
thanx for the precision.

-neil

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: 13 juin, 2003 13:28
To: Lucene Users List
Subject: Re: How to get field contents


This can be done more efficiently if you only want to enumerate the 
terms of a particular field.  Term enumerations are ordered first by 
field, then by the term text.  You can also specify the initial position 
of a term enumeration.  Thus an efficient enumeration of the terms in 
"myField" can be done with something like:

   IndexReader reader = IndexReader.open("index");
   TermEnum te = reader.terms(new Term("myField", ""));
   while (te.term() != null && "myField".equals(te.term().field())) {
 ... do something silly ...
 te.next();
   }

Doug


Neil Couture wrote:
> As far as I know you will have to go throught the index and verify the 
> field field (that is not an error) of each term:
> 
>   IndexReader reader  = IndexReader.open("index");
>   TermEnumte  = reader.terms(); 
>   
>   while ( te.next() ) {
>   
>   Term tt = te.term();
>   
>   String fieldOfTerm = tt.field();
>   
>   if ( fieldOfTerm.compareTo( "contents" ) ) {
> 
>   do_something_silly()...
>   
>   } else if ( fieldOfTerm.compareTo( "path" ) ) {
> 
>   do_something_silly2()...
>   }
>   }
> 
> 
> -neil
> 
> 
> 
> 
> -Original Message-
> From: Ulrich Mayring [mailto:[EMAIL PROTECTED]
> Sent: 13 juin, 2003 08:14
> To: [EMAIL PROTECTED]
> Subject: How to get field contents
> 
> 
> Hello,
> 
> I'd like to build a list with all values from a certain field that occur 
> in an index. Looking at the API, there's a method getFieldNames(), but I 
> already know the field name, I want to get a list of all the values. 
> Also, I can enumerate fields by giving Terms, but a Term means I have to 
> have the field name and value.
> 
> Can this be done?
> 
> Kind regards,
> 
> Ulrich
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Strange problem while indexing?

2003-06-13 Thread Rishabh Bajpai

i am using lucene to index xml+html files. the xml contains the metadata associated 
with the html file.

the process, at a high level, is: 
-create a list of all xml files in a folder
-parse through each of the xml file using SAX parser
-create name:value pairs out of the tags and values, and index them
-one of the tag contains the url to the html page
-when you encounter that, parse the html file

when i do this for a few files, it seems to work fine. however, as the number of files 
increase, it starts to throw an error!
initially, i get a "SAXException: Content is not allowed in trailing section." - but i 
checked and the xml file seems to be well-formed! i even tried indexing this file 
individually, and it worked!
then i get "Index locked for write: Lock@/export/home.../write.lock"
at times, i also get a "Timed out waiting for: Lock@/export/home/.../commit.lock"

as a result of this, the index doesnt get updates and the results are incorrect. i 
also observed once that while the index is being built, i get the results, but when it 
exits, i stop getting results. possibly, my hunch is that index updation didnot get 
commited?

what is particularly intersting to note is that this problem occurs at only some 
times. another observation is that it worked fine for around 50 files, but not for 
about 100 files?

can anyone help me - or give pointers as to what is going on here?

-rishabh


 



Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]