document with parent-child relationship

2011-04-29 Thread svonec
Hello,

I need an advice on how to create an document that has parent-child
relationship. Here is an example:

low pressure - engine
  - wheel
  - 

low pressure string is the parent and engine and wheel are
children. I'd like to be able to search strings such as low pressure
in engine or just low or engine and the result should be an ID of
the parent. How do I create fields in the lucene document to express
this relationship?

Any advice appreciated.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: document with parent-child relationship

2011-04-29 Thread harsh srivastava
Hi,

You can create three fields for a document to index e.g.

Fields = parent_id   parent_textchild_text
Contents =1  low pressure   engine wheel,
etc
  2  Electronics laptop
pc ...


Hope it helps.

Harsh


On Fri, Apr 29, 2011 at 12:59 PM, svo...@gmail.com wrote:

 Hello,

 I need an advice on how to create an document that has parent-child
 relationship. Here is an example:

 low pressure - engine
  - wheel
  - 

 low pressure string is the parent and engine and wheel are
 children. I'd like to be able to search strings such as low pressure
 in engine or just low or engine and the result should be an ID of
 the parent. How do I create fields in the lucene document to express
 this relationship?

 Any advice appreciated.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Lucene 3.0.3 with debug information

2011-04-29 Thread Paul Taylor
Is there a built debug version of lucene 3.0.3 so I can profile it 
properly to find what part of the search is taking the time.


Note:Ive already profiled by application and determined that it is the 
lucene/Search that is taking the time, I also had another attempt using 
luke but find it incredibly buggy and of little use.


thanks Paul

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Are Okapi BM25 scores normalized into 0 and 1 ?

2011-04-29 Thread Patrick Diviacco
Can anybody provide me some information about it ? Even a small clue, I'm
kinda stuck on this and the owner of the libraries do not answer emails.

Thanks


On 28 April 2011 13:49, Patrick Diviacco patrick.divia...@gmail.com wrote:

 Is Okapi BM25 (its implementation in Lucene:
 nlp.uned.es/~jperezi/Lucene-BM25) returning back normalized query scores
 (in between 0 and 1) ?

 According to Okapi formula the final score should be normalized. Could you
 give some information about that ?

 thanks





Re: SorterTemplate.quickSort causes StackOverflowError

2011-04-29 Thread Otis Gospodnetic
Hi,

OK, so it looks like it's not MemoryIndex and its Comparator that are funky.  
After switching from quickSort call in MemoryIndex to mergeSort, the problem 
persists:

'1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=, total cpu 
time=497060.ms user time=495210.msat 
org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:105) 

at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) 
at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) 
at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
So something else is calling quickSort when it gets stuck.  Weirdly, when I get 
a thread dump and get the above, I don't see the original caller.  Maybe 
because 
the stack is already too deep and the printout is limited to N lines per call 
stack?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Uwe Schindler u...@thetaphi.de
 To: java-user@lucene.apache.org
 Sent: Thu, April 28, 2011 5:54:44 PM
 Subject: RE: SorterTemplate.quickSort causes StackOverflowError
 
  Thanks for confirming, Javier! :)
  
  Uwe, I assume you are  referring to this line 528 in MemoryIndex?
  
   528 if (size  1) ArrayUtil.quickSort(entries,  termComparator);
  
  And this funky Comparator from  MemoryIndex:
  
  208   private static final  ComparatorObject termComparator = new
  ComparatorObject()  {
  209  @SuppressWarnings(unchecked)
  210 public  int compare(Object o1, Object o2) {
  211if (o1 instanceof Map.Entry?,?) o1 =  ((Map.Entry?,?)
  o1).getKey();
  212if (o2 instanceof Map.Entry?,?) o2 =  ((Map.Entry?,?)
  o2).getKey();
  213if (o1 == o2) return 0;
  214return ((Comparable) o1).compareTo((Comparable) o2);
   215 }
  216   };
  
   Will try, thanks!
 
 Yeah, simply try with mergeSort in line 528. If that  helps, this comparator
 is buggy.
 
 Uwe
 
 
  - Original  Message 
   From: Uwe Schindler u...@thetaphi.de
   To: java-user@lucene.apache.org
Sent: Thu, April 28, 2011 5:36:13 PM
   Subject: RE:  SorterTemplate.quickSort causes StackOverflowError
  
   Hi  Otis,
  
   Can you reproduce this somehow and send test  code? I could look  into
   it. I don't expect the error in the  quicksort algorithm itself as this
   one is used e.g. BytesRefHash /  TermsHash, if there is a bug we would
   have  seen it long time  ago.
  
   I have not seen this before, but I suspect  a  problem in this very
   strange comparator in MemoryIndex  (which is very broken,  if you look
   at its code - it can  compare Strings with Map.Entry and so on,
   b), maybe the  comparator is not stable? In this case, quicksort
   can  easily  loop endless and stack overflow. In Lucene 3.0 this used
   stock  java  sort (which is mergesort), maybe replace the
ArrayUtils.quickSort my  ArrayUtils.mergeSort() and see if problem  is
 still
  there?
  
   Uwe
  
-
   Uwe Schindler
   H.-H.-Meier-Allee 63,  D-28213  Bremen
   http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
   
-Original  Message-
From:  Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
  Sent: Thursday, April 28, 2011 11:17 PM
To: java-user@lucene.apache.org
  Subject: SorterTemplate.quickSort causes  StackOverflowError
   
 Hi,

I'm looking at some code that uses MemoryIndex (Lucene  3.1)  and
that's exhibiting a strange behaviour - it  slows down over  time.
The MemoryIndex contains 1 doc, of  course, and executes a set of a
few thousand queries against  it.  The set of queries does not
change - the
same
set of queries gets executed on all incoming   documents.
This code runs very quickly. in the  beginning.   But  with time is
 gets
slower and  slower and slower. and then I get  this:
   
 4/28/11 10:32:52 PM (S) SolrException.log  :
 java.lang.StackOverflowError
at

   org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
  at
   
   org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
  at
   
 org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:
 104)
   
I haven't profiled this code  yet (remote server, firewall in
between,
   can't  use
YourKit...), but does the above look familiar to   anyone?
I've looked at the code and obviously there is the  recursive  call
that's problematic here - it looks like  the recursion just gets
deeper and deeper
and
gets stuck, eventually getting too deep for  the  JVM's taste.
   
Thanks,
 Otis

 Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch Lucene
ecosystem  search :: http://search-lucene.com/
   
   

 
 - To  

Re: Are Okapi BM25 scores normalized into 0 and 1 ?

2011-04-29 Thread Paul Libbrecht
Patrick if the question is about the code snippert at the page you mention, 
which I copy below, I believe the answer is no and the author is aware of it 
since he is adding a comment about not-normalized in the second example.

ScoreDocs and TopDocs are not returning normalized scores.
Normalized scores tend to be rare in Lucene nowadays, I believe earlier 
strategy was to divide by max-score when the latter was bigger than 1.

paul

IndexSearcher searcher = new IndexSearcher(IndexPath);

//Load average length
BM25Parameters.load(avgLengthPath);
BM25BooleanQuery query = new BM25BooleanQuery(This is my Query, 
Search-Field,
new StandardAnalyzer());

TopDocs top = searcher.search(query, null, 10);
ScoreDoc[] docs = top.scoreDocs;

//Print results
for (int i = 0; i $$ top.scoreDocs.length; i++) {
  System.out.println(docs[i].doc + :+docs[i].score);
}


Le 29 avr. 2011 à 13:20, Patrick Diviacco a écrit :

 Can anybody provide me some information about it ? Even a small clue, I'm
 kinda stuck on this and the owner of the libraries do not answer emails.
 
 Thanks
 
 
 On 28 April 2011 13:49, Patrick Diviacco patrick.divia...@gmail.com wrote:
 
 Is Okapi BM25 (its implementation in Lucene:
 nlp.uned.es/~jperezi/Lucene-BM25) returning back normalized query scores
 (in between 0 and 1) ?
 
 According to Okapi formula the final score should be normalized. Could you
 give some information about that ?
 
 thanks
 
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: SorterTemplate.quickSort causes StackOverflowError

2011-04-29 Thread jm
maybe http://youdebug.kenai.com/ could be useful. If you are lucky you could
get it to set a breakpoint when the recursive call has reached depth X.

On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Hi,

 OK, so it looks like it's not MemoryIndex and its Comparator that are
 funky.
 After switching from quickSort call in MemoryIndex to mergeSort, the
 problem
 persists:

 '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=, total cpu
 time=497060.ms user time=495210.msat
 org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:105)

 at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
 at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
 at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
 So something else is calling quickSort when it gets stuck.  Weirdly, when I
 get
 a thread dump and get the above, I don't see the original caller.  Maybe
 because
 the stack is already too deep and the printout is limited to N lines per
 call
 stack?

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Uwe Schindler u...@thetaphi.de
  To: java-user@lucene.apache.org
  Sent: Thu, April 28, 2011 5:54:44 PM
  Subject: RE: SorterTemplate.quickSort causes StackOverflowError
 
   Thanks for confirming, Javier! :)
  
   Uwe, I assume you are  referring to this line 528 in MemoryIndex?
  
528 if (size  1) ArrayUtil.quickSort(entries,
  termComparator);
  
   And this funky Comparator from  MemoryIndex:
  
   208   private static final  ComparatorObject termComparator = new
   ComparatorObject()  {
   209  @SuppressWarnings(unchecked)
   210 public  int compare(Object o1, Object o2) {
   211if (o1 instanceof Map.Entry?,?) o1 =
  ((Map.Entry?,?)
   o1).getKey();
   212if (o2 instanceof Map.Entry?,?) o2 =
  ((Map.Entry?,?)
   o2).getKey();
   213if (o1 == o2) return 0;
   214return ((Comparable) o1).compareTo((Comparable) o2);
215 }
   216   };
  
Will try, thanks!
 
  Yeah, simply try with mergeSort in line 528. If that  helps, this
 comparator
  is buggy.
 
  Uwe
 
 
   - Original  Message 
From: Uwe Schindler u...@thetaphi.de
To: java-user@lucene.apache.org
 Sent: Thu, April 28, 2011 5:36:13 PM
Subject: RE:  SorterTemplate.quickSort causes StackOverflowError
   
Hi  Otis,
   
Can you reproduce this somehow and send test  code? I could look
  into
it. I don't expect the error in the  quicksort algorithm itself as
 this
one is used e.g. BytesRefHash /  TermsHash, if there is a bug we
 would
have  seen it long time  ago.
   
I have not seen this before, but I suspect  a  problem in this very
strange comparator in MemoryIndex  (which is very broken,  if you
 look
at its code - it can  compare Strings with Map.Entry and so on,
b), maybe the  comparator is not stable? In this case, quicksort
can  easily  loop endless and stack overflow. In Lucene 3.0 this used
stock  java  sort (which is mergesort), maybe replace the
 ArrayUtils.quickSort my  ArrayUtils.mergeSort() and see if problem
  is
  still
   there?
   
Uwe
   
 -
Uwe Schindler
H.-H.-Meier-Allee 63,  D-28213  Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
   

 -Original  Message-
 From:  Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
   Sent: Thursday, April 28, 2011 11:17 PM
 To: java-user@lucene.apache.org
   Subject: SorterTemplate.quickSort causes  StackOverflowError

  Hi,
 
 I'm looking at some code that uses MemoryIndex (Lucene  3.1)  and
 that's exhibiting a strange behaviour - it  slows down over  time.
 The MemoryIndex contains 1 doc, of  course, and executes a set of a
 few thousand queries against  it.  The set of queries does not
 change - the
 same
 set of queries gets executed on all incoming   documents.
 This code runs very quickly. in the  beginning.   But  with
 time is
  gets
 slower and  slower and slower. and then I get  this:

  4/28/11 10:32:52 PM (S) SolrException.log  :
  java.lang.StackOverflowError
 at
 
  
  org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
   at

  
  org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
   at


  org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:
  104)

 I haven't profiled this code  yet (remote server, firewall in
 between,
can't  use
 YourKit...), but does the above look familiar to   anyone?
 I've looked at the code and obviously there is the  recursive  call
 that's problematic here - it looks like  the recursion just gets
   

Re: SorterTemplate.quickSort causes StackOverflowError

2011-04-29 Thread Dawid Weiss
Don't know if this helps, but debugging stuff like this I simply add a
(manually inserted or aspectj-injected) recursion count, add a breakpoint
inside an if checking for recursion count  X and run the vm with an
attached socket debugger. This lets you run at (nearly) full speed and once
you hit the breakpoint, inspect the stack, variables, etc...

Dawid

On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Hi,

 OK, so it looks like it's not MemoryIndex and its Comparator that are
 funky.
 After switching from quickSort call in MemoryIndex to mergeSort, the
 problem
 persists:

 '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=, total cpu
 time=497060.ms user time=495210.msat
 org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:105)

 at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
 at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
 at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
 So something else is calling quickSort when it gets stuck.  Weirdly, when I
 get
 a thread dump and get the above, I don't see the original caller.  Maybe
 because
 the stack is already too deep and the printout is limited to N lines per
 call
 stack?

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Uwe Schindler u...@thetaphi.de
  To: java-user@lucene.apache.org
  Sent: Thu, April 28, 2011 5:54:44 PM
  Subject: RE: SorterTemplate.quickSort causes StackOverflowError
 
   Thanks for confirming, Javier! :)
  
   Uwe, I assume you are  referring to this line 528 in MemoryIndex?
  
528 if (size  1) ArrayUtil.quickSort(entries,
  termComparator);
  
   And this funky Comparator from  MemoryIndex:
  
   208   private static final  ComparatorObject termComparator = new
   ComparatorObject()  {
   209  @SuppressWarnings(unchecked)
   210 public  int compare(Object o1, Object o2) {
   211if (o1 instanceof Map.Entry?,?) o1 =
  ((Map.Entry?,?)
   o1).getKey();
   212if (o2 instanceof Map.Entry?,?) o2 =
  ((Map.Entry?,?)
   o2).getKey();
   213if (o1 == o2) return 0;
   214return ((Comparable) o1).compareTo((Comparable) o2);
215 }
   216   };
  
Will try, thanks!
 
  Yeah, simply try with mergeSort in line 528. If that  helps, this
 comparator
  is buggy.
 
  Uwe
 
 
   - Original  Message 
From: Uwe Schindler u...@thetaphi.de
To: java-user@lucene.apache.org
 Sent: Thu, April 28, 2011 5:36:13 PM
Subject: RE:  SorterTemplate.quickSort causes StackOverflowError
   
Hi  Otis,
   
Can you reproduce this somehow and send test  code? I could look
  into
it. I don't expect the error in the  quicksort algorithm itself as
 this
one is used e.g. BytesRefHash /  TermsHash, if there is a bug we
 would
have  seen it long time  ago.
   
I have not seen this before, but I suspect  a  problem in this very
strange comparator in MemoryIndex  (which is very broken,  if you
 look
at its code - it can  compare Strings with Map.Entry and so on,
b), maybe the  comparator is not stable? In this case, quicksort
can  easily  loop endless and stack overflow. In Lucene 3.0 this used
stock  java  sort (which is mergesort), maybe replace the
 ArrayUtils.quickSort my  ArrayUtils.mergeSort() and see if problem
  is
  still
   there?
   
Uwe
   
 -
Uwe Schindler
H.-H.-Meier-Allee 63,  D-28213  Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
   

 -Original  Message-
 From:  Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
   Sent: Thursday, April 28, 2011 11:17 PM
 To: java-user@lucene.apache.org
   Subject: SorterTemplate.quickSort causes  StackOverflowError

  Hi,
 
 I'm looking at some code that uses MemoryIndex (Lucene  3.1)  and
 that's exhibiting a strange behaviour - it  slows down over  time.
 The MemoryIndex contains 1 doc, of  course, and executes a set of a
 few thousand queries against  it.  The set of queries does not
 change - the
 same
 set of queries gets executed on all incoming   documents.
 This code runs very quickly. in the  beginning.   But  with
 time is
  gets
 slower and  slower and slower. and then I get  this:

  4/28/11 10:32:52 PM (S) SolrException.log  :
  java.lang.StackOverflowError
 at
 
  
  org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
   at

  
  org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
   at


  org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:
  104)

 I haven't profiled this code  yet (remote server, firewall in
 between,
can't  use
  

ComplexPhraseQueryParser with multiple fields

2011-04-29 Thread Chris Salem
Hi,
I've just started using the ComplexPhraseQueryParser and it works great with 
one field but is there a way for it to work with multiple fields?  For example, 
right now the query:
job_title: sales man* AND NOT contact_name: Chris Salem
throws this exception 
Caused by: org.apache.lucene.queryParser.ParseException: Cannot have clause for 
field job_title nested in phrase for field contact_name
What is the best way to work around this?
Sincerely,
Chris Salem


Re: Lucene 3.0.3 with debug information

2011-04-29 Thread Simon Willnauer
Hey paul,

you can simply checkout the tag or download the sources right?
http://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/
or http://ftp.download-by.net/apache//lucene/java/3.0.3/

simon

On Fri, Apr 29, 2011 at 1:09 PM, Paul Taylor paul_t...@fastmail.fm wrote:
 Is there a built debug version of lucene 3.0.3 so I can profile it properly
 to find what part of the search is taking the time.

 Note:Ive already profiled by application and determined that it is the
 lucene/Search that is taking the time, I also had another attempt using luke
 but find it incredibly buggy and of little use.

 thanks Paul

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene 3.0.3 with debug information

2011-04-29 Thread Dawid Weiss
 lucene/Search that is taking the time, I also had another attempt using
 luke
  but find it incredibly buggy and of little use


Can you expand on this too? What kind of incredible bugs did you see?
Without feedback there is little progress, so bug reports count.

Dawid


RE: Lucene 3.0.3 with debug information

2011-04-29 Thread Steven A Rowe
Hi Paul,

What did you find about Luke that's buggy?  Bug reports are very useful; please 
contribute in this way.

The official Lucene 3.0.3 distribution jars were compiled using the -g cmdline 
argument to javac - by default, though, only line number and source file 
information is generated.  If you want local variable information too, you 
could download the source and make your own debug-enabled jar(s), right?:

0. Install Ant 1.7.1: http://archive.apache.org/dist/ant/binaries/

1. svn checkout http://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3

2. Add 'debuglevel=lines,source,vars' to the compile macrodef in 
common-build.xml 
http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/common-build.xml?revision=1040994view=markup#l536
 in the javac task invocation, e.g.:

545:javac
546:  encoding=${build.encoding}
547:  srcdir=@{srcdir}
548:  destdir=@{destdir}
549:  deprecation=${javac.deprecation}
550:  debug=${javac.debug}
Add -- debuglevel=lines,source,vars
...

3. run ant clean jar from the command line.  The Lucene core jar will be in 
the build/ directory.  (If you need one of the contrib jars, run ant package 
instead.)

Steve

 -Original Message-
 From: Paul Taylor [mailto:paul_t...@fastmail.fm]
 Sent: Friday, April 29, 2011 7:09 AM
 To: java-user@lucene.apache.org
 Subject: Lucene 3.0.3 with debug information
 
 Is there a built debug version of lucene 3.0.3 so I can profile it
 properly to find what part of the search is taking the time.
 
 Note:Ive already profiled by application and determined that it is the
 lucene/Search that is taking the time, I also had another attempt using
 luke but find it incredibly buggy and of little use.
 
 thanks Paul
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: SorterTemplate.quickSort causes StackOverflowError

2011-04-29 Thread Otis Gospodnetic
Hi,

Yeah, that's what we were going to do, but instead we did:
* changed MemoryIndex to use ArrayUtil.mergeSort
* ran the up and did a thread dump that shows that SorterTemplate.quickSort in 
deep recursion again!
* looked for other places where this call is made - found it in 
MultiPhraseQuery$MultiPhraseWeight and changed that call from 
ArrayUtil.quickSort to ArrayUtil.mergeSort
* now we no longer see SorterTemplate.quickSort in deep recursion when we do a 
thread dump
* we now occasionally catch SorterTemplate.mergeSort in our thread dumps, but 
only a few levels deep, which looks healthy

I don't think we'll be able to reproduce this easily - this happens with 
MemoryIndex and a few thousand stored queries that are confidential customer 
data :(

I'll be back if after a while mergeSort starts behaving the same as quickSort.

Thanks!
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Dawid Weiss dawid.we...@gmail.com
 To: java-user@lucene.apache.org
 Sent: Fri, April 29, 2011 7:51:39 AM
 Subject: Re: SorterTemplate.quickSort causes StackOverflowError
 
 Don't know if this helps, but debugging stuff like this I simply add  a
 (manually inserted or aspectj-injected) recursion count, add a  breakpoint
 inside an if checking for recursion count  X and run the  vm with an
 attached socket debugger. This lets you run at (nearly) full speed  and once
 you hit the breakpoint, inspect the stack, variables,  etc...
 
 Dawid
 
 On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic  
 otis_gospodne...@yahoo.com  wrote:
 
  Hi,
 
  OK, so it looks like it's not MemoryIndex  and its Comparator that are
  funky.
  After switching from  quickSort call in MemoryIndex to mergeSort, the
  problem
   persists:
 
  '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=,  total cpu
  time=497060.ms user time=495210.msat
   org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:105)
 
   at  
org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
   at  
org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
   at  
org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
   So something else is calling quickSort when it gets stuck.  Weirdly, when  
I
  get
  a thread dump and get the above, I don't see the original  caller.  Maybe
  because
  the stack is already too deep and  the printout is limited to N lines per
  call
   stack?
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
 
  - Original  Message 
   From: Uwe Schindler u...@thetaphi.de
   To: java-user@lucene.apache.org
Sent: Thu, April 28, 2011 5:54:44 PM
   Subject: RE:  SorterTemplate.quickSort causes StackOverflowError
  
 Thanks for confirming, Javier! :)
   
Uwe,  I assume you are  referring to this line 528 in MemoryIndex?

 528 if (size  1)  ArrayUtil.quickSort(entries,
   termComparator);

And this funky Comparator from  MemoryIndex:

208   private static final   ComparatorObject termComparator = new
 ComparatorObject()  {
209   @SuppressWarnings(unchecked)
 210 public  int compare(Object o1, Object o2) {
 211if (o1 instanceof  Map.Entry?,?) o1 =
   ((Map.Entry?,?)
 o1).getKey();
212 if (o2 instanceof Map.Entry?,?) o2 =
((Map.Entry?,?)
o2).getKey();
 213if (o1 == o2) return 0;
 214return ((Comparable)  o1).compareTo((Comparable) o2);
 215  }
216   };

 Will try, thanks!
  
   Yeah,  simply try with mergeSort in line 528. If that  helps, this
   comparator
   is buggy.
  
   Uwe
   
  
- Original  Message 
  From: Uwe Schindler u...@thetaphi.de
 To: java-user@lucene.apache.org
   Sent: Thu, April 28, 2011 5:36:13 PM
  Subject: RE:  SorterTemplate.quickSort causes  StackOverflowError

 Hi   Otis,

 Can you reproduce this  somehow and send test  code? I could look
   into
  it. I don't expect the error in the  quicksort algorithm itself  as
  this
 one is used e.g. BytesRefHash /   TermsHash, if there is a bug we
  would
 have   seen it long time  ago.

 I  have not seen this before, but I suspect  a  problem in this  very
 strange comparator in MemoryIndex  (which is  very broken,  if you
  look
 at its code - it  can  compare Strings with Map.Entry and so on,
  b), maybe the  comparator is not stable? In this case,  quicksort
 can  easily  loop endless and stack  overflow. In Lucene 3.0 this used
 stock  java   sort (which is mergesort), maybe replace the
   ArrayUtils.quickSort my  ArrayUtils.mergeSort() and see if  problem
   is
   still
there?
 
 Uwe

   -
 Uwe Schindler
  

RE: SorterTemplate.quickSort causes StackOverflowError

2011-04-29 Thread Uwe Schindler
Hi Otis,

Thanks for trying out. From what I see, the problem is at all not in
MemoryIndex, so I suggest that you replace the mergeSort by quicksort again
(for MemoryIndex, see below). The problem seem to be the comparators that's
are in those Queries, which have no tie-breaker. MergeSort can handle them
better, because mergeSort is stable in comparison to quicksort.

I did some testing with random data and did not get a stack overflow at all
(with standard terms / integers). A integer sort showed that even 200
million integers sorted a) much faster with quickSort and did not stack
overflow (in reality, for good comparators, integers should at most do 31
recursions, but only with 2^31 integers in an array!!!), so quickSort is
fine for strings and integers.

Mike McCandless did some tests in TermsHash/BytesRefHash (Lucene Core), that
showed that quicksort is 20% faster than mergeSort. The code is similar to
MemoryIndex, so this is why I suggest to not change MemoryIndex at all. From
your description of the issue its also unlikely that MemoryIndex is causing
this, because sorting is only done on building the index, not when queries
are running! So the bad guys are the PhraseQueries. We should fix them ASAP,
as this may affect other users, too.

More on https://issues.apache.org/jira/browse/LUCENE-3054,
Thanks Robert!

I will review later, I am heavy busy at the moment.
Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Friday, April 29, 2011 7:13 PM
 To: java-user@lucene.apache.org
 Subject: Re: SorterTemplate.quickSort causes StackOverflowError
 
 Hi,
 
 Yeah, that's what we were going to do, but instead we did:
 * changed MemoryIndex to use ArrayUtil.mergeSort
 * ran the up and did a thread dump that shows that
 SorterTemplate.quickSort in deep recursion again!
 * looked for other places where this call is made - found it in
 MultiPhraseQuery$MultiPhraseWeight and changed that call from
 ArrayUtil.quickSort to ArrayUtil.mergeSort
 * now we no longer see SorterTemplate.quickSort in deep recursion when
 we do a thread dump
 * we now occasionally catch SorterTemplate.mergeSort in our thread dumps,
 but only a few levels deep, which looks healthy
 
 I don't think we'll be able to reproduce this easily - this happens with
 MemoryIndex and a few thousand stored queries that are confidential
 customer data :(
 
 I'll be back if after a while mergeSort starts behaving the same as
quickSort.
 
 Thanks!
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
 search :: http://search-lucene.com/
 
 
 
 - Original Message 
  From: Dawid Weiss dawid.we...@gmail.com
  To: java-user@lucene.apache.org
  Sent: Fri, April 29, 2011 7:51:39 AM
  Subject: Re: SorterTemplate.quickSort causes StackOverflowError
 
  Don't know if this helps, but debugging stuff like this I simply add
  a (manually inserted or aspectj-injected) recursion count, add a
  breakpoint inside an if checking for recursion count  X and run the
  vm with an attached socket debugger. This lets you run at (nearly)
  full speed  and once you hit the breakpoint, inspect the stack,
variables,
 etc...
 
  Dawid
 
  On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic  
  otis_gospodne...@yahoo.com  wrote:
 
   Hi,
  
   OK, so it looks like it's not MemoryIndex  and its Comparator that
   are funky.
   After switching from  quickSort call in MemoryIndex to mergeSort,
   the problem
persists:
  
   '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=,  total cpu
   time=497060.ms user time=495210.msat
  
   org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:
   105)
  
at
 org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
at
 org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
at
 org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104)
So something else is calling quickSort when it gets stuck.
   Weirdly, when
 I
   get
   a thread dump and get the above, I don't see the original  caller.
   Maybe because the stack is already too deep and  the printout is
   limited to N lines per call  stack?
  
   Otis
   
   Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch Lucene
   ecosystem search :: http://search-lucene.com/
  
  
  
   - Original  Message 
From: Uwe Schindler u...@thetaphi.de
To: java-user@lucene.apache.org
 Sent: Thu, April 28, 2011 5:54:44 PM
Subject: RE:  SorterTemplate.quickSort causes StackOverflowError
   
  Thanks for confirming, Javier! :)

 Uwe,  I assume you are  referring to this line 528 in MemoryIndex?
 
  528 if (size  1)  ArrayUtil.quickSort(entries,
termComparator);
 
 And this funky Comparator from  MemoryIndex:
 
 208   private static final   

Re: Lucene 3.0.3 with debug information

2011-04-29 Thread Paul Taylor

On 29/04/2011 16:03, Steven A Rowe wrote:

Hi Paul,

What did you find about Luke that's buggy?  Bug reports are very useful; please 
contribute in this way.

Please see previous post, in summary mistake on my part.

The official Lucene 3.0.3 distribution jars were compiled using the -g cmdline 
argument to javac - by default, though, only line number and source file 
information is generated.  If you want local variable information too, you 
could download the source and make your own debug-enabled jar(s), right?:

Hmm maybe that is enough, Im not sure. I'm profiling with 
YourkitProfiler and it doesnt show anything within the lucene classes so 
I assumed this meant they didnt contain the neccessary debugging info 
but I would have thought that -g is all I need


thanks Paul

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene 3.0.3 with debug information

2011-04-29 Thread Dawid Weiss
Instead of profiling, provide some more info about the following:

- what are the problematic (slow) queries -- are they generated from the
code, are they parsed from text? What are they? Certain query types are
slow(er) than other query types.

- what is the index built from? Natural language (text)? Something else?

If you describe the above folks may tell you right away why your queries are
slow -- people on this list continue to amaze me with the insight they have
even without looking at the code ;)

Dawid

On Fri, Apr 29, 2011 at 10:11 PM, Paul Taylor paul_t...@fastmail.fm wrote:

  On 29/04/2011 15:17, Dawid Weiss wrote:



   lucene/Search that is taking the time, I also had another attempt using
 luke
  but find it incredibly buggy and of little use


  Can you expand on this too? What kind of incredible bugs did you see?
 Without feedback there is little progress, so bug reports count.

 Dawid

 Sorry, I'll withdraw that. I was getting all kinds of stacktraces and
 exceptions when I tried to do searches but the problem was my fault. Because
 I wanted to use my own analyzer  I had a shells script that added it to the
 classpath when I ran luke, however I had put it before the ant jar and my
 jar built with maven also included lucene 3.0.3 and because luke 1.0.1 is
 packaged with 3.0.0 it was confusing it, but I didnt realize this until I
 notice done exception complained a lucene method was missing.

 But having got it working I cannot see anything to help me work out why the
 queries are taking too long, is it useful for this or just for refining your
 queries ?

 Paul



Re: Lucene 3.0.3 with debug information

2011-04-29 Thread Paul Taylor

On 29/04/2011 21:14, Paul Taylor wrote:


Hmm maybe that is enough, Im not sure. I'm profiling with 
YourkitProfiler and it doesnt show anything within the lucene classes 
so I assumed this meant they didnt contain the neccessary debugging 
info but I would have thought that -g is all I need


thanks Paul
Aah, not using the filter correctly in Yourkit Profiler properly, 
getting the info now


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Lucene 3.0.3 with debug information

2011-04-29 Thread Steven A Rowe
Hi Paul,

On 4/29/2011 at 4:14 PM, Paul Taylor wrote:
 On 29/04/2011 16:03, Steven A Rowe wrote:
  What did you find about Luke that's buggy?  Bug reports are very
  useful; please contribute in this way.

 Please see previous post, in summary mistake on my part.

Okay... Which previous post?  I searched for posts by you to Lucene mailing 
lists, and found no mention of Luke other than the one complaining about bugs?

Steve



Lucene 3.0.3 with debug information

2011-04-29 Thread Dawid Weiss
This is the e-mail you're looking for, Steven (it wasn't forwarded to the
list, apparently).

Dawid

-- Forwarded message --
From: Paul Taylor paul_t...@fastmail.fm
Date: Fri, Apr 29, 2011 at 10:11 PM
Subject: Re: Lucene 3.0.3 with debug information
To: Dawid Weiss dawid.we...@gmail.com


 On 29/04/2011 15:17, Dawid Weiss wrote:



  lucene/Search that is taking the time, I also had another attempt using
 luke
  but find it incredibly buggy and of little use


 Can you expand on this too? What kind of incredible bugs did you see?
Without feedback there is little progress, so bug reports count.

Dawid

Sorry, I'll withdraw that. I was getting all kinds of stacktraces and
exceptions when I tried to do searches but the problem was my fault. Because
I wanted to use my own analyzer  I had a shells script that added it to the
classpath when I ran luke, however I had put it before the ant jar and my
jar built with maven also included lucene 3.0.3 and because luke 1.0.1 is
packaged with 3.0.0 it was confusing it, but I didnt realize this until I
notice done exception complained a lucene method was missing.

But having got it working I cannot see anything to help me work out why the
queries are taking too long, is it useful for this or just for refining your
queries ?

Paul


Link to nightly build test reports on main Lucene site needs updating

2011-04-29 Thread Burton-West, Tom
Hello,

I went to look at the Hudson nightly builds and tried to follow the link from 
the main Lucene page
http://lucene.apache.org/java/docs/developer-resources.html#Nightly


The links  to the Clover Test Coverage Reports  point to 
http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/lastSuccessfulBuild/clover/
  but apparently hudson.zones.apache.org is no longer being used.  I think the 
link should point to somewhere on  
https://builds.apache.org/hudson/job/Lucene-trunk/.
Is this the right list to alert whoever maintains the main Lucene pages on 
lucene.apache.org?
Tom




Re: Lucene 3.0.3 with debug information

2011-04-29 Thread Michael McCandless
On Fri, Apr 29, 2011 at 4:25 PM, Paul Taylor paul_t...@fastmail.fm wrote:

 Hmm maybe that is enough, Im not sure. I'm profiling with YourkitProfiler
 and it doesnt show anything within the lucene classes so I assumed this
 meant they didnt contain the neccessary debugging info but I would have
 thought that -g is all I need

 thanks Paul

 Aah, not using the filter correctly in Yourkit Profiler properly, getting
 the info now

Right, YourKit filters out org.apache.* by default ;)  I find it amusing!

Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Lucene 3.0.3 with debug information

2011-04-29 Thread Steven A Rowe
Thanks Dawid. – Steve

From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of Dawid 
Weiss
Sent: Friday, April 29, 2011 4:45 PM
To: java-user@lucene.apache.org
Cc: Steven A Rowe
Subject: Lucene 3.0.3 with debug information


This is the e-mail you're looking for, Steven (it wasn't forwarded to the list, 
apparently).

Dawid
-- Forwarded message --
From: Paul Taylor paul_t...@fastmail.fmmailto:paul_t...@fastmail.fm
Date: Fri, Apr 29, 2011 at 10:11 PM
Subject: Re: Lucene 3.0.3 with debug information
To: Dawid Weiss dawid.we...@gmail.commailto:dawid.we...@gmail.com

On 29/04/2011 15:17, Dawid Weiss wrote:

 lucene/Search that is taking the time, I also had another attempt using luke
 but find it incredibly buggy and of little use

Can you expand on this too? What kind of incredible bugs did you see? Without 
feedback there is little progress, so bug reports count.

Dawid
Sorry, I'll withdraw that. I was getting all kinds of stacktraces and 
exceptions when I tried to do searches but the problem was my fault. Because I 
wanted to use my own analyzer  I had a shells script that added it to the 
classpath when I ran luke, however I had put it before the ant jar and my jar 
built with maven also included lucene 3.0.3 and because luke 1.0.1 is packaged 
with 3.0.0 it was confusing it, but I didnt realize this until I notice done 
exception complained a lucene method was missing.

But having got it working I cannot see anything to help me work out why the 
queries are taking too long, is it useful for this or just for refining your 
queries ?

Paul



[ANN] Luke 3.1.0 released

2011-04-29 Thread Andrzej Bialecki

Hi,

I'm happy to announce the release of Luke 3.1.0. This release is based 
on Lucene 3.1.0. Binaries and source code are available from the 
project's page at Google Code:


http://code.google.com/p/luke/

Changes in version 3.1.0 (released on 2011.04.30):

* Issue 35: Lucene 3.1 compatible luke version (oss.akk)
* Issue 36: XMLExporter generating invalid XML, when special characters 
are present in a TermVector field (Craig.Stires)
* Issue 17: Recent changes to DocReconstructor sometimes cause null ref 
(solrtrey)
* Issue 19: Custom directory implementation must be inherited from 
FSDirectory (mitja.lenic)
* Issue 21: luke tarball needs to extract to a luke directory 
(bevan.koopman, Photodeus)

* Issue 33: Term Positions increment incorrect (karolina.bernat)
* Issue 27: Cannot add or edit documents using StandardAnalyzer 
(dean.thrasher)


Thank you for contributing bug reports, patches and comments.

Happy Luke-ing!

--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Reusing Query instances

2011-04-29 Thread Otis Gospodnetic
Hi,

Is there any reason why one would *not* want to reuse Query instances?

I'm using MemoryIndex with a fixed set of queries and I'm executing them all on 
each new document that comes in.  Because each document needs to have many tens 
of thousands of queries executed against it, I thought I'd just run all queries 
through QueryParser once at the beginning, and then just reuse Query instances 
on each incoming document.  What I've noticed is that my fixed set of queries 
takes longer and longer to execute as time passes (more and more time is spent 
inside memoryIndex.search() somewhere).  The problem is not heap/memory - 
there is no crazy GCing and the heap is not full, but the CPU is 100% busy.

I should note that queries I'm dealing with are ugly and big, using lots of 
wildcards, but trailing and prefix ones (and this is Lucene 3.1, so no faster 
Wildcard impl).
I should also emphasize that at this point I only *suspect* that maaaybe the 
gradual slowdown I'm seeing has something to do with the fact that I'm reusing 
Query instances.

Is there any reason why one should not reuse Query instances?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org