document with parent-child relationship
Hello, I need an advice on how to create an document that has parent-child relationship. Here is an example: low pressure - engine - wheel - low pressure string is the parent and engine and wheel are children. I'd like to be able to search strings such as low pressure in engine or just low or engine and the result should be an ID of the parent. How do I create fields in the lucene document to express this relationship? Any advice appreciated. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: document with parent-child relationship
Hi, You can create three fields for a document to index e.g. Fields = parent_id parent_textchild_text Contents =1 low pressure engine wheel, etc 2 Electronics laptop pc ... Hope it helps. Harsh On Fri, Apr 29, 2011 at 12:59 PM, svo...@gmail.com wrote: Hello, I need an advice on how to create an document that has parent-child relationship. Here is an example: low pressure - engine - wheel - low pressure string is the parent and engine and wheel are children. I'd like to be able to search strings such as low pressure in engine or just low or engine and the result should be an ID of the parent. How do I create fields in the lucene document to express this relationship? Any advice appreciated. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Lucene 3.0.3 with debug information
Is there a built debug version of lucene 3.0.3 so I can profile it properly to find what part of the search is taking the time. Note:Ive already profiled by application and determined that it is the lucene/Search that is taking the time, I also had another attempt using luke but find it incredibly buggy and of little use. thanks Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Are Okapi BM25 scores normalized into 0 and 1 ?
Can anybody provide me some information about it ? Even a small clue, I'm kinda stuck on this and the owner of the libraries do not answer emails. Thanks On 28 April 2011 13:49, Patrick Diviacco patrick.divia...@gmail.com wrote: Is Okapi BM25 (its implementation in Lucene: nlp.uned.es/~jperezi/Lucene-BM25) returning back normalized query scores (in between 0 and 1) ? According to Okapi formula the final score should be normalized. Could you give some information about that ? thanks
Re: SorterTemplate.quickSort causes StackOverflowError
Hi, OK, so it looks like it's not MemoryIndex and its Comparator that are funky. After switching from quickSort call in MemoryIndex to mergeSort, the problem persists: '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=, total cpu time=497060.ms user time=495210.msat org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:105) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) So something else is calling quickSort when it gets stuck. Weirdly, when I get a thread dump and get the above, I don't see the original caller. Maybe because the stack is already too deep and the printout is limited to N lines per call stack? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:54:44 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Thanks for confirming, Javier! :) Uwe, I assume you are referring to this line 528 in MemoryIndex? 528 if (size 1) ArrayUtil.quickSort(entries, termComparator); And this funky Comparator from MemoryIndex: 208 private static final ComparatorObject termComparator = new ComparatorObject() { 209 @SuppressWarnings(unchecked) 210 public int compare(Object o1, Object o2) { 211if (o1 instanceof Map.Entry?,?) o1 = ((Map.Entry?,?) o1).getKey(); 212if (o2 instanceof Map.Entry?,?) o2 = ((Map.Entry?,?) o2).getKey(); 213if (o1 == o2) return 0; 214return ((Comparable) o1).compareTo((Comparable) o2); 215 } 216 }; Will try, thanks! Yeah, simply try with mergeSort in line 528. If that helps, this comparator is buggy. Uwe - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:36:13 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Hi Otis, Can you reproduce this somehow and send test code? I could look into it. I don't expect the error in the quicksort algorithm itself as this one is used e.g. BytesRefHash / TermsHash, if there is a bug we would have seen it long time ago. I have not seen this before, but I suspect a problem in this very strange comparator in MemoryIndex (which is very broken, if you look at its code - it can compare Strings with Map.Entry and so on, b), maybe the comparator is not stable? In this case, quicksort can easily loop endless and stack overflow. In Lucene 3.0 this used stock java sort (which is mergesort), maybe replace the ArrayUtils.quickSort my ArrayUtils.mergeSort() and see if problem is still there? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, April 28, 2011 11:17 PM To: java-user@lucene.apache.org Subject: SorterTemplate.quickSort causes StackOverflowError Hi, I'm looking at some code that uses MemoryIndex (Lucene 3.1) and that's exhibiting a strange behaviour - it slows down over time. The MemoryIndex contains 1 doc, of course, and executes a set of a few thousand queries against it. The set of queries does not change - the same set of queries gets executed on all incoming documents. This code runs very quickly. in the beginning. But with time is gets slower and slower and slower. and then I get this: 4/28/11 10:32:52 PM (S) SolrException.log : java.lang.StackOverflowError at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java: 104) I haven't profiled this code yet (remote server, firewall in between, can't use YourKit...), but does the above look familiar to anyone? I've looked at the code and obviously there is the recursive call that's problematic here - it looks like the recursion just gets deeper and deeper and gets stuck, eventually getting too deep for the JVM's taste. Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - To
Re: Are Okapi BM25 scores normalized into 0 and 1 ?
Patrick if the question is about the code snippert at the page you mention, which I copy below, I believe the answer is no and the author is aware of it since he is adding a comment about not-normalized in the second example. ScoreDocs and TopDocs are not returning normalized scores. Normalized scores tend to be rare in Lucene nowadays, I believe earlier strategy was to divide by max-score when the latter was bigger than 1. paul IndexSearcher searcher = new IndexSearcher(IndexPath); //Load average length BM25Parameters.load(avgLengthPath); BM25BooleanQuery query = new BM25BooleanQuery(This is my Query, Search-Field, new StandardAnalyzer()); TopDocs top = searcher.search(query, null, 10); ScoreDoc[] docs = top.scoreDocs; //Print results for (int i = 0; i $$ top.scoreDocs.length; i++) { System.out.println(docs[i].doc + :+docs[i].score); } Le 29 avr. 2011 à 13:20, Patrick Diviacco a écrit : Can anybody provide me some information about it ? Even a small clue, I'm kinda stuck on this and the owner of the libraries do not answer emails. Thanks On 28 April 2011 13:49, Patrick Diviacco patrick.divia...@gmail.com wrote: Is Okapi BM25 (its implementation in Lucene: nlp.uned.es/~jperezi/Lucene-BM25) returning back normalized query scores (in between 0 and 1) ? According to Okapi formula the final score should be normalized. Could you give some information about that ? thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SorterTemplate.quickSort causes StackOverflowError
maybe http://youdebug.kenai.com/ could be useful. If you are lucky you could get it to set a breakpoint when the recursive call has reached depth X. On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, OK, so it looks like it's not MemoryIndex and its Comparator that are funky. After switching from quickSort call in MemoryIndex to mergeSort, the problem persists: '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=, total cpu time=497060.ms user time=495210.msat org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:105) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) So something else is calling quickSort when it gets stuck. Weirdly, when I get a thread dump and get the above, I don't see the original caller. Maybe because the stack is already too deep and the printout is limited to N lines per call stack? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:54:44 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Thanks for confirming, Javier! :) Uwe, I assume you are referring to this line 528 in MemoryIndex? 528 if (size 1) ArrayUtil.quickSort(entries, termComparator); And this funky Comparator from MemoryIndex: 208 private static final ComparatorObject termComparator = new ComparatorObject() { 209 @SuppressWarnings(unchecked) 210 public int compare(Object o1, Object o2) { 211if (o1 instanceof Map.Entry?,?) o1 = ((Map.Entry?,?) o1).getKey(); 212if (o2 instanceof Map.Entry?,?) o2 = ((Map.Entry?,?) o2).getKey(); 213if (o1 == o2) return 0; 214return ((Comparable) o1).compareTo((Comparable) o2); 215 } 216 }; Will try, thanks! Yeah, simply try with mergeSort in line 528. If that helps, this comparator is buggy. Uwe - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:36:13 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Hi Otis, Can you reproduce this somehow and send test code? I could look into it. I don't expect the error in the quicksort algorithm itself as this one is used e.g. BytesRefHash / TermsHash, if there is a bug we would have seen it long time ago. I have not seen this before, but I suspect a problem in this very strange comparator in MemoryIndex (which is very broken, if you look at its code - it can compare Strings with Map.Entry and so on, b), maybe the comparator is not stable? In this case, quicksort can easily loop endless and stack overflow. In Lucene 3.0 this used stock java sort (which is mergesort), maybe replace the ArrayUtils.quickSort my ArrayUtils.mergeSort() and see if problem is still there? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, April 28, 2011 11:17 PM To: java-user@lucene.apache.org Subject: SorterTemplate.quickSort causes StackOverflowError Hi, I'm looking at some code that uses MemoryIndex (Lucene 3.1) and that's exhibiting a strange behaviour - it slows down over time. The MemoryIndex contains 1 doc, of course, and executes a set of a few thousand queries against it. The set of queries does not change - the same set of queries gets executed on all incoming documents. This code runs very quickly. in the beginning. But with time is gets slower and slower and slower. and then I get this: 4/28/11 10:32:52 PM (S) SolrException.log : java.lang.StackOverflowError at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java: 104) I haven't profiled this code yet (remote server, firewall in between, can't use YourKit...), but does the above look familiar to anyone? I've looked at the code and obviously there is the recursive call that's problematic here - it looks like the recursion just gets
Re: SorterTemplate.quickSort causes StackOverflowError
Don't know if this helps, but debugging stuff like this I simply add a (manually inserted or aspectj-injected) recursion count, add a breakpoint inside an if checking for recursion count X and run the vm with an attached socket debugger. This lets you run at (nearly) full speed and once you hit the breakpoint, inspect the stack, variables, etc... Dawid On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, OK, so it looks like it's not MemoryIndex and its Comparator that are funky. After switching from quickSort call in MemoryIndex to mergeSort, the problem persists: '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=, total cpu time=497060.ms user time=495210.msat org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:105) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) So something else is calling quickSort when it gets stuck. Weirdly, when I get a thread dump and get the above, I don't see the original caller. Maybe because the stack is already too deep and the printout is limited to N lines per call stack? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:54:44 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Thanks for confirming, Javier! :) Uwe, I assume you are referring to this line 528 in MemoryIndex? 528 if (size 1) ArrayUtil.quickSort(entries, termComparator); And this funky Comparator from MemoryIndex: 208 private static final ComparatorObject termComparator = new ComparatorObject() { 209 @SuppressWarnings(unchecked) 210 public int compare(Object o1, Object o2) { 211if (o1 instanceof Map.Entry?,?) o1 = ((Map.Entry?,?) o1).getKey(); 212if (o2 instanceof Map.Entry?,?) o2 = ((Map.Entry?,?) o2).getKey(); 213if (o1 == o2) return 0; 214return ((Comparable) o1).compareTo((Comparable) o2); 215 } 216 }; Will try, thanks! Yeah, simply try with mergeSort in line 528. If that helps, this comparator is buggy. Uwe - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:36:13 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Hi Otis, Can you reproduce this somehow and send test code? I could look into it. I don't expect the error in the quicksort algorithm itself as this one is used e.g. BytesRefHash / TermsHash, if there is a bug we would have seen it long time ago. I have not seen this before, but I suspect a problem in this very strange comparator in MemoryIndex (which is very broken, if you look at its code - it can compare Strings with Map.Entry and so on, b), maybe the comparator is not stable? In this case, quicksort can easily loop endless and stack overflow. In Lucene 3.0 this used stock java sort (which is mergesort), maybe replace the ArrayUtils.quickSort my ArrayUtils.mergeSort() and see if problem is still there? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, April 28, 2011 11:17 PM To: java-user@lucene.apache.org Subject: SorterTemplate.quickSort causes StackOverflowError Hi, I'm looking at some code that uses MemoryIndex (Lucene 3.1) and that's exhibiting a strange behaviour - it slows down over time. The MemoryIndex contains 1 doc, of course, and executes a set of a few thousand queries against it. The set of queries does not change - the same set of queries gets executed on all incoming documents. This code runs very quickly. in the beginning. But with time is gets slower and slower and slower. and then I get this: 4/28/11 10:32:52 PM (S) SolrException.log : java.lang.StackOverflowError at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java: 104) I haven't profiled this code yet (remote server, firewall in between, can't use
ComplexPhraseQueryParser with multiple fields
Hi, I've just started using the ComplexPhraseQueryParser and it works great with one field but is there a way for it to work with multiple fields? For example, right now the query: job_title: sales man* AND NOT contact_name: Chris Salem throws this exception Caused by: org.apache.lucene.queryParser.ParseException: Cannot have clause for field job_title nested in phrase for field contact_name What is the best way to work around this? Sincerely, Chris Salem
Re: Lucene 3.0.3 with debug information
Hey paul, you can simply checkout the tag or download the sources right? http://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/ or http://ftp.download-by.net/apache//lucene/java/3.0.3/ simon On Fri, Apr 29, 2011 at 1:09 PM, Paul Taylor paul_t...@fastmail.fm wrote: Is there a built debug version of lucene 3.0.3 so I can profile it properly to find what part of the search is taking the time. Note:Ive already profiled by application and determined that it is the lucene/Search that is taking the time, I also had another attempt using luke but find it incredibly buggy and of little use. thanks Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene 3.0.3 with debug information
lucene/Search that is taking the time, I also had another attempt using luke but find it incredibly buggy and of little use Can you expand on this too? What kind of incredible bugs did you see? Without feedback there is little progress, so bug reports count. Dawid
RE: Lucene 3.0.3 with debug information
Hi Paul, What did you find about Luke that's buggy? Bug reports are very useful; please contribute in this way. The official Lucene 3.0.3 distribution jars were compiled using the -g cmdline argument to javac - by default, though, only line number and source file information is generated. If you want local variable information too, you could download the source and make your own debug-enabled jar(s), right?: 0. Install Ant 1.7.1: http://archive.apache.org/dist/ant/binaries/ 1. svn checkout http://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3 2. Add 'debuglevel=lines,source,vars' to the compile macrodef in common-build.xml http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/common-build.xml?revision=1040994view=markup#l536 in the javac task invocation, e.g.: 545:javac 546: encoding=${build.encoding} 547: srcdir=@{srcdir} 548: destdir=@{destdir} 549: deprecation=${javac.deprecation} 550: debug=${javac.debug} Add -- debuglevel=lines,source,vars ... 3. run ant clean jar from the command line. The Lucene core jar will be in the build/ directory. (If you need one of the contrib jars, run ant package instead.) Steve -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Friday, April 29, 2011 7:09 AM To: java-user@lucene.apache.org Subject: Lucene 3.0.3 with debug information Is there a built debug version of lucene 3.0.3 so I can profile it properly to find what part of the search is taking the time. Note:Ive already profiled by application and determined that it is the lucene/Search that is taking the time, I also had another attempt using luke but find it incredibly buggy and of little use. thanks Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SorterTemplate.quickSort causes StackOverflowError
Hi, Yeah, that's what we were going to do, but instead we did: * changed MemoryIndex to use ArrayUtil.mergeSort * ran the up and did a thread dump that shows that SorterTemplate.quickSort in deep recursion again! * looked for other places where this call is made - found it in MultiPhraseQuery$MultiPhraseWeight and changed that call from ArrayUtil.quickSort to ArrayUtil.mergeSort * now we no longer see SorterTemplate.quickSort in deep recursion when we do a thread dump * we now occasionally catch SorterTemplate.mergeSort in our thread dumps, but only a few levels deep, which looks healthy I don't think we'll be able to reproduce this easily - this happens with MemoryIndex and a few thousand stored queries that are confidential customer data :( I'll be back if after a while mergeSort starts behaving the same as quickSort. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dawid Weiss dawid.we...@gmail.com To: java-user@lucene.apache.org Sent: Fri, April 29, 2011 7:51:39 AM Subject: Re: SorterTemplate.quickSort causes StackOverflowError Don't know if this helps, but debugging stuff like this I simply add a (manually inserted or aspectj-injected) recursion count, add a breakpoint inside an if checking for recursion count X and run the vm with an attached socket debugger. This lets you run at (nearly) full speed and once you hit the breakpoint, inspect the stack, variables, etc... Dawid On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, OK, so it looks like it's not MemoryIndex and its Comparator that are funky. After switching from quickSort call in MemoryIndex to mergeSort, the problem persists: '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=, total cpu time=497060.ms user time=495210.msat org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:105) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) So something else is calling quickSort when it gets stuck. Weirdly, when I get a thread dump and get the above, I don't see the original caller. Maybe because the stack is already too deep and the printout is limited to N lines per call stack? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:54:44 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Thanks for confirming, Javier! :) Uwe, I assume you are referring to this line 528 in MemoryIndex? 528 if (size 1) ArrayUtil.quickSort(entries, termComparator); And this funky Comparator from MemoryIndex: 208 private static final ComparatorObject termComparator = new ComparatorObject() { 209 @SuppressWarnings(unchecked) 210 public int compare(Object o1, Object o2) { 211if (o1 instanceof Map.Entry?,?) o1 = ((Map.Entry?,?) o1).getKey(); 212 if (o2 instanceof Map.Entry?,?) o2 = ((Map.Entry?,?) o2).getKey(); 213if (o1 == o2) return 0; 214return ((Comparable) o1).compareTo((Comparable) o2); 215 } 216 }; Will try, thanks! Yeah, simply try with mergeSort in line 528. If that helps, this comparator is buggy. Uwe - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:36:13 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Hi Otis, Can you reproduce this somehow and send test code? I could look into it. I don't expect the error in the quicksort algorithm itself as this one is used e.g. BytesRefHash / TermsHash, if there is a bug we would have seen it long time ago. I have not seen this before, but I suspect a problem in this very strange comparator in MemoryIndex (which is very broken, if you look at its code - it can compare Strings with Map.Entry and so on, b), maybe the comparator is not stable? In this case, quicksort can easily loop endless and stack overflow. In Lucene 3.0 this used stock java sort (which is mergesort), maybe replace the ArrayUtils.quickSort my ArrayUtils.mergeSort() and see if problem is still there? Uwe - Uwe Schindler
RE: SorterTemplate.quickSort causes StackOverflowError
Hi Otis, Thanks for trying out. From what I see, the problem is at all not in MemoryIndex, so I suggest that you replace the mergeSort by quicksort again (for MemoryIndex, see below). The problem seem to be the comparators that's are in those Queries, which have no tie-breaker. MergeSort can handle them better, because mergeSort is stable in comparison to quicksort. I did some testing with random data and did not get a stack overflow at all (with standard terms / integers). A integer sort showed that even 200 million integers sorted a) much faster with quickSort and did not stack overflow (in reality, for good comparators, integers should at most do 31 recursions, but only with 2^31 integers in an array!!!), so quickSort is fine for strings and integers. Mike McCandless did some tests in TermsHash/BytesRefHash (Lucene Core), that showed that quicksort is 20% faster than mergeSort. The code is similar to MemoryIndex, so this is why I suggest to not change MemoryIndex at all. From your description of the issue its also unlikely that MemoryIndex is causing this, because sorting is only done on building the index, not when queries are running! So the bad guys are the PhraseQueries. We should fix them ASAP, as this may affect other users, too. More on https://issues.apache.org/jira/browse/LUCENE-3054, Thanks Robert! I will review later, I am heavy busy at the moment. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, April 29, 2011 7:13 PM To: java-user@lucene.apache.org Subject: Re: SorterTemplate.quickSort causes StackOverflowError Hi, Yeah, that's what we were going to do, but instead we did: * changed MemoryIndex to use ArrayUtil.mergeSort * ran the up and did a thread dump that shows that SorterTemplate.quickSort in deep recursion again! * looked for other places where this call is made - found it in MultiPhraseQuery$MultiPhraseWeight and changed that call from ArrayUtil.quickSort to ArrayUtil.mergeSort * now we no longer see SorterTemplate.quickSort in deep recursion when we do a thread dump * we now occasionally catch SorterTemplate.mergeSort in our thread dumps, but only a few levels deep, which looks healthy I don't think we'll be able to reproduce this easily - this happens with MemoryIndex and a few thousand stored queries that are confidential customer data :( I'll be back if after a while mergeSort starts behaving the same as quickSort. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dawid Weiss dawid.we...@gmail.com To: java-user@lucene.apache.org Sent: Fri, April 29, 2011 7:51:39 AM Subject: Re: SorterTemplate.quickSort causes StackOverflowError Don't know if this helps, but debugging stuff like this I simply add a (manually inserted or aspectj-injected) recursion count, add a breakpoint inside an if checking for recursion count X and run the vm with an attached socket debugger. This lets you run at (nearly) full speed and once you hit the breakpoint, inspect the stack, variables, etc... Dawid On Fri, Apr 29, 2011 at 1:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, OK, so it looks like it's not MemoryIndex and its Comparator that are funky. After switching from quickSort call in MemoryIndex to mergeSort, the problem persists: '1205215856@qtp-684754483-7' Id=18, RUNNABLE on lock=, total cpu time=497060.ms user time=495210.msat org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java: 105) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) at org.apache.lucene.util.SorterTemplate.quickSort(SorterTemplate.java:104) So something else is calling quickSort when it gets stuck. Weirdly, when I get a thread dump and get the above, I don't see the original caller. Maybe because the stack is already too deep and the printout is limited to N lines per call stack? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, April 28, 2011 5:54:44 PM Subject: RE: SorterTemplate.quickSort causes StackOverflowError Thanks for confirming, Javier! :) Uwe, I assume you are referring to this line 528 in MemoryIndex? 528 if (size 1) ArrayUtil.quickSort(entries, termComparator); And this funky Comparator from MemoryIndex: 208 private static final
Re: Lucene 3.0.3 with debug information
On 29/04/2011 16:03, Steven A Rowe wrote: Hi Paul, What did you find about Luke that's buggy? Bug reports are very useful; please contribute in this way. Please see previous post, in summary mistake on my part. The official Lucene 3.0.3 distribution jars were compiled using the -g cmdline argument to javac - by default, though, only line number and source file information is generated. If you want local variable information too, you could download the source and make your own debug-enabled jar(s), right?: Hmm maybe that is enough, Im not sure. I'm profiling with YourkitProfiler and it doesnt show anything within the lucene classes so I assumed this meant they didnt contain the neccessary debugging info but I would have thought that -g is all I need thanks Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene 3.0.3 with debug information
Instead of profiling, provide some more info about the following: - what are the problematic (slow) queries -- are they generated from the code, are they parsed from text? What are they? Certain query types are slow(er) than other query types. - what is the index built from? Natural language (text)? Something else? If you describe the above folks may tell you right away why your queries are slow -- people on this list continue to amaze me with the insight they have even without looking at the code ;) Dawid On Fri, Apr 29, 2011 at 10:11 PM, Paul Taylor paul_t...@fastmail.fm wrote: On 29/04/2011 15:17, Dawid Weiss wrote: lucene/Search that is taking the time, I also had another attempt using luke but find it incredibly buggy and of little use Can you expand on this too? What kind of incredible bugs did you see? Without feedback there is little progress, so bug reports count. Dawid Sorry, I'll withdraw that. I was getting all kinds of stacktraces and exceptions when I tried to do searches but the problem was my fault. Because I wanted to use my own analyzer I had a shells script that added it to the classpath when I ran luke, however I had put it before the ant jar and my jar built with maven also included lucene 3.0.3 and because luke 1.0.1 is packaged with 3.0.0 it was confusing it, but I didnt realize this until I notice done exception complained a lucene method was missing. But having got it working I cannot see anything to help me work out why the queries are taking too long, is it useful for this or just for refining your queries ? Paul
Re: Lucene 3.0.3 with debug information
On 29/04/2011 21:14, Paul Taylor wrote: Hmm maybe that is enough, Im not sure. I'm profiling with YourkitProfiler and it doesnt show anything within the lucene classes so I assumed this meant they didnt contain the neccessary debugging info but I would have thought that -g is all I need thanks Paul Aah, not using the filter correctly in Yourkit Profiler properly, getting the info now - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Lucene 3.0.3 with debug information
Hi Paul, On 4/29/2011 at 4:14 PM, Paul Taylor wrote: On 29/04/2011 16:03, Steven A Rowe wrote: What did you find about Luke that's buggy? Bug reports are very useful; please contribute in this way. Please see previous post, in summary mistake on my part. Okay... Which previous post? I searched for posts by you to Lucene mailing lists, and found no mention of Luke other than the one complaining about bugs? Steve
Lucene 3.0.3 with debug information
This is the e-mail you're looking for, Steven (it wasn't forwarded to the list, apparently). Dawid -- Forwarded message -- From: Paul Taylor paul_t...@fastmail.fm Date: Fri, Apr 29, 2011 at 10:11 PM Subject: Re: Lucene 3.0.3 with debug information To: Dawid Weiss dawid.we...@gmail.com On 29/04/2011 15:17, Dawid Weiss wrote: lucene/Search that is taking the time, I also had another attempt using luke but find it incredibly buggy and of little use Can you expand on this too? What kind of incredible bugs did you see? Without feedback there is little progress, so bug reports count. Dawid Sorry, I'll withdraw that. I was getting all kinds of stacktraces and exceptions when I tried to do searches but the problem was my fault. Because I wanted to use my own analyzer I had a shells script that added it to the classpath when I ran luke, however I had put it before the ant jar and my jar built with maven also included lucene 3.0.3 and because luke 1.0.1 is packaged with 3.0.0 it was confusing it, but I didnt realize this until I notice done exception complained a lucene method was missing. But having got it working I cannot see anything to help me work out why the queries are taking too long, is it useful for this or just for refining your queries ? Paul
Link to nightly build test reports on main Lucene site needs updating
Hello, I went to look at the Hudson nightly builds and tried to follow the link from the main Lucene page http://lucene.apache.org/java/docs/developer-resources.html#Nightly The links to the Clover Test Coverage Reports point to http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/lastSuccessfulBuild/clover/ but apparently hudson.zones.apache.org is no longer being used. I think the link should point to somewhere on https://builds.apache.org/hudson/job/Lucene-trunk/. Is this the right list to alert whoever maintains the main Lucene pages on lucene.apache.org? Tom
Re: Lucene 3.0.3 with debug information
On Fri, Apr 29, 2011 at 4:25 PM, Paul Taylor paul_t...@fastmail.fm wrote: Hmm maybe that is enough, Im not sure. I'm profiling with YourkitProfiler and it doesnt show anything within the lucene classes so I assumed this meant they didnt contain the neccessary debugging info but I would have thought that -g is all I need thanks Paul Aah, not using the filter correctly in Yourkit Profiler properly, getting the info now Right, YourKit filters out org.apache.* by default ;) I find it amusing! Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Lucene 3.0.3 with debug information
Thanks Dawid. – Steve From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of Dawid Weiss Sent: Friday, April 29, 2011 4:45 PM To: java-user@lucene.apache.org Cc: Steven A Rowe Subject: Lucene 3.0.3 with debug information This is the e-mail you're looking for, Steven (it wasn't forwarded to the list, apparently). Dawid -- Forwarded message -- From: Paul Taylor paul_t...@fastmail.fmmailto:paul_t...@fastmail.fm Date: Fri, Apr 29, 2011 at 10:11 PM Subject: Re: Lucene 3.0.3 with debug information To: Dawid Weiss dawid.we...@gmail.commailto:dawid.we...@gmail.com On 29/04/2011 15:17, Dawid Weiss wrote: lucene/Search that is taking the time, I also had another attempt using luke but find it incredibly buggy and of little use Can you expand on this too? What kind of incredible bugs did you see? Without feedback there is little progress, so bug reports count. Dawid Sorry, I'll withdraw that. I was getting all kinds of stacktraces and exceptions when I tried to do searches but the problem was my fault. Because I wanted to use my own analyzer I had a shells script that added it to the classpath when I ran luke, however I had put it before the ant jar and my jar built with maven also included lucene 3.0.3 and because luke 1.0.1 is packaged with 3.0.0 it was confusing it, but I didnt realize this until I notice done exception complained a lucene method was missing. But having got it working I cannot see anything to help me work out why the queries are taking too long, is it useful for this or just for refining your queries ? Paul
[ANN] Luke 3.1.0 released
Hi, I'm happy to announce the release of Luke 3.1.0. This release is based on Lucene 3.1.0. Binaries and source code are available from the project's page at Google Code: http://code.google.com/p/luke/ Changes in version 3.1.0 (released on 2011.04.30): * Issue 35: Lucene 3.1 compatible luke version (oss.akk) * Issue 36: XMLExporter generating invalid XML, when special characters are present in a TermVector field (Craig.Stires) * Issue 17: Recent changes to DocReconstructor sometimes cause null ref (solrtrey) * Issue 19: Custom directory implementation must be inherited from FSDirectory (mitja.lenic) * Issue 21: luke tarball needs to extract to a luke directory (bevan.koopman, Photodeus) * Issue 33: Term Positions increment incorrect (karolina.bernat) * Issue 27: Cannot add or edit documents using StandardAnalyzer (dean.thrasher) Thank you for contributing bug reports, patches and comments. Happy Luke-ing! -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Reusing Query instances
Hi, Is there any reason why one would *not* want to reuse Query instances? I'm using MemoryIndex with a fixed set of queries and I'm executing them all on each new document that comes in. Because each document needs to have many tens of thousands of queries executed against it, I thought I'd just run all queries through QueryParser once at the beginning, and then just reuse Query instances on each incoming document. What I've noticed is that my fixed set of queries takes longer and longer to execute as time passes (more and more time is spent inside memoryIndex.search() somewhere). The problem is not heap/memory - there is no crazy GCing and the heap is not full, but the CPU is 100% busy. I should note that queries I'm dealing with are ugly and big, using lots of wildcards, but trailing and prefix ones (and this is Lucene 3.1, so no faster Wildcard impl). I should also emphasize that at this point I only *suspect* that maaaybe the gradual slowdown I'm seeing has something to do with the fact that I'm reusing Query instances. Is there any reason why one should not reuse Query instances? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org