Re: [Lucene.Net] VOTE: .NET 2.0 Framework Support After Apache Lucene.Net 2.9.4
Do it, if you need it. +1 Le 10/05/11 20:02, Lombard, Scott a écrit : +1 -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Monday, May 09, 2011 4:05 PM To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: [Lucene.Net] VOTE: .NET 2.0 Framework Support After Apache Lucene.Net 2.9.4 All, Please cast your votes regarding the topic of .Net Framework support. The question on the table is: Should Apache Lucene.Net 2.9.4 be the last release which supports the .Net 2.0 Framework? Some options are: [+1] - Yes, move forward to the latest .Net Framework version, and drop support for 2.0 completely. New features and performance are more important than backwards compatibility. [0] - Yes, focus on the latest .Net Framework, but also include patches and/or preprocessor directives and conditional compilation blocks to include support for 2.0 when needed. New features, performance, and backwards compatibility are all equally important and it's worth the additional complexity and coding work to meet all of those goals. [-1] No, .Net Framework 2.0 should remain our target platform. Backwards compatibility is more important than new features and performance. This vote is not limited to the Apache Lucene.Net IPMC. All users/contributors/committers/mailing list lurkers are welcome to cast their votes with an equal weight. This has been cross posted to both the dev and user mailing lists. Thanks, Troy This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you, King Industries, Inc.
Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)
If any of you follow Hanselman on twitter, please take a second a retweet his on the lucene.net hackathon listed below or even send a thanks. Wanna get involved in Open Source? Why not help with the Lucene.NET HackAThon? http://hnsl.mn/lucenehackathon Cheers, - Michael On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com wrote: Here's the wiki page: https://cwiki.apache.org/confluence/x/Go6OAQ Thanks, Troy On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com wrote: Michael, That worked! I'm in the process of making a wiki page for the event now. Thanks, Troy On Mon, May 9, 2011 at 1:38 PM, Michael Herndon mhern...@wickedsoftware.net wrote: log out and log back in and verify permission changes. On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com wrote: Re: I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. There are a few minor code changes in the new branch vs the C# branch, but those are things like framework target, copyright notices, etc.. I didn't change code significantly, and unit tests still pass. Re: we can probably branch C# to something like pre_NewStructure I made a tag right before committing the directory changes for this exact purpose. It's here: https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-change Regarding the hackathon next week, I'd like to put together a list of tasks specifically for this weekend to give people some focus on where they can contribute. Some of these will be major tasks with high priority (like finishing up the 2.9.4 release) and others will be of lower priority like working on the samples/wiki/website... Those will great skills in creating GUI apps, but less skills with writing back-end libraries might want to contribute to Luke.Net, even if it's not a high priority. I agree with Michael that we should tweet/blog/wiki/mailing list the details of the event. I would make a wiki page on the topic, but it seems I don't have sufficient privileges on our Confluence wiki to do that. Can whoever the admin is give me rights to add/edit wiki pages? My login is 'thoward'. Thanks, Troy On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser geobmx...@hotmail.com wrote: I think Troy has the structure ready to roll - I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. If there isn't then we can probably branch C# to something like pre_NewStructure (someone help me with a better name), then remove it from the trunk. Troy I believe was investigating the legal task - perhaps he can update us if he ever got an answer If you want to jump into a smaller task take a look at https://issues.apache.org/jira/browse/LUCENENET-372 (currently assigned to me). I updated a ton of the analyers, but I believe them to be out of date from the java 2.9.4 branch because I used the attached files from Pasha without paying attention to the age of them. So those could use a review. I also never ported the test cases, which we definately should have. Date: Mon, 9 May 2011 10:04:03 +0200 From: ma...@rotselleri.com To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516) On Mon, May 9, 2011 at 1:12 AM, Prescott Nasser wrote: +1 to getting 2.9.4 ready to roll + the changes to the directory structure we have going +1 for 2.9.4 and directory structure. To make that happen, I'd like to know what needs to be done and in what way I could be of any help. There are 10 open issues for 2.9.4, and (apart from the Luke issues mentioned below) none of them makes me feel that I can grab it and start coding. -Sharpen stuff - I haven't had time to get it really working (not to mention I don't know eclipse from a hole in the ground). I haven't heard from Alex in a while, who I think is the most knowledgeable on the subject. Also most important to get closer to the java version. -.NET syntax. +1, the API often feels quite awkward to use. That said, I think Luke is important. If we left with the idea of you could run Luke in java just find, we could also just say use lucene/solr and the api provided, no need for the Lucene.Net project. (I know it's a bit different). That said, I don't think it's top priority, but it would be nice to have a .net implimentation. Agree, it would be nice to have. Sergey was working on a port of this in WPF - can he perhaps provide an update on what's going on with that? I think it was located
Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)
You never know. Personally I generally have most tech people on a list rather directly following them. But thanks. On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wyatt.barn...@gmail.comwrote: Retweeted. Though I doubt any of the ~100 people following me aren't in the 36 following him . . . On 5/11/11 2:39 PM, Michael Herndon mhern...@wickedsoftware.net wrote: If any of you follow Hanselman on twitter, please take a second a retweet his on the lucene.net hackathon listed below or even send a thanks. Wanna get involved in Open Source? Why not help with the Lucene.NET HackAThon? http://hnsl.mn/lucenehackathon Cheers, - Michael On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com wrote: Here's the wiki page: https://cwiki.apache.org/confluence/x/Go6OAQ Thanks, Troy On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com wrote: Michael, That worked! I'm in the process of making a wiki page for the event now. Thanks, Troy On Mon, May 9, 2011 at 1:38 PM, Michael Herndon mhern...@wickedsoftware.net wrote: log out and log back in and verify permission changes. On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com wrote: Re: I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. There are a few minor code changes in the new branch vs the C# branch, but those are things like framework target, copyright notices, etc.. I didn't change code significantly, and unit tests still pass. Re: we can probably branch C# to something like pre_NewStructure I made a tag right before committing the directory changes for this exact purpose. It's here: https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha nge Regarding the hackathon next week, I'd like to put together a list of tasks specifically for this weekend to give people some focus on where they can contribute. Some of these will be major tasks with high priority (like finishing up the 2.9.4 release) and others will be of lower priority like working on the samples/wiki/website... Those will great skills in creating GUI apps, but less skills with writing back-end libraries might want to contribute to Luke.Net, even if it's not a high priority. I agree with Michael that we should tweet/blog/wiki/mailing list the details of the event. I would make a wiki page on the topic, but it seems I don't have sufficient privileges on our Confluence wiki to do that. Can whoever the admin is give me rights to add/edit wiki pages? My login is 'thoward'. Thanks, Troy On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser geobmx...@hotmail.com wrote: I think Troy has the structure ready to roll - I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. If there isn't then we can probably branch C# to something like pre_NewStructure (someone help me with a better name), then remove it from the trunk. Troy I believe was investigating the legal task - perhaps he can update us if he ever got an answer If you want to jump into a smaller task take a look at https://issues.apache.org/jira/browse/LUCENENET-372 (currently assigned to me). I updated a ton of the analyers, but I believe them to be out of date from the java 2.9.4 branch because I used the attached files from Pasha without paying attention to the age of them. So those could use a review. I also never ported the test cases, which we definately should have. Date: Mon, 9 May 2011 10:04:03 +0200 From: ma...@rotselleri.com To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516) On Mon, May 9, 2011 at 1:12 AM, Prescott Nasser wrote: +1 to getting 2.9.4 ready to roll + the changes to the directory structure we have going +1 for 2.9.4 and directory structure. To make that happen, I'd like to know what needs to be done and in what way I could be of any help. There are 10 open issues for 2.9.4, and (apart from the Luke issues mentioned below) none of them makes me feel that I can grab it and start coding. -Sharpen stuff - I haven't had time to get it really working (not to mention I don't know eclipse from a hole in the ground). I haven't heard from Alex in a while, who I think is the most knowledgeable on the subject. Also most important to get closer to the java version. -.NET syntax. +1, the API often feels quite awkward to use. That said, I think
RE: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)
Just fyi, we will need to update the website to have documentation if we do this. I figured we'd use confluence as our weak documentation store for the time being: http://incubator.apache.org/guides/sites.html Using A Wiki To Create Documentation Podlings may use a wiki to create documentation (including the website) providing that follow the guidelines. In particular, care must be taken to ensure that access to the wiki used to create documentation is restricted to only those with filed CLAs. The PPMC MUST review all changes and ensure that trust is not abused. Also see: https://cwiki.apache.org/CWIKI/#Index-Butwhatifwewouldlikethecommunityatlargetohelpmaintainthespace%253F From: thowar...@gmail.com Date: Wed, 11 May 2011 13:38:47 -0700 To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516) No problem. I set up the permissions such that any user account can edit/add pages in the wiki. This should make things a lot easier on us. Thanks, Troy On Wed, May 11, 2011 at 12:50 PM, Michael Herndon wrote: Troy, Confluence admin is not my forte, but I can look at the privileges tonight and see if we change that. You and Prescott also have admin privileges as of right now. I'm pretty much giving all committers who have forwarded their username those privileges. I've also added a snippet to the page for people to e-mail me in the meantime if they are unable to edit the page to add to the table on the hack-a-thon page. (And there are some who may just not want to join yet another wiki). Do keep an eye out for spam once we elevate privileges. - Michael On Wed, May 11, 2011 at 3:37 PM, Troy Howard wrote: Thanks Michael! One quick question -- the Wiki seems to be really locked down for public editing. That's kind of strange. Anyone should be able to log in and whip up a new page or edit an existing one, committer or otherwise. I didn't have access until just the other day, and Chris Currens doesn't have access now (I had to add him to the page manually). Can we open up the permissions on our wiki? Thanks, Troy On Wed, May 11, 2011 at 11:51 AM, Michael Herndon wrote: You never know. Personally I generally have most tech people on a list rather directly following them. But thanks. On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wrote: Retweeted. Though I doubt any of the ~100 people following me aren't in the 36 following him . . . On 5/11/11 2:39 PM, Michael Herndon wrote: If any of you follow Hanselman on twitter, please take a second a retweet his on the lucene.net hackathon listed below or even send a thanks. Wanna get involved in Open Source? Why not help with the Lucene.NET HackAThon? http://hnsl.mn/lucenehackathon Cheers, - Michael On Mon, May 9, 2011 at 7:12 PM, Troy Howard wrote: Here's the wiki page: https://cwiki.apache.org/confluence/x/Go6OAQ Thanks, Troy On Mon, May 9, 2011 at 1:59 PM, Troy Howard wrote: Michael, That worked! I'm in the process of making a wiki page for the event now. Thanks, Troy On Mon, May 9, 2011 at 1:38 PM, Michael Herndon wrote: log out and log back in and verify permission changes. On Mon, May 9, 2011 at 4:22 PM, Troy Howard wrote: Re: I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. There are a few minor code changes in the new branch vs the C# branch, but those are things like framework target, copyright notices, etc.. I didn't change code significantly, and unit tests still pass. Re: we can probably branch C# to something like pre_NewStructure I made a tag right before committing the directory changes for this exact purpose. It's here: https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha nge Regarding the hackathon next week, I'd like to put together a list of tasks specifically for this weekend to give people some focus on where they can contribute. Some of these will be major tasks with high priority (like finishing up the 2.9.4 release) and others will be of lower priority like working on the samples/wiki/website... Those will great skills in creating GUI apps, but less skills with writing back-end libraries might want to contribute to Luke.Net, even if it's not a high priority. I agree with Michael that we should tweet/blog/wiki/mailing list the details of the event. I would make a wiki page on the topic, but it seems I don't have sufficient privileges on our
Re: [Lucene.Net] VOTE: .NET 2.0 Framework Support After Apache Lucene.Net 2.9.4
+1 Troy Howard thowar...@gmail.com 10/05/2011 7:44 AM My goal with moving forward to .Net 4.0 specifically, is that with 4.0 there are major improvements to the .NET GC, which we have already found in our company's testing, improves Lucene.Net's memory management and overall speed significantly. This is without any code changes, just compiling for .Net 4.0 framework target vs 2.0 or 3.5... Thanks, Troy On Mon, May 9, 2011 at 2:40 PM, Aaron Powell m...@aaron-powell.com wrote: +1 PS: If you are supporting .NET 3.5 then you get .NET 2.0 support anyway, you just have to bin-deploy the .NET 3.5 dependencies (System.Core, etc) since they are all the same CLR Aaron Powell MVP - Internet Explorer (Development) | Umbraco Core Team Member | FunnelWeb Team Member http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | MSN: aaz...@hotmail.com -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Tuesday, 10 May 2011 6:05 AM To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: [Lucene.Net] VOTE: .NET 2.0 Framework Support After Apache Lucene.Net 2.9.4 All, Please cast your votes regarding the topic of .Net Framework support. The question on the table is: Should Apache Lucene.Net 2.9.4 be the last release which supports the .Net 2.0 Framework? Some options are: [+1] - Yes, move forward to the latest .Net Framework version, and drop support for 2.0 completely. New features and performance are more important than backwards compatibility. [0] - Yes, focus on the latest .Net Framework, but also include patches and/or preprocessor directives and conditional compilation blocks to include support for 2.0 when needed. New features, performance, and backwards compatibility are all equally important and it's worth the additional complexity and coding work to meet all of those goals. [-1] No, .Net Framework 2.0 should remain our target platform. Backwards compatibility is more important than new features and performance. This vote is not limited to the Apache Lucene.Net IPMC. All users/contributors/committers/mailing list lurkers are welcome to cast their votes with an equal weight. This has been cross posted to both the dev and user mailing lists. Thanks, Troy
segfault in JCCEnv::deleteGlobalRef
Hello, I've updated our software stack from Python 2.6.6 to Python 2.7.1. Since the update I'm seeing random segfaults all related to JCCEnv::deleteGlobalRef() and Python's GC. At first I thought the bug is an incompatibility between Python 2.7 and JCC 2.7. However an update to JCC 2.8 and Lucence 3.1.0 didn't resolve my issue. So far all segfaults have the same pattern. The creation or removal of a Python object triggers a cyclic GC run which runs into t_JObject_dealloc() and crashes inside JCCEnv::deleteGlobalRef(). At least some of the crashing code paths run inside threads with an attached JCC thread. (gdb) bt #10 signal handler called #11 0x2ba7deb380c9 in JCCEnv::deleteGlobalRef(_jobject*, int) () from /opt/vlspy27/lib/python2.7/site-packages/JCC-2.8-py2.7-linux-x86_64.egg/libjcc.so #12 0x2ba7de36c649 in t_JObject_dealloc(t_JObject*) () from /opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so #13 0x2ba7cee851eb in dict_dealloc (mp=0x9975720) at Objects/dictobject.c:985 #14 0x2ba7cee86edb in PyDict_Clear (op=value optimized out) at Objects/dictobject.c:891 #15 0x2ba7cee86f49 in dict_tp_clear (op=0x3) at Objects/dictobject.c:2088 #16 0x2ba7cef27b7e in delete_garbage (generation=value optimized out) at Modules/gcmodule.c:769 #17 collect (generation=value optimized out) at Modules/gcmodule.c:930 #18 0x2ba7cef283ae in collect_generations (basicsize=value optimized out) at Modules/gcmodule.c:996 #19 _PyObject_GC_Malloc (basicsize=value optimized out) at Modules/gcmodule.c:1457 #20 0x2ba7cef2844d in _PyObject_GC_New (tp=0x2ba7cf197fa0) at Modules/gcmodule.c:1467 #21 0x2ba7cee84bbc in PyDict_New () at Objects/dictobject.c:277 #22 0x2ba7cee8b188 in _PyObject_GenericSetAttrWithDict (obj=value optimized out, name=0x12d5ae8, value=0x7c636b0, dict=0x0) at Objects/object.c:1510 #23 0x2ba7cee8b537 in PyObject_SetAttr (v=0x77704d0, name=0x12d5ae8, value=0x7c636b0) at Objects/object.c:1245 #24 0x2ba7c4b4 in PyEval_EvalFrameEx (f=0x50d7520, throwflag=value optimized out) at Python/ceval.c:2003 #25 0x2ba7ceef28b8 in PyEval_EvalCodeEx (co=0x2199ab0, globals=value optimized out, locals=value optimized out, args=0x8bd7b58, (gdb) select-frame 24 (gdb) pyframe /opt/vlspy27/lib/python2.7/site-packages/kinterbasdb-3.3.0-py2.7-linux-x86_64.egg/kinterbasdb/__init__.py (1499): __init__ class _RowMapping(object): def __init__(self, description, row): self._description = description fields = self._fields = {} # -- 1499 pos = 0 (gdb) bt #11 0x2ba90298b0c9 in JCCEnv::deleteGlobalRef(_jobject*, int) () from /opt/vlspy27/lib/python2.7/site-packages/JCC-2.8-py2.7-linux-x86_64.egg/libjcc.so #12 0x2ba9021bf649 in t_JObject_dealloc(t_JObject*) () from /opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so #13 0x2ba8f2cd81eb in dict_dealloc (mp=0x105df800) at Objects/dictobject.c:985 #14 0x2ba8f2cd9edb in PyDict_Clear (op=value optimized out) at Objects/dictobject.c:891 #15 0x2ba8f2cd9f49 in dict_tp_clear (op=0x3) at Objects/dictobject.c:2088 #16 0x2ba8f2d7ab7e in delete_garbage (generation=value optimized out) at Modules/gcmodule.c:769 #17 collect (generation=value optimized out) at Modules/gcmodule.c:930 #18 0x2ba8f2d7b3ae in collect_generations (basicsize=value optimized out) at Modules/gcmodule.c:996 #19 _PyObject_GC_Malloc (basicsize=value optimized out) at Modules/gcmodule.c:1457 #20 0x2ba8f2d7b44d in _PyObject_GC_New (tp=0x2ba8f2fddfc0) at Modules/gcmodule.c:1467 #21 0x2ba8f2cb0aa8 in PyWrapper_New (d=0x1e5e140, self=0x2ba9242509e0) at Objects/descrobject.c:1051 #22 0x2ba8f2cb0be3 in wrapperdescr_call (descr=0x1e5e140, args=0x28f87520, kwds=0x0) at Objects/descrobject.c:296 #23 0x2ba8f2c93533 in PyObject_Call (func=0x1e5e140, arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529 #24 0x2ba8f89b8d6c in __pyx_pf_4lxml_5etree_9_ErrorLog___init__ (__pyx_v_self=0x229db820, __pyx_args=value optimized out, __pyx_kwds=value optimized out) at src/lxml/lxml.etree.c:28498 #25 0x2ba8f2cf6068 in type_call (type=value optimized out, args=0x2ba8f3c64050, kwds=0x0) at Objects/typeobject.c:728 #26 0x2ba8f2c93533 in PyObject_Call (func=0x2ba8f8cbb1e0, arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529 #27 0x2ba8f89b91c0 in __pyx_pf_4lxml_5etree_19_XPathEvaluatorBase___cinit__ (__pyx_v_self=0x6c5cdb8, __pyx_args=value optimized out, __pyx_kwds=value optimized out) at src/lxml/lxml.etree.c:111873 #28 0x2ba8f89bcb7c in __pyx_tp_new_4lxml_5etree__XPathEvaluatorBase (t=value optimized out, a=value optimized out, k=value optimized out) at src/lxml/lxml.etree.c:149259 #29 __pyx_tp_new_4lxml_5etree_XPath (t=value optimized out, a=value optimized out, k=value optimized out) at src/lxml/lxml.etree.c:18769 #30 0x2ba8f2cf6023 in type_call (type=0x3, args=0x20515510,
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 17:36, schrieb Andi Vajda: As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary to my initial assumption, the thread doesn't have a JCC thread local object. Since any thread may trigger a GC collect run, and not just threads, that use JCC, this looks like a bug in JCC to me. Any thread that is going to call into the JVM must call attachCurrentThread() first. This includes a thread doing GC of object wrapping java refs which it is going to delete. I'm well aware of requirement to call attachCurrentThread() in every thread that uses wrapped objects. This segfault is not caused by passing JVM objects between threads explicitly. It's Python's cyclic GC that breaks and collects reference cyclic with JVM objects in random threads. Something in Python 2.7's gc must have been altered to increase the chance, that a cyclic GC collect run is started inside a thread that isn't attached to the JVM. As far as I know the implementation of Python's cyclic GC detection, it's not possible to restrict the cyclic GC to some threads. So any unattached thread that creates objects, that are allocated with _PyObject_GC_New(), has a chance to trigger the segfault. Almost all Python objects are using _PyObject_GC_New(). Only very simple types like str, int, that can't reference other objects, are not tracked. Everything else (including bound methods of simple types) is tracked. In a few words: Any unattached thread has the chance to crash the interpreter unless the code is very, very limited. This can be easily reproduced with a small script: --- import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: gc.collect() time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj[obj] = obj obj[jcc] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001) --- I wonder, why it wasn't noticed earlier. Christian
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 18:14, schrieb Christian Heimes: --- import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: gc.collect() time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj[obj] = obj obj[jcc] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001) --- The example crashes also with functions like but it takes a bit longer def alloc(): while 1: a = {}, {}, {}, {}, {}, {} time.sleep(0.011) def alloc(): while 1: # create 500 bound methods to exceed PyMethod_MAXFREELIST 256 methods = [] for i in xrange(500): methods.append(str(abc).strip) time.sleep(0.011) Christian
Re: segfault in JCCEnv::deleteGlobalRef
On Wed, 11 May 2011, Christian Heimes wrote: Am 11.05.2011 17:36, schrieb Andi Vajda: As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary to my initial assumption, the thread doesn't have a JCC thread local object. Since any thread may trigger a GC collect run, and not just threads, that use JCC, this looks like a bug in JCC to me. Any thread that is going to call into the JVM must call attachCurrentThread() first. This includes a thread doing GC of object wrapping java refs which it is going to delete. I'm well aware of requirement to call attachCurrentThread() in every thread that uses wrapped objects. This segfault is not caused by passing JVM objects between threads explicitly. It's Python's cyclic GC that breaks and collects reference cyclic with JVM objects in random threads. There shouldn't be any random threads. Threads don't just appear out of thin air. You create them. If there is a chance that they call into the JVM, then attachCurrentThread(). Something in Python 2.7's gc must have been altered to increase the chance, that a cyclic GC collect run is started inside a thread that isn't attached to the JVM. As far as I know the implementation of Python's cyclic GC detection, it's not possible to restrict the cyclic GC to some threads. So any unattached thread that creates objects, that are allocated with _PyObject_GC_New(), has a chance to trigger the segfault. Almost all Python objects are using _PyObject_GC_New(). Only very simple types like str, int, that can't reference other objects, are not tracked. Everything else (including bound methods of simple types) is tracked. In a few words: Any unattached thread has the chance to crash the interpreter unless the code is very, very limited. This can be easily reproduced with a small script: --- import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: gc.collect() time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj[obj] = obj obj[jcc] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001) --- I wonder, why it wasn't noticed earlier. Did anything else change in your application besides the Python version ? 32-bit to 64-bit ? (more memory used, more frequent GCs) Something in the code ? Andi..
Re: segfault in JCCEnv::deleteGlobalRef
On Wed, 11 May 2011, Christian Heimes wrote: Am 11.05.2011 18:14, schrieb Christian Heimes: --- import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: gc.collect() time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj[obj] = obj obj[jcc] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001) --- The example crashes also with functions like but it takes a bit longer def alloc(): while 1: a = {}, {}, {}, {}, {}, {} time.sleep(0.011) def alloc(): while 1: # create 500 bound methods to exceed PyMethod_MAXFREELIST 256 methods = [] for i in xrange(500): methods.append(str(abc).strip) time.sleep(0.011) Does it crash as easily with Python 2.6 ? If not, then that could be an answer as to why this wasn't noticed before. Andi..
Re: segfault in JCCEnv::deleteGlobalRef
There shouldn't be any random threads. Threads don't just appear out of thin air. You create them. If there is a chance that they call into the JVM, then attachCurrentThread(). I've already made sure, that all our code and threads are calling a hook, which attaches the thread to the JVM. But I don't have control over all threads. Some threads are created in third party libraries. I would have to check and patch every third party tool, we are using. I wonder, why it wasn't noticed earlier. Did anything else change in your application besides the Python version ? 32-bit to 64-bit ? (more memory used, more frequent GCs) Something in the code ? I done testing with the same code base on a single machine. The Python 2.7 branch of our application just has a few changes like python2.6 - python2.7. Nothing else is different. JCC and Lucence are compiled from the very same tar ball with the same version of GCC. We had very few segfaults in our test suite over the past months (more than five test runs every day, less than one crash per week). With Python 2.7 I'm seeing crashes three of five test runs. The example code crashes both Python 2.6.6. + JCC 2.7 + PyLucene 3.0.3 and Python 2.7.1 + JCC 2.8 + PyLucene 3.1.0 on my laptop (Ubuntu 10.10 X86_64). Christian
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 18:27, schrieb Andi Vajda: Does it crash as easily with Python 2.6 ? If not, then that could be an answer as to why this wasn't noticed before. With 20 test samples, it seems like Python 2.6 survives 50% longer than Python 2.7. python2.6 0, 1.089 1, 2.688 2, 1.066 3, 6.416 4, 0.921 5, 1.859 6, 0.896 7, 0.910 8, 1.851 9, 1.042 10, 1.110 11, 1.040 12, 1.072 13, 1.825 14, 3.720 15, 1.822 16, 0.983 17, 1.931 18, 0.998 19, 1.105 cnt: 20, min: 0.896, max: 6.416, avg: 1.717 python2.7 0, 1.795 1, 0.953 2, 1.802 3, 1.022 4, 0.906 5, 1.841 6, 1.080 7, 0.958 8, 1.110 9, 0.924 10, 0.894 11, 1.958 12, 0.898 13, 1.846 14, 0.936 15, 1.859 16, 1.036 17, 1.092 18, 0.920 19, 0.949 cnt: 20, min: 0.894, max: 1.958, avg: 1.239 import subprocess from time import time log = open(log.txt, w) cnt = 100 for py in (python2.6, python2.7): log.write(py + \n) dur = [] for i in range(cnt): start = time() subprocess.call([python2.6, cyclic.py]) run = time() - start dur.append(run) log.write(%i, %0.3f\n % (i, run)) print i log.write(cnt: %i, min: %0.3f, max: %0.3f, avg: %0.3f\n\n % (cnt, min(dur), max(dur), sum(dur) / cnt)) import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: a = {}, {}, {}, {}, {}, {} time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj[obj] = obj obj[jcc] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001)
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 18:26, schrieb Andi Vajda: There shouldn't be any random threads. Threads don't just appear out of thin air. You create them. If there is a chance that they call into the JVM, then attachCurrentThread(). I've already made sure, that all our code and threads are calling a hook, which attaches the thread to the JVM. But I don't have control over all threads. Some threads are created in third party libraries. I would have to check and patch every third party tool, we are using. I wonder, why it wasn't noticed earlier. Did anything else change in your application besides the Python version ? 32-bit to 64-bit ? (more memory used, more frequent GCs) Something in the code ? I done testing with the same code base on a single machine. The Python 2.7 branch of our application just has a few changes like python2.6 - python2.7. Nothing else is different. JCC and Lucence are compiled from the very same tar ball with the same version of GCC. We had very few segfaults in our test suite over the past months (more than five test runs every day, less than one crash per week). With Python 2.7 I'm seeing crashes three of five test runs. The example code crashes both Python 2.6.6. + JCC 2.7 + PyLucene 3.0.3 and Python 2.7.1 + JCC 2.8 + PyLucene 3.1.0 on my laptop (Ubuntu 10.10 X86_64). Christian
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 19:03, schrieb Andi Vajda: If these libraries use Python's Thread class you have some control. Create a subclass of Thread that runs your hook and insert it into the threading module (threading.Thread = YourThreadSubclass) before anyone else gets a chance to create threads. One library is using thread.start_new_thread() and another uses Python's C API to create an internal monitor thread. This makes it even harder to fix the issue. How would you feel about another approach? * factor out the attach routine of t_jccenv_attachCurrentThread() as C function int jccenv_attachCurrentThread(char *name, int asDaemon) { JNIEnv *jenv = NULL; JavaVMAttachArgs attach = { JNI_VERSION_1_4, name, NULL }; if (asDaemon) result = env-vm-AttachCurrentThreadAsDaemon((void **) jenv, attach); else result = env-vm-AttachCurrentThread((void **) jenv, attach); env-set_vm_env(jenv); return result; } * modify JCCEnv::deleteGlobalRef() to check get_vm_env() for NULL if (iter-second.count == 1) { JNIEnv *vm_env = get_vm_env() if (!vm_env) { jccenv_attachCurrentThread(NULL, 0); vm_env = get_vm_env(); } vm_env-DeleteGlobalRef(iter-second.global); refs.erase(iter); } Christian
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 19:41, schrieb Andi Vajda: If these functions eventually instantiate a Thread class, even indirectly, the monkey-patching may still work. Some of the code doesn't use the threading module at all, just thread or the internal C API. I'd have to patch the modules and C code. That may cover this case but what about all the others ? There is a reason the call has to be manual. I've not been able to automate it before. Over time, I've added checks where I could but I've not found it possible to cover all cases where attachCurrentThread() wasn't called. Anyhow, try it and see if it fixes the problem you're seeing. If any of the objects being freed invoke user code that eventually call into the JVM, the problem is going to appear again elsewhere. I understand your reluctance to automate the attaching of Python threads to the JVM. Explicit is better than implicit. However this is a special case. CPython doesn't allow to control cyclic garbage collector's threading attachment nor does CPython have a hook that is called for newly created threads. It's hard to debug a segfault when even code like a = [] can trigger the bug. The attached patch doesn't trigger the bug in my artificial test code. I'm going to run our test suite several times. That's going to take a while. Christian Index: jcc/sources/jcc.cpp === --- jcc/sources/jcc.cpp (Revision 1088091) +++ jcc/sources/jcc.cpp (Arbeitskopie) @@ -33,6 +33,25 @@ /* JCCEnv */ +int jccenv_attachCurrentThread(char *name, int asDaemon) +{ + int result; +JNIEnv *jenv = NULL; + +JavaVMAttachArgs attach = { +JNI_VERSION_1_4, name, NULL +}; + +if (asDaemon) +result = env-vm-AttachCurrentThreadAsDaemon((void **) jenv, attach); +else +result = env-vm-AttachCurrentThread((void **) jenv, attach); + +env-set_vm_env(jenv); + +return result; +} + class t_jccenv { public: PyObject_HEAD @@ -154,21 +173,11 @@ { char *name = NULL; int asDaemon = 0, result; -JNIEnv *jenv = NULL; if (!PyArg_ParseTuple(args, |si, name, asDaemon)) return NULL; -JavaVMAttachArgs attach = { -JNI_VERSION_1_4, name, NULL -}; - -if (asDaemon) -result = env-vm-AttachCurrentThreadAsDaemon((void **) jenv, attach); -else -result = env-vm-AttachCurrentThread((void **) jenv, attach); - -env-set_vm_env(jenv); +result = jccenv_attachCurrentThread(name, asDaemon); return PyInt_FromLong(result); } Index: jcc/sources/JCCEnv.cpp === --- jcc/sources/JCCEnv.cpp (Revision 1088091) +++ jcc/sources/JCCEnv.cpp (Arbeitskopie) @@ -318,6 +318,16 @@ { if (iter-second.count == 1) { +JNIEnv *vm_env = get_vm_env(); +if (!vm_env) +{ +/* Python's cyclic garbage collector may remove + * an object inside a thread that is not attached + * to the JVM. This makes sure JCC doesn't segfault. + */ +jccenv_attachCurrentThread(NULL, 0); +vm_env = get_vm_env(); +} get_vm_env()-DeleteGlobalRef(iter-second.global); refs.erase(iter); } Index: jcc/sources/JCCEnv.h === --- jcc/sources/JCCEnv.h (Revision 1088091) +++ jcc/sources/JCCEnv.h (Arbeitskopie) @@ -72,6 +72,8 @@ typedef jclass (*getclassfn)(void); +int jccenv_attachCurrentThread(char *name, int asDaemon); + class countedRef { public: jobject global;
[jira] [Created] (SOLR-2508) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102
Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102 -- Key: SOLR-2508 URL: https://issues.apache.org/jira/browse/SOLR-2508 Project: Solr Issue Type: Bug Components: highlighter, replication (java) Affects Versions: 4.0 Environment: Centos 5.6 with Java1.7.0b137 Reporter: Xing Li Using Solr 4/Trunk snapshot build of 5/10/2011. Setup: -- 1) 1 Master + 4 Slaves 2) Multicore setup with 8 cores. 3) Replication Poll Interval: 00:30:20 Summary of Issue: --- When a slave completes a replication pull from master, it will complete the data index pull but based on logs it appears subsequent index warming and other actions post replication cleanup leaves the core/db in an inconsistent state. Frequency of occurrence: Very high but not 100%. I have 1 master and 4 slaves and for each replication pull cycle, around 50% of the gets affected. Each slave has 8 multi-cores but the problem always affects this particular mysolr_blogs db/core. Please note the mysolr_blogs data index is 1.4GB and the largest of the 8 by a wide margin. Attached is the schema.xml and solrconfig.xml for the mysolr_blogs core. Temp fix: - 1) Stop and restart the solr server when this happens. 2) Stop using automatic replication on this core. Logging: - * begins automatic replication pull {code} May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Master's version: 1302675975227, generation: 694 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's version: 1302675975222, generation: 692 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Number of files in latest index in master: 10 {code} * 65 seconds past and I cut out the query logs in between. Here it's pulling the 1.4GB mysolr_blogs index data. {code} May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller downloadIndexFiles INFO: Skipping download for /db/solr-master/multicore/mysolr_blogs/data/index/1.fnx May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Total time taken for download : 65 secs May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_users] webapp=/solr path=/select params={sort=indent=offstart=0q=%2Buname:inlove*q.op=andhl.fl=*facet.field=pcategoryidfacet.field=categoryidfacet.field=languageidwt=jsonhl=truerows=51} hits=0 status=0 QTime=1 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 QTime=0 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 QTime=0 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@4f83f9df main May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@4f83f9df main from Searcher@5f7808af main
[jira] [Updated] (SOLR-2508) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102
[ https://issues.apache.org/jira/browse/SOLR-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Li updated SOLR-2508: -- Attachment: (was: schema.xml) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102 -- Key: SOLR-2508 URL: https://issues.apache.org/jira/browse/SOLR-2508 Project: Solr Issue Type: Bug Components: highlighter, replication (java) Affects Versions: 4.0 Environment: Centos 5.6 with Java1.7.0b137 Reporter: Xing Li Using Solr 4/Trunk snapshot build of 5/10/2011. Setup: -- 1) 1 Master + 4 Slaves 2) Multicore setup with 8 cores. 3) Replication Poll Interval: 00:30:20 Summary of Issue: --- When a slave completes a replication pull from master, it will complete the data index pull but based on logs it appears subsequent index warming and other actions post replication cleanup leaves the core/db in an inconsistent state. Frequency of occurrence: Very high but not 100%. I have 1 master and 4 slaves and for each replication pull cycle, around 50% of the gets affected. Each slave has 8 multi-cores but the problem always affects this particular mysolr_blogs db/core. Please note the mysolr_blogs data index is 1.4GB and the largest of the 8 by a wide margin. Attached is the schema.xml and solrconfig.xml for the mysolr_blogs core. Temp fix: - 1) Stop and restart the solr server when this happens. 2) Stop using automatic replication on this core. Logging: - * begins automatic replication pull {code} May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Master's version: 1302675975227, generation: 694 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's version: 1302675975222, generation: 692 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Number of files in latest index in master: 10 {code} * 65 seconds past and I cut out the query logs in between. Here it's pulling the 1.4GB mysolr_blogs index data. {code} May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller downloadIndexFiles INFO: Skipping download for /db/solr-master/multicore/mysolr_blogs/data/index/1.fnx May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Total time taken for download : 65 secs May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_users] webapp=/solr path=/select params={sort=indent=offstart=0q=%2Buname:inlove*q.op=andhl.fl=*facet.field=pcategoryidfacet.field=categoryidfacet.field=languageidwt=jsonhl=truerows=51} hits=0 status=0 QTime=1 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 QTime=0 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 QTime=0 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@4f83f9df main May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@4f83f9df main from Searcher@5f7808af main
[jira] [Updated] (SOLR-2508) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102
[ https://issues.apache.org/jira/browse/SOLR-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Li updated SOLR-2508: -- Attachment: solrconfig.xml schema.xml Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102 -- Key: SOLR-2508 URL: https://issues.apache.org/jira/browse/SOLR-2508 Project: Solr Issue Type: Bug Components: highlighter, replication (java) Affects Versions: 4.0 Environment: Centos 5.6 with Java1.7.0b137 Reporter: Xing Li Using Solr 4/Trunk snapshot build of 5/10/2011. Setup: -- 1) 1 Master + 4 Slaves 2) Multicore setup with 8 cores. 3) Replication Poll Interval: 00:30:20 Summary of Issue: --- When a slave completes a replication pull from master, it will complete the data index pull but based on logs it appears subsequent index warming and other actions post replication cleanup leaves the core/db in an inconsistent state. Frequency of occurrence: Very high but not 100%. I have 1 master and 4 slaves and for each replication pull cycle, around 50% of the gets affected. Each slave has 8 multi-cores but the problem always affects this particular mysolr_blogs db/core. Please note the mysolr_blogs data index is 1.4GB and the largest of the 8 by a wide margin. Attached is the schema.xml and solrconfig.xml for the mysolr_blogs core. Temp fix: - 1) Stop and restart the solr server when this happens. 2) Stop using automatic replication on this core. Logging: - * begins automatic replication pull {code} May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Master's version: 1302675975227, generation: 694 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's version: 1302675975222, generation: 692 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Number of files in latest index in master: 10 {code} * 65 seconds past and I cut out the query logs in between. Here it's pulling the 1.4GB mysolr_blogs index data. {code} May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller downloadIndexFiles INFO: Skipping download for /db/solr-master/multicore/mysolr_blogs/data/index/1.fnx May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Total time taken for download : 65 secs May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_users] webapp=/solr path=/select params={sort=indent=offstart=0q=%2Buname:inlove*q.op=andhl.fl=*facet.field=pcategoryidfacet.field=categoryidfacet.field=languageidwt=jsonhl=truerows=51} hits=0 status=0 QTime=1 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 QTime=0 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 QTime=0 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@4f83f9df main May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@4f83f9df main from Searcher@5f7808af main
[jira] [Commented] (SOLR-2508) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102
[ https://issues.apache.org/jira/browse/SOLR-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031590#comment-13031590 ] Xing Li commented on SOLR-2508: --- Got the problem more isolated. The queries effected are those using hl.fl=* with hl=true. All queries run fine until something in the post replication triggers failures of NullPointerException in solrHighligher.java 102 when wildcard is used for highlighting field selection hl.fl=*. Replacing those queries with a specific highlight field such as hl.fl=uname will then make the query work post sudden failure. Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102 -- Key: SOLR-2508 URL: https://issues.apache.org/jira/browse/SOLR-2508 Project: Solr Issue Type: Bug Components: highlighter, replication (java) Affects Versions: 4.0 Environment: Centos 5.6 with Java1.7.0b137 Reporter: Xing Li Attachments: schema.xml, solrconfig.xml Using Solr 4/Trunk snapshot build of 5/10/2011. Setup: -- 1) 1 Master + 4 Slaves 2) Multicore setup with 8 cores. 3) Replication Poll Interval: 00:30:20 Summary of Issue: --- When a slave completes a replication pull from master, it will complete the data index pull but based on logs it appears subsequent index warming and other actions post replication cleanup leaves the core/db in an inconsistent state. Frequency of occurrence: Very high but not 100%. I have 1 master and 4 slaves and for each replication pull cycle, around 50% of the gets affected. Each slave has 8 multi-cores but the problem always affects this particular mysolr_blogs db/core. Please note the mysolr_blogs data index is 1.4GB and the largest of the 8 by a wide margin. Attached is the schema.xml and solrconfig.xml for the mysolr_blogs core. Temp fix: - 1) Stop and restart the solr server when this happens. 2) Stop using automatic replication on this core. Logging: - * begins automatic replication pull {code} May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Master's version: 1302675975227, generation: 694 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's version: 1302675975222, generation: 692 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Number of files in latest index in master: 10 {code} * 65 seconds past and I cut out the query logs in between. Here it's pulling the 1.4GB mysolr_blogs index data. {code} May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller downloadIndexFiles INFO: Skipping download for /db/solr-master/multicore/mysolr_blogs/data/index/1.fnx May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Total time taken for download : 65 secs May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_users] webapp=/solr path=/select params={sort=indent=offstart=0q=%2Buname:inlove*q.op=andhl.fl=*facet.field=pcategoryidfacet.field=categoryidfacet.field=languageidwt=jsonhl=truerows=51} hits=0 status=0 QTime=1 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 QTime=0 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 QTime=0 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@4f83f9df main May 10, 2011 10:18:46 PM
[jira] [Updated] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos
[ https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2984: Attachment: LUCENE-2984.patch here is a new patch that should fix selckins failur. I added javadoc, some comments and TODOs to remove the hasProx hasVector flags once we don't need to support it anymore. I also added a testcase for the vector flags in the exception case. Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos -- Key: LUCENE-2984 URL: https://issues.apache.org/jira/browse/LUCENE-2984 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2984.patch, LUCENE-2984.patch Spin-off from LUCENE-2881 which had this change already but due to some random failures related to this change I remove this part of the patch to make it more isolated and easier to test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2448) Upgrade Carrot2 to version 3.5.0
[ https://issues.apache.org/jira/browse/SOLR-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanislaw Osinski updated SOLR-2448: Attachment: (was: SOLR-2448-2449-2450-2505-trunk.zip) Upgrade Carrot2 to version 3.5.0 Key: SOLR-2448 URL: https://issues.apache.org/jira/browse/SOLR-2448 Project: Solr Issue Type: Task Components: contrib - Clustering Reporter: Stanislaw Osinski Assignee: Stanislaw Osinski Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2448-2449-2450-2505-branch_3x.patch, SOLR-2448-2449-2450-2505-trunk.patch, carrot2-core-3.5.0.jar Carrot2 version 3.5.0 should be available very soon. After the upgrade, it will be possible to implement a few improvements to the clustering plugin; I'll file separate issues for these. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2448) Upgrade Carrot2 to version 3.5.0
[ https://issues.apache.org/jira/browse/SOLR-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanislaw Osinski updated SOLR-2448: Attachment: carrot2-core-3.5.0.jar SOLR-2448-2449-2450-2505-trunk.patch SOLR-2448-2449-2450-2505-branch_3x.patch Hi, here's another set of patches (svn this time) against trunk and branch_3x. I've corrected Maven configs and checked that the project builds fine using mvn install. After applying the patches you'd need to manually update the JARs: In trunk, delete: trunk/solr/contrib/clustering/lib/carrot2-core-3.4.2.jar trunk/solr/contrib/clustering/lib/hppc-0.3.1.jar and replace them with new versions: http://repo1.maven.org/maven2/org/carrot2/carrot2-core/3.5.0/carrot2-core-3.5.0.jar http://repo1.maven.org/maven2/com/carrotsearch/hppc/0.3.3/hppc-0.3.3.jar In branch_3x, delete: branch_3x/solr/contrib/clustering/lib/carrot2-core-3.4.2.jar branch_3x/solr/contrib/clustering/lib/hppc-0.3.1.jar and replace them with new versions: carrot2-core-3.5.0.jar attached (jdk15 backport) http://repo1.maven.org/maven2/com/carrotsearch/hppc/0.3.4/hppc-0.3.4-jdk15.jar It'd be great if someone could review these before I make the commit. Thanks! S. Upgrade Carrot2 to version 3.5.0 Key: SOLR-2448 URL: https://issues.apache.org/jira/browse/SOLR-2448 Project: Solr Issue Type: Task Components: contrib - Clustering Reporter: Stanislaw Osinski Assignee: Stanislaw Osinski Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2448-2449-2450-2505-branch_3x.patch, SOLR-2448-2449-2450-2505-trunk.patch, carrot2-core-3.5.0.jar Carrot2 version 3.5.0 should be available very soon. After the upgrade, it will be possible to implement a few improvements to the clustering plugin; I'll file separate issues for these. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-2504) Combined usage of Synonyms/SpellChecker causes java.lang.NullPointerException, when searching for a word out of synonyms.txt
[ https://issues.apache.org/jira/browse/SOLR-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Bertheau closed SOLR-2504. --- Combined usage of Synonyms/SpellChecker causes java.lang.NullPointerException, when searching for a word out of synonyms.txt Key: SOLR-2504 URL: https://issues.apache.org/jira/browse/SOLR-2504 Project: Solr Issue Type: Bug Components: clients - java, spellchecker Affects Versions: 3.1 Reporter: Jens Bertheau Assignee: Uwe Schindler After migrating from 1.4 to 3.1 we experience the following behaviour: When SpellChecking is turned off, everything works fine. When Synonyms are *not* being used, everything works fine. When both, SpellChecking and Synonyms, are being used and a search is triggered, that contains at least one of the words out of synonyms.txt the following error is thrown: java.lang.NullPointerException at org.apache.lucene.util.AttributeSource.cloneAttributes(AttributeSource.java:542) at org.apache.solr.analysis.SynonymFilter.incrementToken(SynonymFilter.java:132) at org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:58) at org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:485) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) The problem has been described already here: http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00945.html I have a report of a third person, experiencing the same problem. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3087) highlighting exact phrase with overlapping tokens fails.
highlighting exact phrase with overlapping tokens fails. Key: LUCENE-3087 URL: https://issues.apache.org/jira/browse/LUCENE-3087 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 3.1, 2.9.4 Reporter: Pierre Gossé Priority: Minor Fields with overlapping token are not highlighted in search results when searching exact phrases, when using TermVector.WITH_OFFSET. The document builded in MemoryIndex for highlight does not preserve positions of tokens in this case. Overlapping tokens get flattened (position increment always set to 1), the spanquery used for searching relevant fragment will fail to identify the correct token sequence because the position shift. I corrected this by adding a position increment calculation in sub class StoredTokenStream. I added junit test covering this case. I used the eclipse codestyle from trunk, but style add quite a few format differences between repository and working copy files. I tried to reduce them, but some linewrapping rules still doesn't match. Correction patch joined -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3087) highlighting exact phrase with overlapping tokens fails.
[ https://issues.apache.org/jira/browse/LUCENE-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pierre Gossé updated LUCENE-3087: - Attachment: LUCENE-3087.patch correction patch with junit tests highlighting exact phrase with overlapping tokens fails. Key: LUCENE-3087 URL: https://issues.apache.org/jira/browse/LUCENE-3087 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.9.4, 3.1 Reporter: Pierre Gossé Priority: Minor Attachments: LUCENE-3087.patch Fields with overlapping token are not highlighted in search results when searching exact phrases, when using TermVector.WITH_OFFSET. The document builded in MemoryIndex for highlight does not preserve positions of tokens in this case. Overlapping tokens get flattened (position increment always set to 1), the spanquery used for searching relevant fragment will fail to identify the correct token sequence because the position shift. I corrected this by adding a position increment calculation in sub class StoredTokenStream. I added junit test covering this case. I used the eclipse codestyle from trunk, but style add quite a few format differences between repository and working copy files. I tried to reduce them, but some linewrapping rules still doesn't match. Correction patch joined -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos
[ https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2984: Attachment: LUCENE-2984.patch new patch! I was running tests with the previous patch and tripped a very nifty exception. {noformat} [junit] Testsuite: org.apache.lucene.store.TestLockFactory [junit] Testcase: testStressLocksNativeFSLockFactory(org.apache.lucene.store.TestLockFactory): FAILED [junit] IndexWriter hit unexpected exceptions [junit] junit.framework.AssertionFailedError: IndexWriter hit unexpected exceptions [junit] at org.apache.lucene.store.TestLockFactory._testStressLocks(TestLockFactory.java:164) [junit] at org.apache.lucene.store.TestLockFactory.testStressLocksNativeFSLockFactory(TestLockFactory.java:144) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) [junit] [junit] [junit] Tests run: 11, Failures: 1, Errors: 0, Time elapsed: 7.092 sec [junit] [junit] - Standard Output --- [junit] Stress Test Index Writer: creation hit unexpected IOException: java.io.FileNotFoundException: _u.fnm [junit] java.io.FileNotFoundException: _u.fnm [junit] at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:386) [junit] at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:273) [junit] at org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:264) [junit] at org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:315) [junit] at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:603) [junit] at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:873) [junit] at org.apache.lucene.index.IndexFileDeleter$CommitPoint.init(IndexFileDeleter.java:625) [junit] at org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:199) [junit] at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:830) [junit] at org.apache.lucene.store.TestLockFactory$WriterThread.run(TestLockFactory.java:283) [junit] Stress Test Index Writer: creation hit unexpected IOException: java.io.FileNotFoundException: _u.fnm [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestLockFactory -Dtestmethod=testStressLocksNativeFSLockFactory -Dtests.seed=9223296054268232625:-7758089421938554917 [junit] NOTE: test params are: codec=RandomCodecProvider: {content=MockFixedIntBlock(blockSize=1397)}, locale=ar_MA, timezone=Indian/Antananarivo [junit] NOTE: all tests run in this JVM: [junit] [TestDateTools, Test2BTerms, TestAddIndexes, TestFilterIndexReader, TestIndexWriterExceptions, TestIndexWriterMerging, TestMaxTermFrequency, TestParallelReaderEmptyIndex, TestParallelTermEnum, TestPerSegmentDeletes, TestPersistentSnapshotDeletionPolicy, TestSegmentReader, TestStressAdvance, TestConstantScoreQuery, TestDateFilter, TestDateSort, TestDocIdSet, TestNot, TestPrefixQuery, TestSetNorm, TestTopScoreDocCollector, TestBasics, TestSpansAdvanced2, TestDirectory, TestLockFactory] [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 (64-bit)/cpus=8,threads=1,free=136724544,total=292618240 {noformat} that is caused by MockDirectoryWrapper behaving like Windows not deleting files if they are still open. So there might be a segments_x file around but the _x.fnm has already been deleted. That wasn't a problem before but since we now need FIs to decide if a segment is storing vectors or not this file is required. To work around this I had to add some code to IndexFileDeleter which makes me worry a little. Now I drop a commit-point if either I can't load the SIS or I can not load one of the FIs from the loaded SI. I still try to delete all files of the broken?! segment though but the question is if there could be cases where I should rather throw an exception in such a case. Maybe some infoStream output would be helpful here to. Any comments largely appreciated. Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos -- Key: LUCENE-2984 URL: https://issues.apache.org/jira/browse/LUCENE-2984 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2984.patch, LUCENE-2984.patch, LUCENE-2984.patch Spin-off from LUCENE-2881 which had this change already but due to some
[jira] [Created] (SOLR-2509) String index out of range: -1
String index out of range: -1 - Key: SOLR-2509 URL: https://issues.apache.org/jira/browse/SOLR-2509 Project: Solr Issue Type: Bug Affects Versions: 3.1 Environment: Debian Lenny JAVA Version 1.6.0_20 Reporter: Thomas Gambier Priority: Blocker Hi, I'm a french user of SOLR and i've encountered a problem since i've installed SOLR 3.1. I've got an error with this query : cle_frbr:LYSROUGE1149-73190 The error is : HTTP ERROR 500 Problem accessing /solr/select. Reason: String index out of range: -1 java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) I've tested to escape the minus char and the query worked : cle_frbr:LYSROUGE1149\-73190 But, strange fact, if i change one letter in my query it works : cle_frbr:LASROUGE1149-73190 I've tested the same query on SOLR 1.4 and it works ! Can someone test the query on next line on a 3.1 SOLR version and tell me if he have the same problem ? yourfield:LYSROUGE1149-73190 Where do the problem come from ? Thank you by advance for your help. Tom -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2509) String index out of range: -1
[ https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Gambier updated SOLR-2509: - Description: Hi, I'm a french user of SOLR and i've encountered a problem since i've installed SOLR 3.1. I've got an error with this query : cle_frbr:LYSROUGE1149-73190 The error is : HTTP ERROR 500 Problem accessing /solr/select. Reason: String index out of range: -1 java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) I've tested to escape the minus char and the query worked : cle_frbr:LYSROUGE1149\-73190 But, strange fact, if i change one letter in my query it works : cle_frbr:LASROUGE1149-73190 I've tested the same query on SOLR 1.4 and it works ! Can someone test the query on next line on a 3.1 SOLR version and tell me if he have the same problem ? yourfield:LYSROUGE1149-73190 Where do the problem come from ? Thank you by advance for your help. Tom was: Hi, I'm a french user of SOLR and i've encountered a problem since i've installed SOLR 3.1. I've got an error with this query : cle_frbr:LYSROUGE1149-73190 The error is : HTTP ERROR 500 Problem accessing /solr/select. Reason: String index out of range: -1 java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at
[jira] [Updated] (SOLR-2509) String index out of range: -1
[ https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Gambier updated SOLR-2509: - Description: Hi, I'm a french user of SOLR and i've encountered a problem since i've installed SOLR 3.1. I've got an error with this query : cle_frbr:LYSROUGE1149-73190 The error is : HTTP ERROR 500 Problem accessing /solr/select. Reason: String index out of range: -1 java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) I've tested to escape the minus char and the query worked : cle_frbr:LYSROUGE1149(BACKSLASH)-73190 But, strange fact, if i change one letter in my query it works : cle_frbr:LASROUGE1149-73190 I've tested the same query on SOLR 1.4 and it works ! Can someone test the query on next line on a 3.1 SOLR version and tell me if he have the same problem ? yourfield:LYSROUGE1149-73190 Where do the problem come from ? Thank you by advance for your help. Tom was: Hi, I'm a french user of SOLR and i've encountered a problem since i've installed SOLR 3.1. I've got an error with this query : cle_frbr:LYSROUGE1149-73190 The error is : HTTP ERROR 500 Problem accessing /solr/select. Reason: String index out of range: -1 java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797) at java.lang.StringBuilder.replace(StringBuilder.java:271) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at
[jira] [Commented] (SOLR-17) XSD for solr requests/responses
[ https://issues.apache.org/jira/browse/SOLR-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031847#comment-13031847 ] David Barnes commented on SOLR-17: -- Strongly second comment from Bill Bell. Working on a commercial project integrating Solr 3.1. Lack of an XSD is making integration with our business service back end a royal pain. We do have XSDs from all other 3rd parties we are integrating with. Using Solr commons connector is not an option for us. XSD for solr requests/responses --- Key: SOLR-17 URL: https://issues.apache.org/jira/browse/SOLR-17 Project: Solr Issue Type: Improvement Reporter: Mike Baranczak Priority: Minor Attachments: SOLR-17.Mattmann.121709.patch.txt, UselessRequestHandler.java, solr-complex.xml, solr-rev2.xsd, solr.xsd Attaching an XML schema definition for the responses and the update requests. I needed to do this for myself anyway, so I might as well contribute it to the project. At the moment, I have no plans to write an XSD for the config documents, but it wouldn't be a bad idea. TODO: change the schema URL. I'm guessing that Apache already has some sort of naming convention for these? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 18:56, schrieb Christian Heimes: Am 11.05.2011 18:27, schrieb Andi Vajda: Does it crash as easily with Python 2.6 ? If not, then that could be an answer as to why this wasn't noticed before. With 20 test samples, it seems like Python 2.6 survives 50% longer than Python 2.7. 100 samples: python2.6 cnt: 100, min: 0.886, max: 3.700, avg: 1.260 python2.7 cnt: 100, min: 0.888, max: 3.793, avg: 1.426
[jira] [Commented] (SOLR-2451) Add assertQScore() to SolrTestCaseJ4 to account for small deltas
[ https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031983#comment-13031983 ] David Smiley commented on SOLR-2451: Ok, I like it. Add assertQScore() to SolrTestCaseJ4 to account for small deltas - Key: SOLR-2451 URL: https://issues.apache.org/jira/browse/SOLR-2451 Project: Solr Issue Type: Improvement Affects Versions: 3.2 Reporter: David Smiley Priority: Minor Attachments: SOLR-2451.patch, SOLR-2451.patch, SOLR-2451_assertQScore.patch Attached is a patch that adds the following method to SolrTestCaseJ4: (just javadoc signature shown) {code:java} /** * Validates that the document at the specified index in the results has the specified score, within 0.0001. */ public static void assertQScore(SolrQueryRequest req, int docIdx, float targetScore) { {code} This is especially useful for geospatial in which slightly different precision deltas might occur when trying different geospatial indexing strategies are used, assuming the score is some geospatial distance. This patch makes a simple modification to DistanceFunctionTest to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)
Thanks Michael! One quick question -- the Wiki seems to be really locked down for public editing. That's kind of strange. Anyone should be able to log in and whip up a new page or edit an existing one, committer or otherwise. I didn't have access until just the other day, and Chris Currens doesn't have access now (I had to add him to the page manually). Can we open up the permissions on our wiki? Thanks, Troy On Wed, May 11, 2011 at 11:51 AM, Michael Herndon mhern...@wickedsoftware.net wrote: You never know. Personally I generally have most tech people on a list rather directly following them. But thanks. On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wyatt.barn...@gmail.comwrote: Retweeted. Though I doubt any of the ~100 people following me aren't in the 36 following him . . . On 5/11/11 2:39 PM, Michael Herndon mhern...@wickedsoftware.net wrote: If any of you follow Hanselman on twitter, please take a second a retweet his on the lucene.net hackathon listed below or even send a thanks. Wanna get involved in Open Source? Why not help with the Lucene.NET HackAThon? http://hnsl.mn/lucenehackathon Cheers, - Michael On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com wrote: Here's the wiki page: https://cwiki.apache.org/confluence/x/Go6OAQ Thanks, Troy On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com wrote: Michael, That worked! I'm in the process of making a wiki page for the event now. Thanks, Troy On Mon, May 9, 2011 at 1:38 PM, Michael Herndon mhern...@wickedsoftware.net wrote: log out and log back in and verify permission changes. On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com wrote: Re: I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. There are a few minor code changes in the new branch vs the C# branch, but those are things like framework target, copyright notices, etc.. I didn't change code significantly, and unit tests still pass. Re: we can probably branch C# to something like pre_NewStructure I made a tag right before committing the directory changes for this exact purpose. It's here: https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha nge Regarding the hackathon next week, I'd like to put together a list of tasks specifically for this weekend to give people some focus on where they can contribute. Some of these will be major tasks with high priority (like finishing up the 2.9.4 release) and others will be of lower priority like working on the samples/wiki/website... Those will great skills in creating GUI apps, but less skills with writing back-end libraries might want to contribute to Luke.Net, even if it's not a high priority. I agree with Michael that we should tweet/blog/wiki/mailing list the details of the event. I would make a wiki page on the topic, but it seems I don't have sufficient privileges on our Confluence wiki to do that. Can whoever the admin is give me rights to add/edit wiki pages? My login is 'thoward'. Thanks, Troy On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser geobmx...@hotmail.com wrote: I think Troy has the structure ready to roll - I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. If there isn't then we can probably branch C# to something like pre_NewStructure (someone help me with a better name), then remove it from the trunk. Troy I believe was investigating the legal task - perhaps he can update us if he ever got an answer If you want to jump into a smaller task take a look at https://issues.apache.org/jira/browse/LUCENENET-372 (currently assigned to me). I updated a ton of the analyers, but I believe them to be out of date from the java 2.9.4 branch because I used the attached files from Pasha without paying attention to the age of them. So those could use a review. I also never ported the test cases, which we definately should have. Date: Mon, 9 May 2011 10:04:03 +0200 From: ma...@rotselleri.com To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516) On Mon, May 9, 2011 at 1:12 AM, Prescott Nasser wrote: +1 to getting 2.9.4 ready to roll + the changes to the directory structure we have going +1 for 2.9.4 and directory structure. To make that happen, I'd like to know what needs to be done and in what way I could be of any help. There are 10 open issues for 2.9.4, and (apart from the Luke issues mentioned below)
Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)
Troy, Confluence admin is not my forte, but I can look at the privileges tonight and see if we change that. You and Prescott also have admin privileges as of right now. I'm pretty much giving all committers who have forwarded their username those privileges. I've also added a snippet to the page for people to e-mail me in the meantime if they are unable to edit the page to add to the table on the hack-a-thon page. (And there are some who may just not want to join yet another wiki). Do keep an eye out for spam once we elevate privileges. - Michael On Wed, May 11, 2011 at 3:37 PM, Troy Howard thowar...@gmail.com wrote: Thanks Michael! One quick question -- the Wiki seems to be really locked down for public editing. That's kind of strange. Anyone should be able to log in and whip up a new page or edit an existing one, committer or otherwise. I didn't have access until just the other day, and Chris Currens doesn't have access now (I had to add him to the page manually). Can we open up the permissions on our wiki? Thanks, Troy On Wed, May 11, 2011 at 11:51 AM, Michael Herndon mhern...@wickedsoftware.net wrote: You never know. Personally I generally have most tech people on a list rather directly following them. But thanks. On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wyatt.barn...@gmail.com wrote: Retweeted. Though I doubt any of the ~100 people following me aren't in the 36 following him . . . On 5/11/11 2:39 PM, Michael Herndon mhern...@wickedsoftware.net wrote: If any of you follow Hanselman on twitter, please take a second a retweet his on the lucene.net hackathon listed below or even send a thanks. Wanna get involved in Open Source? Why not help with the Lucene.NET HackAThon? http://hnsl.mn/lucenehackathon Cheers, - Michael On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com wrote: Here's the wiki page: https://cwiki.apache.org/confluence/x/Go6OAQ Thanks, Troy On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com wrote: Michael, That worked! I'm in the process of making a wiki page for the event now. Thanks, Troy On Mon, May 9, 2011 at 1:38 PM, Michael Herndon mhern...@wickedsoftware.net wrote: log out and log back in and verify permission changes. On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com wrote: Re: I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. There are a few minor code changes in the new branch vs the C# branch, but those are things like framework target, copyright notices, etc.. I didn't change code significantly, and unit tests still pass. Re: we can probably branch C# to something like pre_NewStructure I made a tag right before committing the directory changes for this exact purpose. It's here: https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha nge Regarding the hackathon next week, I'd like to put together a list of tasks specifically for this weekend to give people some focus on where they can contribute. Some of these will be major tasks with high priority (like finishing up the 2.9.4 release) and others will be of lower priority like working on the samples/wiki/website... Those will great skills in creating GUI apps, but less skills with writing back-end libraries might want to contribute to Luke.Net, even if it's not a high priority. I agree with Michael that we should tweet/blog/wiki/mailing list the details of the event. I would make a wiki page on the topic, but it seems I don't have sufficient privileges on our Confluence wiki to do that. Can whoever the admin is give me rights to add/edit wiki pages? My login is 'thoward'. Thanks, Troy On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser geobmx...@hotmail.com wrote: I think Troy has the structure ready to roll - I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. If there isn't then we can probably branch C# to something like pre_NewStructure (someone help me with a better name), then remove it from the trunk. Troy I believe was investigating the legal task - perhaps he can update us if he ever got an answer If you want to jump into a smaller task take a look at https://issues.apache.org/jira/browse/LUCENENET-372(currently assigned to me). I updated a ton of the analyers, but I believe them to be out of date from the java 2.9.4 branch because I used the attached files from Pasha without paying attention to the age of
[jira] [Updated] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3084: -- Attachment: LUCENE-3084-trunk-only.patch MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos -- Key: LUCENE-3084 URL: https://issues.apache.org/jira/browse/LUCENE-3084 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084.patch SegmentInfos carries a bunch of fields beyond the list of SI, but for merging purposes these fields are unused. We should cutover to ListSI instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-3084: --- Lucene Fields: [New, Patch Available] (was: [New]) After some discussion with Mike we decided, to make some further API changes in 4.0: - No longer subclass java.util.Vector, instead ArrayList - rename SegmentInfos.range to cloneSubList() and let it also return ListSegmentInfo - make OneMerge's list unmodifiable to protect against changes in consumers of the MergeSpecification (this item should in my opionion also backported to 3.x) I'll atach simple patch. MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos -- Key: LUCENE-3084 URL: https://issues.apache.org/jira/browse/LUCENE-3084 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084.patch SegmentInfos carries a bunch of fields beyond the list of SI, but for merging purposes these fields are unused. We should cutover to ListSI instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032032#comment-13032032 ] Uwe Schindler commented on LUCENE-3084: --- The above patch shows the problem with the current merge policy code: it seems that the list returned in OneMerge is sometimes modified, we should fix that (so patch not yet commitable) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos -- Key: LUCENE-3084 URL: https://issues.apache.org/jira/browse/LUCENE-3084 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084.patch SegmentInfos carries a bunch of fields beyond the list of SI, but for merging purposes these fields are unused. We should cutover to ListSI instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)
No problem. I set up the permissions such that any user account can edit/add pages in the wiki. This should make things a lot easier on us. Thanks, Troy On Wed, May 11, 2011 at 12:50 PM, Michael Herndon mhern...@wickedsoftware.net wrote: Troy, Confluence admin is not my forte, but I can look at the privileges tonight and see if we change that. You and Prescott also have admin privileges as of right now. I'm pretty much giving all committers who have forwarded their username those privileges. I've also added a snippet to the page for people to e-mail me in the meantime if they are unable to edit the page to add to the table on the hack-a-thon page. (And there are some who may just not want to join yet another wiki). Do keep an eye out for spam once we elevate privileges. - Michael On Wed, May 11, 2011 at 3:37 PM, Troy Howard thowar...@gmail.com wrote: Thanks Michael! One quick question -- the Wiki seems to be really locked down for public editing. That's kind of strange. Anyone should be able to log in and whip up a new page or edit an existing one, committer or otherwise. I didn't have access until just the other day, and Chris Currens doesn't have access now (I had to add him to the page manually). Can we open up the permissions on our wiki? Thanks, Troy On Wed, May 11, 2011 at 11:51 AM, Michael Herndon mhern...@wickedsoftware.net wrote: You never know. Personally I generally have most tech people on a list rather directly following them. But thanks. On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wyatt.barn...@gmail.com wrote: Retweeted. Though I doubt any of the ~100 people following me aren't in the 36 following him . . . On 5/11/11 2:39 PM, Michael Herndon mhern...@wickedsoftware.net wrote: If any of you follow Hanselman on twitter, please take a second a retweet his on the lucene.net hackathon listed below or even send a thanks. Wanna get involved in Open Source? Why not help with the Lucene.NET HackAThon? http://hnsl.mn/lucenehackathon Cheers, - Michael On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com wrote: Here's the wiki page: https://cwiki.apache.org/confluence/x/Go6OAQ Thanks, Troy On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com wrote: Michael, That worked! I'm in the process of making a wiki page for the event now. Thanks, Troy On Mon, May 9, 2011 at 1:38 PM, Michael Herndon mhern...@wickedsoftware.net wrote: log out and log back in and verify permission changes. On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com wrote: Re: I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. There are a few minor code changes in the new branch vs the C# branch, but those are things like framework target, copyright notices, etc.. I didn't change code significantly, and unit tests still pass. Re: we can probably branch C# to something like pre_NewStructure I made a tag right before committing the directory changes for this exact purpose. It's here: https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha nge Regarding the hackathon next week, I'd like to put together a list of tasks specifically for this weekend to give people some focus on where they can contribute. Some of these will be major tasks with high priority (like finishing up the 2.9.4 release) and others will be of lower priority like working on the samples/wiki/website... Those will great skills in creating GUI apps, but less skills with writing back-end libraries might want to contribute to Luke.Net, even if it's not a high priority. I agree with Michael that we should tweet/blog/wiki/mailing list the details of the event. I would make a wiki page on the topic, but it seems I don't have sufficient privileges on our Confluence wiki to do that. Can whoever the admin is give me rights to add/edit wiki pages? My login is 'thoward'. Thanks, Troy On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser geobmx...@hotmail.com wrote: I think Troy has the structure ready to roll - I'm not sure if there is a coding difference between the C# stuff and the other directory stuff. If there isn't then we can probably branch C# to something like pre_NewStructure (someone help me with a better name), then remove it from the trunk. Troy I believe was investigating the legal task - perhaps he can update us if he ever got an answer If you want to jump into a smaller task take a look at
[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032046#comment-13032046 ] Earwin Burrfoot commented on LUCENE-3084: - * Speaking logically, merges operate on Sets of SIs, not List? * Let's stop subclassing random things? : ) SIS can contain a List of SIs (and maybe a Set, or whatever we need in the future), and only expose operations its clients really need. MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos -- Key: LUCENE-3084 URL: https://issues.apache.org/jira/browse/LUCENE-3084 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084.patch SegmentInfos carries a bunch of fields beyond the list of SI, but for merging purposes these fields are unused. We should cutover to ListSI instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1421) Ability to group search results by field
[ https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032082#comment-13032082 ] Martijn van Groningen commented on LUCENE-1421: --- Nice work Michael! I also think that the two pass mechanism is definitely the preferred way to go. I think we also need a strategy mechanism (or at least an GroupCollector class hierarchy) inside this module. The mechanism should select the right group collector(s) for a certain request. Some users maybe only care about the top group document, so I second pass won't be necessary. Another example with faceting in mind. When group based faceting is necessary. The top N groups don't suffice. You'll need all group docs (I currently don't see a other way). These groups docs are then used to create a grouped Solr DocSet. But this should be a completely different implementation. Ability to group search results by field Key: LUCENE-1421 URL: https://issues.apache.org/jira/browse/LUCENE-1421 Project: Lucene - Java Issue Type: New Feature Components: Search Reporter: Artyom Sokolov Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-1421.patch, lucene-grouping.patch It would be awesome to group search results by specified field. Some functionality was provided for Apache Solr but I think it should be done in Core Lucene. There could be some useful information like total hits about collapsed data like total count and so on. Thanks, Artyom -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2510) Proximity search is not symmetric
Proximity search is not symmetric - Key: SOLR-2510 URL: https://issues.apache.org/jira/browse/SOLR-2510 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 3.1 Environment: Ubuntu 10.04 Reporter: mark risher The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are less-than N words before and less-than-or-equal-to N words after. For example, use the following document: WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G Expected result: Both of the following queries should match: 1) WORD_D WORD_G~3 2) WORD_D WORD_A~3 Actual result: Only #1 matches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2510) Proximity search is not symmetric
[ https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mark risher updated SOLR-2510: -- Description: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G Expected result: Both of the following queries should match: 1) WORD_D WORD_G~3 2) WORD_D WORD_A~3 Actual result: Only #1 matches. was: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are less-than N words before and less-than-or-equal-to N words after. For example, use the following document: WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G Expected result: Both of the following queries should match: 1) WORD_D WORD_G~3 2) WORD_D WORD_A~3 Actual result: Only #1 matches. Proximity search is not symmetric - Key: SOLR-2510 URL: https://issues.apache.org/jira/browse/SOLR-2510 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 3.1 Environment: Ubuntu 10.04 Reporter: mark risher The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G Expected result: Both of the following queries should match: 1) WORD_D WORD_G~3 2) WORD_D WORD_A~3 Actual result: Only #1 matches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2510) Proximity search is not symmetric
[ https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mark risher updated SOLR-2510: -- Description: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} __Expected result:__ Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} __Actual result:__ Only #1 matches. was: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{ WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} __Expected result:__ Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} __Actual result:__ Only #1 matches. Proximity search is not symmetric - Key: SOLR-2510 URL: https://issues.apache.org/jira/browse/SOLR-2510 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 3.1 Environment: Ubuntu 10.04 Reporter: mark risher The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} __Expected result:__ Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} __Actual result:__ Only #1 matches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2510) Proximity search is not symmetric
[ https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mark risher updated SOLR-2510: -- Description: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} *Expected result:* Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} *Actual result:* Only #1 matches. was: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} __Expected result:__ Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} __Actual result:__ Only #1 matches. Proximity search is not symmetric - Key: SOLR-2510 URL: https://issues.apache.org/jira/browse/SOLR-2510 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 3.1 Environment: Ubuntu 10.04 Reporter: mark risher The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} *Expected result:* Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} *Actual result:* Only #1 matches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2510) Proximity search is not symmetric
[ https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mark risher updated SOLR-2510: -- Description: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{ WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} __Expected result:__ Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} __Actual result:__ Only #1 matches. was: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G Expected result: Both of the following queries should match: 1) WORD_D WORD_G~3 2) WORD_D WORD_A~3 Actual result: Only #1 matches. Proximity search is not symmetric - Key: SOLR-2510 URL: https://issues.apache.org/jira/browse/SOLR-2510 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 3.1 Environment: Ubuntu 10.04 Reporter: mark risher The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{ WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} __Expected result:__ Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} __Actual result:__ Only #1 matches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032099#comment-13032099 ] Earwin Burrfoot commented on LUCENE-3084: - bq. Merges are ordered Hmm.. Why should they be? bq. SegmentInfos itself must be list It may contain list as a field instead. And have a much cleaner API as a consequence. On another note, I wonder, is the fact that Vector is internally synchronized used somewhere within SegmentInfos client code? MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos -- Key: LUCENE-3084 URL: https://issues.apache.org/jira/browse/LUCENE-3084 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, LUCENE-3084.patch SegmentInfos carries a bunch of fields beyond the list of SI, but for merging purposes these fields are unused. We should cutover to ListSI instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2511) Make it easier to override SolrContentHandler newDocument
[ https://issues.apache.org/jira/browse/SOLR-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-2511: - Assignee: Grant Ingersoll Make it easier to override SolrContentHandler newDocument - Key: SOLR-2511 URL: https://issues.apache.org/jira/browse/SOLR-2511 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor The SolrContentHandler's newDocument method does a variety of things: adds metadata, literals, content and catpured content. We could split this out into protected methods for each that makes it easier to override. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2511) Make it easier to override SolrContentHandler newDocument
Make it easier to override SolrContentHandler newDocument - Key: SOLR-2511 URL: https://issues.apache.org/jira/browse/SOLR-2511 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor The SolrContentHandler's newDocument method does a variety of things: adds metadata, literals, content and catpured content. We could split this out into protected methods for each that makes it easier to override. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032137#comment-13032137 ] Michael McCandless commented on LUCENE-3084: I would love to cutover to SetSI, but, I don't think we can. There are apps out there that want merges to remain contiguous (so docIDs keep their monotonicity). But I do think we should not keep that by default (I reopened LUCENE-1076 to switched to TieredMP in 3.x by default). MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos -- Key: LUCENE-3084 URL: https://issues.apache.org/jira/browse/LUCENE-3084 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, LUCENE-3084.patch SegmentInfos carries a bunch of fields beyond the list of SI, but for merging purposes these fields are unused. We should cutover to ListSI instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032139#comment-13032139 ] Michael McCandless commented on LUCENE-3084: Patch looks good -- thanks Uwe! MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos -- Key: LUCENE-3084 URL: https://issues.apache.org/jira/browse/LUCENE-3084 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, LUCENE-3084.patch SegmentInfos carries a bunch of fields beyond the list of SI, but for merging purposes these fields are unused. We should cutover to ListSI instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1421) Ability to group search results by field
[ https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1421: --- Attachment: LUCENE-1421.patch Patch w/ next iteration... I beefed up the overview.html, added test case coverage of null groupValue. I think it's ready to commit and then back-port to 3.x! Ability to group search results by field Key: LUCENE-1421 URL: https://issues.apache.org/jira/browse/LUCENE-1421 Project: Lucene - Java Issue Type: New Feature Components: Search Reporter: Artyom Sokolov Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-1421.patch, LUCENE-1421.patch, lucene-grouping.patch It would be awesome to group search results by specified field. Some functionality was provided for Apache Solr but I think it should be done in Core Lucene. There could be some useful information like total hits about collapsed data like total count and so on. Thanks, Artyom -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1421) Ability to group search results by field
[ https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032145#comment-13032145 ] Michael McCandless commented on LUCENE-1421: {quote} I think we also need a strategy mechanism (or at least an GroupCollector class hierarchy) inside this module. The mechanism should select the right group collector(s) for a certain request. Some users maybe only care about the top group document, so I second pass won't be necessary. Another example with faceting in mind. When group based faceting is necessary. The top N groups don't suffice. You'll need all group docs (I currently don't see a other way). These groups docs are then used to create a grouped Solr DocSet. But this should be a completely different implementation. {quote} I agree, there's much more we could do here! Specialized collection for the maxDocsPerGroup=1 case, and for the I want all groups case, would be nice. For the not many unique values in the group field case we could do a single-pass collector, I think. Grouping by a multi-valued field should be possible (we now have DocTermOrds in Lucene, but it doesn't load the term byte[] data), as well as support for sharding, ie, by merging top groups and docs w/in each group (but I think we need an addition to FieldComparator API for this). I think we should commit this starting point, today, and then iterate from there... Martijn, thank you for persisting for so long on SOLR-236! We are finally getting grouping functionality accessible from Lucene and Solr... Ability to group search results by field Key: LUCENE-1421 URL: https://issues.apache.org/jira/browse/LUCENE-1421 Project: Lucene - Java Issue Type: New Feature Components: Search Reporter: Artyom Sokolov Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-1421.patch, LUCENE-1421.patch, lucene-grouping.patch It would be awesome to group search results by specified field. Some functionality was provided for Apache Solr but I think it should be done in Core Lucene. There could be some useful information like total hits about collapsed data like total count and so on. Thanks, Artyom -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3086) add ElisionsFilter to ItalianAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3086. - Resolution: Fixed Committed revision 1102120, 1102127 add ElisionsFilter to ItalianAnalyzer - Key: LUCENE-3086 URL: https://issues.apache.org/jira/browse/LUCENE-3086 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-3086.patch we set this up for french by default, but we don't for italian. we should enable it with the standard italian contractions (e.g. definite articles). the various stemmers for these languages assume this is already being taken care of and don't do anything about it... in general things like snowball assume really dumb tokenization, that you will split on the word-internal ', and they add these to stoplists. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3064) add checks to MockTokenizer to enforce proper consumption
[ https://issues.apache.org/jira/browse/LUCENE-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3064: Attachment: LUCENE-3064.patch Updated patch: I think this is ready to commit. I added a boolean to allow the workflow checks to be disabled in very exceptional cases (e.g. TestIndexWriterExceptions's CrashingTokenFilter), so in general we can do pretty good checking. add checks to MockTokenizer to enforce proper consumption - Key: LUCENE-3064 URL: https://issues.apache.org/jira/browse/LUCENE-3064 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3064.patch, LUCENE-3064.patch, LUCENE-3064.patch we can enforce things like consumer properly iterates through tokenstream lifeycle via MockTokenizer. this could catch bugs in consumers that don't call reset(), etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2445) unknown handler: standard
[ https://issues.apache.org/jira/browse/SOLR-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032216#comment-13032216 ] Gabriele Kahlout commented on SOLR-2445: I've attached a trivial patch that just modifies the form.jsp (useful for scripts). unknown handler: standard - Key: SOLR-2445 URL: https://issues.apache.org/jira/browse/SOLR-2445 Project: Solr Issue Type: Bug Affects Versions: 1.4.1, 3.1, 3.2, 4.0 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2445.patch To reproduce the problem using example config, go form.jsp, use standard for qt (it is default) then click Search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2445) unknown handler: standard
[ https://issues.apache.org/jira/browse/SOLR-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriele Kahlout updated SOLR-2445: --- Attachment: qt-form-jsp.patch trivial patch to form.jsp that leaves qt empty (useful for setup scripts and those that need to stick to an 3.1.0 revision). unknown handler: standard - Key: SOLR-2445 URL: https://issues.apache.org/jira/browse/SOLR-2445 Project: Solr Issue Type: Bug Affects Versions: 1.4.1, 3.1, 3.2, 4.0 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2445.patch, qt-form-jsp.patch To reproduce the problem using example config, go form.jsp, use standard for qt (it is default) then click Search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2510) Proximity search is not symmetric
[ https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mark risher updated SOLR-2510: -- Description: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} *Expected result:* Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_G WORD_D~3}} *Actual result:* Only #1 matches. For some reason, it thinks the distance from D to G is 3, but from G to D is 4. was: The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} *Expected result:* Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_D WORD_A~3}} *Actual result:* Only #1 matches. Proximity search is not symmetric - Key: SOLR-2510 URL: https://issues.apache.org/jira/browse/SOLR-2510 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 3.1 Environment: Ubuntu 10.04 Reporter: mark risher The proximity search is incorrect on words occurring *before* the matching term. It matches documents that are _less-than_ N words before and _less-than-or-equal-to_ N words after. For example, use the following document: {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} *Expected result:* Both of the following queries should match: 1) {{WORD_D WORD_G~3}} 2) {{WORD_G WORD_D~3}} *Actual result:* Only #1 matches. For some reason, it thinks the distance from D to G is 3, but from G to D is 4. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1421) Ability to group search results by field
[ https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032238#comment-13032238 ] Bill Bell commented on LUCENE-1421: --- Say we have 4 documents: docid=1 hgid=1 age=10 docid=2 hgid=1 age=10 docid=3 hgid=2 age=12 docid=4 hgid=4 age=11 If we group by hgid, we would get: hgid=1 docid=1 hgid=1 age=10 docid=2 hgid=1 age=10 hgid=3 docid=3 hgid=2 age=12 hgid=4 docid=4 hgid=4 age=11 If I set Facet Counts = POST age: 10 (1 document) age: 11 (1 document) age: 12 (1 document) If I set Facet Counts = PRE age: 10 (2 document) age: 11 (1 document) age: 12 (1 document) The only way grouping works in Solr now is Facet Counts = PRE. Thanks. Ability to group search results by field Key: LUCENE-1421 URL: https://issues.apache.org/jira/browse/LUCENE-1421 Project: Lucene - Java Issue Type: New Feature Components: Search Reporter: Artyom Sokolov Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-1421.patch, LUCENE-1421.patch, lucene-grouping.patch It would be awesome to group search results by specified field. Some functionality was provided for Apache Solr but I think it should be done in Core Lucene. There could be some useful information like total hits about collapsed data like total count and so on. Thanks, Artyom -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2444) Update fl syntax to support: pseudo fields, AS, transformers, and wildcards
[ https://issues.apache.org/jira/browse/SOLR-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032239#comment-13032239 ] Koji Sekiguchi commented on SOLR-2444: -- Does this issue cover wildcard syntax like fl=*_s ? Because SOLR-2503 has been committed, I want the wildcard syntax for fl. fl=*_s {code} doc str name=PERSON_SBarack Obama/str str name=TITLE_Sthe President/str /doc {code} Update fl syntax to support: pseudo fields, AS, transformers, and wildcards --- Key: SOLR-2444 URL: https://issues.apache.org/jira/browse/SOLR-2444 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Attachments: SOLR-2444-fl-parsing.patch, SOLR-2444-fl-parsing.patch The ReturnFields parsing needs to be improved. It should also support wildcards -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1559 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-trunk/1559/ 1 tests failed. FAILED: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: this writer hit an OutOfMemoryError; cannot commit Stack Trace: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2456) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2538) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2520) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2504) at org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:223) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) Build Log (for compile errors): [...truncated 11983 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Small issue in queryparser // ParametricRangeQueryNode.java
Hi Karsten, Sorry for taking so long to reply. I am still not 100% sure what behavior you expect exactly from ParametricRangeQueryNode constructor. To help to solve our understading problem, I created a simple JUnit (attached) that tests the behavior I expect. Please, go ahead and change it the way you expect ;) Please, on your next replies, copy Lucene dev mailing list, they might help on your questions as well. Best Regards, Adriano Crestani On Mon, May 9, 2011 at 7:05 AM, karsten-s...@gmx.de wrote: Hi Adriano, at time ParametricRangeQueryNode(lowerBound, upperBound) works only if both parameters have the same instance as fieldname (==). If it is only the same text (equals) the IllegalArgumentException is thrown. why contradiction: because upperBound == lowerBound (and lowerBound != null) implicates upperBound.equals(lowerBound) my suggestion and upper.getField() is NULL: In this case the IllegalArgumentException would be thrown. (and also for lower.getField() is NULL) Best regards Karsten Datum: Sun, 8 May 2011 16:24:48 -0400 Adriano Crestani adrianocrest...@gmail.com subject: Re: Small issue in queryparser // ParametricRangeQueryNode.java Hi Karsten, No, AFAIK, no one is working on such feature, feel free to work on it, I am sure there are many people waiting for such feature :) Now, about the contradiction you mentioned below, I can't see it in the code: because (upperBound == lowerBound !upperBound.getField().equals(lowerBound.getField())) is a contradiction) Can you explain more on this problem you see in the code? Also, what you think that should be the constraint condition does not make sense for me. The contraints asserts whether the upper and lower bounds have the same field name, correct?! The condition you proposed below would not throw an exception if upper.getField() is NULL and lower is something else, which is wrong, an exception should be thrown, since the field names are different. most possible it should be if(upperBound.getField() == null || (upperBound.getField() != lowerBound.getField() !upperBound.getField().equals( lowerBound.getField( { throw new IllegalArgumentException( ... Am I missing something? Best Regards, Adriano Crestani On Sun, May 8, 2011 at 1:00 PM, karsten-s...@gmx.de wrote: Hi Michael Busch, The Class ParametricRangeQueryNode was inserted in svn with LUCENE-1567 New flexible query parser. The constructor public ParametricRangeQueryNode(ParametricQueryNode lowerBound, ParametricQueryNode upperBound) { has a constraint about his parameters: if (upperBound.getField() != lowerBound.getField() || (upperBound.getField() != null !upperBound.getField().equals( lowerBound.getField( { throw new IllegalArgumentException( ... most possible it should be if(upperBound.getField() == null || (upperBound.getField() != lowerBound.getField() !upperBound.getField().equals( lowerBound.getField( { throw new IllegalArgumentException( ... ( because (upperBound == lowerBound !upperBound.getField().equals(lowerBound.getField())) is a contradiction) ) Best regards Karsten P.S. currently I am working with SpanQueries in the queryparser-module, so I wrote e.g. SpanNearQueryNode. Is this work already down by someone else? TestParametricRangeQueryNode.java Description: Binary data - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine
uima: add an ability to skip runtime error in AnalysisEngine Key: SOLR-2512 URL: https://issues.apache.org/jira/browse/SOLR-2512 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.2, 4.0 Currently, if AnalysisEngine throws an exception during processing a text, whole adding docs go fail. Because online NLP services are error-prone, users should be able to choose whether solr skips the text processing (but source text can be indexed) for the document or throws a runtime exception so that solr can stop adding documents entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2512: - Attachment: SOLR-2512.patch A draft patch attached. It doesn't include the switch. uima: add an ability to skip runtime error in AnalysisEngine Key: SOLR-2512 URL: https://issues.apache.org/jira/browse/SOLR-2512 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2512.patch Currently, if AnalysisEngine throws an exception during processing a text, whole adding docs go fail. Because online NLP services are error-prone, users should be able to choose whether solr skips the text processing (but source text can be indexed) for the document or throws a runtime exception so that solr can stop adding documents entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org