Re: [Lucene.Net] VOTE: .NET 2.0 Framework Support After Apache Lucene.Net 2.9.4

2011-05-11 Thread Vincent DARON

Do it, if you need it. +1



Le 10/05/11 20:02, Lombard, Scott a écrit :

+1


-Original Message-
From: Troy Howard [mailto:thowar...@gmail.com]
Sent: Monday, May 09, 2011 4:05 PM
To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] VOTE: .NET 2.0 Framework Support After Apache
Lucene.Net 2.9.4

All,

Please cast your votes regarding the topic of .Net Framework support.

The question on the table is:

Should Apache Lucene.Net 2.9.4 be the last release which supports the
.Net 2.0 Framework?

Some options are:

[+1] - Yes, move forward to the latest .Net Framework version, and drop
support for 2.0 completely. New features and performance are more
important
than backwards compatibility.
[0] - Yes, focus on the latest .Net Framework, but also include patches
and/or preprocessor directives and conditional compilation blocks to
include
support for 2.0 when needed. New features, performance, and backwards
compatibility are all equally important and it's worth the additional
complexity and coding work to meet all of those goals.
[-1] No, .Net Framework 2.0 should remain our target platform. Backwards
compatibility is more important than new features and performance.


This vote is not limited to the Apache Lucene.Net IPMC. All
users/contributors/committers/mailing list lurkers are welcome to cast
their
votes with an equal weight. This has been cross posted to both the dev and
user mailing lists.

Thanks,
Troy


This message (and any associated files) is intended only for the
use of the individual or entity to which it is addressed and may
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient
you are hereby notified that any dissemination, copying or
distribution of this message, or files associated with this message,
is strictly prohibited. If you have received this message in error,
please notify us immediately by replying to the message and deleting
it from your computer.  Thank you, King Industries, Inc.



Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)

2011-05-11 Thread Michael Herndon
If any of you follow Hanselman on twitter, please take a second a retweet
his on the lucene.net hackathon listed below or even send a thanks.

Wanna get involved in Open Source? Why not help with the Lucene.NET
HackAThon? http://hnsl.mn/lucenehackathon 

Cheers,
- Michael

On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com wrote:

 Here's the wiki page:

 https://cwiki.apache.org/confluence/x/Go6OAQ

 Thanks,
 Troy


 On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com wrote:
  Michael,
 
  That worked!
 
  I'm in the process of making a wiki page for the event now.
 
  Thanks,
  Troy
 
 
  On Mon, May 9, 2011 at 1:38 PM, Michael Herndon
  mhern...@wickedsoftware.net wrote:
 
  log out and log back in and verify permission changes.
 
  On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com
 wrote:
 
   Re: I'm not sure if there is a coding difference between the C# stuff
 and
   the other directory stuff.
  
   There are a few minor code changes in the new branch vs the C# branch,
 but
   those are things like framework target, copyright notices, etc.. I
 didn't
   change code significantly, and unit tests still pass.
  
   Re: we can probably branch C# to something like pre_NewStructure
  
   I made a tag right before committing the directory changes for this
 exact
   purpose. It's here:
  
  
  
 https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-change
  
  
   Regarding the hackathon next week, I'd like to put together a list of
 tasks
   specifically for this weekend to give people some focus on where they
 can
   contribute. Some of these will be major tasks with high priority (like
   finishing up the 2.9.4 release) and others will be of lower priority
 like
   working on the samples/wiki/website... Those will great skills in
 creating
   GUI apps, but less skills with writing back-end libraries might want
 to
   contribute to Luke.Net, even if it's not a high priority.
  
   I agree with Michael that we should tweet/blog/wiki/mailing list the
   details
   of the event. I would make a wiki page on the topic, but it seems I
 don't
   have sufficient privileges on our Confluence wiki to do that. Can
 whoever
   the admin is give me rights to add/edit wiki pages? My login is
 'thoward'.
  
   Thanks,
   Troy
  
   On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser 
 geobmx...@hotmail.com
   wrote:
  
   
I think Troy has the structure ready to roll - I'm not sure if there
 is a
coding difference between the C# stuff and the other directory
 stuff. If
there isn't then we can probably branch C# to something like
pre_NewStructure (someone help me with a better name), then remove
 it
   from
the trunk.
   
Troy I believe was investigating the legal task - perhaps he can
 update
   us
if he ever got an answer
   
If you want to jump into a smaller task take a look at
https://issues.apache.org/jira/browse/LUCENENET-372 (currently
 assigned
   to
me). I updated a ton of the analyers, but I believe them to be out
 of
   date
from the java 2.9.4 branch because I used the attached files from
 Pasha
without paying attention to the age of them. So those could use a
 review.
   I
also never ported the test cases, which we definately should have.
   
   
   

 Date: Mon, 9 May 2011 10:04:03 +0200
 From: ma...@rotselleri.com
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)

 On Mon, May 9, 2011 at 1:12 AM, Prescott Nasser wrote:
 
  +1 to getting 2.9.4 ready to roll + the changes to the directory
structure we have
  going

 +1 for 2.9.4 and directory structure.
 To make that happen, I'd like to know what needs to be done and in
 what way I could be of any help. There are 10 open issues for
 2.9.4,
 and (apart from the Luke issues mentioned below) none of them
 makes me
 feel that I can grab it and start coding.

  -Sharpen stuff - I haven't had time to get it really working
 (not to
mention I don't know
  eclipse from a hole in the ground). I haven't heard from Alex in
 a
while, who I think is
  the most knowledgeable on the subject.

 Also most important to get closer to the java version.

  -.NET syntax.
 +1, the API often feels quite awkward to use.

  That said, I think Luke is important. If we left with the idea
 of you
could run Luke in
  java just find, we could also just say use lucene/solr and the
 api
provided, no need
  for the Lucene.Net project. (I know it's a bit different). That
 said,
   I
don't think it's top
  priority, but it would be nice to have a .net implimentation.

 Agree, it would be nice to have.

  Sergey was working on a port of this in WPF - can he perhaps
 provide
   an
update on
  what's going on with that? I think it was located 

Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)

2011-05-11 Thread Michael Herndon
You never know.  Personally I generally have most tech people on a list
rather directly following them.

But thanks.

On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wyatt.barn...@gmail.comwrote:

 Retweeted. Though I doubt any of the ~100 people following me aren't in
 the 36 following him . . .

 On 5/11/11 2:39 PM, Michael Herndon mhern...@wickedsoftware.net wrote:

 If any of you follow Hanselman on twitter, please take a second a retweet
 his on the lucene.net hackathon listed below or even send a thanks.
 
 Wanna get involved in Open Source? Why not help with the Lucene.NET
 HackAThon? http://hnsl.mn/lucenehackathon 
 
 Cheers,
 - Michael
 
 On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com wrote:
 
  Here's the wiki page:
 
  https://cwiki.apache.org/confluence/x/Go6OAQ
 
  Thanks,
  Troy
 
 
  On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com
 wrote:
   Michael,
  
   That worked!
  
   I'm in the process of making a wiki page for the event now.
  
   Thanks,
   Troy
  
  
   On Mon, May 9, 2011 at 1:38 PM, Michael Herndon
   mhern...@wickedsoftware.net wrote:
  
   log out and log back in and verify permission changes.
  
   On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com
  wrote:
  
Re: I'm not sure if there is a coding difference between the C#
 stuff
  and
the other directory stuff.
   
There are a few minor code changes in the new branch vs the C#
 branch,
  but
those are things like framework target, copyright notices, etc.. I
  didn't
change code significantly, and unit tests still pass.
   
Re: we can probably branch C# to something like pre_NewStructure
   
I made a tag right before committing the directory changes for this
  exact
purpose. It's here:
   
   
   
 
 
 https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha
 nge
   
   
Regarding the hackathon next week, I'd like to put together a list
 of
  tasks
specifically for this weekend to give people some focus on where
 they
  can
contribute. Some of these will be major tasks with high priority
 (like
finishing up the 2.9.4 release) and others will be of lower
 priority
  like
working on the samples/wiki/website... Those will great skills in
  creating
GUI apps, but less skills with writing back-end libraries might
 want
  to
contribute to Luke.Net, even if it's not a high priority.
   
I agree with Michael that we should tweet/blog/wiki/mailing list
 the
details
of the event. I would make a wiki page on the topic, but it seems I
  don't
have sufficient privileges on our Confluence wiki to do that. Can
  whoever
the admin is give me rights to add/edit wiki pages? My login is
  'thoward'.
   
Thanks,
Troy
   
On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser 
  geobmx...@hotmail.com
wrote:
   

 I think Troy has the structure ready to roll - I'm not sure if
 there
  is a
 coding difference between the C# stuff and the other directory
  stuff. If
 there isn't then we can probably branch C# to something like
 pre_NewStructure (someone help me with a better name), then
 remove
  it
from
 the trunk.

 Troy I believe was investigating the legal task - perhaps he can
  update
us
 if he ever got an answer

 If you want to jump into a smaller task take a look at
 https://issues.apache.org/jira/browse/LUCENENET-372 (currently
  assigned
to
 me). I updated a ton of the analyers, but I believe them to be
 out
  of
date
 from the java 2.9.4 branch because I used the attached files from
  Pasha
 without paying attention to the age of them. So those could use a
  review.
I
 also never ported the test cases, which we definately should
 have.



 
  Date: Mon, 9 May 2011 10:04:03 +0200
  From: ma...@rotselleri.com
  To: lucene-net-dev@lucene.apache.org
  Subject: Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)
 
  On Mon, May 9, 2011 at 1:12 AM, Prescott Nasser wrote:
  
   +1 to getting 2.9.4 ready to roll + the changes to the
 directory
 structure we have
   going
 
  +1 for 2.9.4 and directory structure.
  To make that happen, I'd like to know what needs to be done
 and in
  what way I could be of any help. There are 10 open issues for
  2.9.4,
  and (apart from the Luke issues mentioned below) none of them
  makes me
  feel that I can grab it and start coding.
 
   -Sharpen stuff - I haven't had time to get it really working
  (not to
 mention I don't know
   eclipse from a hole in the ground). I haven't heard from
 Alex in
  a
 while, who I think is
   the most knowledgeable on the subject.
 
  Also most important to get closer to the java version.
 
   -.NET syntax.
  +1, the API often feels quite awkward to use.
 
   That said, I think 

RE: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)

2011-05-11 Thread Prescott Nasser

Just fyi, we will need to update the website to have documentation if we do 
this. I figured we'd use confluence as our weak documentation store for the 
time being:
 
 
http://incubator.apache.org/guides/sites.html
 
Using A Wiki To Create Documentation 
Podlings may use a wiki to create documentation (including the website) 
providing that follow the guidelines. In particular, care must be taken to 
ensure that access to the wiki used to create documentation is restricted to 
only those with filed CLAs. The PPMC MUST review all changes and ensure that 
trust is not abused.


Also see: 
https://cwiki.apache.org/CWIKI/#Index-Butwhatifwewouldlikethecommunityatlargetohelpmaintainthespace%253F


 From: thowar...@gmail.com
 Date: Wed, 11 May 2011 13:38:47 -0700
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)

 No problem. I set up the permissions such that any user account can
 edit/add pages in the wiki.

 This should make things a lot easier on us.

 Thanks,
 Troy


 On Wed, May 11, 2011 at 12:50 PM, Michael Herndon
 wrote:
  Troy,
 
  Confluence admin is not my forte, but I can look at the privileges tonight
  and see if we change that.
 
  You and Prescott also have admin privileges as of right now. I'm pretty much
  giving all committers who have forwarded their username those privileges.
 
  I've also added a snippet to the page for people to e-mail me in the
  meantime if they are unable to edit the page to add to the table on the
  hack-a-thon page. (And there are some who may just not want to join yet
  another wiki).
 
  Do keep an eye out for spam once we elevate privileges.
 
  - Michael
 
  On Wed, May 11, 2011 at 3:37 PM, Troy Howard wrote:
 
  Thanks Michael!
 
  One quick question -- the Wiki seems to be really locked down for
  public editing. That's kind of strange. Anyone should be able to log
  in and whip up a new page or edit an existing one, committer or
  otherwise. I didn't have access until just the other day, and Chris
  Currens doesn't have access now (I had to add him to the page
  manually).
 
  Can we open up the permissions on our wiki?
 
  Thanks,
  Troy
 
 
 
  On Wed, May 11, 2011 at 11:51 AM, Michael Herndon
  wrote:
   You never know. Personally I generally have most tech people on a list
   rather directly following them.
  
   But thanks.
  
   On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett   wrote:
  
   Retweeted. Though I doubt any of the ~100 people following me aren't in
   the 36 following him . . .
  
   On 5/11/11 2:39 PM, Michael Herndon 
  wrote:
  
   If any of you follow Hanselman on twitter, please take a second a
  retweet
   his on the lucene.net hackathon listed below or even send a thanks.
   
   Wanna get involved in Open Source? Why not help with the Lucene.NET
   HackAThon? http://hnsl.mn/lucenehackathon 
   
   Cheers,
   - Michael
   
   On Mon, May 9, 2011 at 7:12 PM, Troy Howard 
  wrote:
   
Here's the wiki page:
   
https://cwiki.apache.org/confluence/x/Go6OAQ
   
Thanks,
Troy
   
   
On Mon, May 9, 2011 at 1:59 PM, Troy Howard 
   wrote:
 Michael,

 That worked!

 I'm in the process of making a wiki page for the event now.

 Thanks,
 Troy


 On Mon, May 9, 2011 at 1:38 PM, Michael Herndon
 wrote:

 log out and log back in and verify permission changes.

 On Mon, May 9, 2011 at 4:22 PM, Troy Howard 
wrote:

  Re: I'm not sure if there is a coding difference between the C#
   stuff
and
  the other directory stuff.
 
  There are a few minor code changes in the new branch vs the C#
   branch,
but
  those are things like framework target, copyright notices, etc..
  I
didn't
  change code significantly, and unit tests still pass.
 
  Re: we can probably branch C# to something like
  pre_NewStructure
 
  I made a tag right before committing the directory changes for
  this
exact
  purpose. It's here:
 
 
 
   
   
  
  https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha
   nge
 
 
  Regarding the hackathon next week, I'd like to put together a
  list
   of
tasks
  specifically for this weekend to give people some focus on where
   they
can
  contribute. Some of these will be major tasks with high priority
   (like
  finishing up the 2.9.4 release) and others will be of lower
   priority
like
  working on the samples/wiki/website... Those will great skills
  in
creating
  GUI apps, but less skills with writing back-end libraries might
   want
to
  contribute to Luke.Net, even if it's not a high priority.
 
  I agree with Michael that we should tweet/blog/wiki/mailing list
   the
  details
  of the event. I would make a wiki page on the topic, but it
  seems I
don't
  have sufficient privileges on our 

Re: [Lucene.Net] VOTE: .NET 2.0 Framework Support After Apache Lucene.Net 2.9.4

2011-05-11 Thread Gregory Bell
+1

 Troy Howard thowar...@gmail.com 10/05/2011 7:44 AM 
My goal with moving forward to .Net 4.0 specifically, is that with 4.0
there are major improvements to the .NET GC, which we have already
found in our company's testing, improves Lucene.Net's memory
management and overall speed significantly. This is without any code
changes, just compiling for .Net 4.0 framework target vs 2.0 or 3.5...

Thanks,
Troy


On Mon, May 9, 2011 at 2:40 PM, Aaron Powell m...@aaron-powell.com wrote:
 +1

 PS: If you are supporting .NET 3.5 then you get .NET 2.0 support anyway, you 
 just have to bin-deploy the .NET 3.5 dependencies (System.Core, etc) since 
 they are all the same CLR

 Aaron Powell
 MVP - Internet Explorer (Development) | Umbraco Core Team Member | FunnelWeb 
 Team Member

 http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | MSN: 
 aaz...@hotmail.com

 -Original Message-
 From: Troy Howard [mailto:thowar...@gmail.com]
 Sent: Tuesday, 10 May 2011 6:05 AM
 To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
 Subject: [Lucene.Net] VOTE: .NET 2.0 Framework Support After Apache 
 Lucene.Net 2.9.4

 All,

 Please cast your votes regarding the topic of .Net Framework support.

 The question on the table is:

 Should Apache Lucene.Net 2.9.4 be the last release which supports the .Net 
 2.0 Framework?

 Some options are:

 [+1] - Yes, move forward to the latest .Net Framework version, and drop 
 support for 2.0 completely. New features and performance are more important 
 than backwards compatibility.
 [0] - Yes, focus on the latest .Net Framework, but also include patches 
 and/or preprocessor directives and conditional compilation blocks to include 
 support for 2.0 when needed. New features, performance, and backwards 
 compatibility are all equally important and it's worth the additional 
 complexity and coding work to meet all of those goals.
 [-1] No, .Net Framework 2.0 should remain our target platform. Backwards 
 compatibility is more important than new features and performance.


 This vote is not limited to the Apache Lucene.Net IPMC. All 
 users/contributors/committers/mailing list lurkers are welcome to cast their 
 votes with an equal weight. This has been cross posted to both the dev and 
 user mailing lists.

 Thanks,
 Troy




segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Hello,

I've updated our software stack from Python 2.6.6 to Python 2.7.1. Since
the update I'm seeing random segfaults all related to
JCCEnv::deleteGlobalRef() and Python's GC. At first I thought the bug is
an incompatibility between Python 2.7 and JCC 2.7. However an update to
JCC 2.8 and Lucence 3.1.0 didn't resolve my issue.

So far all segfaults have the same pattern. The creation or removal of a
Python object triggers a cyclic GC run which runs into
t_JObject_dealloc() and crashes inside JCCEnv::deleteGlobalRef(). At
least some of the crashing code paths run inside threads with an
attached JCC thread.

(gdb) bt
#10 signal handler called
#11 0x2ba7deb380c9 in JCCEnv::deleteGlobalRef(_jobject*, int) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/JCC-2.8-py2.7-linux-x86_64.egg/libjcc.so
#12 0x2ba7de36c649 in t_JObject_dealloc(t_JObject*) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so
#13 0x2ba7cee851eb in dict_dealloc (mp=0x9975720) at
Objects/dictobject.c:985
#14 0x2ba7cee86edb in PyDict_Clear (op=value optimized out) at
Objects/dictobject.c:891
#15 0x2ba7cee86f49 in dict_tp_clear (op=0x3) at
Objects/dictobject.c:2088
#16 0x2ba7cef27b7e in delete_garbage (generation=value optimized
out) at Modules/gcmodule.c:769
#17 collect (generation=value optimized out) at Modules/gcmodule.c:930
#18 0x2ba7cef283ae in collect_generations (basicsize=value
optimized out) at Modules/gcmodule.c:996
#19 _PyObject_GC_Malloc (basicsize=value optimized out) at
Modules/gcmodule.c:1457
#20 0x2ba7cef2844d in _PyObject_GC_New (tp=0x2ba7cf197fa0) at
Modules/gcmodule.c:1467
#21 0x2ba7cee84bbc in PyDict_New () at Objects/dictobject.c:277
#22 0x2ba7cee8b188 in _PyObject_GenericSetAttrWithDict (obj=value
optimized out, name=0x12d5ae8, value=0x7c636b0, dict=0x0)
at Objects/object.c:1510
#23 0x2ba7cee8b537 in PyObject_SetAttr (v=0x77704d0, name=0x12d5ae8,
value=0x7c636b0) at Objects/object.c:1245
#24 0x2ba7c4b4 in PyEval_EvalFrameEx (f=0x50d7520,
throwflag=value optimized out) at Python/ceval.c:2003
#25 0x2ba7ceef28b8 in PyEval_EvalCodeEx (co=0x2199ab0,
globals=value optimized out, locals=value optimized out,
args=0x8bd7b58,
(gdb) select-frame 24
(gdb) pyframe
/opt/vlspy27/lib/python2.7/site-packages/kinterbasdb-3.3.0-py2.7-linux-x86_64.egg/kinterbasdb/__init__.py
(1499): __init__


class _RowMapping(object):
def __init__(self, description, row):
self._description = description
fields = self._fields = {} # -- 1499
pos = 0

(gdb) bt
#11 0x2ba90298b0c9 in JCCEnv::deleteGlobalRef(_jobject*, int) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/JCC-2.8-py2.7-linux-x86_64.egg/libjcc.so
#12 0x2ba9021bf649 in t_JObject_dealloc(t_JObject*) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so
#13 0x2ba8f2cd81eb in dict_dealloc (mp=0x105df800) at
Objects/dictobject.c:985
#14 0x2ba8f2cd9edb in PyDict_Clear (op=value optimized out) at
Objects/dictobject.c:891
#15 0x2ba8f2cd9f49 in dict_tp_clear (op=0x3) at
Objects/dictobject.c:2088
#16 0x2ba8f2d7ab7e in delete_garbage (generation=value optimized
out) at Modules/gcmodule.c:769
#17 collect (generation=value optimized out) at Modules/gcmodule.c:930
#18 0x2ba8f2d7b3ae in collect_generations (basicsize=value
optimized out) at Modules/gcmodule.c:996
#19 _PyObject_GC_Malloc (basicsize=value optimized out) at
Modules/gcmodule.c:1457
#20 0x2ba8f2d7b44d in _PyObject_GC_New (tp=0x2ba8f2fddfc0) at
Modules/gcmodule.c:1467
#21 0x2ba8f2cb0aa8 in PyWrapper_New (d=0x1e5e140,
self=0x2ba9242509e0) at Objects/descrobject.c:1051
#22 0x2ba8f2cb0be3 in wrapperdescr_call (descr=0x1e5e140,
args=0x28f87520, kwds=0x0) at Objects/descrobject.c:296
#23 0x2ba8f2c93533 in PyObject_Call (func=0x1e5e140,
arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529
#24 0x2ba8f89b8d6c in __pyx_pf_4lxml_5etree_9_ErrorLog___init__
(__pyx_v_self=0x229db820, __pyx_args=value optimized out,
__pyx_kwds=value optimized out) at src/lxml/lxml.etree.c:28498
#25 0x2ba8f2cf6068 in type_call (type=value optimized out,
args=0x2ba8f3c64050, kwds=0x0) at Objects/typeobject.c:728
#26 0x2ba8f2c93533 in PyObject_Call (func=0x2ba8f8cbb1e0,
arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529
#27 0x2ba8f89b91c0 in
__pyx_pf_4lxml_5etree_19_XPathEvaluatorBase___cinit__
(__pyx_v_self=0x6c5cdb8, __pyx_args=value optimized out,
__pyx_kwds=value optimized out) at src/lxml/lxml.etree.c:111873
#28 0x2ba8f89bcb7c in __pyx_tp_new_4lxml_5etree__XPathEvaluatorBase
(t=value optimized out, a=value optimized out,
k=value optimized out) at src/lxml/lxml.etree.c:149259
#29 __pyx_tp_new_4lxml_5etree_XPath (t=value optimized out, a=value
optimized out, k=value optimized out)
at src/lxml/lxml.etree.c:18769
#30 0x2ba8f2cf6023 in type_call (type=0x3, args=0x20515510,

Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 17:36, schrieb Andi Vajda:
 As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary
 to my initial assumption, the thread doesn't have a JCC thread local
 object. Since any thread may trigger a GC collect run, and not just
 threads, that use JCC, this looks like a bug in JCC to me.
 
 Any thread that is going to call into the JVM must call attachCurrentThread() 
 first. This includes a thread doing GC of object wrapping java refs which it 
 is going to delete.

I'm well aware of requirement to call attachCurrentThread() in every
thread that uses wrapped objects. This segfault is not caused by passing
JVM objects between threads explicitly. It's Python's cyclic GC that
breaks and collects reference cyclic with JVM objects in random threads.

Something in Python 2.7's gc must have been altered to increase the
chance, that a cyclic GC collect run is started inside a thread that
isn't attached to the JVM. As far as I know the implementation of
Python's cyclic GC detection, it's not possible to restrict the cyclic
GC to some threads. So any unattached thread that creates objects, that
are allocated with _PyObject_GC_New(), has a chance to trigger the
segfault. Almost all Python objects are using _PyObject_GC_New(). Only
very simple types like str, int, that can't reference other objects, are
not tracked. Everything else (including bound methods of simple types)
is tracked.

In a few words: Any unattached thread has the chance to crash the
interpreter unless the code is very, very limited. This can be easily
reproduced with a small script:

---
import lucene
import threading
import time
import gc

lucene.initVM()

def alloc():
while 1:
gc.collect()
time.sleep(0.011)

t = threading.Thread(target=alloc)
t.daemon = True

t.start()

while 1:
obj = {}
# create cycle
obj[obj] = obj
obj[jcc] = lucene.JArray('object')(1, lucene.File)
time.sleep(0.001)

---

I wonder, why it wasn't noticed earlier.

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 18:14, schrieb Christian Heimes:
 ---
 import lucene
 import threading
 import time
 import gc
 
 lucene.initVM()
 
 def alloc():
 while 1:
 gc.collect()
 time.sleep(0.011)
 
 t = threading.Thread(target=alloc)
 t.daemon = True
 
 t.start()
 
 while 1:
 obj = {}
 # create cycle
 obj[obj] = obj
 obj[jcc] = lucene.JArray('object')(1, lucene.File)
 time.sleep(0.001)
 
 ---

The example crashes also with functions like but it takes a bit longer

def alloc():
while 1:
a = {}, {}, {}, {}, {}, {}
time.sleep(0.011)

def alloc():
while 1:
# create 500 bound methods to exceed PyMethod_MAXFREELIST 256
methods = []
for i in xrange(500):
methods.append(str(abc).strip)
time.sleep(0.011)

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Andi Vajda



On Wed, 11 May 2011, Christian Heimes wrote:


Am 11.05.2011 17:36, schrieb Andi Vajda:

As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary
to my initial assumption, the thread doesn't have a JCC thread local
object. Since any thread may trigger a GC collect run, and not just
threads, that use JCC, this looks like a bug in JCC to me.


Any thread that is going to call into the JVM must call attachCurrentThread() 
first. This includes a thread doing GC of object wrapping java refs which it is 
going to delete.


I'm well aware of requirement to call attachCurrentThread() in every
thread that uses wrapped objects. This segfault is not caused by passing
JVM objects between threads explicitly. It's Python's cyclic GC that
breaks and collects reference cyclic with JVM objects in random threads.


There shouldn't be any random threads. Threads don't just appear out of thin 
air. You create them. If there is a chance that they call into the JVM, then 
attachCurrentThread().



Something in Python 2.7's gc must have been altered to increase the
chance, that a cyclic GC collect run is started inside a thread that
isn't attached to the JVM. As far as I know the implementation of
Python's cyclic GC detection, it's not possible to restrict the cyclic
GC to some threads. So any unattached thread that creates objects, that
are allocated with _PyObject_GC_New(), has a chance to trigger the
segfault. Almost all Python objects are using _PyObject_GC_New(). Only
very simple types like str, int, that can't reference other objects, are
not tracked. Everything else (including bound methods of simple types)
is tracked.

In a few words: Any unattached thread has the chance to crash the
interpreter unless the code is very, very limited. This can be easily
reproduced with a small script:

---
import lucene
import threading
import time
import gc

lucene.initVM()

def alloc():
   while 1:
   gc.collect()
   time.sleep(0.011)

t = threading.Thread(target=alloc)
t.daemon = True

t.start()

while 1:
   obj = {}
   # create cycle
   obj[obj] = obj
   obj[jcc] = lucene.JArray('object')(1, lucene.File)
   time.sleep(0.001)

---

I wonder, why it wasn't noticed earlier.


Did anything else change in your application besides the Python version ?
32-bit to 64-bit ? (more memory used, more frequent GCs)
Something in the code ?

Andi..


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Andi Vajda


On Wed, 11 May 2011, Christian Heimes wrote:


Am 11.05.2011 18:14, schrieb Christian Heimes:

---
import lucene
import threading
import time
import gc

lucene.initVM()

def alloc():
while 1:
gc.collect()
time.sleep(0.011)

t = threading.Thread(target=alloc)
t.daemon = True

t.start()

while 1:
obj = {}
# create cycle
obj[obj] = obj
obj[jcc] = lucene.JArray('object')(1, lucene.File)
time.sleep(0.001)

---


The example crashes also with functions like but it takes a bit longer

def alloc():
   while 1:
   a = {}, {}, {}, {}, {}, {}
   time.sleep(0.011)

def alloc():
   while 1:
   # create 500 bound methods to exceed PyMethod_MAXFREELIST 256
   methods = []
   for i in xrange(500):
   methods.append(str(abc).strip)
   time.sleep(0.011)


Does it crash as easily with Python 2.6 ?
If not, then that could be an answer as to why this wasn't noticed before.

Andi..


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
 There shouldn't be any random threads. Threads don't just appear out of thin 
 air. You create them. If there is a chance that they call into the JVM, then 
 attachCurrentThread().

I've already made sure, that all our code and threads are calling a
hook, which attaches the thread to the JVM. But I don't have control
over all threads. Some threads are created in third party libraries. I
would have to check and patch every third party tool, we are using.

 I wonder, why it wasn't noticed earlier.
 
 Did anything else change in your application besides the Python version ?
 32-bit to 64-bit ? (more memory used, more frequent GCs)
 Something in the code ?

I done testing with the same code base on a single machine. The Python
2.7 branch of our application just has a few changes like python2.6 -
python2.7. Nothing else is different. JCC and Lucence are compiled from
the very same tar ball with the same version of GCC. We had very few
segfaults in our test suite over the past months (more than five test
runs every day, less than one crash per week). With Python 2.7 I'm
seeing crashes three of five test runs.

The example code crashes both Python 2.6.6. + JCC 2.7 + PyLucene 3.0.3
and Python 2.7.1 + JCC 2.8 + PyLucene 3.1.0 on my laptop (Ubuntu 10.10
X86_64).

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 18:27, schrieb Andi Vajda:
 Does it crash as easily with Python 2.6 ?
 If not, then that could be an answer as to why this wasn't noticed before.

With 20 test samples, it seems like Python 2.6 survives 50% longer than
Python 2.7.

python2.6
0, 1.089
1, 2.688
2, 1.066
3, 6.416
4, 0.921
5, 1.859
6, 0.896
7, 0.910
8, 1.851
9, 1.042
10, 1.110
11, 1.040
12, 1.072
13, 1.825
14, 3.720
15, 1.822
16, 0.983
17, 1.931
18, 0.998
19, 1.105
cnt: 20, min: 0.896, max: 6.416, avg: 1.717

python2.7
0, 1.795
1, 0.953
2, 1.802
3, 1.022
4, 0.906
5, 1.841
6, 1.080
7, 0.958
8, 1.110
9, 0.924
10, 0.894
11, 1.958
12, 0.898
13, 1.846
14, 0.936
15, 1.859
16, 1.036
17, 1.092
18, 0.920
19, 0.949
cnt: 20, min: 0.894, max: 1.958, avg: 1.239
import subprocess
from time import time
log = open(log.txt, w)
cnt = 100

for py in (python2.6, python2.7):
log.write(py + \n)
dur = []
for i in range(cnt):
start = time()
subprocess.call([python2.6, cyclic.py])
run = time() - start
dur.append(run)
log.write(%i, %0.3f\n % (i, run))
print i
log.write(cnt: %i, min: %0.3f, max: %0.3f, avg: %0.3f\n\n % 
  (cnt, min(dur), max(dur), sum(dur) / cnt)) 



import lucene
import threading
import time
import gc

lucene.initVM()

def alloc():
while 1:
a = {}, {}, {}, {}, {}, {}
time.sleep(0.011)

t = threading.Thread(target=alloc)
t.daemon = True

t.start()

while 1:
obj = {}
# create cycle
obj[obj] = obj
obj[jcc] = lucene.JArray('object')(1, lucene.File)
time.sleep(0.001)


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 18:26, schrieb Andi Vajda:
 There shouldn't be any random threads. Threads don't just appear out of thin 
 air. You create them. If there is a chance that they call into the JVM, then 
 attachCurrentThread().

I've already made sure, that all our code and threads are calling a
hook, which attaches the thread to the JVM. But I don't have control
over all threads. Some threads are created in third party libraries. I
would have to check and patch every third party tool, we are using.

 I wonder, why it wasn't noticed earlier.
 
 Did anything else change in your application besides the Python version ?
 32-bit to 64-bit ? (more memory used, more frequent GCs)
 Something in the code ?

I done testing with the same code base on a single machine. The Python
2.7 branch of our application just has a few changes like python2.6 -
python2.7. Nothing else is different. JCC and Lucence are compiled from
the very same tar ball with the same version of GCC. We had very few
segfaults in our test suite over the past months (more than five test
runs every day, less than one crash per week). With Python 2.7 I'm
seeing crashes three of five test runs.

The example code crashes both Python 2.6.6. + JCC 2.7 + PyLucene 3.0.3
and Python 2.7.1 + JCC 2.8 + PyLucene 3.1.0 on my laptop (Ubuntu 10.10
X86_64).

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 19:03, schrieb Andi Vajda:
 If these libraries use Python's Thread class you have some control.
 
 Create a subclass of Thread that runs your hook and insert it into the 
 threading module (threading.Thread = YourThreadSubclass) before anyone else 
 gets a chance to create threads.

One library is using thread.start_new_thread() and another uses Python's
C API to create an internal monitor thread. This makes it even harder to
fix the issue.

How would you feel about another approach?

* factor out the attach routine of t_jccenv_attachCurrentThread() as C
function

int jccenv_attachCurrentThread(char *name, int asDaemon) {
JNIEnv *jenv = NULL;

JavaVMAttachArgs attach = {
JNI_VERSION_1_4, name, NULL
};

if (asDaemon)
result = env-vm-AttachCurrentThreadAsDaemon((void **) jenv,
attach);
else
result = env-vm-AttachCurrentThread((void **) jenv, attach);

env-set_vm_env(jenv);

return result;
}


* modify JCCEnv::deleteGlobalRef() to check get_vm_env() for NULL

if (iter-second.count == 1)
{
JNIEnv *vm_env = get_vm_env()
if (!vm_env) {
jccenv_attachCurrentThread(NULL, 0);
vm_env = get_vm_env();
}
vm_env-DeleteGlobalRef(iter-second.global);
refs.erase(iter);
}

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 19:41, schrieb Andi Vajda:
 If these functions eventually instantiate a Thread class, even indirectly, 
 the monkey-patching may still work.

Some of the code doesn't use the threading module at all, just thread or
the internal C API. I'd have to patch the modules and C code.

 That may cover this case but what about all the others ?
 There is a reason the call has to be manual.
 
 I've not been able to automate it before.
 Over time, I've added checks where I could but I've not found it possible to 
 cover all cases where attachCurrentThread() wasn't called.
 
 Anyhow, try it and see if it fixes the problem you're seeing.
 If any of the objects being freed invoke user code that eventually call into 
 the JVM, the problem is going to appear again elsewhere.

I understand your reluctance to automate the attaching of Python threads
to the JVM. Explicit is better than implicit. However this is a special
case. CPython doesn't allow to control cyclic garbage collector's
threading attachment nor does CPython have a hook that is called for
newly created threads. It's hard to debug a segfault when even code like
a = [] can trigger the bug.

The attached patch doesn't trigger the bug in my artificial test code.
I'm going to run our test suite several times. That's going to take a
while.

Christian
Index: jcc/sources/jcc.cpp
===
--- jcc/sources/jcc.cpp	(Revision 1088091)
+++ jcc/sources/jcc.cpp	(Arbeitskopie)
@@ -33,6 +33,25 @@
 
 /* JCCEnv */
 
+int jccenv_attachCurrentThread(char *name, int asDaemon)
+{
+	int result;
+JNIEnv *jenv = NULL;
+
+JavaVMAttachArgs attach = {
+JNI_VERSION_1_4, name, NULL
+};
+
+if (asDaemon)
+result = env-vm-AttachCurrentThreadAsDaemon((void **) jenv, attach);
+else
+result = env-vm-AttachCurrentThread((void **) jenv, attach);
+
+env-set_vm_env(jenv);
+
+return result;
+}
+
 class t_jccenv {
 public:
 PyObject_HEAD
@@ -154,21 +173,11 @@
 {
 char *name = NULL;
 int asDaemon = 0, result;
-JNIEnv *jenv = NULL;
 
 if (!PyArg_ParseTuple(args, |si, name, asDaemon))
 return NULL;
 
-JavaVMAttachArgs attach = {
-JNI_VERSION_1_4, name, NULL
-};
-
-if (asDaemon)
-result = env-vm-AttachCurrentThreadAsDaemon((void **) jenv, attach);
-else
-result = env-vm-AttachCurrentThread((void **) jenv, attach);
-
-env-set_vm_env(jenv);
+result = jccenv_attachCurrentThread(name, asDaemon);
 
 return PyInt_FromLong(result);
 }
Index: jcc/sources/JCCEnv.cpp
===
--- jcc/sources/JCCEnv.cpp	(Revision 1088091)
+++ jcc/sources/JCCEnv.cpp	(Arbeitskopie)
@@ -318,6 +318,16 @@
 {
 if (iter-second.count == 1)
 {
+JNIEnv *vm_env = get_vm_env();
+if (!vm_env)
+{
+/* Python's cyclic garbage collector may remove
+ * an object inside a thread that is not attached
+ * to the JVM. This makes sure JCC doesn't segfault.
+ */
+jccenv_attachCurrentThread(NULL, 0);
+vm_env = get_vm_env();
+}
 get_vm_env()-DeleteGlobalRef(iter-second.global);
 refs.erase(iter);
 }
Index: jcc/sources/JCCEnv.h
===
--- jcc/sources/JCCEnv.h	(Revision 1088091)
+++ jcc/sources/JCCEnv.h	(Arbeitskopie)
@@ -72,6 +72,8 @@
 
 typedef jclass (*getclassfn)(void);
 
+int jccenv_attachCurrentThread(char *name, int asDaemon);
+
 class countedRef {
 public:
 jobject global;


[jira] [Created] (SOLR-2508) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102

2011-05-11 Thread Xing Li (JIRA)
Master/Slave replication can leave slave in inconsistent state of  
NullPointerException in solrHighligher.java 102
--

 Key: SOLR-2508
 URL: https://issues.apache.org/jira/browse/SOLR-2508
 Project: Solr
  Issue Type: Bug
  Components: highlighter, replication (java)
Affects Versions: 4.0
 Environment: Centos 5.6 with Java1.7.0b137
Reporter: Xing Li


Using Solr 4/Trunk snapshot build of 5/10/2011. 

Setup:
--
1) 1 Master + 4 Slaves
2) Multicore setup with 8 cores.
3) Replication Poll Interval: 00:30:20

Summary of Issue:
---
When a slave completes a replication pull from master, it will complete the 
data index pull but 
based on logs it appears subsequent index warming and other actions post 
replication 
cleanup leaves the core/db in an inconsistent state.

Frequency of occurrence: Very high but not 100%. I have 1 master and 4 slaves 
and for each replication 
pull cycle, around 50% of the gets affected. Each slave has 8 multi-cores but
the problem always affects this particular mysolr_blogs db/core.

Please note the mysolr_blogs data index is 1.4GB and the largest of the 8 by 
a wide margin.

Attached is the schema.xml and solrconfig.xml for the mysolr_blogs core.


Temp fix:
-
1) Stop and restart the solr server when this happens.
2) Stop using automatic replication on this core.


Logging:
-

* begins automatic replication  pull

{code}
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1302675975227, generation: 694
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1302675975222, generation: 692
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Number of files in latest index in master: 10
{code}

* 65 seconds past and I cut out the query logs in between. Here it's pulling 
the 1.4GB mysolr_blogs index data. 

{code}
May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller downloadIndexFiles
INFO: Skipping download for 
/db/solr-master/multicore/mysolr_blogs/data/index/1.fnx
May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 65 secs
May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
INFO: [mysolr_users] webapp=/solr path=/select 
params={sort=indent=offstart=0q=%2Buname:inlove*q.op=andhl.fl=*facet.field=pcategoryidfacet.field=categoryidfacet.field=languageidwt=jsonhl=truerows=51}
 hits=0 status=0 QTime=1 
May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 
QTime=0 
May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 status=0 
QTime=0 
May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@4f83f9df main
May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@4f83f9df main from Searcher@5f7808af main


[jira] [Updated] (SOLR-2508) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102

2011-05-11 Thread Xing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xing Li updated SOLR-2508:
--

Attachment: (was: schema.xml)

 Master/Slave replication can leave slave in inconsistent state of  
 NullPointerException in solrHighligher.java 102
 --

 Key: SOLR-2508
 URL: https://issues.apache.org/jira/browse/SOLR-2508
 Project: Solr
  Issue Type: Bug
  Components: highlighter, replication (java)
Affects Versions: 4.0
 Environment: Centos 5.6 with Java1.7.0b137
Reporter: Xing Li

 Using Solr 4/Trunk snapshot build of 5/10/2011. 
 Setup:
 --
 1) 1 Master + 4 Slaves
 2) Multicore setup with 8 cores.
 3) Replication Poll Interval: 00:30:20
 Summary of Issue:
 ---
 When a slave completes a replication pull from master, it will complete the 
 data index pull but 
 based on logs it appears subsequent index warming and other actions post 
 replication 
 cleanup leaves the core/db in an inconsistent state.
 Frequency of occurrence: Very high but not 100%. I have 1 master and 4 slaves 
 and for each replication 
 pull cycle, around 50% of the gets affected. Each slave has 8 multi-cores but
 the problem always affects this particular mysolr_blogs db/core.
 Please note the mysolr_blogs data index is 1.4GB and the largest of the 8 
 by a wide margin.
 Attached is the schema.xml and solrconfig.xml for the mysolr_blogs core.
 Temp fix:
 -
 1) Stop and restart the solr server when this happens.
 2) Stop using automatic replication on this core.
 Logging:
 -
 * begins automatic replication  pull
 {code}
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Master's version: 1302675975227, generation: 694
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave's version: 1302675975222, generation: 692
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Starting replication process
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Number of files in latest index in master: 10
 {code}
 * 65 seconds past and I cut out the query logs in between. Here it's pulling 
 the 1.4GB mysolr_blogs index data. 
 {code}
 May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller downloadIndexFiles
 INFO: Skipping download for 
 /db/solr-master/multicore/mysolr_blogs/data/index/1.fnx
 May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Total time taken for download : 65 secs
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_users] webapp=/solr path=/select 
 params={sort=indent=offstart=0q=%2Buname:inlove*q.op=andhl.fl=*facet.field=pcategoryidfacet.field=categoryidfacet.field=languageidwt=jsonhl=truerows=51}
  hits=0 status=0 QTime=1 
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 
 status=0 QTime=0 
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 
 status=0 QTime=0 
 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start 
 commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
 May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher init
 INFO: Opening Searcher@4f83f9df main
 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: end_commit_flush
 May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming Searcher@4f83f9df main from Searcher@5f7808af main
   
 

[jira] [Updated] (SOLR-2508) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102

2011-05-11 Thread Xing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xing Li updated SOLR-2508:
--

Attachment: solrconfig.xml
schema.xml

 Master/Slave replication can leave slave in inconsistent state of  
 NullPointerException in solrHighligher.java 102
 --

 Key: SOLR-2508
 URL: https://issues.apache.org/jira/browse/SOLR-2508
 Project: Solr
  Issue Type: Bug
  Components: highlighter, replication (java)
Affects Versions: 4.0
 Environment: Centos 5.6 with Java1.7.0b137
Reporter: Xing Li

 Using Solr 4/Trunk snapshot build of 5/10/2011. 
 Setup:
 --
 1) 1 Master + 4 Slaves
 2) Multicore setup with 8 cores.
 3) Replication Poll Interval: 00:30:20
 Summary of Issue:
 ---
 When a slave completes a replication pull from master, it will complete the 
 data index pull but 
 based on logs it appears subsequent index warming and other actions post 
 replication 
 cleanup leaves the core/db in an inconsistent state.
 Frequency of occurrence: Very high but not 100%. I have 1 master and 4 slaves 
 and for each replication 
 pull cycle, around 50% of the gets affected. Each slave has 8 multi-cores but
 the problem always affects this particular mysolr_blogs db/core.
 Please note the mysolr_blogs data index is 1.4GB and the largest of the 8 
 by a wide margin.
 Attached is the schema.xml and solrconfig.xml for the mysolr_blogs core.
 Temp fix:
 -
 1) Stop and restart the solr server when this happens.
 2) Stop using automatic replication on this core.
 Logging:
 -
 * begins automatic replication  pull
 {code}
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Master's version: 1302675975227, generation: 694
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave's version: 1302675975222, generation: 692
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Starting replication process
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Number of files in latest index in master: 10
 {code}
 * 65 seconds past and I cut out the query logs in between. Here it's pulling 
 the 1.4GB mysolr_blogs index data. 
 {code}
 May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller downloadIndexFiles
 INFO: Skipping download for 
 /db/solr-master/multicore/mysolr_blogs/data/index/1.fnx
 May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Total time taken for download : 65 secs
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_users] webapp=/solr path=/select 
 params={sort=indent=offstart=0q=%2Buname:inlove*q.op=andhl.fl=*facet.field=pcategoryidfacet.field=categoryidfacet.field=languageidwt=jsonhl=truerows=51}
  hits=0 status=0 QTime=1 
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 
 status=0 QTime=0 
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 
 status=0 QTime=0 
 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start 
 commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
 May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher init
 INFO: Opening Searcher@4f83f9df main
 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: end_commit_flush
 May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming Searcher@4f83f9df main from Searcher@5f7808af main
   
 

[jira] [Commented] (SOLR-2508) Master/Slave replication can leave slave in inconsistent state of NullPointerException in solrHighligher.java 102

2011-05-11 Thread Xing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031590#comment-13031590
 ] 

Xing Li commented on SOLR-2508:
---

Got the problem more isolated. 

The queries effected are those using hl.fl=* with hl=true. 

All queries run fine until something in the post replication triggers failures 
of NullPointerException in solrHighligher.java 102 when wildcard is used for 
highlighting field selection hl.fl=*. Replacing those queries with a specific 
highlight field such as hl.fl=uname will then make the query work post 
sudden failure. 

 

 Master/Slave replication can leave slave in inconsistent state of  
 NullPointerException in solrHighligher.java 102
 --

 Key: SOLR-2508
 URL: https://issues.apache.org/jira/browse/SOLR-2508
 Project: Solr
  Issue Type: Bug
  Components: highlighter, replication (java)
Affects Versions: 4.0
 Environment: Centos 5.6 with Java1.7.0b137
Reporter: Xing Li
 Attachments: schema.xml, solrconfig.xml


 Using Solr 4/Trunk snapshot build of 5/10/2011. 
 Setup:
 --
 1) 1 Master + 4 Slaves
 2) Multicore setup with 8 cores.
 3) Replication Poll Interval: 00:30:20
 Summary of Issue:
 ---
 When a slave completes a replication pull from master, it will complete the 
 data index pull but 
 based on logs it appears subsequent index warming and other actions post 
 replication 
 cleanup leaves the core/db in an inconsistent state.
 Frequency of occurrence: Very high but not 100%. I have 1 master and 4 slaves 
 and for each replication 
 pull cycle, around 50% of the gets affected. Each slave has 8 multi-cores but
 the problem always affects this particular mysolr_blogs db/core.
 Please note the mysolr_blogs data index is 1.4GB and the largest of the 8 
 by a wide margin.
 Attached is the schema.xml and solrconfig.xml for the mysolr_blogs core.
 Temp fix:
 -
 1) Stop and restart the solr server when this happens.
 2) Stop using automatic replication on this core.
 Logging:
 -
 * begins automatic replication  pull
 {code}
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Master's version: 1302675975227, generation: 694
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave in sync with master.
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Slave's version: 1302675975222, generation: 692
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Starting replication process
 May 10, 2011 10:17:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Number of files in latest index in master: 10
 {code}
 * 65 seconds past and I cut out the query logs in between. Here it's pulling 
 the 1.4GB mysolr_blogs index data. 
 {code}
 May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller downloadIndexFiles
 INFO: Skipping download for 
 /db/solr-master/multicore/mysolr_blogs/data/index/1.fnx
 May 10, 2011 10:18:45 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 INFO: Total time taken for download : 65 secs
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_users] webapp=/solr path=/select 
 params={sort=indent=offstart=0q=%2Buname:inlove*q.op=andhl.fl=*facet.field=pcategoryidfacet.field=categoryidfacet.field=languageidwt=jsonhl=truerows=51}
  hits=0 status=0 QTime=1 
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 
 status=0 QTime=0 
 May 10, 2011 10:18:45 PM org.apache.solr.core.SolrCore execute
 INFO: [mysolr_blogs] webapp=/solr path=/select/ params={q=solr} hits=0 
 status=0 QTime=0 
 May 10, 2011 10:18:46 PM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start 
 commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
 May 10, 2011 10:18:46 PM org.apache.solr.search.SolrIndexSearcher init
 INFO: Opening Searcher@4f83f9df main
 May 10, 2011 10:18:46 PM 

[jira] [Updated] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos

2011-05-11 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2984:


Attachment: LUCENE-2984.patch

here is a new patch that should fix selckins failur. I added javadoc, some 
comments and TODOs to remove the hasProx hasVector flags once we don't need to 
support it anymore.

I also added a testcase for the vector flags in the exception case.

 Move hasVectors()  hasProx() responsibility out of SegmentInfo to FieldInfos 
 --

 Key: LUCENE-2984
 URL: https://issues.apache.org/jira/browse/LUCENE-2984
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2984.patch, LUCENE-2984.patch


 Spin-off from LUCENE-2881 which had this change already but due to some 
 random failures related to this change I remove this part of the patch to 
 make it more isolated and easier to test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2448) Upgrade Carrot2 to version 3.5.0

2011-05-11 Thread Stanislaw Osinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislaw Osinski updated SOLR-2448:


Attachment: (was: SOLR-2448-2449-2450-2505-trunk.zip)

 Upgrade Carrot2 to version 3.5.0
 

 Key: SOLR-2448
 URL: https://issues.apache.org/jira/browse/SOLR-2448
 Project: Solr
  Issue Type: Task
  Components: contrib - Clustering
Reporter: Stanislaw Osinski
Assignee: Stanislaw Osinski
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2448-2449-2450-2505-branch_3x.patch, 
 SOLR-2448-2449-2450-2505-trunk.patch, carrot2-core-3.5.0.jar


 Carrot2 version 3.5.0 should be available very soon. After the upgrade, it 
 will be possible to implement a few improvements to the clustering plugin; 
 I'll file separate issues for these.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2448) Upgrade Carrot2 to version 3.5.0

2011-05-11 Thread Stanislaw Osinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislaw Osinski updated SOLR-2448:


Attachment: carrot2-core-3.5.0.jar
SOLR-2448-2449-2450-2505-trunk.patch
SOLR-2448-2449-2450-2505-branch_3x.patch

Hi, here's another set of patches (svn this time) against trunk and branch_3x. 
I've corrected Maven configs and checked that the project builds fine using mvn 
install.

After applying the patches you'd need to manually update the JARs:

In trunk, delete:

trunk/solr/contrib/clustering/lib/carrot2-core-3.4.2.jar
trunk/solr/contrib/clustering/lib/hppc-0.3.1.jar

and replace them with new versions:

http://repo1.maven.org/maven2/org/carrot2/carrot2-core/3.5.0/carrot2-core-3.5.0.jar
http://repo1.maven.org/maven2/com/carrotsearch/hppc/0.3.3/hppc-0.3.3.jar


In branch_3x, delete:

branch_3x/solr/contrib/clustering/lib/carrot2-core-3.4.2.jar
branch_3x/solr/contrib/clustering/lib/hppc-0.3.1.jar

and replace them with new versions:

carrot2-core-3.5.0.jar attached (jdk15 backport)
http://repo1.maven.org/maven2/com/carrotsearch/hppc/0.3.4/hppc-0.3.4-jdk15.jar


It'd be great if someone could review these before I make the commit.

Thanks!

S.

 Upgrade Carrot2 to version 3.5.0
 

 Key: SOLR-2448
 URL: https://issues.apache.org/jira/browse/SOLR-2448
 Project: Solr
  Issue Type: Task
  Components: contrib - Clustering
Reporter: Stanislaw Osinski
Assignee: Stanislaw Osinski
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2448-2449-2450-2505-branch_3x.patch, 
 SOLR-2448-2449-2450-2505-trunk.patch, carrot2-core-3.5.0.jar


 Carrot2 version 3.5.0 should be available very soon. After the upgrade, it 
 will be possible to implement a few improvements to the clustering plugin; 
 I'll file separate issues for these.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-2504) Combined usage of Synonyms/SpellChecker causes java.lang.NullPointerException, when searching for a word out of synonyms.txt

2011-05-11 Thread Jens Bertheau (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Bertheau closed SOLR-2504.
---


 Combined usage of Synonyms/SpellChecker causes 
 java.lang.NullPointerException, when searching for a word out of synonyms.txt
 

 Key: SOLR-2504
 URL: https://issues.apache.org/jira/browse/SOLR-2504
 Project: Solr
  Issue Type: Bug
  Components: clients - java, spellchecker
Affects Versions: 3.1
Reporter: Jens Bertheau
Assignee: Uwe Schindler

 After migrating from 1.4 to 3.1 we experience the following behaviour:
 When SpellChecking is turned off, everything works fine.
 When Synonyms are *not* being used, everything works fine.
 When both, SpellChecking and Synonyms, are being used and a search is 
 triggered, that contains at least one of the words out of synonyms.txt the 
 following error is thrown:
 java.lang.NullPointerException
 at 
 org.apache.lucene.util.AttributeSource.cloneAttributes(AttributeSource.java:542)
 at 
 org.apache.solr.analysis.SynonymFilter.incrementToken(SynonymFilter.java:132)
 at 
 org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:58)
 at 
 org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:485)
 at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
 at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
 at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
 at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
 at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
 at java.lang.Thread.run(Thread.java:619)
 The problem has been described already here:
 http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00945.html
 I have a report of a third person, experiencing the same problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3087) highlighting exact phrase with overlapping tokens fails.

2011-05-11 Thread JIRA
highlighting exact phrase with overlapping tokens fails.


 Key: LUCENE-3087
 URL: https://issues.apache.org/jira/browse/LUCENE-3087
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 3.1, 2.9.4
Reporter: Pierre Gossé
Priority: Minor


Fields with overlapping token are not highlighted in search results when 
searching exact phrases, when using TermVector.WITH_OFFSET.

The document builded in MemoryIndex for highlight does not preserve positions 
of tokens in this case. Overlapping tokens get flattened (position increment 
always set to 1), the spanquery used for searching relevant fragment will fail 
to identify the correct token sequence because the position shift.

I corrected this by adding a position increment calculation in sub class 
StoredTokenStream. I added junit test covering this case.

I used the eclipse codestyle from trunk, but style add quite a few format 
differences between repository and working copy files. I tried to reduce them, 
but some linewrapping rules still doesn't match.

Correction patch joined

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3087) highlighting exact phrase with overlapping tokens fails.

2011-05-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/LUCENE-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Gossé updated LUCENE-3087:
-

Attachment: LUCENE-3087.patch

correction patch with junit tests

 highlighting exact phrase with overlapping tokens fails.
 

 Key: LUCENE-3087
 URL: https://issues.apache.org/jira/browse/LUCENE-3087
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9.4, 3.1
Reporter: Pierre Gossé
Priority: Minor
 Attachments: LUCENE-3087.patch


 Fields with overlapping token are not highlighted in search results when 
 searching exact phrases, when using TermVector.WITH_OFFSET.
 The document builded in MemoryIndex for highlight does not preserve positions 
 of tokens in this case. Overlapping tokens get flattened (position 
 increment always set to 1), the spanquery used for searching relevant 
 fragment will fail to identify the correct token sequence because the 
 position shift.
 I corrected this by adding a position increment calculation in sub class 
 StoredTokenStream. I added junit test covering this case.
 I used the eclipse codestyle from trunk, but style add quite a few format 
 differences between repository and working copy files. I tried to reduce 
 them, but some linewrapping rules still doesn't match.
 Correction patch joined

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos

2011-05-11 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2984:


Attachment: LUCENE-2984.patch

new patch!
I was running tests with the previous patch and tripped a very nifty exception.
{noformat}
 [junit] Testsuite: org.apache.lucene.store.TestLockFactory
[junit] Testcase: 
testStressLocksNativeFSLockFactory(org.apache.lucene.store.TestLockFactory):
  FAILED
[junit] IndexWriter hit unexpected exceptions
[junit] junit.framework.AssertionFailedError: IndexWriter hit unexpected 
exceptions
[junit] at 
org.apache.lucene.store.TestLockFactory._testStressLocks(TestLockFactory.java:164)
[junit] at 
org.apache.lucene.store.TestLockFactory.testStressLocksNativeFSLockFactory(TestLockFactory.java:144)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
[junit] 
[junit] 
[junit] Tests run: 11, Failures: 1, Errors: 0, Time elapsed: 7.092 sec
[junit] 
[junit] - Standard Output ---
[junit] Stress Test Index Writer: creation hit unexpected IOException: 
java.io.FileNotFoundException: _u.fnm
[junit] java.io.FileNotFoundException: _u.fnm
[junit] at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:386)
[junit] at 
org.apache.lucene.index.FieldInfos.init(FieldInfos.java:273)
[junit] at 
org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:264)
[junit] at 
org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:315)
[junit] at 
org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:603)
[junit] at 
org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:873)
[junit] at 
org.apache.lucene.index.IndexFileDeleter$CommitPoint.init(IndexFileDeleter.java:625)
[junit] at 
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:199)
[junit] at 
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:830)
[junit] at 
org.apache.lucene.store.TestLockFactory$WriterThread.run(TestLockFactory.java:283)
[junit] Stress Test Index Writer: creation hit unexpected IOException: 
java.io.FileNotFoundException: _u.fnm
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestLockFactory 
-Dtestmethod=testStressLocksNativeFSLockFactory 
-Dtests.seed=9223296054268232625:-7758089421938554917
[junit] NOTE: test params are: codec=RandomCodecProvider: 
{content=MockFixedIntBlock(blockSize=1397)}, locale=ar_MA, 
timezone=Indian/Antananarivo
[junit] NOTE: all tests run in this JVM:
[junit] [TestDateTools, Test2BTerms, TestAddIndexes, TestFilterIndexReader, 
TestIndexWriterExceptions, TestIndexWriterMerging, TestMaxTermFrequency, 
TestParallelReaderEmptyIndex, TestParallelTermEnum, TestPerSegmentDeletes, 
TestPersistentSnapshotDeletionPolicy, TestSegmentReader, TestStressAdvance, 
TestConstantScoreQuery, TestDateFilter, TestDateSort, TestDocIdSet, TestNot, 
TestPrefixQuery, TestSetNorm, TestTopScoreDocCollector, TestBasics, 
TestSpansAdvanced2, TestDirectory, TestLockFactory]
[junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
(64-bit)/cpus=8,threads=1,free=136724544,total=292618240
{noformat}

that is caused by MockDirectoryWrapper behaving like Windows not deleting files 
if they are still open. So there might be a segments_x file around but the 
_x.fnm has already been deleted. That wasn't a problem before but since we now 
need FIs to decide if a segment is storing vectors or not this file is 
required. 

To work around this I had to add some code to IndexFileDeleter which makes me 
worry a little. Now I drop a commit-point if either I can't load the SIS or I 
can not load one of the FIs from the loaded SI. I still try to delete all files 
of the broken?! segment though but the question is if there could be cases 
where I should rather throw an exception in such a case. Maybe some infoStream 
output would be helpful here to.

Any comments largely appreciated.

 Move hasVectors()  hasProx() responsibility out of SegmentInfo to FieldInfos 
 --

 Key: LUCENE-2984
 URL: https://issues.apache.org/jira/browse/LUCENE-2984
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2984.patch, LUCENE-2984.patch, LUCENE-2984.patch


 Spin-off from LUCENE-2881 which had this change already but due to some 
 

[jira] [Created] (SOLR-2509) String index out of range: -1

2011-05-11 Thread Thomas Gambier (JIRA)
String index out of range: -1
-

 Key: SOLR-2509
 URL: https://issues.apache.org/jira/browse/SOLR-2509
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: Debian Lenny
JAVA Version 1.6.0_20

Reporter: Thomas Gambier
Priority: Blocker


Hi,

I'm a french user of SOLR and i've encountered a problem since i've installed 
SOLR 3.1.

I've got an error with this query : 
cle_frbr:LYSROUGE1149-73190

The error is :
HTTP ERROR 500

Problem accessing /solr/select. Reason:

String index out of range: -1

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


I've tested to escape the minus char and the query worked :
cle_frbr:LYSROUGE1149\-73190

But, strange fact, if i change one letter in my query it works :
cle_frbr:LASROUGE1149-73190


I've tested the same query on SOLR 1.4 and it works !

Can someone test the query on next line on a 3.1 SOLR version and tell me if he 
have the same problem ? 
yourfield:LYSROUGE1149-73190

Where do the problem come from ?

Thank you by advance for your help.

Tom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2509) String index out of range: -1

2011-05-11 Thread Thomas Gambier (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Gambier updated SOLR-2509:
-

Description: 
Hi,

I'm a french user of SOLR and i've encountered a problem since i've installed 
SOLR 3.1.

I've got an error with this query : 
cle_frbr:LYSROUGE1149-73190

The error is :
HTTP ERROR 500

Problem accessing /solr/select. Reason:

String index out of range: -1

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


I've tested to escape the minus char and the query worked :
cle_frbr:LYSROUGE1149\-73190

But, strange fact, if i change one letter in my query it works :
cle_frbr:LASROUGE1149-73190


I've tested the same query on SOLR 1.4 and it works !

Can someone test the query on next line on a 3.1 SOLR version and tell me if he 
have the same problem ? 
yourfield:LYSROUGE1149-73190

Where do the problem come from ?

Thank you by advance for your help.

Tom

  was:
Hi,

I'm a french user of SOLR and i've encountered a problem since i've installed 
SOLR 3.1.

I've got an error with this query : 
cle_frbr:LYSROUGE1149-73190

The error is :
HTTP ERROR 500

Problem accessing /solr/select. Reason:

String index out of range: -1

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 

[jira] [Updated] (SOLR-2509) String index out of range: -1

2011-05-11 Thread Thomas Gambier (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Gambier updated SOLR-2509:
-

Description: 
Hi,

I'm a french user of SOLR and i've encountered a problem since i've installed 
SOLR 3.1.

I've got an error with this query : 
cle_frbr:LYSROUGE1149-73190

The error is :
HTTP ERROR 500

Problem accessing /solr/select. Reason:

String index out of range: -1

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


I've tested to escape the minus char and the query worked :
cle_frbr:LYSROUGE1149(BACKSLASH)-73190

But, strange fact, if i change one letter in my query it works :
cle_frbr:LASROUGE1149-73190


I've tested the same query on SOLR 1.4 and it works !

Can someone test the query on next line on a 3.1 SOLR version and tell me if he 
have the same problem ? 
yourfield:LYSROUGE1149-73190

Where do the problem come from ?

Thank you by advance for your help.

Tom

  was:
Hi,

I'm a french user of SOLR and i've encountered a problem since i've installed 
SOLR 3.1.

I've got an error with this query : 
cle_frbr:LYSROUGE1149-73190

The error is :
HTTP ERROR 500

Problem accessing /solr/select. Reason:

String index out of range: -1

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 

[jira] [Commented] (SOLR-17) XSD for solr requests/responses

2011-05-11 Thread David Barnes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031847#comment-13031847
 ] 

David Barnes commented on SOLR-17:
--

Strongly second comment from Bill Bell.  Working on a commercial project 
integrating Solr 3.1.
Lack of an XSD is making integration with our business service back end a royal 
pain.
We do have XSDs from all other 3rd parties we are integrating with.  Using Solr 
commons connector is not an option for us.



 XSD for solr requests/responses
 ---

 Key: SOLR-17
 URL: https://issues.apache.org/jira/browse/SOLR-17
 Project: Solr
  Issue Type: Improvement
Reporter: Mike Baranczak
Priority: Minor
 Attachments: SOLR-17.Mattmann.121709.patch.txt, 
 UselessRequestHandler.java, solr-complex.xml, solr-rev2.xsd, solr.xsd


 Attaching an XML schema definition for the responses and the update requests. 
 I needed to do this for myself anyway, so I might as well contribute it to 
 the project.
 At the moment, I have no plans to write an XSD for the config documents, but 
 it wouldn't be a bad idea.
 TODO: change the schema URL. I'm guessing that Apache already has some sort 
 of naming convention for these?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 18:56, schrieb Christian Heimes:
 Am 11.05.2011 18:27, schrieb Andi Vajda:
 Does it crash as easily with Python 2.6 ?
 If not, then that could be an answer as to why this wasn't noticed before.
 
 With 20 test samples, it seems like Python 2.6 survives 50% longer than
 Python 2.7.

100 samples:

python2.6
cnt: 100, min: 0.886, max: 3.700, avg: 1.260

python2.7
cnt: 100, min: 0.888, max: 3.793, avg: 1.426


[jira] [Commented] (SOLR-2451) Add assertQScore() to SolrTestCaseJ4 to account for small deltas

2011-05-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031983#comment-13031983
 ] 

David Smiley commented on SOLR-2451:


Ok, I like it.

 Add assertQScore() to SolrTestCaseJ4 to account for small deltas 
 -

 Key: SOLR-2451
 URL: https://issues.apache.org/jira/browse/SOLR-2451
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.2
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-2451.patch, SOLR-2451.patch, 
 SOLR-2451_assertQScore.patch


 Attached is a patch that adds the following method to SolrTestCaseJ4:  (just 
 javadoc  signature shown)
 {code:java}
   /**
* Validates that the document at the specified index in the results has 
 the specified score, within 0.0001.
*/
   public static void assertQScore(SolrQueryRequest req, int docIdx, float 
 targetScore) {
 {code}
 This is especially useful for geospatial in which slightly different 
 precision deltas might occur when trying different geospatial indexing 
 strategies are used, assuming the score is some geospatial distance.  This 
 patch makes a simple modification to DistanceFunctionTest to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)

2011-05-11 Thread Troy Howard
Thanks Michael!

One quick question -- the Wiki seems to be really locked down for
public editing. That's kind of strange. Anyone should be able to log
in and whip up a new page or edit an existing one, committer or
otherwise. I didn't have access until just the other day, and Chris
Currens doesn't have access now (I had to add him to the page
manually).

Can we open up the permissions on our wiki?

Thanks,
Troy



On Wed, May 11, 2011 at 11:51 AM, Michael Herndon
mhern...@wickedsoftware.net wrote:
 You never know.  Personally I generally have most tech people on a list
 rather directly following them.

 But thanks.

 On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wyatt.barn...@gmail.comwrote:

 Retweeted. Though I doubt any of the ~100 people following me aren't in
 the 36 following him . . .

 On 5/11/11 2:39 PM, Michael Herndon mhern...@wickedsoftware.net wrote:

 If any of you follow Hanselman on twitter, please take a second a retweet
 his on the lucene.net hackathon listed below or even send a thanks.
 
 Wanna get involved in Open Source? Why not help with the Lucene.NET
 HackAThon? http://hnsl.mn/lucenehackathon 
 
 Cheers,
 - Michael
 
 On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com wrote:
 
  Here's the wiki page:
 
  https://cwiki.apache.org/confluence/x/Go6OAQ
 
  Thanks,
  Troy
 
 
  On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com
 wrote:
   Michael,
  
   That worked!
  
   I'm in the process of making a wiki page for the event now.
  
   Thanks,
   Troy
  
  
   On Mon, May 9, 2011 at 1:38 PM, Michael Herndon
   mhern...@wickedsoftware.net wrote:
  
   log out and log back in and verify permission changes.
  
   On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com
  wrote:
  
Re: I'm not sure if there is a coding difference between the C#
 stuff
  and
the other directory stuff.
   
There are a few minor code changes in the new branch vs the C#
 branch,
  but
those are things like framework target, copyright notices, etc.. I
  didn't
change code significantly, and unit tests still pass.
   
Re: we can probably branch C# to something like pre_NewStructure
   
I made a tag right before committing the directory changes for this
  exact
purpose. It's here:
   
   
   
 
 
 https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha
 nge
   
   
Regarding the hackathon next week, I'd like to put together a list
 of
  tasks
specifically for this weekend to give people some focus on where
 they
  can
contribute. Some of these will be major tasks with high priority
 (like
finishing up the 2.9.4 release) and others will be of lower
 priority
  like
working on the samples/wiki/website... Those will great skills in
  creating
GUI apps, but less skills with writing back-end libraries might
 want
  to
contribute to Luke.Net, even if it's not a high priority.
   
I agree with Michael that we should tweet/blog/wiki/mailing list
 the
details
of the event. I would make a wiki page on the topic, but it seems I
  don't
have sufficient privileges on our Confluence wiki to do that. Can
  whoever
the admin is give me rights to add/edit wiki pages? My login is
  'thoward'.
   
Thanks,
Troy
   
On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser 
  geobmx...@hotmail.com
wrote:
   

 I think Troy has the structure ready to roll - I'm not sure if
 there
  is a
 coding difference between the C# stuff and the other directory
  stuff. If
 there isn't then we can probably branch C# to something like
 pre_NewStructure (someone help me with a better name), then
 remove
  it
from
 the trunk.

 Troy I believe was investigating the legal task - perhaps he can
  update
us
 if he ever got an answer

 If you want to jump into a smaller task take a look at
 https://issues.apache.org/jira/browse/LUCENENET-372 (currently
  assigned
to
 me). I updated a ton of the analyers, but I believe them to be
 out
  of
date
 from the java 2.9.4 branch because I used the attached files from
  Pasha
 without paying attention to the age of them. So those could use a
  review.
I
 also never ported the test cases, which we definately should
 have.



 
  Date: Mon, 9 May 2011 10:04:03 +0200
  From: ma...@rotselleri.com
  To: lucene-net-...@lucene.apache.org
  Subject: Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)
 
  On Mon, May 9, 2011 at 1:12 AM, Prescott Nasser wrote:
  
   +1 to getting 2.9.4 ready to roll + the changes to the
 directory
 structure we have
   going
 
  +1 for 2.9.4 and directory structure.
  To make that happen, I'd like to know what needs to be done
 and in
  what way I could be of any help. There are 10 open issues for
  2.9.4,
  and (apart from the Luke issues mentioned below) 

Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)

2011-05-11 Thread Michael Herndon
Troy,

Confluence admin is not my forte, but I can look at the privileges tonight
and see if we change that.

You and Prescott also have admin privileges as of right now. I'm pretty much
giving all committers who have forwarded their username those privileges.

I've also added a snippet to the page for people to e-mail me in the
meantime if they are unable to edit the page to add to the table on the
hack-a-thon page.  (And there are some who may just not want to join yet
another wiki).

Do keep an eye out for spam once we elevate privileges.

- Michael

On Wed, May 11, 2011 at 3:37 PM, Troy Howard thowar...@gmail.com wrote:

 Thanks Michael!

 One quick question -- the Wiki seems to be really locked down for
 public editing. That's kind of strange. Anyone should be able to log
 in and whip up a new page or edit an existing one, committer or
 otherwise. I didn't have access until just the other day, and Chris
 Currens doesn't have access now (I had to add him to the page
 manually).

 Can we open up the permissions on our wiki?

 Thanks,
 Troy



 On Wed, May 11, 2011 at 11:51 AM, Michael Herndon
 mhern...@wickedsoftware.net wrote:
  You never know.  Personally I generally have most tech people on a list
  rather directly following them.
 
  But thanks.
 
  On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wyatt.barn...@gmail.com
 wrote:
 
  Retweeted. Though I doubt any of the ~100 people following me aren't in
  the 36 following him . . .
 
  On 5/11/11 2:39 PM, Michael Herndon mhern...@wickedsoftware.net
 wrote:
 
  If any of you follow Hanselman on twitter, please take a second a
 retweet
  his on the lucene.net hackathon listed below or even send a thanks.
  
  Wanna get involved in Open Source? Why not help with the Lucene.NET
  HackAThon? http://hnsl.mn/lucenehackathon 
  
  Cheers,
  - Michael
  
  On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com
 wrote:
  
   Here's the wiki page:
  
   https://cwiki.apache.org/confluence/x/Go6OAQ
  
   Thanks,
   Troy
  
  
   On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com
  wrote:
Michael,
   
That worked!
   
I'm in the process of making a wiki page for the event now.
   
Thanks,
Troy
   
   
On Mon, May 9, 2011 at 1:38 PM, Michael Herndon
mhern...@wickedsoftware.net wrote:
   
log out and log back in and verify permission changes.
   
On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com
   wrote:
   
 Re: I'm not sure if there is a coding difference between the C#
  stuff
   and
 the other directory stuff.

 There are a few minor code changes in the new branch vs the C#
  branch,
   but
 those are things like framework target, copyright notices, etc..
 I
   didn't
 change code significantly, and unit tests still pass.

 Re: we can probably branch C# to something like
 pre_NewStructure

 I made a tag right before committing the directory changes for
 this
   exact
 purpose. It's here:



  
  
 
 https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha
  nge


 Regarding the hackathon next week, I'd like to put together a
 list
  of
   tasks
 specifically for this weekend to give people some focus on where
  they
   can
 contribute. Some of these will be major tasks with high priority
  (like
 finishing up the 2.9.4 release) and others will be of lower
  priority
   like
 working on the samples/wiki/website... Those will great skills
 in
   creating
 GUI apps, but less skills with writing back-end libraries might
  want
   to
 contribute to Luke.Net, even if it's not a high priority.

 I agree with Michael that we should tweet/blog/wiki/mailing list
  the
 details
 of the event. I would make a wiki page on the topic, but it
 seems I
   don't
 have sufficient privileges on our Confluence wiki to do that.
 Can
   whoever
 the admin is give me rights to add/edit wiki pages? My login is
   'thoward'.

 Thanks,
 Troy

 On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser 
   geobmx...@hotmail.com
 wrote:

 
  I think Troy has the structure ready to roll - I'm not sure if
  there
   is a
  coding difference between the C# stuff and the other directory
   stuff. If
  there isn't then we can probably branch C# to something like
  pre_NewStructure (someone help me with a better name), then
  remove
   it
 from
  the trunk.
 
  Troy I believe was investigating the legal task - perhaps he
 can
   update
 us
  if he ever got an answer
 
  If you want to jump into a smaller task take a look at
  https://issues.apache.org/jira/browse/LUCENENET-372(currently
   assigned
 to
  me). I updated a ton of the analyers, but I believe them to be
  out
   of
 date
  from the java 2.9.4 branch because I used the attached files
 from
   Pasha
  without paying attention to the age of 

[jira] [Updated] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos

2011-05-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3084:
--

Attachment: LUCENE-3084-trunk-only.patch

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos

2011-05-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-3084:
---

Lucene Fields: [New, Patch Available]  (was: [New])

After some discussion with Mike we decided, to make some further API changes in 
4.0:

- No longer subclass java.util.Vector, instead ArrayList
- rename SegmentInfos.range to cloneSubList() and let it also return 
ListSegmentInfo
- make OneMerge's list unmodifiable to protect against changes in consumers of 
the MergeSpecification (this item should in my opionion also backported to 3.x)

I'll atach simple patch.

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos

2011-05-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032032#comment-13032032
 ] 

Uwe Schindler commented on LUCENE-3084:
---

The above patch shows the problem with the current merge policy code: it seems 
that the list returned in OneMerge is sometimes modified, we should fix that 
(so patch not yet commitable)

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Lucene.Net Hackathon (5/13-/516)

2011-05-11 Thread Troy Howard
No problem. I set up the permissions such that any user account can
edit/add pages in the wiki.

This should make things a lot easier on us.

Thanks,
Troy


On Wed, May 11, 2011 at 12:50 PM, Michael Herndon
mhern...@wickedsoftware.net wrote:
 Troy,

 Confluence admin is not my forte, but I can look at the privileges tonight
 and see if we change that.

 You and Prescott also have admin privileges as of right now. I'm pretty much
 giving all committers who have forwarded their username those privileges.

 I've also added a snippet to the page for people to e-mail me in the
 meantime if they are unable to edit the page to add to the table on the
 hack-a-thon page.  (And there are some who may just not want to join yet
 another wiki).

 Do keep an eye out for spam once we elevate privileges.

 - Michael

 On Wed, May 11, 2011 at 3:37 PM, Troy Howard thowar...@gmail.com wrote:

 Thanks Michael!

 One quick question -- the Wiki seems to be really locked down for
 public editing. That's kind of strange. Anyone should be able to log
 in and whip up a new page or edit an existing one, committer or
 otherwise. I didn't have access until just the other day, and Chris
 Currens doesn't have access now (I had to add him to the page
 manually).

 Can we open up the permissions on our wiki?

 Thanks,
 Troy



 On Wed, May 11, 2011 at 11:51 AM, Michael Herndon
 mhern...@wickedsoftware.net wrote:
  You never know.  Personally I generally have most tech people on a list
  rather directly following them.
 
  But thanks.
 
  On Wed, May 11, 2011 at 2:43 PM, Wyatt Barnett wyatt.barn...@gmail.com
 wrote:
 
  Retweeted. Though I doubt any of the ~100 people following me aren't in
  the 36 following him . . .
 
  On 5/11/11 2:39 PM, Michael Herndon mhern...@wickedsoftware.net
 wrote:
 
  If any of you follow Hanselman on twitter, please take a second a
 retweet
  his on the lucene.net hackathon listed below or even send a thanks.
  
  Wanna get involved in Open Source? Why not help with the Lucene.NET
  HackAThon? http://hnsl.mn/lucenehackathon 
  
  Cheers,
  - Michael
  
  On Mon, May 9, 2011 at 7:12 PM, Troy Howard thowar...@gmail.com
 wrote:
  
   Here's the wiki page:
  
   https://cwiki.apache.org/confluence/x/Go6OAQ
  
   Thanks,
   Troy
  
  
   On Mon, May 9, 2011 at 1:59 PM, Troy Howard thowar...@gmail.com
  wrote:
Michael,
   
That worked!
   
I'm in the process of making a wiki page for the event now.
   
Thanks,
Troy
   
   
On Mon, May 9, 2011 at 1:38 PM, Michael Herndon
mhern...@wickedsoftware.net wrote:
   
log out and log back in and verify permission changes.
   
On Mon, May 9, 2011 at 4:22 PM, Troy Howard thowar...@gmail.com
   wrote:
   
 Re: I'm not sure if there is a coding difference between the C#
  stuff
   and
 the other directory stuff.

 There are a few minor code changes in the new branch vs the C#
  branch,
   but
 those are things like framework target, copyright notices, etc..
 I
   didn't
 change code significantly, and unit tests still pass.

 Re: we can probably branch C# to something like
 pre_NewStructure

 I made a tag right before committing the directory changes for
 this
   exact
 purpose. It's here:



  
  
 
 https://svn.apache.org/repos/asf/incubator/lucene.net/tags/pre-layout-cha
  nge


 Regarding the hackathon next week, I'd like to put together a
 list
  of
   tasks
 specifically for this weekend to give people some focus on where
  they
   can
 contribute. Some of these will be major tasks with high priority
  (like
 finishing up the 2.9.4 release) and others will be of lower
  priority
   like
 working on the samples/wiki/website... Those will great skills
 in
   creating
 GUI apps, but less skills with writing back-end libraries might
  want
   to
 contribute to Luke.Net, even if it's not a high priority.

 I agree with Michael that we should tweet/blog/wiki/mailing list
  the
 details
 of the event. I would make a wiki page on the topic, but it
 seems I
   don't
 have sufficient privileges on our Confluence wiki to do that.
 Can
   whoever
 the admin is give me rights to add/edit wiki pages? My login is
   'thoward'.

 Thanks,
 Troy

 On Mon, May 9, 2011 at 1:15 AM, Prescott Nasser 
   geobmx...@hotmail.com
 wrote:

 
  I think Troy has the structure ready to roll - I'm not sure if
  there
   is a
  coding difference between the C# stuff and the other directory
   stuff. If
  there isn't then we can probably branch C# to something like
  pre_NewStructure (someone help me with a better name), then
  remove
   it
 from
  the trunk.
 
  Troy I believe was investigating the legal task - perhaps he
 can
   update
 us
  if he ever got an answer
 
  If you want to jump into a smaller task take a look at
  

[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos

2011-05-11 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032046#comment-13032046
 ] 

Earwin Burrfoot commented on LUCENE-3084:
-

* Speaking logically, merges operate on Sets of SIs, not List?
* Let's stop subclassing random things? : ) SIS can contain a List of SIs (and 
maybe a Set, or whatever we need in the future), and only expose operations its 
clients really need.

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1421) Ability to group search results by field

2011-05-11 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032082#comment-13032082
 ] 

Martijn van Groningen commented on LUCENE-1421:
---

Nice work Michael! I also think that the two pass mechanism is definitely the 
preferred way to go. 

I think we also need a strategy mechanism (or at least an GroupCollector class 
hierarchy) inside this module. The mechanism should select the right group 
collector(s) for a certain request. Some users maybe only care about the top 
group document, so I second pass won't be necessary. Another example with 
faceting in mind. When group based faceting is necessary. The top N groups 
don't suffice. You'll need all group docs (I currently don't see a other way). 
These groups docs are then used to create a grouped Solr DocSet. But this 
should be a completely different implementation. 

 Ability to group search results by field
 

 Key: LUCENE-1421
 URL: https://issues.apache.org/jira/browse/LUCENE-1421
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Artyom Sokolov
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1421.patch, lucene-grouping.patch


 It would be awesome to group search results by specified field. Some 
 functionality was provided for Apache Solr but I think it should be done in 
 Core Lucene. There could be some useful information like total hits about 
 collapsed data like total count and so on.
 Thanks,
 Artyom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2510) Proximity search is not symmetric

2011-05-11 Thread mark risher (JIRA)
Proximity search is not symmetric
-

 Key: SOLR-2510
 URL: https://issues.apache.org/jira/browse/SOLR-2510
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 3.1
 Environment: Ubuntu 10.04
Reporter: mark risher


The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are less-than N words before and 
less-than-or-equal-to N words after.

For example, use the following document:
   WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G

Expected result: Both of the following queries should match:
1) WORD_D WORD_G~3
2) WORD_D WORD_A~3

Actual result: Only #1 matches.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2510) Proximity search is not symmetric

2011-05-11 Thread mark risher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mark risher updated SOLR-2510:
--

Description: 
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
   WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G

Expected result: Both of the following queries should match:
1) WORD_D WORD_G~3
2) WORD_D WORD_A~3

Actual result: Only #1 matches.



  was:
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are less-than N words before and 
less-than-or-equal-to N words after.

For example, use the following document:
   WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G

Expected result: Both of the following queries should match:
1) WORD_D WORD_G~3
2) WORD_D WORD_A~3

Actual result: Only #1 matches.




 Proximity search is not symmetric
 -

 Key: SOLR-2510
 URL: https://issues.apache.org/jira/browse/SOLR-2510
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 3.1
 Environment: Ubuntu 10.04
Reporter: mark risher

 The proximity search is incorrect on words occurring *before* the matching 
 term. It matches documents that are _less-than_ N words before and 
 _less-than-or-equal-to_ N words after.
 For example, use the following document:
WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G
 Expected result: Both of the following queries should match:
 1) WORD_D WORD_G~3
 2) WORD_D WORD_A~3
 Actual result: Only #1 matches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2510) Proximity search is not symmetric

2011-05-11 Thread mark risher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mark risher updated SOLR-2510:
--

Description: 
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
   {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}

__Expected result:__ Both of the following queries should match:
1) {{WORD_D WORD_G~3}}
2) {{WORD_D WORD_A~3}}

__Actual result:__ Only #1 matches.



  was:
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
{{   WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}

__Expected result:__ Both of the following queries should match:
1) {{WORD_D WORD_G~3}}
2) {{WORD_D WORD_A~3}}

__Actual result:__ Only #1 matches.




 Proximity search is not symmetric
 -

 Key: SOLR-2510
 URL: https://issues.apache.org/jira/browse/SOLR-2510
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 3.1
 Environment: Ubuntu 10.04
Reporter: mark risher

 The proximity search is incorrect on words occurring *before* the matching 
 term. It matches documents that are _less-than_ N words before and 
 _less-than-or-equal-to_ N words after.
 For example, use the following document:
{{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}
 __Expected result:__ Both of the following queries should match:
 1) {{WORD_D WORD_G~3}}
 2) {{WORD_D WORD_A~3}}
 __Actual result:__ Only #1 matches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2510) Proximity search is not symmetric

2011-05-11 Thread mark risher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mark risher updated SOLR-2510:
--

Description: 
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
   {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}

*Expected result:* Both of the following queries should match:
1) {{WORD_D WORD_G~3}}
2) {{WORD_D WORD_A~3}}

*Actual result:* Only #1 matches.



  was:
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
   {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}

__Expected result:__ Both of the following queries should match:
1) {{WORD_D WORD_G~3}}
2) {{WORD_D WORD_A~3}}

__Actual result:__ Only #1 matches.




 Proximity search is not symmetric
 -

 Key: SOLR-2510
 URL: https://issues.apache.org/jira/browse/SOLR-2510
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 3.1
 Environment: Ubuntu 10.04
Reporter: mark risher

 The proximity search is incorrect on words occurring *before* the matching 
 term. It matches documents that are _less-than_ N words before and 
 _less-than-or-equal-to_ N words after.
 For example, use the following document:
{{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}
 *Expected result:* Both of the following queries should match:
 1) {{WORD_D WORD_G~3}}
 2) {{WORD_D WORD_A~3}}
 *Actual result:* Only #1 matches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2510) Proximity search is not symmetric

2011-05-11 Thread mark risher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mark risher updated SOLR-2510:
--

Description: 
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
{{   WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}

__Expected result:__ Both of the following queries should match:
1) {{WORD_D WORD_G~3}}
2) {{WORD_D WORD_A~3}}

__Actual result:__ Only #1 matches.



  was:
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
   WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G

Expected result: Both of the following queries should match:
1) WORD_D WORD_G~3
2) WORD_D WORD_A~3

Actual result: Only #1 matches.




 Proximity search is not symmetric
 -

 Key: SOLR-2510
 URL: https://issues.apache.org/jira/browse/SOLR-2510
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 3.1
 Environment: Ubuntu 10.04
Reporter: mark risher

 The proximity search is incorrect on words occurring *before* the matching 
 term. It matches documents that are _less-than_ N words before and 
 _less-than-or-equal-to_ N words after.
 For example, use the following document:
 {{   WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}
 __Expected result:__ Both of the following queries should match:
 1) {{WORD_D WORD_G~3}}
 2) {{WORD_D WORD_A~3}}
 __Actual result:__ Only #1 matches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos

2011-05-11 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032099#comment-13032099
 ] 

Earwin Burrfoot commented on LUCENE-3084:
-

bq. Merges are ordered
Hmm.. Why should they be?

bq. SegmentInfos itself must be list
It may contain list as a field instead. And have a much cleaner API as a 
consequence.

On another note, I wonder, is the fact that Vector is internally synchronized 
used somewhere within SegmentInfos client code?

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2511) Make it easier to override SolrContentHandler newDocument

2011-05-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-2511:
-

Assignee: Grant Ingersoll

 Make it easier to override SolrContentHandler newDocument
 -

 Key: SOLR-2511
 URL: https://issues.apache.org/jira/browse/SOLR-2511
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor

 The SolrContentHandler's newDocument method does a variety of things: adds 
 metadata, literals, content and catpured content.  We could split this out 
 into protected methods for each that makes it easier to override.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2511) Make it easier to override SolrContentHandler newDocument

2011-05-11 Thread Grant Ingersoll (JIRA)
Make it easier to override SolrContentHandler newDocument
-

 Key: SOLR-2511
 URL: https://issues.apache.org/jira/browse/SOLR-2511
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


The SolrContentHandler's newDocument method does a variety of things: adds 
metadata, literals, content and catpured content.  We could split this out into 
protected methods for each that makes it easier to override.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos

2011-05-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032137#comment-13032137
 ] 

Michael McCandless commented on LUCENE-3084:


I would love to cutover to SetSI, but, I don't think we can.  There are apps 
out there that want merges to remain contiguous (so docIDs keep their 
monotonicity).

But I do think we should not keep that by default (I reopened LUCENE-1076 to 
switched to TieredMP in 3.x by default).

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos

2011-05-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032139#comment-13032139
 ] 

Michael McCandless commented on LUCENE-3084:


Patch looks good -- thanks Uwe!

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1421) Ability to group search results by field

2011-05-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1421:
---

Attachment: LUCENE-1421.patch

Patch w/ next iteration... I beefed up the overview.html, added test case 
coverage of null groupValue.

I think it's ready to commit and then back-port to 3.x!

 Ability to group search results by field
 

 Key: LUCENE-1421
 URL: https://issues.apache.org/jira/browse/LUCENE-1421
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Artyom Sokolov
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1421.patch, LUCENE-1421.patch, 
 lucene-grouping.patch


 It would be awesome to group search results by specified field. Some 
 functionality was provided for Apache Solr but I think it should be done in 
 Core Lucene. There could be some useful information like total hits about 
 collapsed data like total count and so on.
 Thanks,
 Artyom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1421) Ability to group search results by field

2011-05-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032145#comment-13032145
 ] 

Michael McCandless commented on LUCENE-1421:


{quote}
I think we also need a strategy mechanism (or at least an GroupCollector class 
hierarchy) inside this module. The mechanism should select the right group 
collector(s) for a certain request. Some users maybe only care about the top 
group document, so I second pass won't be necessary. Another example with 
faceting in mind. When group based faceting is necessary. The top N groups 
don't suffice. You'll need all group docs (I currently don't see a other way). 
These groups docs are then used to create a grouped Solr DocSet. But this 
should be a completely different implementation.
{quote}

I agree, there's much more we could do here!  Specialized collection for the 
maxDocsPerGroup=1 case, and for the I want all groups case, would be nice.  
For the not many unique values in the group field case we could do a 
single-pass collector, I think.

Grouping by a multi-valued field should be possible (we now have DocTermOrds in 
Lucene, but it doesn't load the term byte[] data), as well as support for 
sharding, ie, by merging top groups and docs w/in each group (but I think we 
need an addition to FieldComparator API for this).

I think we should commit this starting point, today, and then iterate from 
there...

Martijn, thank you for persisting for so long on SOLR-236!  We are
finally getting grouping functionality accessible from Lucene and
Solr...


 Ability to group search results by field
 

 Key: LUCENE-1421
 URL: https://issues.apache.org/jira/browse/LUCENE-1421
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Artyom Sokolov
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1421.patch, LUCENE-1421.patch, 
 lucene-grouping.patch


 It would be awesome to group search results by specified field. Some 
 functionality was provided for Apache Solr but I think it should be done in 
 Core Lucene. There could be some useful information like total hits about 
 collapsed data like total count and so on.
 Thanks,
 Artyom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3086) add ElisionsFilter to ItalianAnalyzer

2011-05-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3086.
-

Resolution: Fixed

Committed revision 1102120, 1102127

 add ElisionsFilter to ItalianAnalyzer
 -

 Key: LUCENE-3086
 URL: https://issues.apache.org/jira/browse/LUCENE-3086
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3086.patch


 we set this up for french by default, but we don't for italian.
 we should enable it with the standard italian contractions (e.g. definite 
 articles).
 the various stemmers for these languages assume this is already being taken 
 care of
 and don't do anything about it... in general things like snowball assume 
 really dumb
 tokenization, that you will split on the word-internal ', and they add these 
 to stoplists.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3064) add checks to MockTokenizer to enforce proper consumption

2011-05-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3064:


Attachment: LUCENE-3064.patch

Updated patch: I think this is ready to commit.

I added a boolean to allow the workflow checks to be disabled in very 
exceptional cases (e.g. TestIndexWriterExceptions's CrashingTokenFilter), so in 
general we can do pretty good checking.


 add checks to MockTokenizer to enforce proper consumption
 -

 Key: LUCENE-3064
 URL: https://issues.apache.org/jira/browse/LUCENE-3064
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3064.patch, LUCENE-3064.patch, LUCENE-3064.patch


 we can enforce things like consumer properly iterates through tokenstream 
 lifeycle
 via MockTokenizer. this could catch bugs in consumers that don't call 
 reset(), etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2445) unknown handler: standard

2011-05-11 Thread Gabriele Kahlout (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032216#comment-13032216
 ] 

Gabriele Kahlout commented on SOLR-2445:


I've attached a trivial patch that just modifies the form.jsp (useful for 
scripts).

 unknown handler: standard
 -

 Key: SOLR-2445
 URL: https://issues.apache.org/jira/browse/SOLR-2445
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4.1, 3.1, 3.2, 4.0
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2445.patch


 To reproduce the problem using example config, go form.jsp, use standard for 
 qt (it is default) then click Search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2445) unknown handler: standard

2011-05-11 Thread Gabriele Kahlout (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriele Kahlout updated SOLR-2445:
---

Attachment: qt-form-jsp.patch

trivial patch to form.jsp that leaves qt empty (useful for setup scripts and 
those that need to stick to an 3.1.0 revision).

 unknown handler: standard
 -

 Key: SOLR-2445
 URL: https://issues.apache.org/jira/browse/SOLR-2445
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4.1, 3.1, 3.2, 4.0
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2445.patch, qt-form-jsp.patch


 To reproduce the problem using example config, go form.jsp, use standard for 
 qt (it is default) then click Search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2510) Proximity search is not symmetric

2011-05-11 Thread mark risher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mark risher updated SOLR-2510:
--

Description: 
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
   {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}

*Expected result:* Both of the following queries should match:
1) {{WORD_D WORD_G~3}}
2) {{WORD_G WORD_D~3}}

*Actual result:* Only #1 matches. For some reason, it thinks the distance from 
D to G is 3, but from G to D is 4.



  was:
The proximity search is incorrect on words occurring *before* the matching 
term. It matches documents that are _less-than_ N words before and 
_less-than-or-equal-to_ N words after.

For example, use the following document:
   {{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}

*Expected result:* Both of the following queries should match:
1) {{WORD_D WORD_G~3}}
2) {{WORD_D WORD_A~3}}

*Actual result:* Only #1 matches.




 Proximity search is not symmetric
 -

 Key: SOLR-2510
 URL: https://issues.apache.org/jira/browse/SOLR-2510
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 3.1
 Environment: Ubuntu 10.04
Reporter: mark risher

 The proximity search is incorrect on words occurring *before* the matching 
 term. It matches documents that are _less-than_ N words before and 
 _less-than-or-equal-to_ N words after.
 For example, use the following document:
{{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}
 *Expected result:* Both of the following queries should match:
 1) {{WORD_D WORD_G~3}}
 2) {{WORD_G WORD_D~3}}
 *Actual result:* Only #1 matches. For some reason, it thinks the distance 
 from D to G is 3, but from G to D is 4.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1421) Ability to group search results by field

2011-05-11 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032238#comment-13032238
 ] 

Bill Bell commented on LUCENE-1421:
---

Say we have 4 documents:

docid=1
hgid=1
age=10

docid=2
hgid=1
age=10

docid=3
hgid=2
age=12

docid=4
hgid=4
age=11

If we group by hgid, we would get:

hgid=1
  docid=1
   hgid=1
   age=10
  docid=2
   hgid=1
   age=10

hgid=3
   docid=3
hgid=2
age=12

hgid=4
docid=4
 hgid=4
 age=11

If I set Facet Counts = POST

age: 10 (1 document)
age: 11 (1 document)
age: 12 (1 document)

If I set Facet Counts = PRE

age: 10 (2 document)
age: 11 (1 document)
age: 12 (1 document)

The only way grouping works in Solr now is Facet Counts = PRE.

Thanks.

 Ability to group search results by field
 

 Key: LUCENE-1421
 URL: https://issues.apache.org/jira/browse/LUCENE-1421
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Artyom Sokolov
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-1421.patch, LUCENE-1421.patch, 
 lucene-grouping.patch


 It would be awesome to group search results by specified field. Some 
 functionality was provided for Apache Solr but I think it should be done in 
 Core Lucene. There could be some useful information like total hits about 
 collapsed data like total count and so on.
 Thanks,
 Artyom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2444) Update fl syntax to support: pseudo fields, AS, transformers, and wildcards

2011-05-11 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032239#comment-13032239
 ] 

Koji Sekiguchi commented on SOLR-2444:
--

Does this issue cover wildcard syntax like fl=*_s ? Because SOLR-2503 has been 
committed, I want the wildcard syntax for fl.

fl=*_s
{code}
doc
  str name=PERSON_SBarack Obama/str
  str name=TITLE_Sthe President/str
/doc
{code}


 Update fl syntax to support: pseudo fields, AS, transformers, and wildcards
 ---

 Key: SOLR-2444
 URL: https://issues.apache.org/jira/browse/SOLR-2444
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
 Attachments: SOLR-2444-fl-parsing.patch, SOLR-2444-fl-parsing.patch


 The ReturnFields parsing needs to be improved.  It should also support 
 wildcards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1559 - Still Failing

2011-05-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-trunk/1559/

1 tests failed.
FAILED:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

Error Message:
this writer hit an OutOfMemoryError; cannot commit

Stack Trace:
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot 
commit
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2456)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2538)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2520)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2504)
at 
org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:223)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)




Build Log (for compile errors):
[...truncated 11983 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Small issue in queryparser // ParametricRangeQueryNode.java

2011-05-11 Thread Adriano Crestani
Hi Karsten,

Sorry for taking so long to reply.

I am still not 100% sure what behavior you expect exactly from
ParametricRangeQueryNode constructor.

To help to solve our understading problem, I created a simple JUnit
(attached) that tests the behavior I expect. Please, go ahead and change it
the way you expect ;)

Please, on your next replies, copy Lucene dev mailing list, they might help
on your questions as well.

Best Regards,
Adriano Crestani

On Mon, May 9, 2011 at 7:05 AM, karsten-s...@gmx.de wrote:

 Hi Adriano,

 at time ParametricRangeQueryNode(lowerBound, upperBound) works only if both
 parameters have the same instance as fieldname (==). If it is only the same
 text (equals) the IllegalArgumentException is thrown.

 why contradiction:
 because upperBound == lowerBound (and lowerBound != null) implicates
 upperBound.equals(lowerBound)

 my suggestion and upper.getField() is NULL:
 In this case the IllegalArgumentException would be thrown.
 (and also for lower.getField() is NULL)

 Best regards
  Karsten

  
  Datum: Sun, 8 May 2011 16:24:48 -0400
  Adriano Crestani adrianocrest...@gmail.com
  subject: Re: Small issue in queryparser // ParametricRangeQueryNode.java

  Hi Karsten,
 
  No, AFAIK, no one is working on such feature, feel free to work on it, I
  am
  sure there are many people waiting for such feature :)
 
  Now, about the contradiction you mentioned below, I can't see it in the
  code:
 
  because
   (upperBound == lowerBound
   
   !upperBound.getField().equals(lowerBound.getField()))
   is a contradiction)
 
  Can you explain more on this problem you see in the code?
 
  Also, what you think that should be the constraint condition does not
 make
  sense for me. The contraints asserts whether the upper and lower bounds
  have
  the same field name, correct?! The condition you proposed below would not
  throw an exception if upper.getField() is NULL and lower is something
  else,
  which is wrong, an exception should be thrown, since the field names are
  different.
 
  most possible it should be
 
 if(upperBound.getField() == null
||
(upperBound.getField() != lowerBound.getField()
  !upperBound.getField().equals(
 lowerBound.getField( {
   throw new IllegalArgumentException(
   ...
 
 
  Am I missing something?
 
  Best Regards,
  Adriano Crestani
 
 
  On Sun, May 8, 2011 at 1:00 PM, karsten-s...@gmx.de
  wrote:
 
   Hi Michael Busch,
  
   The Class ParametricRangeQueryNode was inserted in svn with LUCENE-1567
   New flexible query parser.
  
   The constructor
public ParametricRangeQueryNode(ParametricQueryNode lowerBound,
ParametricQueryNode upperBound) {
   has a constraint about his parameters:
  
  if (upperBound.getField() != lowerBound.getField()
  || (upperBound.getField() != null 
  !upperBound.getField().equals(
  lowerBound.getField( {
throw new IllegalArgumentException(
...
  
   most possible it should be
  
  if(upperBound.getField() == null
 ||
 (upperBound.getField() != lowerBound.getField()
   !upperBound.getField().equals(
  lowerBound.getField( {
throw new IllegalArgumentException(
...
   (
because
(upperBound == lowerBound

!upperBound.getField().equals(lowerBound.getField()))
is a contradiction)
   )
  
   Best regards
  
Karsten
  
  
   P.S. currently I am working with SpanQueries in the queryparser-module,
  so
   I wrote e.g. SpanNearQueryNode. Is this work already down by someone
  else?
  



TestParametricRangeQueryNode.java
Description: Binary data

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine

2011-05-11 Thread Koji Sekiguchi (JIRA)
uima: add an ability to skip runtime error in AnalysisEngine


 Key: SOLR-2512
 URL: https://issues.apache.org/jira/browse/SOLR-2512
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 3.2, 4.0


Currently, if AnalysisEngine throws an exception during processing a text, 
whole adding docs go fail. Because online NLP services are error-prone, users 
should be able to choose whether solr skips the text processing (but source 
text can be indexed) for the document or throws a runtime exception so that 
solr can stop adding documents entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine

2011-05-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2512:
-

Attachment: SOLR-2512.patch

A draft patch attached. It doesn't include the switch.

 uima: add an ability to skip runtime error in AnalysisEngine
 

 Key: SOLR-2512
 URL: https://issues.apache.org/jira/browse/SOLR-2512
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2512.patch


 Currently, if AnalysisEngine throws an exception during processing a text, 
 whole adding docs go fail. Because online NLP services are error-prone, users 
 should be able to choose whether solr skips the text processing (but source 
 text can be indexed) for the document or throws a runtime exception so that 
 solr can stop adding documents entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org