date:20111130

FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
CloseableThreadLocal

(like uncommenting ThreadLocalT class and replacing extension-methods
calls 

with static calls to CloseableThreadLocalExtensions)

 

 

DIGY

 

 

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 7:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

Trevor,

 

I'm not sure if you can use 2.9.4, though, it looks like you're using

VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

classes only available in 4.0 (or 3.5?).  However, if you can, I would

suggest updating, as 2.9.4 should be a fairly stable release.

 

The leak I'm talking about is addressed here:

https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

or may not be what your issue is.  You say that it was at one time working

fine, I assume you mean no memory leak.  I would take some time to see what

else in your code has changed.  Make sure you're calling Close on whatever

needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

 

Unfortunately for us, memory leaks are hard to debug over email, and it's

difficult for us to tell if it's any change to your code or an issue with

Lucene .NET.  As far as I can tell, this is the only memory leak I can find

that affects 2.9.2.

 

 

Thanks,

Christopher

 

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
geobmx...@hotmail.comwrote:

 

 We just released 2.9.4 - the website didn't update last night, so ill have

 to try and update it later today. But if you follow the link to download

 2.9.2 dist you'll see folders for 2.9.4.

 

 I'll send an email to the user and dev lists once i get the website to

 update

 

 From: Trevor Watson

 Sent: 11/30/2011 8:14 AM

 To: lucene-net-dev@lucene.apache.org

 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

 You said pre 2.9.3  I checked the apache lucene.net page to try to see

 if

 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2
and

 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong

 spot for updates to lucene.net?

 

 Thanks for all your help

 

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 

 powersearchsoftw...@gmail.com wrote:

 

  I can send you the dll that I am using if you would like.  The documents

  are _mostly_ small documents.  Emails and office docs size of plain text

 

 

  On Tuesday, November 29, 2011, Christopher Currens 

  currens.ch...@gmail.com wrote:

   Do you know how big the documents are that you are trying to

  delete/update?

I'm trying to find a copy of 2.9.2 to see if I can reproduce it.

  

  

   Thanks,

   Christopher

  

   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 

   powersearchsoftw...@gmail.com wrote:

  

   Sorry for the duplicate post. I was on the road and posted both via
my

  web

   mail and office mail by mistake

  

   The increase is a very gradual,  the program starts at about 160,000k

   according to task manager (I know that's not entirely accurate, but
it

  was

   the best I had at the time) and would, after adding 25,000-40,000

  result in

   an out of memory exception (800,000k according to taskmanager). I

 tried

   building a copy of 2.9.4 to test, but could not find one that worked

 in

   visual studio 2005

  

   I did notice using Ants memory profiler that there were a number of

   byte[32789] arrays that I didn't know where they came from in memory.

  

   On Monday, November 28, 2011, Christopher Currens 

  currens.ch...@gmail.com

   

   wrote:

Hi Trevor,

   

What kind of memory increase are we talking about?  Also, how big

 are

  the

documents that you are indexing, the ones returned from

  getFileInfoDoc()?

 Is it putting an entire file into the index?  Pre 2.9.3 versions

 had

issues with holding onto allocated byte arrays far beyond when they

  were

used.  The memory could only be freed via closing the IndexWriter.

   

I'm a little unclear on exactly what's happening.  Are you noticing

   memory

spike and stay constant at that level or is it a gradual increase?

   Is it

causing your application to error, (ie OutOfMemory exception, etc)?

   

   

Thanks,

Christopher

   

On Mon, Nov 28, 2011 at 5:59 PM, Trevor Watson 

powersearchsoftw...@gmail.com wrote:

   

I'm attempting to use Lucene.Net v2.9.2.2 in a Visual Studio 2005

  (.NET

2.0) environment.  We had a piece of software that WAS working.

  I'm

  not

sure what has changed however, the following code results in a

 memory

   leak

in the Lucene.Net component (or a failure to clean up used
memory).

   

The code in issue is here:

   

 private void

RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

OK, here is the code that can be compiled against .NET 2.0
http://pastebin.com/k2f7JfPd

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Wednesday, November 30, 2011 9:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

DIGY,

Thanks for the tip, but could you be a little more specific?
Where and how are extension-methods calls replaced?

For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}

to eliminate the compile error

Error   2   Cannot define a new extension method because the compiler
required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be
found. Are you missing a reference to System.Core.dll?

due to the SetT parameter this ThreadLocalt t ?

- Neal

-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Wednesday, November 30, 2011 12:27 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
CloseableThreadLocal

(like uncommenting ThreadLocalT class and replacing extension-methods
calls 

with static calls to CloseableThreadLocalExtensions)

DIGY

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 7:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

Trevor,

I'm not sure if you can use 2.9.4, though, it looks like you're using

VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

classes only available in 4.0 (or 3.5?).  However, if you can, I would

suggest updating, as 2.9.4 should be a fairly stable release.

The leak I'm talking about is addressed here:

https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

or may not be what your issue is.  You say that it was at one time working

fine, I assume you mean no memory leak.  I would take some time to see what

else in your code has changed.  Make sure you're calling Close on whatever

needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

Unfortunately for us, memory leaks are hard to debug over email, and it's

difficult for us to tell if it's any change to your code or an issue with

Lucene .NET.  As far as I can tell, this is the only memory leak I can find

that affects 2.9.2.

Thanks,

Christopher

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
geobmx...@hotmail.comwrote:

 We just released 2.9.4 - the website didn't update last night, so ill have

 to try and update it later today. But if you follow the link to download

 2.9.2 dist you'll see folders for 2.9.4.

 I'll send an email to the user and dev lists once i get the website to

 update

 From: Trevor Watson

 Sent: 11/30/2011 8:14 AM

 To: lucene-net-dev@lucene.apache.org

 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 You said pre 2.9.3  I checked the apache lucene.net page to try to see

 if

 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2
and

 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong

 spot for updates to lucene.net?

 Thanks for all your help

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 

 powersearchsoftw...@gmail.com wrote:

  I can send you the dll that I am using if you would like.  The documents

  are _mostly_ small documents.  Emails and office docs size of plain text

  On Tuesday, November 29, 2011, Christopher Currens 

  currens.ch...@gmail.com wrote:

   Do you know how big the documents are that you are trying to

  delete/update?

I'm trying to find a copy of 2.9.2 to see if I can reproduce it.

   Thanks,

   Christopher

   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 

   powersearchsoftw...@gmail.com wrote:

   Sorry for the duplicate post. I was on the road and posted both via
my

  web

   mail and office mail by mistake

   The increase is a very gradual,  the program starts at about 160,000k

   according to task manager (I know that's not entirely accurate, but
it

  was

   the best I had at the time) and would, after adding 25,000-40,000

  result in

   an out of memory exception (800,000k according to taskmanager). I

 tried

   building a copy of 2.9.4 to test, but could not find one that worked

 in

   visual studio 2005

   I did notice using Ants memory profiler that there were a number of

   byte[32789] arrays that I didn't know where they came from in memory.

   On Monday, November 28, 2011, Christopher Currens 

  currens.ch...@gmail.com

   wrote:

Hi Trevor,

What kind of memory increase are we talking about?

RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

If I recall it correctly, last memory leak problem for 2.9.2 was reported in
~August from RavenDB, and it was fixed in 2.9.4(g)

DIGY

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 11:33 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

Trevor,

Unforunately I was unable to reproduce the memory leak you're experiencing
in 2.9.2.  Particularly with byte[], of the 18,277 that were created, only
13 were not garbage collected, and it's likely that they are not related to
Lucene (it's possible they are static, therefore would only be destroyed
with the AppDomain, outside of what the profiler can trace).  I tried to
emulate the code you showed us and there were no signs of any allocated
arrays that weren't cleaned up.  That doesn't mean there isn't one in your
code, but I just can't reproduce it with what you've shown us.  If it's
possible you can write a small program that has the same behavior, that
could help us track it down.

As a side note, what was a little disconcerting, though, was in 2.9.4 with
the same code, it created 28,565 byte[], and there was quite a few more
left uncollected (2,805 arrays).  The allocations are happening in
DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though,
to see if its even a problem.

Thanks,
Christopher

On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. 
neal.granr...@thermofisher.com wrote:

 Or maybe put the changes within a conditional compile code block?

 Thanks DIGY, works great.

 - Neal

 -Original Message-
 From: Prescott Nasser [mailto:geobmx...@hotmail.com]
 Sent: Wednesday, November 30, 2011 2:35 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 Probably makes for a good wiki entry

 Sent from my Windows Phone

 From: Digy
 Sent: 11/30/2011 12:04 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 OK, here is the code that can be compiled against .NET 2.0
 http://pastebin.com/k2f7JfPd

 DIGY

 -Original Message-
 From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
 Sent: Wednesday, November 30, 2011 9:26 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 DIGY,

 Thanks for the tip, but could you be a little more specific?
 Where and how are extension-methods calls replaced?

 For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}

 to eliminate the compile error

 Error   2   Cannot define a new extension method because the compiler
 required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot
 be
 found. Are you missing a reference to System.Core.dll?

 due to the SetT parameter this ThreadLocalt t ?

 - Neal

 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, November 30, 2011 12:27 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
 CloseableThreadLocal

 (like uncommenting ThreadLocalT class and replacing extension-methods
 calls

 with static calls to CloseableThreadLocalExtensions)

 DIGY

 -Original Message-
 From: Christopher Currens [mailto:currens.ch...@gmail.com]
 Sent: Wednesday, November 30, 2011 7:26 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 Trevor,

 I'm not sure if you can use 2.9.4, though, it looks like you're using

 VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we
use

 classes only available in 4.0 (or 3.5?).  However, if you can, I would

 suggest updating, as 2.9.4 should be a fairly stable release.

 The leak I'm talking about is addressed here:

 https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

 isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It
may

 or may not be what your issue is.  You say that it was at one time working

 fine, I assume you mean no memory leak.  I would take some time to see
what

 else in your code has changed.  Make sure you're calling Close on whatever

 needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

 Unfortunately for us, memory leaks are hard to debug over email, and it's

 difficult for us to tell if it's any change to your code or an issue with

 Lucene .NET.  As far as I can tell, this is the only memory leak I can
find

 that affects 2.9.2.

 Thanks,

 Christopher

 On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
 geobmx...@hotmail.comwrote:

  We just released 2.9.4 - the website didn't update last night, so ill
 have

  to try and update it later today. But if you follow the link to download

  2.9.2

RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

... and it was related with CloseableThreadLocal (fixed in 2.9.4(g)) which
now creates 
compilation problem against .Net20 :)

DIGY

-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Thursday, December 01, 2011 12:09 AM
To: 'lucene-net-dev@lucene.apache.org'
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

If I recall it correctly, last memory leak problem for 2.9.2 was reported in
~August from RavenDB, and it was fixed in 2.9.4(g)

DIGY

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 11:33 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

Trevor,

Unforunately I was unable to reproduce the memory leak you're experiencing
in 2.9.2.  Particularly with byte[], of the 18,277 that were created, only
13 were not garbage collected, and it's likely that they are not related to
Lucene (it's possible they are static, therefore would only be destroyed
with the AppDomain, outside of what the profiler can trace).  I tried to
emulate the code you showed us and there were no signs of any allocated
arrays that weren't cleaned up.  That doesn't mean there isn't one in your
code, but I just can't reproduce it with what you've shown us.  If it's
possible you can write a small program that has the same behavior, that
could help us track it down.

As a side note, what was a little disconcerting, though, was in 2.9.4 with
the same code, it created 28,565 byte[], and there was quite a few more
left uncollected (2,805 arrays).  The allocations are happening in
DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though,
to see if its even a problem.

Thanks,
Christopher

On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. 
neal.granr...@thermofisher.com wrote:

 Or maybe put the changes within a conditional compile code block?

 Thanks DIGY, works great.

 - Neal

 -Original Message-
 From: Prescott Nasser [mailto:geobmx...@hotmail.com]
 Sent: Wednesday, November 30, 2011 2:35 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 Probably makes for a good wiki entry

 Sent from my Windows Phone

 From: Digy
 Sent: 11/30/2011 12:04 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 OK, here is the code that can be compiled against .NET 2.0
 http://pastebin.com/k2f7JfPd

 DIGY

 -Original Message-
 From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
 Sent: Wednesday, November 30, 2011 9:26 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 DIGY,

 Thanks for the tip, but could you be a little more specific?
 Where and how are extension-methods calls replaced?

 For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}

 to eliminate the compile error

 Error   2   Cannot define a new extension method because the compiler
 required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot
 be
 found. Are you missing a reference to System.Core.dll?

 due to the SetT parameter this ThreadLocalt t ?

 - Neal

 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, November 30, 2011 12:27 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
 CloseableThreadLocal

 (like uncommenting ThreadLocalT class and replacing extension-methods
 calls

 with static calls to CloseableThreadLocalExtensions)

 DIGY

 -Original Message-
 From: Christopher Currens [mailto:currens.ch...@gmail.com]
 Sent: Wednesday, November 30, 2011 7:26 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 Trevor,

 I'm not sure if you can use 2.9.4, though, it looks like you're using

 VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we
use

 classes only available in 4.0 (or 3.5?).  However, if you can, I would

 suggest updating, as 2.9.4 should be a fairly stable release.

 The leak I'm talking about is addressed here:

 https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

 isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It
may

 or may not be what your issue is.  You say that it was at one time working

 fine, I assume you mean no memory leak.  I would take some time to see
what

 else in your code has changed.  Make sure you're calling Close on whatever

 needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

 Unfortunately for us, memory leaks are hard to debug over email, and it's

 difficult for us to tell if it's any change to your code or an issue with

 Lucene .NET.  As far as I can tell, this

[Lucene.Net] December Board Report

2011-11-30 Thread Prescott Nasser


The december board report has been updated: 
http://wiki.apache.org/incubator/December2011

 

Please review and adjust as needed,

 

~P

Re: buildbots for PyLucene?

2011-11-30 Thread Andi Vajda


On Nov 29, 2011, at 18:04, Bill Janssen jans...@parc.com wrote:

 Andi Vajda va...@apache.org wrote:
 
 
 On Nov 29, 2011, at 15:18, Bill Janssen jans...@parc.com wrote:
 
 I've once again spent an hour building PyLucene, which gives me some
 sympathy for issue 10:
 =20
 https://issues.apache.org/jira/browse/PYLUCENE-10
 =20
 I was thinking about how to address this...
 =20
 One thing I've found useful at PARC is to set up buildbot tests for
 hard-to-package systems.  Basically, the test just waits for changes to
 the SCM repository, checks out the code, and tries to build.  A nice
 side-effect is that, when successful, it produces a binary for the build
 slave's platform.
 =20
 I'm unsure whether this would work for PyLucene.  The ASF build slaves
 seem pretty coarse-grained.  I see that there is an osx-slave, but
 there's no information about it (10.5? 10.6? 10.7?), no contact, and it's
 down.
 
 I know nothing about the Apache buildbots. Why not contribute buildbots for 
 P=
 yLucene at PARC ?
 
 Because this is something the ASF should really address.  I'm happy to
 volunteer to set up a PyLucene build test on an ASF buildbot -- maybe
 more than one if it's easy to clone.

Given the bizarre netbsd(?) jail the Lucene Java bot is setup in, I can't 
imagine a multi OS x multi Java x multi Python build bot to materialize anytime 
soon.
That being said, I really don't know what is and isn't available as 
infrastructure from the ASF for these kinds of things so I might be completely 
wrong here.

Andi..

 
 Just looked at snakebite.org -- no OS X buildbots there, either.
 
 Bill
 
 
 Andi..
 
 =20
 A possibility would be to use the Python buildbots, but of course there's
 no assurance that Java is installed on any of them.
 =20
 Bill

[jira] [Commented] (LUCENE-3586) Choose a specific Directory implementation running the CheckIndex main

2011-11-30 Thread Luca Cavanna (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159894#comment-13159894
 ] 

Luca Cavanna commented on LUCENE-3586:
--

{quote}
Hmm, I don't think we should add an enum to FSDir here? Can we simply accept 
the class name and then just load that class (maybe prefixing oal.store so user 
doesn't have to type that all the time)?

Also, can we make it a hard error if the specified name isn't recognized? 
(Instead of silently falling back to FSDir.open).
{quote}

That's fine as well. Just a little bit longer than writing NIOFS, MMAP or 
SIMPLE, but I guess it doesn't matter. Mike, do you mean to load the class 
using reflection or compare the input string to those three class names?

Any other opinion?

 Choose a specific Directory implementation running the CheckIndex main
 --

 Key: LUCENE-3586
 URL: https://issues.apache.org/jira/browse/LUCENE-3586
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Luca Cavanna
Assignee: Luca Cavanna
Priority: Minor
 Attachments: LUCENE-3586.patch


 It should be possible to choose a specific Directory implementation to use 
 during the CheckIndex process when we run it from its main.
 What about an additional main parameter?
 In fact, I'm experiencing some problems with MMapDirectory working with a big 
 segment, and after some failed attempts playing with maxChunkSize, I decided 
 to switch to another FSDirectory implementation but I needed to do that on my 
 own main.
 Should we also consider to use a FileSwitchDirectory?
 I'm willing to contribute, could you please let me know your thoughts about 
 it?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.

2011-11-30 Thread Ravish Bhagdev (Created) (JIRA)

Allow controlling an important PDF processing parameter in Tika that splits the 
words in text and is now suppored in version 1.0 of Tika.
-

 Key: SOLR-2930
 URL: https://issues.apache.org/jira/browse/SOLR-2930
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.5
Reporter: Ravish Bhagdev


Tika 1.0 has fixed a major issue with processing and parsing of PDF files that 
was splitting the words incorrectly: 
https://issues.apache.org/jira/browse/TIKA-724

This causes text to be indexed incorrectly in solr and it becomes specially 
visible when using spellcheck features etc.  

They have added a special parameter set using setEnableAutoSpace that fixes the 
problem but there is currently no way of setting this when using Solr.  As 
discussed in thread on above issue, it would be nice if we could control this 
(and in future other) parameter via Solr configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11608 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11608/

1 tests failed.
REGRESSION:  org.apache.lucene.search.TestSearcherManager.testIntermediateClose

Error Message:
java.lang.NullPointerException

Stack Trace:
junit.framework.AssertionFailedError: java.lang.NullPointerException
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.search.TestSearcherManager.testIntermediateClose(TestSearcherManager.java:248)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)




Build Log (for compile errors):
[...truncated 7958 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2472) StatsComponent should support hierarchical facets

2011-11-30 Thread Dmitry Drozdov (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159947#comment-13159947
 ] 

Dmitry Drozdov commented on SOLR-2472:
--

Any chance for this to be merged into trunk?

 StatsComponent should support hierarchical facets
 -

 Key: SOLR-2472
 URL: https://issues.apache.org/jira/browse/SOLR-2472
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1, 4.0
Reporter: Dmitry Drozdov
 Attachments: SOLR-2472.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 It is currently possible to get only single layer of faceting in 
 StatsComponent.
 The proposal is it make it possible to specify stats.facet parameter like 
 this:
 stats=truestats.field=sFieldstats.facet=fField1,fField2
 and get the response like this:
 lst name=stats
 lst name=stats_fields
 lst name=sField
 double name=min1.0/double
 double name=max1.0/double
 double name=sum4.0/double
 long name=count4/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField1
 lst name=fField1Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum2.0/double
 long name=count2/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField2
 lst name=fField2Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 lst name=fField2Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 /lst
 /lst
 /lst
 lst name=fField1Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum2.0/double
 long name=count2/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField2
 lst name=fField2Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 lst name=fField2Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1112 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1112/

No tests ran.

Build Log (for compile errors):
[...truncated 12340 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3609) BooleanFilter changed behavior in 3.5, no longer acts as if minimum should match set to 1

2011-11-30 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159970#comment-13159970
 ] 

Uwe Schindler commented on LUCENE-3609:
---

Committed 3.x revision: 1208375

Now forward-porting

 BooleanFilter changed behavior in 3.5, no longer acts as if minimum should 
 match set to 1
 ---

 Key: LUCENE-3609
 URL: https://issues.apache.org/jira/browse/LUCENE-3609
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.5
Reporter: Shay Banon
Assignee: Uwe Schindler
 Attachments: LUCENE-3609.patch, LUCENE-3609.patch, LUCENE-3609.patch


 The change LUCENE-3446 causes a change in behavior in BooleanFilter. It used 
 to work as if minimum should match clauses is 1 (compared to BQ lingo), but 
 now, if no should clauses match, then the should clauses are ignored, and for 
 example, if there is a must clause, only that one will be used and returned.
 For example, a single must clause and should clause, with the should clause 
 not matching anything, should not match anything, but, it will match whatever 
 the must clause matches.
 The fix is simple, after iterating over the should clauses, if the aggregated 
 bitset is null, return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3609) BooleanFilter changed behavior in 3.5, no longer acts as if minimum should match set to 1

2011-11-30 Thread Uwe Schindler (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3609.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.6

Committed trunk revision: 1208381

 BooleanFilter changed behavior in 3.5, no longer acts as if minimum should 
 match set to 1
 ---

 Key: LUCENE-3609
 URL: https://issues.apache.org/jira/browse/LUCENE-3609
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.5
Reporter: Shay Banon
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3609.patch, LUCENE-3609.patch, LUCENE-3609.patch


 The change LUCENE-3446 causes a change in behavior in BooleanFilter. It used 
 to work as if minimum should match clauses is 1 (compared to BQ lingo), but 
 now, if no should clauses match, then the should clauses are ignored, and for 
 example, if there is a must clause, only that one will be used and returned.
 For example, a single must clause and should clause, with the should clause 
 not matching anything, should not match anything, but, it will match whatever 
 the must clause matches.
 The fix is simple, after iterating over the should clauses, if the aggregated 
 bitset is null, return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)

2011-11-30 Thread Martin Oberhuber (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159983#comment-13159983
]

Martin Oberhuber commented on LUCENE-3607:
--

Hi all,

thanks for the many comments. I understand that there's no desire changing
behavior that's been working (and documented!) for years.

What about a different approach ... would it be possible to write a small Java
main that normalizes an index, very much like stripping an EXE ? That way I
could postprocess my indexes (which are meant for distribution with our
product), but at its core Lucene could continue working as today.

Regarding some other comments,

- Our main reason for shipping a pre-built index is initial search
performance. In a large eclipse based product, generating the docs index on
initial search can take approx 4 minutes on a decent computer. With everything
pre-indexed, initial search can proceed after 10 seconds. That's an important
usability issue for our help system. Another reason is the desire to find any
index building errors at build-time (where we can investigate them) rather than
runtime.

- We do have both the build environment and the deployment environment under
full control (same lucene version, same JVM version, same ICU version, all our
content is en_US).

- Regarding heuristics ... sure the search is heuristic at runtime, but that's
a very different thing than having the build environment heuristic... having
identical input produce identical output is still desirable.

- The issue of different analyzes used at index generation time vs. runtime has
indeed bitten us in the past (see
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c16]]). In my personal
opinion, the choice of analyzer should be bound to the content, and not to the
search environment ... since in many cases the language of the search string
will not be known, but the language of the documents / index is known. Right
now, the best workaround for this at Eclipse is launching Eclipse with a -nl
en_US argument to force US locale when I know all the docs are US... but that
won't work at all in an environment where some docs are English and others are
German, a very common scenario with software products on Eclipse (main product
may be localized but some plugins are not).

Is that analyzer binding to content vs. binding to search issue known and
discussed at Lucene already ? I.e. is it possible to have parts of the index
(the US one) searched with an US analyzer but other parts (the German one) with
a German analyzer ? And, why does the German analyzer truncate words at .
while the US one does not (See
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c18 ]]) ?

Lucene Index files can not be reproduced faithfully (due to timestamps
embedded)

Key: LUCENE-3607
URL: https://issues.apache.org/jira/browse/LUCENE-3607
Project: Lucene - Java
Issue Type: Bug
Components: core/index
Affects Versions: 2.9.1
Environment: Eclipse 3.7
Reporter: Martin Oberhuber
Assignee: Michael McCandless

Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A
pre-generated help index can be shipped together with online content. As per
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]]
it turns out that the help index can not be faithfully reproduced during a
build, because there are timestamps embedded in the index files, and the
NameCounter field in segments_2 contains different contents on every build.
Not being able to faithfully reproduce the index from identical source bits
undermines trust in the index (and software delivery) being correct.
I'm wondering whether this is a known issue and/or has been addressed in a
newer Lucene version already ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2280) commitWithin ignored for a delete query

2011-11-30 Thread Commented


[ 
https://issues.apache.org/jira/browse/SOLR-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159994#comment-13159994
 ] 

Jan Høydahl commented on SOLR-2280:
---

I also plan to add in support for the convenience methods deleteById(String id, 
int commitWithinMs) etc in SolrJ the same way as for adds.

 commitWithin ignored for a delete query
 ---

 Key: SOLR-2280
 URL: https://issues.apache.org/jira/browse/SOLR-2280
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Reporter: David Smiley
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2280-3x.patch, SOLR-2280.patch, SOLR-2280.patch, 
 SOLR-2280.patch


 The commitWithin option on an UpdateRequest is only honored for requests 
 containing new documents.  It does not, for example, work with a delete 
 query.  The following doesn't work as expected:
 {code:java}
 UpdateRequest request = new UpdateRequest();
 request.deleteById(id123);
 request.setCommitWithin(1000);
 solrServer.request(request);
 {code}
 In my opinion, the commitWithin attribute should be  permitted on the 
 delete/ xml tag as well as add/.  Such a change would go in 
 XMLLoader.java and its would have some ramifications elsewhere too.  Once 
 this is done, then UpdateRequest.getXml() can be updated to generate the 
 right XML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)

[
https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159997#comment-13159997
]

Robert Muir commented on LUCENE-3607:
-

{quote}
Is that analyzer binding to content vs. binding to search issue known and
discussed at Lucene already ?
{quote}

No, because its eclipses bug. you can set analyzers however you want in lucene,
we don't enforce anything.

{quote}
And, why does the German analyzer truncate words at . while the US one does
not (See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c18]) ?
{quote}

Because you are using an ancient version of lucene.

Lucene Index files can not be reproduced faithfully (due to timestamps
embedded)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)

2011-11-30 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3607.
-

Resolution: Won't Fix

 Lucene Index files can not be reproduced faithfully (due to timestamps 
 embedded)
 

 Key: LUCENE-3607
 URL: https://issues.apache.org/jira/browse/LUCENE-3607
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 2.9.1
 Environment: Eclipse 3.7
Reporter: Martin Oberhuber
Assignee: Michael McCandless

 Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A 
 pre-generated help index can be shipped together with online content. As per
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]]
 it turns out that the help index can not be faithfully reproduced during a 
 build, because there are timestamps embedded in the index files, and the 
 NameCounter field in segments_2 contains different contents on every build.
 Not being able to faithfully reproduce the index from identical source bits 
 undermines trust in the index (and software delivery) being correct.
 I'm wondering whether this is a known issue and/or has been addressed in a 
 newer Lucene version already ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-11-30 Thread Eric Pugh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160006#comment-13160006
]

Eric Pugh commented on SOLR-1972:
-

Has anyone had thoughts on how to do this via a component that is less
intrusive then modifying RequestHandlerBase? I'd love to do this via a
component that I could compile as a standalone project and then drop in my
existing Solr.

Also, I am only interested in certain subset of queries, so I added a
collection of regex patterns as that are used to test against the query string
to see if it should be included in the rolling statistics. I will upload the
patch. Also fixed the patch to work against the latest trunk.

Need additional query stats in admin interface - median, 95th and 99th
percentile
-

Key: SOLR-1972
URL: https://issues.apache.org/jira/browse/SOLR-1972
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
Attachments: SOLR-1972-url_pattern.patch, SOLR-1972.patch,
SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, elyograg-1972-3.2.patch,
elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch

I would like to see more detailed query statistics from the admin GUI. This
is what you can get now:
requests : 809
errors : 0
timeouts : 0
totalTime : 70053
avgTimePerRequest : 86.59209
avgRequestsPerSecond : 0.8148785
I'd like to see more data on the time per request - median, 95th percentile,
99th percentile, and any other statistical function that makes sense to
include. In my environment, the first bunch of queries after startup tend to
take several seconds each. I find that the average value tends to be useless
until it has several thousand queries under its belt and the caches are
thoroughly warmed. The statistical functions I have mentioned would quickly
eliminate the influence of those initial slow queries.
The system will have to store individual data about each query. I don't know
if this is something Solr does already. It would be nice to have a
configurable count of how many of the most recent data points are kept, to
control the amount of memory the feature uses. The default value could be
something like 1024 or 4096.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-11-30 Thread Eric Pugh (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh updated SOLR-1972:


Attachment: SOLR-1972-url_pattern.patch

Updated to latest trunk, added regex patterns.

 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: SOLR-1972-url_pattern.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/

2 tests failed.
FAILED:  
org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser

Error Message:
Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 1867

Stack Trace:
java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 
1867
at java.text.DateFormat.parse(DateFormat.java:354)
at 
org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFormatSanity(TestNumericQueryParser.java:88)
at 
org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(TestNumericQueryParser.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)


FAILED:  
org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(TestNumericQueryParser.java:495)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)




Build Log (for compile errors):
[...truncated 26196 lines...]

[jira] [Commented] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.


[ 
https://issues.apache.org/jira/browse/SOLR-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160034#comment-13160034
 ] 

Robert Muir commented on SOLR-2930:
---

i think the most important piece is that this parameter is *off* by default.

for a search engine, if some bold content gets duplicated... there could really 
be worse things.

but if spaces get incorrectly added to words, thats going to mess up 
tokenization.

 Allow controlling an important PDF processing parameter in Tika that splits 
 the words in text and is now suppored in version 1.0 of Tika.
 -

 Key: SOLR-2930
 URL: https://issues.apache.org/jira/browse/SOLR-2930
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.5
Reporter: Ravish Bhagdev
  Labels: pdf, text-splitting, tika,

 Tika 1.0 has fixed a major issue with processing and parsing of PDF files 
 that was splitting the words incorrectly: 
 https://issues.apache.org/jira/browse/TIKA-724
 This causes text to be indexed incorrectly in solr and it becomes specially 
 visible when using spellcheck features etc.  
 They have added a special parameter set using setEnableAutoSpace that fixes 
 the problem but there is currently no way of setting this when using Solr.  
 As discussed in thread on above issue, it would be nice if we could control 
 this (and in future other) parameter via Solr configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Stats per group with StatsComponent?

2011-11-30 Thread Morten Lied Johansen



Hi

I posted the below mail to the solr-user list a little over a week ago. 
Since there has been no response, we assume this means that what we need 
is not currently possible.


We need this functionality, and are willing to put in time and effort to 
implement it, but could use some pointers to where it would be natural 
to add this, and ideas for how to best solve it.


I'm also wondering if I should create an issue in JIRA right away, or if 
I should wait until we have a first patch ready?



 Original Message 
Subject: Stats per group with StatsComponent?
Date: Tue, 22 Nov 2011 14:40:45 +0100
From: Morten Lied Johansen morte...@ifi.uio.no
Reply-To: solr-u...@lucene.apache.org
To: solr-u...@lucene.apache.org


Hi

We need to get minimum and maximum values for a field, within a group in
a grouped search-result. Is this possible today, perhaps by using
StatsComponent some way?

I'll flesh out the example a little, to make the question clearer.

We have a number of documents, indexed with a price, date and a hotel.
For each hotel, there are a number of documents, each representing a
price/date combination. We then group our search result on hotel.

We want to show the minimum and maximum price for each hotel.

A little googling leads us to look at StatsComponent, as what it does
would be what we need, if it could be done for each group. There was a
thread on this list in August, Grouping and performing statistics per
group that seemed to go into this a bit, but didn't find a solution.

Is this possible in Solr 3.4, either with StatsComponent, or some other way?

--
Morten
We all live in a yellow subroutine.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.


[ 
https://issues.apache.org/jira/browse/SOLR-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160036#comment-13160036
 ] 

Robert Muir commented on SOLR-2930:
---

my bad, i confused this bug with the pdfbox 'character deletion' 
one (TIKA-767), thats still unfortunately not in tika 1.0 it seems.


 Allow controlling an important PDF processing parameter in Tika that splits 
 the words in text and is now suppored in version 1.0 of Tika.
 -

 Key: SOLR-2930
 URL: https://issues.apache.org/jira/browse/SOLR-2930
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.5
Reporter: Ravish Bhagdev
  Labels: pdf, text-splitting, tika,

 Tika 1.0 has fixed a major issue with processing and parsing of PDF files 
 that was splitting the words incorrectly: 
 https://issues.apache.org/jira/browse/TIKA-724
 This causes text to be indexed incorrectly in solr and it becomes specially 
 visible when using spellcheck features etc.  
 They have added a special parameter set using setEnableAutoSpace that fixes 
 the problem but there is currently no way of setting this when using Solr.  
 As discussed in thread on above issue, it would be nice if we could control 
 this (and in future other) parameter via Solr configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2472) StatsComponent should support hierarchical facets

2011-11-30 Thread Erick Erickson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160046#comment-13160046
 ] 

Erick Erickson commented on SOLR-2472:
--

This patch no longer applies cleanly.

I'll volunteer to shepherd this through the commit process if:

1 we can get some consensus that this is a good thing to do.
2 you update it to apply cleanly, and provide some unit tests, 
StatsComponentTest might be the place to start.

It's probably worthwhile to get consensus before spending time working on the 
patch, could you outline the use-case for this functionality?

 StatsComponent should support hierarchical facets
 -

 Key: SOLR-2472
 URL: https://issues.apache.org/jira/browse/SOLR-2472
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1, 4.0
Reporter: Dmitry Drozdov
 Attachments: SOLR-2472.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 It is currently possible to get only single layer of faceting in 
 StatsComponent.
 The proposal is it make it possible to specify stats.facet parameter like 
 this:
 stats=truestats.field=sFieldstats.facet=fField1,fField2
 and get the response like this:
 lst name=stats
 lst name=stats_fields
 lst name=sField
 double name=min1.0/double
 double name=max1.0/double
 double name=sum4.0/double
 long name=count4/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField1
 lst name=fField1Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum2.0/double
 long name=count2/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField2
 lst name=fField2Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 lst name=fField2Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 /lst
 /lst
 /lst
 lst name=fField1Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum2.0/double
 long name=count2/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField2
 lst name=fField2Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 lst name=fField2Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 1121 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1121/

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest.testMultiThreaded

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:571)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:96)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:599)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:543)




Build Log (for compile errors):
[...truncated 15111 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2929) TermsComponent Adding entries

2011-11-30 Thread maillard (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160056#comment-13160056
]

maillard commented on SOLR-2929:

Thank you for the reponse.
I understand your response.
I have tired to flush and commit my after my update.
I have played around with the mergefactor set to 2.
I have played wit the maxPendingDeletes all without success.
How can i be sure and or force the deletion of thsese marked docs. In other
words how do i make sure that my TermsComponent is a correct view of the
indexes (wihtout any marked for deletion) at a given time?

TermsComponent Adding entries
-

Key: SOLR-2929
URL: https://issues.apache.org/jira/browse/SOLR-2929
Project: Solr
Issue Type: Bug
Components: SearchComponents - other
Affects Versions: 3.3, 3.4
Environment: solr 3.x
Reporter: maillard
Priority: Minor

When indexing multiple documents in one go and then updating one of the
documents in a later process Termscomponent count gets wrongfully incremented.
example indexing two documents with a country field as such:
add
doc
field name=COUNTRYUS/field
field name=IDL20110121151204207/field
/doc
doc
field name=COUNTRYCanada/field
field name=IDL20110121151204208/field
/doc
/add
Termscomponent returns:
US(1)
Canada(1)
Update the first document:
add
doc
field name=COUNTRYUS/field
field name=IDL20110121151204207/field
/doc
/add
Termscomponent returns:
US(2)
Canada(1)
There still are only two documents in the index.
This does not happen when only dealing with a single doc, or when you update
the same set of documents you initially indexed.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Stats per group with StatsComponent?

2011-11-30 Thread Martijn v Groningen

Hi Morten,

I missed your question on the user mailing list. Here is my answer:

With the StatsComponent this isn't possible at the moment. The
StatsComponent will give you the min / max of field for the whole
query result.
If you want the min / max value per group you'll need to do some
coding. The grouping logic is executed inside Lucene collectors
located in the
grouping module. You'll need to create a new second pass collector
that computes the min / max for the top N groups. This collector then
needs to
be wired up in Solr. The AbstractSecondPassGroupingCollector is
something you can take a look at. It collects the top documents for
the top N groups.

You don't need to have a patch to open an issue. Just open an issue
with a good description and maybe some implementation details.

Martijn

On 30 November 2011 14:25, Morten Lied Johansen morte...@ifi.uio.no wrote:

 Hi

 I posted the below mail to the solr-user list a little over a week ago.
 Since there has been no response, we assume this means that what we need is
 not currently possible.

 We need this functionality, and are willing to put in time and effort to
 implement it, but could use some pointers to where it would be natural to
 add this, and ideas for how to best solve it.

 I'm also wondering if I should create an issue in JIRA right away, or if I
 should wait until we have a first patch ready?


  Original Message 
 Subject: Stats per group with StatsComponent?
 Date: Tue, 22 Nov 2011 14:40:45 +0100
 From: Morten Lied Johansen morte...@ifi.uio.no
 Reply-To: solr-u...@lucene.apache.org
 To: solr-u...@lucene.apache.org


 Hi

 We need to get minimum and maximum values for a field, within a group in
 a grouped search-result. Is this possible today, perhaps by using
 StatsComponent some way?

 I'll flesh out the example a little, to make the question clearer.

 We have a number of documents, indexed with a price, date and a hotel.
 For each hotel, there are a number of documents, each representing a
 price/date combination. We then group our search result on hotel.

 We want to show the minimum and maximum price for each hotel.

 A little googling leads us to look at StatsComponent, as what it does
 would be what we need, if it could be done for each group. There was a
 thread on this list in August, Grouping and performing statistics per
 group that seemed to go into this a bit, but didn't find a solution.

 Is this possible in Solr 3.4, either with StatsComponent, or some other way?

 --
 Morten
 We all live in a yellow subroutine.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Stats per group with StatsComponent?

2011-11-30 Thread Morten Lied Johansen


On 30. nov. 2011 14:58, Martijn v Groningen wrote:





With the StatsComponent this isn't possible at the moment. The
StatsComponent will give you the min / max of field for the whole
query result.
If you want the min / max value per group you'll need to do some
coding. The grouping logic is executed inside Lucene collectors
located in the grouping module. You'll need to create a new second
pass collector that computes the min / max for the top N groups. This
collector then needs to be wired up in Solr. The
AbstractSecondPassGroupingCollector is something you can take a look
at. It collects the top documents for the top N groups.


Thank you for your reply. We'll have a look at this and see if we can 
get something going this week.



You don't need to have a patch to open an issue. Just open an issue
with a good description and maybe some implementation details.


I have created an issue, SOLR-2931. Let me know if I should add some 
more details to it. We will update it and follow any discussions as we work.


--
Morten
We all live in a yellow subroutine.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2931) Statistics/aggregated values per group in a grouped response

2011-11-30 Thread Morten Lied Johansen (Created) (JIRA)

Statistics/aggregated values per group in a grouped response


 Key: SOLR-2931
 URL: https://issues.apache.org/jira/browse/SOLR-2931
 Project: Solr
  Issue Type: New Feature
Reporter: Morten Lied Johansen


We need to get minimum and maximum values for a field, within a group in a 
grouped search-result.

I'll flesh out our use-case a little to make our needs clearer:

We have a number of documents, indexed with a price, date and a hotel. For each 
hotel, there are a number of documents, each representing a price/date 
combination. We then group our search result on hotel. We want to show the 
minimum and maximum price for each hotel.

Other use-cases could be to calculate an average or a sum within a group.


We plan to work on this in the coming weeks, and will be supplying patches.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Stats per group with StatsComponent?

2011-11-30 Thread Martijn v Groningen

Looks fine!

Martijn

On 30 November 2011 15:25, Morten Lied Johansen morte...@ifi.uio.no wrote:
 On 30. nov. 2011 14:58, Martijn v Groningen wrote:



 With the StatsComponent this isn't possible at the moment. The
 StatsComponent will give you the min / max of field for the whole
 query result.
 If you want the min / max value per group you'll need to do some
 coding. The grouping logic is executed inside Lucene collectors
 located in the grouping module. You'll need to create a new second
 pass collector that computes the min / max for the top N groups. This
 collector then needs to be wired up in Solr. The
 AbstractSecondPassGroupingCollector is something you can take a look
 at. It collects the top documents for the top N groups.


 Thank you for your reply. We'll have a look at this and see if we can get
 something going this week.


 You don't need to have a patch to open an issue. Just open an issue
 with a good description and maybe some implementation details.


 I have created an issue, SOLR-2931. Let me know if I should add some more
 details to it. We will update it and follow any discussions as we work.


 --
 Morten
 We all live in a yellow subroutine.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

2011-11-30 Thread Robert Muir

This looks like a localization bug. is it possible to get the seed or
more information on this test fail?

Did maven truncate the test output or is there a bug in LuceneTestCase
where its not providing the reproduce-with (hopefully) that it
should if beforeClass() throws an exception?

I looked at the console log and there wasn't any failure information there...

On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/

 2 tests failed.
 FAILED:  
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser

 Error Message:
 Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 1867

 Stack Trace:
 java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 
 1867
        at java.text.DateFormat.parse(DateFormat.java:354)
        at 
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFormatSanity(TestNumericQueryParser.java:88)
        at 
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(TestNumericQueryParser.java:145)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
        at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        at 
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
        at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
        at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
        at 
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
        at 
 org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
        at 
 org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
        at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)


 FAILED:  
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser

 Error Message:
 null

 Stack Trace:
 java.lang.NullPointerException
        at 
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(TestNumericQueryParser.java:495)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        at 
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
        at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
        at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
        at

RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

2011-11-30 Thread Steven A Rowe

I looked at the Surefire report 
https://builds.apache.org/job/Lucene-Solr-Maven-3.x/ws/checkout/lucene/build/contrib/queryparser/surefire-reports/TEST-org.apache.lucene.queryParser.standard.TestNumericQueryParser.xml/*view*/
 and I don't see any more information.

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Wednesday, November 30, 2011 9:55 AM
 To: dev@lucene.apache.org
 Subject: Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

 This looks like a localization bug. is it possible to get the seed or
 more information on this test fail?

 Did maven truncate the test output or is there a bug in LuceneTestCase
 where its not providing the reproduce-with (hopefully) that it
 should if beforeClass() throws an exception?

 I looked at the console log and there wasn't any failure information
 there...

 On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server
 jenk...@builds.apache.org wrote:
  Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/

  2 tests failed.

 FAILED:  org.apache.lucene.queryParser.standard.TestNumericQueryParser.org
 .apache.lucene.queryParser.standard.TestNumericQueryParser

  Error Message:
  Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 1867

  Stack Trace:
  java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET  9
 +0100 1867
         at java.text.DateFormat.parse(DateFormat.java:354)
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFor
 matSanity(TestNumericQueryParser.java:88)
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(
 TestNumericQueryParser.java:145)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho
 d.java:44)
         at
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable
 .java:15)
         at
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.
 java:41)
         at
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:
 27)
         at
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31
 )
         at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
         at
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:
 53)
         at
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provi
 der.java:123)
         at
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java
 :104)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(Refle
 ctionUtils.java:164)
         at
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(Prov
 iderFactory.java:110)
         at
 org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireSt
 arter.java:175)
         at
 org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenFor
 ked(SurefireStarter.java:107)
         at
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)

 FAILED:  org.apache.lucene.queryParser.standard.TestNumericQueryParser.org
 .apache.lucene.queryParser.standard.TestNumericQueryParser

  Error Message:
  null

  Stack Trace:
  java.lang.NullPointerException
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(T
 estNumericQueryParser.java:495)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho
 d.java:44)
         at
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable
 .java:15)
         at
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.
 java:41)
         at
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37
 )
         at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
         at
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:
 53)
         at

Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

2011-11-30 Thread Robert Muir

Thanks Steven.

I think this is a bug in LuceneTestCase (I'll open an issue), because
if I add the following to TestDemo, i get no seed or anything at all:

  @BeforeClass
  public static void beforeClass() throws Exception {
throw new NullPointerException();
  }

junit-sequential:
[junit] Testsuite: org.apache.lucene.TestDemo
[junit] Tests run: 0, Failures: 0, Errors: 1, Time elapsed: 0.143 sec
[junit]
[junit] Testcase: org.apache.lucene.TestDemo:   Caused an ERROR
[junit] null
[junit] java.lang.NullPointerException
[junit] at org.apache.lucene.TestDemo.beforeClass(TestDemo.java:44)
[junit]
[junit]
[junit] Test org.apache.lucene.TestDemo FAILED


On Wed, Nov 30, 2011 at 9:59 AM, Steven A Rowe sar...@syr.edu wrote:
 I looked at the Surefire report 
 https://builds.apache.org/job/Lucene-Solr-Maven-3.x/ws/checkout/lucene/build/contrib/queryparser/surefire-reports/TEST-org.apache.lucene.queryParser.standard.TestNumericQueryParser.xml/*view*/
  and I don't see any more information.

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Wednesday, November 30, 2011 9:55 AM
 To: dev@lucene.apache.org
 Subject: Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

 This looks like a localization bug. is it possible to get the seed or
 more information on this test fail?

 Did maven truncate the test output or is there a bug in LuceneTestCase
 where its not providing the reproduce-with (hopefully) that it
 should if beforeClass() throws an exception?

 I looked at the console log and there wasn't any failure information
 there...

 On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server
 jenk...@builds.apache.org wrote:
  Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/
 
  2 tests failed.
 
 FAILED:  org.apache.lucene.queryParser.standard.TestNumericQueryParser.org
 .apache.lucene.queryParser.standard.TestNumericQueryParser
 
  Error Message:
  Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 1867
 
  Stack Trace:
  java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET  9
 +0100 1867
         at java.text.DateFormat.parse(DateFormat.java:354)
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFor
 matSanity(TestNumericQueryParser.java:88)
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(
 TestNumericQueryParser.java:145)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho
 d.java:44)
         at
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable
 .java:15)
         at
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.
 java:41)
         at
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:
 27)
         at
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31
 )
         at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
         at
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:
 53)
         at
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provi
 der.java:123)
         at
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java
 :104)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(Refle
 ctionUtils.java:164)
         at
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(Prov
 iderFactory.java:110)
         at
 org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireSt
 arter.java:175)
         at
 org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenFor
 ked(SurefireStarter.java:107)
         at
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
 
 
 
 FAILED:  org.apache.lucene.queryParser.standard.TestNumericQueryParser.org
 .apache.lucene.queryParser.standard.TestNumericQueryParser
 
  Error Message:
  null
 
  Stack Trace:
  java.lang.NullPointerException
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(T
 estNumericQueryParser.java:495)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at

[jira] [Created] (LUCENE-3611) If a test fails in beforeClass(), we don't get any debugging information

2011-11-30 Thread Robert Muir (Created) (JIRA)

If a test fails in beforeClass(), we don't get any debugging information


 Key: LUCENE-3611
 URL: https://issues.apache.org/jira/browse/LUCENE-3611
 Project: Lucene - Java
  Issue Type: Test
  Components: general/test
Reporter: Robert Muir


At the minimum we at least need reportPartialFailureInfo()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2280) commitWithin ignored for a delete query

2011-11-30 Thread Updated


 [ 
https://issues.apache.org/jira/browse/SOLR-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2280:
--

Attachment: SOLR-2280.patch
SOLR-2280-3x.patch

New patches which adds new commitWithin capable SolrJ methods for deleteBy*()

 commitWithin ignored for a delete query
 ---

 Key: SOLR-2280
 URL: https://issues.apache.org/jira/browse/SOLR-2280
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Reporter: David Smiley
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2280-3x.patch, SOLR-2280-3x.patch, SOLR-2280.patch, 
 SOLR-2280.patch, SOLR-2280.patch, SOLR-2280.patch


 The commitWithin option on an UpdateRequest is only honored for requests 
 containing new documents.  It does not, for example, work with a delete 
 query.  The following doesn't work as expected:
 {code:java}
 UpdateRequest request = new UpdateRequest();
 request.deleteById(id123);
 request.setCommitWithin(1000);
 solrServer.request(request);
 {code}
 In my opinion, the commitWithin attribute should be  permitted on the 
 delete/ xml tag as well as add/.  Such a change would go in 
 XMLLoader.java and its would have some ramifications elsewhere too.  Once 
 this is done, then UpdateRequest.getXml() can be updated to generate the 
 right XML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud

2011-11-30 Thread Eric Pugh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160083#comment-13160083
 ] 

Eric Pugh commented on SOLR-2805:
-

I started working on something like this, and noticed that ZkController is 
marked final, why is that?   I ended up cut'n'paste into my own class.

 Add a main method to ZkController so that it's easier to script config upload 
 with SolrCloud
 

 Key: SOLR-2805
 URL: https://issues.apache.org/jira/browse/SOLR-2805
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0


 when scripting a cluster setup, it would be nice if it was easy to upload a 
 set of configs - otherwise you have to wait to start secondary servers until 
 the first server has uploaded the config - kind of a pain
 You should be able to do something like:
 java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf 
 conf1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11612 - Failure

2011-11-30 Thread Mikhail Khludnev (Issue Comment Edited) (JIRA)

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11612/

1 tests failed.
REGRESSION:  
org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild(TestBlockJoin.java:659)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




Build Log (for compile errors):
[...truncated 11289 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3586) Choose a specific Directory implementation running the CheckIndex main

2011-11-30 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160102#comment-13160102
 ] 

Michael McCandless commented on LUCENE-3586:


I think just load the classes by name via reflection?  This way if I have my 
own external Dir impl somewhere I can also have CheckIndex use that...

 Choose a specific Directory implementation running the CheckIndex main
 --

 Key: LUCENE-3586
 URL: https://issues.apache.org/jira/browse/LUCENE-3586
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Luca Cavanna
Assignee: Luca Cavanna
Priority: Minor
 Attachments: LUCENE-3586.patch


 It should be possible to choose a specific Directory implementation to use 
 during the CheckIndex process when we run it from its main.
 What about an additional main parameter?
 In fact, I'm experiencing some problems with MMapDirectory working with a big 
 segment, and after some failed attempts playing with maxChunkSize, I decided 
 to switch to another FSDirectory implementation but I needed to do that on my 
 own main.
 Should we also consider to use a FileSwitchDirectory?
 I'm willing to contribute, could you please let me know your thoughts about 
 it?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3608) MultiFields.getUniqueFieldCount is broken

2011-11-30 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160107#comment-13160107
 ] 

Michael McCandless commented on LUCENE-3608:


+1 for -1 ;)

 MultiFields.getUniqueFieldCount is broken
 -

 Key: LUCENE-3608
 URL: https://issues.apache.org/jira/browse/LUCENE-3608
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0


 this returns terms.size(), but terms is lazy-initted. So it wrongly returns 0.
 Simplest fix would be to return -1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2922) Upgrade commons io and lang in Solr

2011-11-30 Thread Koji Sekiguchi (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2922.
--

Resolution: Fixed
  Assignee: Koji Sekiguchi

trunk: Committed revision 1208509.
3x: Committed revision 1208516.

 Upgrade commons io and lang in Solr
 ---

 Key: SOLR-2922
 URL: https://issues.apache.org/jira/browse/SOLR-2922
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.5, 4.0
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: SOLR-2922.patch


 Upgrade commons-io and commons-lang in Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 11608 - Failure

2011-11-30 Thread Michael McCandless

I committed a fix...

Mike McCandless

http://blog.mikemccandless.com

On Wed, Nov 30, 2011 at 4:54 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11608/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.TestSearcherManager.testIntermediateClose

 Error Message:
 java.lang.NullPointerException

 Stack Trace:
 junit.framework.AssertionFailedError: java.lang.NullPointerException
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
        at 
 org.apache.lucene.search.TestSearcherManager.testIntermediateClose(TestSearcherManager.java:248)
        at 
 org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)




 Build Log (for compile errors):
 [...truncated 7958 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Christopher Currens

Trevor,

I'm not sure if you can use 2.9.4, though, it looks like you're using
VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use
classes only available in 4.0 (or 3.5?).  However, if you can, I would
suggest updating, as 2.9.4 should be a fairly stable release.

The leak I'm talking about is addressed here:
https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code
isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may
or may not be what your issue is.  You say that it was at one time working
fine, I assume you mean no memory leak.  I would take some time to see what
else in your code has changed.  Make sure you're calling Close on whatever
needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

Unfortunately for us, memory leaks are hard to debug over email, and it's
difficult for us to tell if it's any change to your code or an issue with
Lucene .NET.  As far as I can tell, this is the only memory leak I can find
that affects 2.9.2.


Thanks,
Christopher

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote:

 We just released 2.9.4 - the website didn't update last night, so ill have
 to try and update it later today. But if you follow the link to download
 2.9.2 dist you'll see folders for 2.9.4.

 I'll send an email to the user and dev lists once i get the website to
 update
 
 From: Trevor Watson
 Sent: 11/30/2011 8:14 AM
 To: lucene-net-...@lucene.apache.org
 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 You said pre 2.9.3  I checked the apache lucene.net page to try to see
 if
 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and
 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong
 spot for updates to lucene.net?

 Thanks for all your help

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 
 powersearchsoftw...@gmail.com wrote:

  I can send you the dll that I am using if you would like.  The documents
  are _mostly_ small documents.  Emails and office docs size of plain text
 
 
  On Tuesday, November 29, 2011, Christopher Currens 
  currens.ch...@gmail.com wrote:
   Do you know how big the documents are that you are trying to
  delete/update?
I'm trying to find a copy of 2.9.2 to see if I can reproduce it.
  
  
   Thanks,
   Christopher
  
   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 
   powersearchsoftw...@gmail.com wrote:
  
   Sorry for the duplicate post. I was on the road and posted both via my
  web
   mail and office mail by mistake
  
   The increase is a very gradual,  the program starts at about 160,000k
   according to task manager (I know that's not entirely accurate, but it
  was
   the best I had at the time) and would, after adding 25,000-40,000
  result in
   an out of memory exception (800,000k according to taskmanager). I
 tried
   building a copy of 2.9.4 to test, but could not find one that worked
 in
   visual studio 2005
  
   I did notice using Ants memory profiler that there were a number of
   byte[32789] arrays that I didn't know where they came from in memory.
  
   On Monday, November 28, 2011, Christopher Currens 
  currens.ch...@gmail.com
   
   wrote:
Hi Trevor,
   
What kind of memory increase are we talking about?  Also, how big
 are
  the
documents that you are indexing, the ones returned from
  getFileInfoDoc()?
 Is it putting an entire file into the index?  Pre 2.9.3 versions
 had
issues with holding onto allocated byte arrays far beyond when they
  were
used.  The memory could only be freed via closing the IndexWriter.
   
I'm a little unclear on exactly what's happening.  Are you noticing
   memory
spike and stay constant at that level or is it a gradual increase?
   Is it
causing your application to error, (ie OutOfMemory exception, etc)?
   
   
Thanks,
Christopher
   
On Mon, Nov 28, 2011 at 5:59 PM, Trevor Watson 
powersearchsoftw...@gmail.com wrote:
   
I'm attempting to use Lucene.Net v2.9.2.2 in a Visual Studio 2005
  (.NET
2.0) environment.  We had a piece of software that WAS working.
  I'm
  not
sure what has changed however, the following code results in a
 memory
   leak
in the Lucene.Net component (or a failure to clean up used memory).
   
The code in issue is here:
   
 private void SaveFileToFileInfo(Lucene.Net.Index.IndexWriter iw,
  bool
delayCommit, string sDataPath)
{
  Document doc = getFileInfoDoc(sDataPath);
  Analyzer analyzer = clsLuceneFunctions.getAnalyzer();
  if (this.FileID == 0)
  {
 string s = ;
  }
  iw.UpdateDocument(new Lucene.Net.Index.Term(FileId,
this.fileID.ToString(0)), doc, analyzer);
   
  analyzer = null;
  doc = null;
  if (!delayCommit)
   iw.Commit();
}
   
Commenting out the line iw.UpdateDocument resulted in no memory
   increase.
I also tried replacing it with a

Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 11612 - Failure

2011-11-30 Thread Michael McCandless

I committed fix...

Mike McCandless

http://blog.mikemccandless.com

On Wed, Nov 30, 2011 at 10:32 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11612/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild

 Error Message:
 null

 Stack Trace:
 java.lang.NullPointerException
        at 
 org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild(TestBlockJoin.java:659)
        at 
 org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




 Build Log (for compile errors):
 [...truncated 11289 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: buildbots for PyLucene?

2011-11-30 Thread Bill Janssen

I sent a note off to Trent Nelson to see if we could use Snakebite for
this purpose.

I'd be happy to set up a buildbot on our internal PARC Jenkins
infrastructure for this, but the results wouldn't be visible outside.

Is there a lucene-infrastructure or apache-infrastructure mailing list
this might be appropriate for?

Bill

[jira] [Created] (SOLR-2932) Replication filelist failures

2011-11-30 Thread Kyle Maxwell (Created) (JIRA)

Replication filelist failures
-

 Key: SOLR-2932
 URL: https://issues.apache.org/jira/browse/SOLR-2932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 3.5
Reporter: Kyle Maxwell


Replicating the bug manually:
http://../replication?command=indexversion 
- 1234561234
http://../replication?command=filelistindexversion=1234561234
- invalid index version

In the logs, I tend to see lines like:
SEVERE: No files to download for indexversion: 1321658703961

This bug only appears on certain indexes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Granroth, Neal V.

DIGY,

Thanks for the tip, but could you be a little more specific?
Where and how are extension-methods calls replaced?

For example, how would I change the CloseableThreadLocalExtensions method
 
public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}


to eliminate the compile error

Error   2   Cannot define a new extension method because the compiler 
required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be 
found. Are you missing a reference to System.Core.dll?

due to the SetT parameter this ThreadLocalt t ?

 
- Neal


-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Wednesday, November 30, 2011 12:27 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
CloseableThreadLocal

(like uncommenting ThreadLocalT class and replacing extension-methods
calls 

with static calls to CloseableThreadLocalExtensions)

 

 

DIGY

 

 

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 7:26 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

Trevor,

 

I'm not sure if you can use 2.9.4, though, it looks like you're using

VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

classes only available in 4.0 (or 3.5?).  However, if you can, I would

suggest updating, as 2.9.4 should be a fairly stable release.

 

The leak I'm talking about is addressed here:

https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

or may not be what your issue is.  You say that it was at one time working

fine, I assume you mean no memory leak.  I would take some time to see what

else in your code has changed.  Make sure you're calling Close on whatever

needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

 

Unfortunately for us, memory leaks are hard to debug over email, and it's

difficult for us to tell if it's any change to your code or an issue with

Lucene .NET.  As far as I can tell, this is the only memory leak I can find

that affects 2.9.2.

 

 

Thanks,

Christopher

 

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
geobmx...@hotmail.comwrote:

 

 We just released 2.9.4 - the website didn't update last night, so ill have

 to try and update it later today. But if you follow the link to download

 2.9.2 dist you'll see folders for 2.9.4.

 

 I'll send an email to the user and dev lists once i get the website to

 update

 

 From: Trevor Watson

 Sent: 11/30/2011 8:14 AM

 To: lucene-net-...@lucene.apache.org

 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

 You said pre 2.9.3  I checked the apache lucene.net page to try to see

 if

 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2
and

 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong

 spot for updates to lucene.net?

 

 Thanks for all your help

 

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 

 powersearchsoftw...@gmail.com wrote:

 

  I can send you the dll that I am using if you would like.  The documents

  are _mostly_ small documents.  Emails and office docs size of plain text

 

 

  On Tuesday, November 29, 2011, Christopher Currens 

  currens.ch...@gmail.com wrote:

   Do you know how big the documents are that you are trying to

  delete/update?

I'm trying to find a copy of 2.9.2 to see if I can reproduce it.

  

  

   Thanks,

   Christopher

  

   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 

   powersearchsoftw...@gmail.com wrote:

  

   Sorry for the duplicate post. I was on the road and posted both via
my

  web

   mail and office mail by mistake

  

   The increase is a very gradual,  the program starts at about 160,000k

   according to task manager (I know that's not entirely accurate, but
it

  was

   the best I had at the time) and would, after adding 25,000-40,000

  result in

   an out of memory exception (800,000k according to taskmanager). I

 tried

   building a copy of 2.9.4 to test, but could not find one that worked

 in

   visual studio 2005

  

   I did notice using Ants memory profiler that there were a number of

   byte[32789] arrays that I didn't know where they came from in memory.

  

   On Monday, November 28, 2011, Christopher Currens 

  currens.ch...@gmail.com

   

   wrote:

Hi Trevor,

   

What kind of memory increase are we talking about?  Also, how big

 are

  the

documents that you are indexing, the ones returned from

  getFileInfoDoc()?

 Is it putting an entire file into the index?  Pre 2.9.3 versions

 had

issues with holding onto allocated byte arrays far beyond when they

  were

used.  The memory could only be freed via

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Mike Sokolov (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160278#comment-13160278
]

Mike Sokolov commented on SOLR-2921:

No not stemmers. Not synonyms, not shinglers or anything that might produce
multiple tokens.

Make any Filters, Tokenizers and CharFilters implement
MultiTermAwareComponent if they should
-

Key: SOLR-2921
URL: https://issues.apache.org/jira/browse/SOLR-2921
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr
to automatically assemble a multiterm analyzer that does the right thing
vis-a-vis transforming the individual terms of a multi-term query at query
time. Examples are: lower casing, folding accents, etc. Currently
(27-Nov-2011), the following classes implement MultiTermAwareComponent:
* ASCIIFoldingFilterFactory
* LowerCaseFilterFactory
* LowerCaseTokenizerFactory
* MappingCharFilterFactory
* PersianCharFilterFactory
When users put any of the above in their query analyzer, Solr will do the
right thing at query time and the perennial question users have, why didn't
my wildcard query automatically lower-case (or accent fold or) my terms?
will be gone. Die question die!
But taking a quick look, for instance, at the various FilterFactories that
exist, there are a number of possibilities that *might* be good candidates
for implementing MultiTermAwareComponent. But I really don't understand the
correct behavior here well enough to know whether these should implement the
interface or not. And this doesn't include other CharFilters or Tokenizers.
Actually implementing the interface is often trivial, see the classes above
for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which
is the right thing in this case.
Here is a quick cull of the Filters that, just from their names, might be
candidates. If anyone wants to take any of them on, that would be great. If
all you can do is provide test cases, I could probably do the code part, just
let me know.
ArabicNormalizationFilterFactory
GreekLowerCaseFilterFactory
HindiNormalizationFilterFactory
ICUFoldingFilterFactory
ICUNormalizer2FilterFactory
ICUTransformFilterFactory
IndicNormalizationFilterFactory
ISOLatin1AccentFilterFactory
PersianNormalizationFilterFactory
RussianLowerCaseFilterFactory
TurkishLowerCaseFilterFactory

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute

2011-11-30 Thread Mikhail Khludnev (Created) (JIRA)

DIHCacheSupport ignores left side of where=xid=x.id attribute
---

 Key: SOLR-2933
 URL: https://issues.apache.org/jira/browse/SOLR-2933
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor


DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk and 
cacheLookup. But support old one where=xid=x.id is broken by 
[DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup]
 - it never put where= sides into the context, but it revealed by 
[SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup],
 which takes just first column as a primary key. That's why all tests are green.

To reproduce the issue I need just reorder entry at [line 
219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]
 and make desc first and picked up as a primary key. 

To do that I propose to chose concrete map class randomly for all DIH test 
cases at 
[createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup].
 

I'm attaching test breaking patch and seed.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute

2011-11-30 Thread Mikhail Khludnev (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-2933:
---

Attachment: AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch

AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map 
class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]


 DIHCacheSupport ignores left side of where=xid=x.id attribute
 ---

 Key: SOLR-2933
 URL: https://issues.apache.org/jira/browse/SOLR-2933
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor
  Labels: noob, random
 Attachments: 
 AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk 
 and cacheLookup. But support old one where=xid=x.id is broken by 
 [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup]
  - it never put where= sides into the context, but it revealed by 
 [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup],
  which takes just first column as a primary key. That's why all tests are 
 green.
 To reproduce the issue I need just reorder entry at [line 
 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]
  and make desc first and picked up as a primary key. 
 To do that I propose to chose concrete map class randomly for all DIH test 
 cases at 
 [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup].
  
 I'm attaching test breaking patch and seed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute


[ 
https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160304#comment-13160304
 ] 

Mikhail Khludnev edited comment on SOLR-2933 at 11/30/11 8:28 PM:
--

AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map 
class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]
 

Let me attach the fix tomorrow.   

  was (Author: mkhludnev):
AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up 
map class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]

  
 DIHCacheSupport ignores left side of where=xid=x.id attribute
 ---

 Key: SOLR-2933
 URL: https://issues.apache.org/jira/browse/SOLR-2933
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor
  Labels: noob, random
 Attachments: 
 AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk 
 and cacheLookup. But support old one where=xid=x.id is broken by 
 [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup]
  - it never put where= sides into the context, but it revealed by 
 [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup],
  which takes just first column as a primary key. That's why all tests are 
 green.
 To reproduce the issue I need just reorder entry at [line 
 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]
  and make desc first and picked up as a primary key. 
 To do that I propose to chose concrete map class randomly for all DIH test 
 cases at 
 [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup].
  
 I'm attaching test breaking patch and seed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute

2011-11-30 Thread Mikhail Khludnev (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160304#comment-13160304
 ] 

Mikhail Khludnev edited comment on SOLR-2933 at 11/30/11 8:31 PM:
--

AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map 
class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]
 

seed which reproduces the fail
{code}
ant test -Dtestcase=TestCachedSqlEntityProcessor -Dtestmethod=withKeyAndLookup 
-Dtests.seed=7735f677498f3558:-29c15941cc37921e:-32c8bd2280b92536 
-Dargs=-Dfile.encoding=UTF-8
{code}

Let me attach the fix tomorrow. It's not a big deal anyway.   

  was (Author: mkhludnev):
AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up 
map class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]
 

Let me attach the fix tomorrow.   
  
 DIHCacheSupport ignores left side of where=xid=x.id attribute
 ---

 Key: SOLR-2933
 URL: https://issues.apache.org/jira/browse/SOLR-2933
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor
  Labels: noob, random
 Attachments: 
 AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk 
 and cacheLookup. But support old one where=xid=x.id is broken by 
 [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup]
  - it never put where= sides into the context, but it revealed by 
 [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup],
  which takes just first column as a primary key. That's why all tests are 
 green.
 To reproduce the issue I need just reorder entry at [line 
 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]
  and make desc first and picked up as a primary key. 
 To do that I propose to chose concrete map class randomly for all DIH test 
 cases at 
 [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup].
  
 I'm attaching test breaking patch and seed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Prescott Nasser

Probably makes for a good wiki entry

Sent from my Windows Phone

From: Digy
Sent: 11/30/2011 12:04 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

OK, here is the code that can be compiled against .NET 2.0
http://pastebin.com/k2f7JfPd

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
Sent: Wednesday, November 30, 2011 9:26 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

DIGY,

Thanks for the tip, but could you be a little more specific?
Where and how are extension-methods calls replaced?

For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}

to eliminate the compile error

Error   2   Cannot define a new extension method because the compiler
required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be
found. Are you missing a reference to System.Core.dll?

due to the SetT parameter this ThreadLocalt t ?

- Neal

-Original Message-
From: Digy [mailto:digyd...@gmail.com]
Sent: Wednesday, November 30, 2011 12:27 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
CloseableThreadLocal

(like uncommenting ThreadLocalT class and replacing extension-methods
calls

with static calls to CloseableThreadLocalExtensions)

DIGY

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com]
Sent: Wednesday, November 30, 2011 7:26 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

Trevor,

I'm not sure if you can use 2.9.4, though, it looks like you're using

VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

classes only available in 4.0 (or 3.5?).  However, if you can, I would

suggest updating, as 2.9.4 should be a fairly stable release.

The leak I'm talking about is addressed here:

https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

or may not be what your issue is.  You say that it was at one time working

fine, I assume you mean no memory leak.  I would take some time to see what

else in your code has changed.  Make sure you're calling Close on whatever

needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

Unfortunately for us, memory leaks are hard to debug over email, and it's

difficult for us to tell if it's any change to your code or an issue with

Lucene .NET.  As far as I can tell, this is the only memory leak I can find

that affects 2.9.2.

Thanks,

Christopher

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
geobmx...@hotmail.comwrote:

 We just released 2.9.4 - the website didn't update last night, so ill have

 to try and update it later today. But if you follow the link to download

 2.9.2 dist you'll see folders for 2.9.4.

 I'll send an email to the user and dev lists once i get the website to

 update

 From: Trevor Watson

 Sent: 11/30/2011 8:14 AM

 To: lucene-net-...@lucene.apache.org

 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 You said pre 2.9.3  I checked the apache lucene.net page to try to see

 if

 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2
and

 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong

 spot for updates to lucene.net?

 Thanks for all your help

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 

 powersearchsoftw...@gmail.com wrote:

  I can send you the dll that I am using if you would like.  The documents

  are _mostly_ small documents.  Emails and office docs size of plain text

  On Tuesday, November 29, 2011, Christopher Currens 

  currens.ch...@gmail.com wrote:

   Do you know how big the documents are that you are trying to

  delete/update?

I'm trying to find a copy of 2.9.2 to see if I can reproduce it.

   Thanks,

   Christopher

   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 

   powersearchsoftw...@gmail.com wrote:

   Sorry for the duplicate post. I was on the road and posted both via
my

  web

   mail and office mail by mistake

   The increase is a very gradual,  the program starts at about 160,000k

   according to task manager (I know that's not entirely accurate, but
it

  was

   the best I had at the time) and would, after adding 25,000-40,000

  result in

   an out of memory exception (800,000k according to taskmanager). I

 tried

   building a copy of 2.9.4 to test, but could not find one that worked

 in

   visual studio 2005

   I did notice using Ants memory profiler that there were a number of

   byte[32789] arrays that I didn't know where

[jira] [Created] (LUCENE-3612) remove _X.fnx

2011-11-30 Thread Robert Muir (Created) (JIRA)

remove _X.fnx
-

 Key: LUCENE-3612
 URL: https://issues.apache.org/jira/browse/LUCENE-3612
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3612.patch

Currently we store a global (not per-segment) field number-name mapping in 
_X.fnx

However, it doesn't actually save us any performance e.g on IndexWriter's init 
because
since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() 
for IFD, etc, 
as thats where hasProx/hasVectors is.

Additionally in the past global files like shared doc stores have caused us 
problems,
(recently we just fixed a bug related to this file in LUCENE-3601).

Finally this is trouble for backwards compatibility as its difficult to handle 
a global
file with the codecs mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3612) remove _X.fnx

2011-11-30 Thread Robert Muir (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3612:


Attachment: LUCENE-3612.patch

Patch: all tests pass.

before committing I think we should cleanup some apis/javadocs, remove the 
various versioning stuff (now unused), and not read/write it in segments files.

 remove _X.fnx
 -

 Key: LUCENE-3612
 URL: https://issues.apache.org/jira/browse/LUCENE-3612
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3612.patch


 Currently we store a global (not per-segment) field number-name mapping in 
 _X.fnx
 However, it doesn't actually save us any performance e.g on IndexWriter's 
 init because
 since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() 
 for IFD, etc, 
 as thats where hasProx/hasVectors is.
 Additionally in the past global files like shared doc stores have caused us 
 problems,
 (recently we just fixed a bug related to this file in LUCENE-3601).
 Finally this is trouble for backwards compatibility as its difficult to 
 handle a global
 file with the codecs mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Christopher Currens

Trevor,

Unforunately I was unable to reproduce the memory leak you're experiencing
in 2.9.2.  Particularly with byte[], of the 18,277 that were created, only
13 were not garbage collected, and it's likely that they are not related to
Lucene (it's possible they are static, therefore would only be destroyed
with the AppDomain, outside of what the profiler can trace).  I tried to
emulate the code you showed us and there were no signs of any allocated
arrays that weren't cleaned up.  That doesn't mean there isn't one in your
code, but I just can't reproduce it with what you've shown us.  If it's
possible you can write a small program that has the same behavior, that
could help us track it down.

As a side note, what was a little disconcerting, though, was in 2.9.4 with
the same code, it created 28,565 byte[], and there was quite a few more
left uncollected (2,805 arrays).  The allocations are happening in
DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though,
to see if its even a problem.


Thanks,
Christopher


On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. 
neal.granr...@thermofisher.com wrote:

 Or maybe put the changes within a conditional compile code block?

 Thanks DIGY, works great.

 - Neal

 -Original Message-
 From: Prescott Nasser [mailto:geobmx...@hotmail.com]
 Sent: Wednesday, November 30, 2011 2:35 PM
 To: lucene-net-...@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 Probably makes for a good wiki entry

 Sent from my Windows Phone
 
 From: Digy
 Sent: 11/30/2011 12:04 PM
 To: lucene-net-...@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 OK, here is the code that can be compiled against .NET 2.0
 http://pastebin.com/k2f7JfPd

 DIGY


 -Original Message-
 From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
 Sent: Wednesday, November 30, 2011 9:26 PM
 To: lucene-net-...@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 DIGY,

 Thanks for the tip, but could you be a little more specific?
 Where and how are extension-methods calls replaced?

 For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}


 to eliminate the compile error

 Error   2   Cannot define a new extension method because the compiler
 required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot
 be
 found. Are you missing a reference to System.Core.dll?

 due to the SetT parameter this ThreadLocalt t ?


 - Neal


 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, November 30, 2011 12:27 PM
 To: lucene-net-...@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
 CloseableThreadLocal

 (like uncommenting ThreadLocalT class and replacing extension-methods
 calls

 with static calls to CloseableThreadLocalExtensions)





 DIGY





 -Original Message-
 From: Christopher Currens [mailto:currens.ch...@gmail.com]
 Sent: Wednesday, November 30, 2011 7:26 PM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2



 Trevor,



 I'm not sure if you can use 2.9.4, though, it looks like you're using

 VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

 classes only available in 4.0 (or 3.5?).  However, if you can, I would

 suggest updating, as 2.9.4 should be a fairly stable release.



 The leak I'm talking about is addressed here:

 https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

 isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

 or may not be what your issue is.  You say that it was at one time working

 fine, I assume you mean no memory leak.  I would take some time to see what

 else in your code has changed.  Make sure you're calling Close on whatever

 needs to be closed (IndexWriter/IndexReader/Analyzers, etc).



 Unfortunately for us, memory leaks are hard to debug over email, and it's

 difficult for us to tell if it's any change to your code or an issue with

 Lucene .NET.  As far as I can tell, this is the only memory leak I can find

 that affects 2.9.2.





 Thanks,

 Christopher



 On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
 geobmx...@hotmail.comwrote:



  We just released 2.9.4 - the website didn't update last night, so ill
 have

  to try and update it later today. But if you follow the link to download

  2.9.2 dist you'll see folders for 2.9.4.

 

  I'll send an email to the user and dev lists once i get the website to

  update

  

  From: Trevor Watson

  Sent: 11/30/2011 8:14 AM

  To: lucene-net-...@lucene.apache.org

  Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

  You said pre 2.9.3  I checked the apache lucene.net

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Erick Erickson (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160406#comment-13160406
]

Erick Erickson commented on SOLR-2921:
--

Not synonyms - agreed.
Not shinglers - agreed.
Not anything that might produce multiple tokens - agreed.

Stemmers... When do stemmers produce multiple tokens? My ignorance of all the
possibilities knows no bounds. I was wondering if, in this case, stemmers
really reduced to prefix queries. Maybe it's just a bad idea altogether, I
guess it begs the question of what use adding stemmers to the mix would be. You
want to match the root, just specify the root with an asterisk and be done with
it. No need to introduce stemming into the MultiTermAwareComponent mix.

But this kind of question is exactly why I have this JIRA in place, we can
collect reasons I wouldn't think of and record them. Before I mess something up
with well-intentioned-but-wrong help.

Make any Filters, Tokenizers and CharFilters implement
MultiTermAwareComponent if they should
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Mark Miller (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160416#comment-13160416
]

Robert Muir commented on SOLR-2921:
---

{quote}
I guess it begs the question of what use adding stemmers to the mix would be.
{quote}

I agree with Mike.

Most stemmers are basically suffix-strippers and use heuristics like term
length. They are not going to work with the syntax of various multitermqueries.
no stemmer is going to stem dogs* to dog*. some might remove any non-alpha
characters completely, and its not a bug that they do this. they are
heuristical in nature and designed to work on natural language text... not
syntax.

Make any Filters, Tokenizers and CharFilters implement
MultiTermAwareComponent if they should
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud

2011-11-30 Thread Mark Miller (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160433#comment-13160433
 ] 

Mark Miller commented on SOLR-2805:
---

I tend to limit by default and open when needed I guess.

FWIW, I've got code for this in the solrcloud branch now.

It lets you temporily launch a zk server, connects to it, and uploads a set of 
conf files by calling (from the /solr folder in a checkout):

java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
127.0.0.1 8983 solr ../example/solr/conf conf1 example/solr/zoo_data

 Add a main method to ZkController so that it's easier to script config upload 
 with SolrCloud
 

 Key: SOLR-2805
 URL: https://issues.apache.org/jira/browse/SOLR-2805
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0


 when scripting a cluster setup, it would be nice if it was easy to upload a 
 set of configs - otherwise you have to wait to start secondary servers until 
 the first server has uploaded the config - kind of a pain
 You should be able to do something like:
 java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf 
 conf1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160433#comment-13160433
 ] 

Mark Miller edited comment on SOLR-2805 at 11/30/11 11:05 PM:
--

I tend to limit by default and open when needed I guess.

FWIW, I've got code for this in the solrcloud branch now.

It lets you temporily launch a zk server, connects to it, and uploads a set of 
conf files by calling (from the /solr folder in a checkout):

{noformat}java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 
127.0.0.1:9983 127.0.0.1 8983 solr ../example/solr/conf conf1 
example/solr/zoo_data{noformat}

  was (Author: markrmil...@gmail.com):
I tend to limit by default and open when needed I guess.

FWIW, I've got code for this in the solrcloud branch now.

It lets you temporily launch a zk server, connects to it, and uploads a set of 
conf files by calling (from the /solr folder in a checkout):

java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
127.0.0.1 8983 solr ../example/solr/conf conf1 example/solr/zoo_data
  
 Add a main method to ZkController so that it's easier to script config upload 
 with SolrCloud
 

 Key: SOLR-2805
 URL: https://issues.apache.org/jira/browse/SOLR-2805
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0


 when scripting a cluster setup, it would be nice if it was easy to upload a 
 set of configs - otherwise you have to wait to start secondary servers until 
 the first server has uploaded the config - kind of a pain
 You should be able to do something like:
 java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf 
 conf1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3612) remove _X.fnx

2011-11-30 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160467#comment-13160467
 ] 

Uwe Schindler commented on LUCENE-3612:
---

+1

 remove _X.fnx
 -

 Key: LUCENE-3612
 URL: https://issues.apache.org/jira/browse/LUCENE-3612
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3612.patch


 Currently we store a global (not per-segment) field number-name mapping in 
 _X.fnx
 However, it doesn't actually save us any performance e.g on IndexWriter's 
 init because
 since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() 
 for IFD, etc, 
 as thats where hasProx/hasVectors is.
 Additionally in the past global files like shared doc stores have caused us 
 problems,
 (recently we just fixed a bug related to this file in LUCENE-3601).
 Finally this is trouble for backwards compatibility as its difficult to 
 handle a global
 file with the codecs mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0

2011-11-30 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160515#comment-13160515
 ] 

Uwe Schindler commented on LUCENE-3606:
---

OK, I will work on this as soon as I can (next weekend). I will be gald to 
remove the copy-on-write setNorm stuff in Lucene40 codec and make Lucene3x 
codec completely read-only (only reading the newest norm file). I hope Robert 
will possibly help me :-)

 Make IndexReader really read-only in Lucene 4.0
 ---

 Key: LUCENE-3606
 URL: https://issues.apache.org/jira/browse/LUCENE-3606
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler

 As we change API completely in Lucene 4.0 we are also free to remove 
 read-write access and commits from IndexReader. This code is so hairy and 
 buggy (as investigated by Robert and Mike today) when you work on 
 SegmentReader level but forget to flush in the DirectoryReader, so its better 
 to really make IndexReaders readonly.
 Currently with IndexReader you can do things like:
 - delete/undelete Documents - Can be done by with IndexWriter, too (using 
 deleteByQuery)
 - change norms - this is a bad idea in general, but when we remove norms at 
 all and replace by DocValues this is obsolete already. Changing DocValues 
 should also be done using IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0

2011-11-30 Thread Uwe Schindler (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-3606:
-

Assignee: Uwe Schindler

 Make IndexReader really read-only in Lucene 4.0
 ---

 Key: LUCENE-3606
 URL: https://issues.apache.org/jira/browse/LUCENE-3606
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler

 As we change API completely in Lucene 4.0 we are also free to remove 
 read-write access and commits from IndexReader. This code is so hairy and 
 buggy (as investigated by Robert and Mike today) when you work on 
 SegmentReader level but forget to flush in the DirectoryReader, so its better 
 to really make IndexReaders readonly.
 Currently with IndexReader you can do things like:
 - delete/undelete Documents - Can be done by with IndexWriter, too (using 
 deleteByQuery)
 - change norms - this is a bad idea in general, but when we remove norms at 
 all and replace by DocValues this is obsolete already. Changing DocValues 
 should also be done using IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0


[ 
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160559#comment-13160559
 ] 

Robert Muir commented on LUCENE-3606:
-

{quote}
finally, holy grail where similarities can declare the normalization 
factor(s) they need, using byte/float/int whatever, and its all unified with 
the docvalues api. IndexReader.norms() maybe goes away here, and maybe 
NormsFormat too.
{quote}

Thinking about this: a clean way to do it would be for Similarity to get a new 
method:
{code}
ValueType getValueType();
{code}

and we would change:
{code}
byte computeNorm(FieldInvertState state);
{code}
to:
{code}
void computeNorm(FieldInvertState state, PerDocFieldValues norm);
{code}

Sims that want to encode multiple index-time scoring factors separately 
could just use BYTES_FIXED_STRAIGHT. This should be only for some rare
sims anyway, because a Sim can pull named 'application' specific scoring
factors from IR.perDocValues() today already.

Its not too crazy either since sims are already doing their own encoding,
so e.g. default sim would just use FIXED_INTS_8.

People that don't want to mess with bytes or smallfloat could use things
like FLOAT_32 if they want and need this.

we would just change FieldInfo.omitNorms to instead be FieldInfo.normValueType,
which is the value type of the norm (null if its omitted, just like 
docValueType).

Preflex FieldInfosReader would just set FIXED_INTS_8 or null, based on
whether the fieldinfos had omitNorms or not. it doesnt support
any other types... 

Finally then, sims would be own their scoring factors, and we could
even remove omitNorms from Field/FieldType etc (just use the correct 
scoring algorithm for the field, if you don't want norms, use a sim
that doesn't need them for scoring)

This would remove the awkward/messy situation where every similarity 
implementation we have has to 'downgrade' itself to handle things like
if the user decided to omit parts of their formula!

 Make IndexReader really read-only in Lucene 4.0
 ---

 Key: LUCENE-3606
 URL: https://issues.apache.org/jira/browse/LUCENE-3606
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler

 As we change API completely in Lucene 4.0 we are also free to remove 
 read-write access and commits from IndexReader. This code is so hairy and 
 buggy (as investigated by Robert and Mike today) when you work on 
 SegmentReader level but forget to flush in the DirectoryReader, so its better 
 to really make IndexReaders readonly.
 Currently with IndexReader you can do things like:
 - delete/undelete Documents - Can be done by with IndexWriter, too (using 
 deleteByQuery)
 - change norms - this is a bad idea in general, but when we remove norms at 
 all and replace by DocValues this is obsolete already. Changing DocValues 
 should also be done using IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Mike Sokolov (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160576#comment-13160576
]

Mike Sokolov commented on SOLR-2921:

I spoke hastily, and it's true that stemmers are different from those other
multi-token things. It would be kind of nice if it were possible to have a
query for do?s actually match the a document containing dogs, even when
matching against a stemmed field, but I don't see how to do it without breaking
all kinds of other things. Consider how messed up range queries would get.
[dogs TO *] would match doge, doggone, and other words in [dog TO dogs] which
would be totally counterintuitive.

Make any Filters, Tokenizers and CharFilters implement
MultiTermAwareComponent if they should
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should