[Lucene.Net] Reminder we need to get a board report in

2011-11-30 Thread Prescott Nasser
FYI the process seems to have changed, board reports are due first of the month 
(if you have to report that month) to give people time to review.

I can handle the report if nobody else, but I won't be able to do it for a day 
or so

[Lucene.Net] Re: Memory Leak in code (mine or Lucene?) 2.9.2.2

2011-11-30 Thread Trevor Watson
I was wrong, analyzer does have the close function.  I closed my analyzer,
but the steady climb in memory is still there.

I wonder if I should create a global analyzer variable and enclose it in a
lock to make sure there isn't any thread issues and use that instead.
Could it be a leak in the analyzer itself?




On Tue, Nov 29, 2011 at 12:16 PM, Trevor Watson 
powersearchsoftw...@gmail.com wrote:

 I don't recall seeing a close function on the analyzer.  But I will
 definitely take a look. Thanks!


 On Tuesday, November 29, 2011, Oren Eini (Ayende Rahien) 
 aye...@ayende.com wrote:
  You need to close the analyzer.
 
  On Tue, Nov 29, 2011 at 12:32 AM, Trevor Watson 
  powersearchsoftw...@gmail.com wrote:
 
  I'm using the following block of code.  The document is created in
 another
  function and written to the Lucene index via an IndexWriter
 
  private void SaveFileToFileInfo(Lucene.Net.Index.IndexWriter iw, bool
  delayCommit, string sDataPath)
 {
 Document doc = getFileInfoDoc(sDataPath);
 Analyzer analyzer = clsLuceneFunctions.getAnalyzer();
 if (this.FileID == 0)
 {
 string s = ;
 }
 iw.UpdateDocument(new Lucene.Net.Index.Term(FileId,
  this.fileID.ToString(0)), doc, analyzer);
 analyzer = null;
 doc = null;
 if (!delayCommit)
 iw.Commit();
 }
 
  When the UpdateDocument line is commented out, everything seems to run
  fine.  When that line of code is run, it slowly creeps up.  However, it
  used to work on some computers and now works on 1 or 2, but fails on our
  client's computers.
 
  Is there an issue with UpdateDocument that I am not aware of in 2.9.2.2?
 
  Thanks in advance.
 
 



[Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Trevor Watson
You said pre 2.9.3  I checked the apache lucene.net page to try to see if
I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and
2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong
spot for updates to lucene.net?

Thanks for all your help

On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 
powersearchsoftw...@gmail.com wrote:

 I can send you the dll that I am using if you would like.  The documents
 are _mostly_ small documents.  Emails and office docs size of plain text


 On Tuesday, November 29, 2011, Christopher Currens 
 currens.ch...@gmail.com wrote:
  Do you know how big the documents are that you are trying to
 delete/update?
   I'm trying to find a copy of 2.9.2 to see if I can reproduce it.
 
 
  Thanks,
  Christopher
 
  On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 
  powersearchsoftw...@gmail.com wrote:
 
  Sorry for the duplicate post. I was on the road and posted both via my
 web
  mail and office mail by mistake
 
  The increase is a very gradual,  the program starts at about 160,000k
  according to task manager (I know that's not entirely accurate, but it
 was
  the best I had at the time) and would, after adding 25,000-40,000
 result in
  an out of memory exception (800,000k according to taskmanager). I tried
  building a copy of 2.9.4 to test, but could not find one that worked in
  visual studio 2005
 
  I did notice using Ants memory profiler that there were a number of
  byte[32789] arrays that I didn't know where they came from in memory.
 
  On Monday, November 28, 2011, Christopher Currens 
 currens.ch...@gmail.com
  
  wrote:
   Hi Trevor,
  
   What kind of memory increase are we talking about?  Also, how big are
 the
   documents that you are indexing, the ones returned from
 getFileInfoDoc()?
Is it putting an entire file into the index?  Pre 2.9.3 versions had
   issues with holding onto allocated byte arrays far beyond when they
 were
   used.  The memory could only be freed via closing the IndexWriter.
  
   I'm a little unclear on exactly what's happening.  Are you noticing
  memory
   spike and stay constant at that level or is it a gradual increase?
  Is it
   causing your application to error, (ie OutOfMemory exception, etc)?
  
  
   Thanks,
   Christopher
  
   On Mon, Nov 28, 2011 at 5:59 PM, Trevor Watson 
   powersearchsoftw...@gmail.com wrote:
  
   I'm attempting to use Lucene.Net v2.9.2.2 in a Visual Studio 2005
 (.NET
   2.0) environment.  We had a piece of software that WAS working.  I'm
 not
   sure what has changed however, the following code results in a memory
  leak
   in the Lucene.Net component (or a failure to clean up used memory).
  
   The code in issue is here:
  
private void SaveFileToFileInfo(Lucene.Net.Index.IndexWriter iw,
 bool
   delayCommit, string sDataPath)
   {
 Document doc = getFileInfoDoc(sDataPath);
 Analyzer analyzer = clsLuceneFunctions.getAnalyzer();
 if (this.FileID == 0)
 {
string s = ;
 }
 iw.UpdateDocument(new Lucene.Net.Index.Term(FileId,
   this.fileID.ToString(0)), doc, analyzer);
  
 analyzer = null;
 doc = null;
 if (!delayCommit)
  iw.Commit();
   }
  
   Commenting out the line iw.UpdateDocument resulted in no memory
  increase.
   I also tried replacing it with a deleteDocument and  AddDocument and
 the
   memory increased the same as using the UpdateDocument function
  
   The getAnalyzer() function returns a ExtendedStandardAnalyzer, but
 it's
  the
   UpdateDocument line specifically that gives me the issue.
  
   Any assistance would be greatly appreciated.
  
   Trevor Watson
  
  
 
 



RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Digy
FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
CloseableThreadLocal

(like uncommenting ThreadLocalT class and replacing extension-methods
calls 

with static calls to CloseableThreadLocalExtensions)

 

 

DIGY

 

 

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 7:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

Trevor,

 

I'm not sure if you can use 2.9.4, though, it looks like you're using

VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

classes only available in 4.0 (or 3.5?).  However, if you can, I would

suggest updating, as 2.9.4 should be a fairly stable release.

 

The leak I'm talking about is addressed here:

https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

or may not be what your issue is.  You say that it was at one time working

fine, I assume you mean no memory leak.  I would take some time to see what

else in your code has changed.  Make sure you're calling Close on whatever

needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

 

Unfortunately for us, memory leaks are hard to debug over email, and it's

difficult for us to tell if it's any change to your code or an issue with

Lucene .NET.  As far as I can tell, this is the only memory leak I can find

that affects 2.9.2.

 

 

Thanks,

Christopher

 

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
geobmx...@hotmail.comwrote:

 

 We just released 2.9.4 - the website didn't update last night, so ill have

 to try and update it later today. But if you follow the link to download

 2.9.2 dist you'll see folders for 2.9.4.

 

 I'll send an email to the user and dev lists once i get the website to

 update

 

 From: Trevor Watson

 Sent: 11/30/2011 8:14 AM

 To: lucene-net-dev@lucene.apache.org

 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

 You said pre 2.9.3  I checked the apache lucene.net page to try to see

 if

 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2
and

 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong

 spot for updates to lucene.net?

 

 Thanks for all your help

 

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 

 powersearchsoftw...@gmail.com wrote:

 

  I can send you the dll that I am using if you would like.  The documents

  are _mostly_ small documents.  Emails and office docs size of plain text

 

 

  On Tuesday, November 29, 2011, Christopher Currens 

  currens.ch...@gmail.com wrote:

   Do you know how big the documents are that you are trying to

  delete/update?

I'm trying to find a copy of 2.9.2 to see if I can reproduce it.

  

  

   Thanks,

   Christopher

  

   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 

   powersearchsoftw...@gmail.com wrote:

  

   Sorry for the duplicate post. I was on the road and posted both via
my

  web

   mail and office mail by mistake

  

   The increase is a very gradual,  the program starts at about 160,000k

   according to task manager (I know that's not entirely accurate, but
it

  was

   the best I had at the time) and would, after adding 25,000-40,000

  result in

   an out of memory exception (800,000k according to taskmanager). I

 tried

   building a copy of 2.9.4 to test, but could not find one that worked

 in

   visual studio 2005

  

   I did notice using Ants memory profiler that there were a number of

   byte[32789] arrays that I didn't know where they came from in memory.

  

   On Monday, November 28, 2011, Christopher Currens 

  currens.ch...@gmail.com

   

   wrote:

Hi Trevor,

   

What kind of memory increase are we talking about?  Also, how big

 are

  the

documents that you are indexing, the ones returned from

  getFileInfoDoc()?

 Is it putting an entire file into the index?  Pre 2.9.3 versions

 had

issues with holding onto allocated byte arrays far beyond when they

  were

used.  The memory could only be freed via closing the IndexWriter.

   

I'm a little unclear on exactly what's happening.  Are you noticing

   memory

spike and stay constant at that level or is it a gradual increase?

   Is it

causing your application to error, (ie OutOfMemory exception, etc)?

   

   

Thanks,

Christopher

   

On Mon, Nov 28, 2011 at 5:59 PM, Trevor Watson 

powersearchsoftw...@gmail.com wrote:

   

I'm attempting to use Lucene.Net v2.9.2.2 in a Visual Studio 2005

  (.NET

2.0) environment.  We had a piece of software that WAS working.

  I'm

  not

sure what has changed however, the following code results in a

 memory

   leak

in the Lucene.Net component (or a failure to clean up used
memory).

   

The code in issue is here:

   

 private void 

RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Digy
OK, here is the code that can be compiled against .NET 2.0
http://pastebin.com/k2f7JfPd

DIGY


-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Wednesday, November 30, 2011 9:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

DIGY,

Thanks for the tip, but could you be a little more specific?
Where and how are extension-methods calls replaced?

For example, how would I change the CloseableThreadLocalExtensions method
 
public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}


to eliminate the compile error

Error   2   Cannot define a new extension method because the compiler
required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be
found. Are you missing a reference to System.Core.dll?

due to the SetT parameter this ThreadLocalt t ?

 
- Neal


-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Wednesday, November 30, 2011 12:27 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
CloseableThreadLocal

(like uncommenting ThreadLocalT class and replacing extension-methods
calls 

with static calls to CloseableThreadLocalExtensions)

 

 

DIGY

 

 

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 7:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

Trevor,

 

I'm not sure if you can use 2.9.4, though, it looks like you're using

VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

classes only available in 4.0 (or 3.5?).  However, if you can, I would

suggest updating, as 2.9.4 should be a fairly stable release.

 

The leak I'm talking about is addressed here:

https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

or may not be what your issue is.  You say that it was at one time working

fine, I assume you mean no memory leak.  I would take some time to see what

else in your code has changed.  Make sure you're calling Close on whatever

needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

 

Unfortunately for us, memory leaks are hard to debug over email, and it's

difficult for us to tell if it's any change to your code or an issue with

Lucene .NET.  As far as I can tell, this is the only memory leak I can find

that affects 2.9.2.

 

 

Thanks,

Christopher

 

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
geobmx...@hotmail.comwrote:

 

 We just released 2.9.4 - the website didn't update last night, so ill have

 to try and update it later today. But if you follow the link to download

 2.9.2 dist you'll see folders for 2.9.4.

 

 I'll send an email to the user and dev lists once i get the website to

 update

 

 From: Trevor Watson

 Sent: 11/30/2011 8:14 AM

 To: lucene-net-dev@lucene.apache.org

 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

 You said pre 2.9.3  I checked the apache lucene.net page to try to see

 if

 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2
and

 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong

 spot for updates to lucene.net?

 

 Thanks for all your help

 

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 

 powersearchsoftw...@gmail.com wrote:

 

  I can send you the dll that I am using if you would like.  The documents

  are _mostly_ small documents.  Emails and office docs size of plain text

 

 

  On Tuesday, November 29, 2011, Christopher Currens 

  currens.ch...@gmail.com wrote:

   Do you know how big the documents are that you are trying to

  delete/update?

I'm trying to find a copy of 2.9.2 to see if I can reproduce it.

  

  

   Thanks,

   Christopher

  

   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 

   powersearchsoftw...@gmail.com wrote:

  

   Sorry for the duplicate post. I was on the road and posted both via
my

  web

   mail and office mail by mistake

  

   The increase is a very gradual,  the program starts at about 160,000k

   according to task manager (I know that's not entirely accurate, but
it

  was

   the best I had at the time) and would, after adding 25,000-40,000

  result in

   an out of memory exception (800,000k according to taskmanager). I

 tried

   building a copy of 2.9.4 to test, but could not find one that worked

 in

   visual studio 2005

  

   I did notice using Ants memory profiler that there were a number of

   byte[32789] arrays that I didn't know where they came from in memory.

  

   On Monday, November 28, 2011, Christopher Currens 

  currens.ch...@gmail.com

   

   wrote:

Hi Trevor,

   

What kind of memory increase are we talking about?  

RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Digy
If I recall it correctly, last memory leak problem for 2.9.2 was reported in
~August from RavenDB, and it was fixed in 2.9.4(g)

DIGY

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 11:33 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

Trevor,

Unforunately I was unable to reproduce the memory leak you're experiencing
in 2.9.2.  Particularly with byte[], of the 18,277 that were created, only
13 were not garbage collected, and it's likely that they are not related to
Lucene (it's possible they are static, therefore would only be destroyed
with the AppDomain, outside of what the profiler can trace).  I tried to
emulate the code you showed us and there were no signs of any allocated
arrays that weren't cleaned up.  That doesn't mean there isn't one in your
code, but I just can't reproduce it with what you've shown us.  If it's
possible you can write a small program that has the same behavior, that
could help us track it down.

As a side note, what was a little disconcerting, though, was in 2.9.4 with
the same code, it created 28,565 byte[], and there was quite a few more
left uncollected (2,805 arrays).  The allocations are happening in
DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though,
to see if its even a problem.


Thanks,
Christopher


On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. 
neal.granr...@thermofisher.com wrote:

 Or maybe put the changes within a conditional compile code block?

 Thanks DIGY, works great.

 - Neal

 -Original Message-
 From: Prescott Nasser [mailto:geobmx...@hotmail.com]
 Sent: Wednesday, November 30, 2011 2:35 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 Probably makes for a good wiki entry

 Sent from my Windows Phone
 
 From: Digy
 Sent: 11/30/2011 12:04 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 OK, here is the code that can be compiled against .NET 2.0
 http://pastebin.com/k2f7JfPd

 DIGY


 -Original Message-
 From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
 Sent: Wednesday, November 30, 2011 9:26 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 DIGY,

 Thanks for the tip, but could you be a little more specific?
 Where and how are extension-methods calls replaced?

 For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}


 to eliminate the compile error

 Error   2   Cannot define a new extension method because the compiler
 required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot
 be
 found. Are you missing a reference to System.Core.dll?

 due to the SetT parameter this ThreadLocalt t ?


 - Neal


 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, November 30, 2011 12:27 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
 CloseableThreadLocal

 (like uncommenting ThreadLocalT class and replacing extension-methods
 calls

 with static calls to CloseableThreadLocalExtensions)





 DIGY





 -Original Message-
 From: Christopher Currens [mailto:currens.ch...@gmail.com]
 Sent: Wednesday, November 30, 2011 7:26 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2



 Trevor,



 I'm not sure if you can use 2.9.4, though, it looks like you're using

 VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we
use

 classes only available in 4.0 (or 3.5?).  However, if you can, I would

 suggest updating, as 2.9.4 should be a fairly stable release.



 The leak I'm talking about is addressed here:

 https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

 isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It
may

 or may not be what your issue is.  You say that it was at one time working

 fine, I assume you mean no memory leak.  I would take some time to see
what

 else in your code has changed.  Make sure you're calling Close on whatever

 needs to be closed (IndexWriter/IndexReader/Analyzers, etc).



 Unfortunately for us, memory leaks are hard to debug over email, and it's

 difficult for us to tell if it's any change to your code or an issue with

 Lucene .NET.  As far as I can tell, this is the only memory leak I can
find

 that affects 2.9.2.





 Thanks,

 Christopher



 On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
 geobmx...@hotmail.comwrote:



  We just released 2.9.4 - the website didn't update last night, so ill
 have

  to try and update it later today. But if you follow the link to download

  2.9.2 

RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Digy
... and it was related with CloseableThreadLocal (fixed in 2.9.4(g)) which
now creates 
compilation problem against .Net20 :)

DIGY

-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Thursday, December 01, 2011 12:09 AM
To: 'lucene-net-dev@lucene.apache.org'
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

If I recall it correctly, last memory leak problem for 2.9.2 was reported in
~August from RavenDB, and it was fixed in 2.9.4(g)

DIGY

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 11:33 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

Trevor,

Unforunately I was unable to reproduce the memory leak you're experiencing
in 2.9.2.  Particularly with byte[], of the 18,277 that were created, only
13 were not garbage collected, and it's likely that they are not related to
Lucene (it's possible they are static, therefore would only be destroyed
with the AppDomain, outside of what the profiler can trace).  I tried to
emulate the code you showed us and there were no signs of any allocated
arrays that weren't cleaned up.  That doesn't mean there isn't one in your
code, but I just can't reproduce it with what you've shown us.  If it's
possible you can write a small program that has the same behavior, that
could help us track it down.

As a side note, what was a little disconcerting, though, was in 2.9.4 with
the same code, it created 28,565 byte[], and there was quite a few more
left uncollected (2,805 arrays).  The allocations are happening in
DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though,
to see if its even a problem.


Thanks,
Christopher


On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. 
neal.granr...@thermofisher.com wrote:

 Or maybe put the changes within a conditional compile code block?

 Thanks DIGY, works great.

 - Neal

 -Original Message-
 From: Prescott Nasser [mailto:geobmx...@hotmail.com]
 Sent: Wednesday, November 30, 2011 2:35 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 Probably makes for a good wiki entry

 Sent from my Windows Phone
 
 From: Digy
 Sent: 11/30/2011 12:04 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 OK, here is the code that can be compiled against .NET 2.0
 http://pastebin.com/k2f7JfPd

 DIGY


 -Original Message-
 From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
 Sent: Wednesday, November 30, 2011 9:26 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 DIGY,

 Thanks for the tip, but could you be a little more specific?
 Where and how are extension-methods calls replaced?

 For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}


 to eliminate the compile error

 Error   2   Cannot define a new extension method because the compiler
 required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot
 be
 found. Are you missing a reference to System.Core.dll?

 due to the SetT parameter this ThreadLocalt t ?


 - Neal


 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, November 30, 2011 12:27 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
 CloseableThreadLocal

 (like uncommenting ThreadLocalT class and replacing extension-methods
 calls

 with static calls to CloseableThreadLocalExtensions)





 DIGY





 -Original Message-
 From: Christopher Currens [mailto:currens.ch...@gmail.com]
 Sent: Wednesday, November 30, 2011 7:26 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2



 Trevor,



 I'm not sure if you can use 2.9.4, though, it looks like you're using

 VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we
use

 classes only available in 4.0 (or 3.5?).  However, if you can, I would

 suggest updating, as 2.9.4 should be a fairly stable release.



 The leak I'm talking about is addressed here:

 https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

 isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It
may

 or may not be what your issue is.  You say that it was at one time working

 fine, I assume you mean no memory leak.  I would take some time to see
what

 else in your code has changed.  Make sure you're calling Close on whatever

 needs to be closed (IndexWriter/IndexReader/Analyzers, etc).



 Unfortunately for us, memory leaks are hard to debug over email, and it's

 difficult for us to tell if it's any change to your code or an issue with

 Lucene .NET.  As far as I can tell, this 

[Lucene.Net] December Board Report

2011-11-30 Thread Prescott Nasser

The december board report has been updated: 
http://wiki.apache.org/incubator/December2011

 

Please review and adjust as needed,

 

~P

Re: buildbots for PyLucene?

2011-11-30 Thread Andi Vajda

On Nov 29, 2011, at 18:04, Bill Janssen jans...@parc.com wrote:

 Andi Vajda va...@apache.org wrote:
 
 
 On Nov 29, 2011, at 15:18, Bill Janssen jans...@parc.com wrote:
 
 I've once again spent an hour building PyLucene, which gives me some
 sympathy for issue 10:
 =20
 https://issues.apache.org/jira/browse/PYLUCENE-10
 =20
 I was thinking about how to address this...
 =20
 One thing I've found useful at PARC is to set up buildbot tests for
 hard-to-package systems.  Basically, the test just waits for changes to
 the SCM repository, checks out the code, and tries to build.  A nice
 side-effect is that, when successful, it produces a binary for the build
 slave's platform.
 =20
 I'm unsure whether this would work for PyLucene.  The ASF build slaves
 seem pretty coarse-grained.  I see that there is an osx-slave, but
 there's no information about it (10.5? 10.6? 10.7?), no contact, and it's
 down.
 
 I know nothing about the Apache buildbots. Why not contribute buildbots for 
 P=
 yLucene at PARC ?
 
 Because this is something the ASF should really address.  I'm happy to
 volunteer to set up a PyLucene build test on an ASF buildbot -- maybe
 more than one if it's easy to clone.

Given the bizarre netbsd(?) jail the Lucene Java bot is setup in, I can't 
imagine a multi OS x multi Java x multi Python build bot to materialize anytime 
soon.
That being said, I really don't know what is and isn't available as 
infrastructure from the ASF for these kinds of things so I might be completely 
wrong here.

Andi..

 
 Just looked at snakebite.org -- no OS X buildbots there, either.
 
 Bill
 
 
 Andi..
 
 =20
 A possibility would be to use the Python buildbots, but of course there's
 no assurance that Java is installed on any of them.
 =20
 Bill



[jira] [Commented] (LUCENE-3586) Choose a specific Directory implementation running the CheckIndex main

2011-11-30 Thread Luca Cavanna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159894#comment-13159894
 ] 

Luca Cavanna commented on LUCENE-3586:
--

{quote}
Hmm, I don't think we should add an enum to FSDir here? Can we simply accept 
the class name and then just load that class (maybe prefixing oal.store so user 
doesn't have to type that all the time)?

Also, can we make it a hard error if the specified name isn't recognized? 
(Instead of silently falling back to FSDir.open).
{quote}

That's fine as well. Just a little bit longer than writing NIOFS, MMAP or 
SIMPLE, but I guess it doesn't matter. Mike, do you mean to load the class 
using reflection or compare the input string to those three class names?

Any other opinion?

 Choose a specific Directory implementation running the CheckIndex main
 --

 Key: LUCENE-3586
 URL: https://issues.apache.org/jira/browse/LUCENE-3586
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Luca Cavanna
Assignee: Luca Cavanna
Priority: Minor
 Attachments: LUCENE-3586.patch


 It should be possible to choose a specific Directory implementation to use 
 during the CheckIndex process when we run it from its main.
 What about an additional main parameter?
 In fact, I'm experiencing some problems with MMapDirectory working with a big 
 segment, and after some failed attempts playing with maxChunkSize, I decided 
 to switch to another FSDirectory implementation but I needed to do that on my 
 own main.
 Should we also consider to use a FileSwitchDirectory?
 I'm willing to contribute, could you please let me know your thoughts about 
 it?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.

2011-11-30 Thread Ravish Bhagdev (Created) (JIRA)
Allow controlling an important PDF processing parameter in Tika that splits the 
words in text and is now suppored in version 1.0 of Tika.
-

 Key: SOLR-2930
 URL: https://issues.apache.org/jira/browse/SOLR-2930
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.5
Reporter: Ravish Bhagdev


Tika 1.0 has fixed a major issue with processing and parsing of PDF files that 
was splitting the words incorrectly: 
https://issues.apache.org/jira/browse/TIKA-724

This causes text to be indexed incorrectly in solr and it becomes specially 
visible when using spellcheck features etc.  

They have added a special parameter set using setEnableAutoSpace that fixes the 
problem but there is currently no way of setting this when using Solr.  As 
discussed in thread on above issue, it would be nice if we could control this 
(and in future other) parameter via Solr configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11608 - Failure

2011-11-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11608/

1 tests failed.
REGRESSION:  org.apache.lucene.search.TestSearcherManager.testIntermediateClose

Error Message:
java.lang.NullPointerException

Stack Trace:
junit.framework.AssertionFailedError: java.lang.NullPointerException
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.search.TestSearcherManager.testIntermediateClose(TestSearcherManager.java:248)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)




Build Log (for compile errors):
[...truncated 7958 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2472) StatsComponent should support hierarchical facets

2011-11-30 Thread Dmitry Drozdov (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159947#comment-13159947
 ] 

Dmitry Drozdov commented on SOLR-2472:
--

Any chance for this to be merged into trunk?

 StatsComponent should support hierarchical facets
 -

 Key: SOLR-2472
 URL: https://issues.apache.org/jira/browse/SOLR-2472
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1, 4.0
Reporter: Dmitry Drozdov
 Attachments: SOLR-2472.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 It is currently possible to get only single layer of faceting in 
 StatsComponent.
 The proposal is it make it possible to specify stats.facet parameter like 
 this:
 stats=truestats.field=sFieldstats.facet=fField1,fField2
 and get the response like this:
 lst name=stats
 lst name=stats_fields
 lst name=sField
 double name=min1.0/double
 double name=max1.0/double
 double name=sum4.0/double
 long name=count4/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField1
 lst name=fField1Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum2.0/double
 long name=count2/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField2
 lst name=fField2Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 lst name=fField2Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 /lst
 /lst
 /lst
 lst name=fField1Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum2.0/double
 long name=count2/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField2
 lst name=fField2Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 lst name=fField2Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1112 - Still Failing

2011-11-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1112/

No tests ran.

Build Log (for compile errors):
[...truncated 12340 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3609) BooleanFilter changed behavior in 3.5, no longer acts as if minimum should match set to 1

2011-11-30 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159970#comment-13159970
 ] 

Uwe Schindler commented on LUCENE-3609:
---

Committed 3.x revision: 1208375

Now forward-porting

 BooleanFilter changed behavior in 3.5, no longer acts as if minimum should 
 match set to 1
 ---

 Key: LUCENE-3609
 URL: https://issues.apache.org/jira/browse/LUCENE-3609
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.5
Reporter: Shay Banon
Assignee: Uwe Schindler
 Attachments: LUCENE-3609.patch, LUCENE-3609.patch, LUCENE-3609.patch


 The change LUCENE-3446 causes a change in behavior in BooleanFilter. It used 
 to work as if minimum should match clauses is 1 (compared to BQ lingo), but 
 now, if no should clauses match, then the should clauses are ignored, and for 
 example, if there is a must clause, only that one will be used and returned.
 For example, a single must clause and should clause, with the should clause 
 not matching anything, should not match anything, but, it will match whatever 
 the must clause matches.
 The fix is simple, after iterating over the should clauses, if the aggregated 
 bitset is null, return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3609) BooleanFilter changed behavior in 3.5, no longer acts as if minimum should match set to 1

2011-11-30 Thread Uwe Schindler (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3609.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.6

Committed trunk revision: 1208381

 BooleanFilter changed behavior in 3.5, no longer acts as if minimum should 
 match set to 1
 ---

 Key: LUCENE-3609
 URL: https://issues.apache.org/jira/browse/LUCENE-3609
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.5
Reporter: Shay Banon
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3609.patch, LUCENE-3609.patch, LUCENE-3609.patch


 The change LUCENE-3446 causes a change in behavior in BooleanFilter. It used 
 to work as if minimum should match clauses is 1 (compared to BQ lingo), but 
 now, if no should clauses match, then the should clauses are ignored, and for 
 example, if there is a must clause, only that one will be used and returned.
 For example, a single must clause and should clause, with the should clause 
 not matching anything, should not match anything, but, it will match whatever 
 the must clause matches.
 The fix is simple, after iterating over the should clauses, if the aggregated 
 bitset is null, return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)

2011-11-30 Thread Martin Oberhuber (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159983#comment-13159983
 ] 

Martin Oberhuber commented on LUCENE-3607:
--

Hi all,

thanks for the many comments. I understand that there's no desire changing 
behavior that's been working (and documented!) for years.

What about a different approach ... would it be possible to write a small Java 
main that normalizes an index, very much like stripping an EXE ? That way I 
could postprocess my indexes (which are meant for distribution with our 
product), but at its core Lucene could continue working as today.

Regarding some other comments,

- Our main reason for shipping a pre-built index is initial search 
performance. In a large eclipse based product, generating the docs index on 
initial search can take approx 4 minutes on a decent computer. With everything 
pre-indexed, initial search can proceed after 10 seconds. That's an important 
usability issue for our help system. Another reason is the desire to find any 
index building errors at build-time (where we can investigate them) rather than 
runtime.

- We do have both the build environment and the deployment environment under 
full control (same lucene version, same JVM version, same ICU version, all our 
content is en_US).

- Regarding heuristics ... sure the search is heuristic at runtime, but that's 
a very different thing than having the build environment heuristic... having 
identical input produce identical output is still desirable.

- The issue of different analyzes used at index generation time vs. runtime has 
indeed bitten us in the past (see 
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c16]]). In my personal 
opinion, the choice of analyzer should be bound to the content, and not to the 
search environment ... since in many cases the language of the search string 
will not be known, but the language of the documents / index is known. Right 
now, the best workaround for this at Eclipse is launching Eclipse with a -nl 
en_US argument to force US locale when I know all the docs are US... but that 
won't work at all in an environment where some docs are English and others are 
German, a very common scenario with software products on Eclipse (main product 
may be localized but some plugins are not).

Is that analyzer binding to content vs. binding to search issue known and 
discussed at Lucene already ? I.e. is it possible to have parts of the index 
(the US one) searched with an US analyzer but other parts (the German one) with 
a German analyzer ? And, why does the German analyzer truncate words at . 
while the US one does not (See 
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c18 ]]) ?

 Lucene Index files can not be reproduced faithfully (due to timestamps 
 embedded)
 

 Key: LUCENE-3607
 URL: https://issues.apache.org/jira/browse/LUCENE-3607
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 2.9.1
 Environment: Eclipse 3.7
Reporter: Martin Oberhuber
Assignee: Michael McCandless

 Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A 
 pre-generated help index can be shipped together with online content. As per
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]]
 it turns out that the help index can not be faithfully reproduced during a 
 build, because there are timestamps embedded in the index files, and the 
 NameCounter field in segments_2 contains different contents on every build.
 Not being able to faithfully reproduce the index from identical source bits 
 undermines trust in the index (and software delivery) being correct.
 I'm wondering whether this is a known issue and/or has been addressed in a 
 newer Lucene version already ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2280) commitWithin ignored for a delete query

2011-11-30 Thread Commented

[ 
https://issues.apache.org/jira/browse/SOLR-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159994#comment-13159994
 ] 

Jan Høydahl commented on SOLR-2280:
---

I also plan to add in support for the convenience methods deleteById(String id, 
int commitWithinMs) etc in SolrJ the same way as for adds.

 commitWithin ignored for a delete query
 ---

 Key: SOLR-2280
 URL: https://issues.apache.org/jira/browse/SOLR-2280
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Reporter: David Smiley
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2280-3x.patch, SOLR-2280.patch, SOLR-2280.patch, 
 SOLR-2280.patch


 The commitWithin option on an UpdateRequest is only honored for requests 
 containing new documents.  It does not, for example, work with a delete 
 query.  The following doesn't work as expected:
 {code:java}
 UpdateRequest request = new UpdateRequest();
 request.deleteById(id123);
 request.setCommitWithin(1000);
 solrServer.request(request);
 {code}
 In my opinion, the commitWithin attribute should be  permitted on the 
 delete/ xml tag as well as add/.  Such a change would go in 
 XMLLoader.java and its would have some ramifications elsewhere too.  Once 
 this is done, then UpdateRequest.getXml() can be updated to generate the 
 right XML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)

2011-11-30 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159997#comment-13159997
 ] 

Robert Muir commented on LUCENE-3607:
-

{quote}
Is that analyzer binding to content vs. binding to search issue known and 
discussed at Lucene already ? 
{quote}

No, because its eclipses bug. you can set analyzers however you want in lucene, 
we don't enforce anything.

{quote}
And, why does the German analyzer truncate words at . while the US one does 
not (See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c18]) ?
{quote}

Because you are using an ancient version of lucene.

 Lucene Index files can not be reproduced faithfully (due to timestamps 
 embedded)
 

 Key: LUCENE-3607
 URL: https://issues.apache.org/jira/browse/LUCENE-3607
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 2.9.1
 Environment: Eclipse 3.7
Reporter: Martin Oberhuber
Assignee: Michael McCandless

 Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A 
 pre-generated help index can be shipped together with online content. As per
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]]
 it turns out that the help index can not be faithfully reproduced during a 
 build, because there are timestamps embedded in the index files, and the 
 NameCounter field in segments_2 contains different contents on every build.
 Not being able to faithfully reproduce the index from identical source bits 
 undermines trust in the index (and software delivery) being correct.
 I'm wondering whether this is a known issue and/or has been addressed in a 
 newer Lucene version already ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)

2011-11-30 Thread Robert Muir (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3607.
-

Resolution: Won't Fix

 Lucene Index files can not be reproduced faithfully (due to timestamps 
 embedded)
 

 Key: LUCENE-3607
 URL: https://issues.apache.org/jira/browse/LUCENE-3607
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 2.9.1
 Environment: Eclipse 3.7
Reporter: Martin Oberhuber
Assignee: Michael McCandless

 Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A 
 pre-generated help index can be shipped together with online content. As per
[[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]]
 it turns out that the help index can not be faithfully reproduced during a 
 build, because there are timestamps embedded in the index files, and the 
 NameCounter field in segments_2 contains different contents on every build.
 Not being able to faithfully reproduce the index from identical source bits 
 undermines trust in the index (and software delivery) being correct.
 I'm wondering whether this is a known issue and/or has been addressed in a 
 newer Lucene version already ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-11-30 Thread Eric Pugh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160006#comment-13160006
 ] 

Eric Pugh commented on SOLR-1972:
-

Has anyone had thoughts on how to do this via a component that is less 
intrusive then modifying RequestHandlerBase?   I'd love to do this via a 
component that I could compile as a standalone project and then drop in my 
existing Solr.   

Also, I am only interested in certain subset of queries, so I added a 
collection of regex patterns as that are used to test against the query string 
to see if it should be included in the rolling statistics.  I will upload the 
patch.   Also fixed the patch to work against the latest trunk.

 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: SOLR-1972-url_pattern.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile

2011-11-30 Thread Eric Pugh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh updated SOLR-1972:


Attachment: SOLR-1972-url_pattern.patch

Updated to latest trunk, added regex patterns.

 Need additional query stats in admin interface - median, 95th and 99th 
 percentile
 -

 Key: SOLR-1972
 URL: https://issues.apache.org/jira/browse/SOLR-1972
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Shawn Heisey
Priority: Minor
 Attachments: SOLR-1972-url_pattern.patch, SOLR-1972.patch, 
 SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, elyograg-1972-3.2.patch, 
 elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch


 I would like to see more detailed query statistics from the admin GUI.  This 
 is what you can get now:
 requests : 809
 errors : 0
 timeouts : 0
 totalTime : 70053
 avgTimePerRequest : 86.59209
 avgRequestsPerSecond : 0.8148785 
 I'd like to see more data on the time per request - median, 95th percentile, 
 99th percentile, and any other statistical function that makes sense to 
 include.  In my environment, the first bunch of queries after startup tend to 
 take several seconds each.  I find that the average value tends to be useless 
 until it has several thousand queries under its belt and the caches are 
 thoroughly warmed.  The statistical functions I have mentioned would quickly 
 eliminate the influence of those initial slow queries.
 The system will have to store individual data about each query.  I don't know 
 if this is something Solr does already.  It would be nice to have a 
 configurable count of how many of the most recent data points are kept, to 
 control the amount of memory the feature uses.  The default value could be 
 something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

2011-11-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/

2 tests failed.
FAILED:  
org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser

Error Message:
Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 1867

Stack Trace:
java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 
1867
at java.text.DateFormat.parse(DateFormat.java:354)
at 
org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFormatSanity(TestNumericQueryParser.java:88)
at 
org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(TestNumericQueryParser.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)


FAILED:  
org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(TestNumericQueryParser.java:495)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)




Build Log (for compile errors):
[...truncated 26196 lines...]




[jira] [Commented] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.

2011-11-30 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160034#comment-13160034
 ] 

Robert Muir commented on SOLR-2930:
---

i think the most important piece is that this parameter is *off* by default.

for a search engine, if some bold content gets duplicated... there could really 
be worse things.

but if spaces get incorrectly added to words, thats going to mess up 
tokenization.

 Allow controlling an important PDF processing parameter in Tika that splits 
 the words in text and is now suppored in version 1.0 of Tika.
 -

 Key: SOLR-2930
 URL: https://issues.apache.org/jira/browse/SOLR-2930
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.5
Reporter: Ravish Bhagdev
  Labels: pdf, text-splitting, tika,

 Tika 1.0 has fixed a major issue with processing and parsing of PDF files 
 that was splitting the words incorrectly: 
 https://issues.apache.org/jira/browse/TIKA-724
 This causes text to be indexed incorrectly in solr and it becomes specially 
 visible when using spellcheck features etc.  
 They have added a special parameter set using setEnableAutoSpace that fixes 
 the problem but there is currently no way of setting this when using Solr.  
 As discussed in thread on above issue, it would be nice if we could control 
 this (and in future other) parameter via Solr configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Stats per group with StatsComponent?

2011-11-30 Thread Morten Lied Johansen


Hi

I posted the below mail to the solr-user list a little over a week ago. 
Since there has been no response, we assume this means that what we need 
is not currently possible.


We need this functionality, and are willing to put in time and effort to 
implement it, but could use some pointers to where it would be natural 
to add this, and ideas for how to best solve it.


I'm also wondering if I should create an issue in JIRA right away, or if 
I should wait until we have a first patch ready?



 Original Message 
Subject: Stats per group with StatsComponent?
Date: Tue, 22 Nov 2011 14:40:45 +0100
From: Morten Lied Johansen morte...@ifi.uio.no
Reply-To: solr-u...@lucene.apache.org
To: solr-u...@lucene.apache.org


Hi

We need to get minimum and maximum values for a field, within a group in
a grouped search-result. Is this possible today, perhaps by using
StatsComponent some way?

I'll flesh out the example a little, to make the question clearer.

We have a number of documents, indexed with a price, date and a hotel.
For each hotel, there are a number of documents, each representing a
price/date combination. We then group our search result on hotel.

We want to show the minimum and maximum price for each hotel.

A little googling leads us to look at StatsComponent, as what it does
would be what we need, if it could be done for each group. There was a
thread on this list in August, Grouping and performing statistics per
group that seemed to go into this a bit, but didn't find a solution.

Is this possible in Solr 3.4, either with StatsComponent, or some other way?

--
Morten
We all live in a yellow subroutine.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.

2011-11-30 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160036#comment-13160036
 ] 

Robert Muir commented on SOLR-2930:
---

my bad, i confused this bug with the pdfbox 'character deletion' 
one (TIKA-767), thats still unfortunately not in tika 1.0 it seems.


 Allow controlling an important PDF processing parameter in Tika that splits 
 the words in text and is now suppored in version 1.0 of Tika.
 -

 Key: SOLR-2930
 URL: https://issues.apache.org/jira/browse/SOLR-2930
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.5
Reporter: Ravish Bhagdev
  Labels: pdf, text-splitting, tika,

 Tika 1.0 has fixed a major issue with processing and parsing of PDF files 
 that was splitting the words incorrectly: 
 https://issues.apache.org/jira/browse/TIKA-724
 This causes text to be indexed incorrectly in solr and it becomes specially 
 visible when using spellcheck features etc.  
 They have added a special parameter set using setEnableAutoSpace that fixes 
 the problem but there is currently no way of setting this when using Solr.  
 As discussed in thread on above issue, it would be nice if we could control 
 this (and in future other) parameter via Solr configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2472) StatsComponent should support hierarchical facets

2011-11-30 Thread Erick Erickson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160046#comment-13160046
 ] 

Erick Erickson commented on SOLR-2472:
--

This patch no longer applies cleanly.

I'll volunteer to shepherd this through the commit process if:

1 we can get some consensus that this is a good thing to do.
2 you update it to apply cleanly, and provide some unit tests, 
StatsComponentTest might be the place to start.

It's probably worthwhile to get consensus before spending time working on the 
patch, could you outline the use-case for this functionality?

 StatsComponent should support hierarchical facets
 -

 Key: SOLR-2472
 URL: https://issues.apache.org/jira/browse/SOLR-2472
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1, 4.0
Reporter: Dmitry Drozdov
 Attachments: SOLR-2472.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 It is currently possible to get only single layer of faceting in 
 StatsComponent.
 The proposal is it make it possible to specify stats.facet parameter like 
 this:
 stats=truestats.field=sFieldstats.facet=fField1,fField2
 and get the response like this:
 lst name=stats
 lst name=stats_fields
 lst name=sField
 double name=min1.0/double
 double name=max1.0/double
 double name=sum4.0/double
 long name=count4/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField1
 lst name=fField1Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum2.0/double
 long name=count2/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField2
 lst name=fField2Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 lst name=fField2Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 /lst
 /lst
 /lst
 lst name=fField1Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum2.0/double
 long name=count2/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 lst name=facets
 lst name=fField2
 lst name=fField2Value1
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 lst name=fField2Value2
 double name=min1.0/double
 double name=max1.0/double
 double name=sum1.0/double
 long name=count1/long
 long name=missing0/long
 double name=sumOfSquares/double
 double name=mean/double
 double name=stddev/double
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst
 /lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 1121 - Failure

2011-11-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1121/

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest.testMultiThreaded

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:571)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:96)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:599)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:543)




Build Log (for compile errors):
[...truncated 15111 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2929) TermsComponent Adding entries

2011-11-30 Thread maillard (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160056#comment-13160056
 ] 

maillard commented on SOLR-2929:


Thank you for the reponse.
I understand your response.
I have tired to flush and commit my after my update.
I have played around with the mergefactor set to 2.
I have played wit the maxPendingDeletes all without success.
How can i be sure and or force the deletion of thsese marked docs. In other 
words how do i make sure that my TermsComponent is a correct view of the 
indexes (wihtout any marked for deletion)  at a given time? 


 TermsComponent Adding entries
 -

 Key: SOLR-2929
 URL: https://issues.apache.org/jira/browse/SOLR-2929
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.3, 3.4
 Environment: solr 3.x
Reporter: maillard
Priority: Minor

 When indexing multiple documents in one go and then updating one of the 
 documents in a later process Termscomponent count gets wrongfully incremented.
 example indexing two documents with a country field as such:
 add
 doc
 field name=COUNTRYUS/field
 field name=IDL20110121151204207/field
 /doc
 doc
 field name=COUNTRYCanada/field
 field name=IDL20110121151204208/field
 /doc
 /add
 Termscomponent returns:
  US(1)
  Canada(1)
 Update the first document:
 add
 doc
 field name=COUNTRYUS/field
 field name=IDL20110121151204207/field
 /doc
 /add
 Termscomponent returns:
  US(2)
  Canada(1)
 There still are only two documents in the index.
 This does not happen when only dealing with a single doc, or when you update 
 the same set of documents you initially indexed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Stats per group with StatsComponent?

2011-11-30 Thread Martijn v Groningen
Hi Morten,

I missed your question on the user mailing list. Here is my answer:

With the StatsComponent this isn't possible at the moment. The
StatsComponent will give you the min / max of field for the whole
query result.
If you want the min / max value per group you'll need to do some
coding. The grouping logic is executed inside Lucene collectors
located in the
grouping module. You'll need to create a new second pass collector
that computes the min / max for the top N groups. This collector then
needs to
be wired up in Solr. The AbstractSecondPassGroupingCollector is
something you can take a look at. It collects the top documents for
the top N groups.

You don't need to have a patch to open an issue. Just open an issue
with a good description and maybe some implementation details.

Martijn

On 30 November 2011 14:25, Morten Lied Johansen morte...@ifi.uio.no wrote:

 Hi

 I posted the below mail to the solr-user list a little over a week ago.
 Since there has been no response, we assume this means that what we need is
 not currently possible.

 We need this functionality, and are willing to put in time and effort to
 implement it, but could use some pointers to where it would be natural to
 add this, and ideas for how to best solve it.

 I'm also wondering if I should create an issue in JIRA right away, or if I
 should wait until we have a first patch ready?


  Original Message 
 Subject: Stats per group with StatsComponent?
 Date: Tue, 22 Nov 2011 14:40:45 +0100
 From: Morten Lied Johansen morte...@ifi.uio.no
 Reply-To: solr-u...@lucene.apache.org
 To: solr-u...@lucene.apache.org


 Hi

 We need to get minimum and maximum values for a field, within a group in
 a grouped search-result. Is this possible today, perhaps by using
 StatsComponent some way?

 I'll flesh out the example a little, to make the question clearer.

 We have a number of documents, indexed with a price, date and a hotel.
 For each hotel, there are a number of documents, each representing a
 price/date combination. We then group our search result on hotel.

 We want to show the minimum and maximum price for each hotel.

 A little googling leads us to look at StatsComponent, as what it does
 would be what we need, if it could be done for each group. There was a
 thread on this list in August, Grouping and performing statistics per
 group that seemed to go into this a bit, but didn't find a solution.

 Is this possible in Solr 3.4, either with StatsComponent, or some other way?

 --
 Morten
 We all live in a yellow subroutine.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Stats per group with StatsComponent?

2011-11-30 Thread Morten Lied Johansen

On 30. nov. 2011 14:58, Martijn v Groningen wrote:





With the StatsComponent this isn't possible at the moment. The
StatsComponent will give you the min / max of field for the whole
query result.
If you want the min / max value per group you'll need to do some
coding. The grouping logic is executed inside Lucene collectors
located in the grouping module. You'll need to create a new second
pass collector that computes the min / max for the top N groups. This
collector then needs to be wired up in Solr. The
AbstractSecondPassGroupingCollector is something you can take a look
at. It collects the top documents for the top N groups.


Thank you for your reply. We'll have a look at this and see if we can 
get something going this week.



You don't need to have a patch to open an issue. Just open an issue
with a good description and maybe some implementation details.


I have created an issue, SOLR-2931. Let me know if I should add some 
more details to it. We will update it and follow any discussions as we work.


--
Morten
We all live in a yellow subroutine.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2931) Statistics/aggregated values per group in a grouped response

2011-11-30 Thread Morten Lied Johansen (Created) (JIRA)
Statistics/aggregated values per group in a grouped response


 Key: SOLR-2931
 URL: https://issues.apache.org/jira/browse/SOLR-2931
 Project: Solr
  Issue Type: New Feature
Reporter: Morten Lied Johansen


We need to get minimum and maximum values for a field, within a group in a 
grouped search-result.

I'll flesh out our use-case a little to make our needs clearer:

We have a number of documents, indexed with a price, date and a hotel. For each 
hotel, there are a number of documents, each representing a price/date 
combination. We then group our search result on hotel. We want to show the 
minimum and maximum price for each hotel.

Other use-cases could be to calculate an average or a sum within a group.


We plan to work on this in the coming weeks, and will be supplying patches.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Stats per group with StatsComponent?

2011-11-30 Thread Martijn v Groningen
Looks fine!

Martijn

On 30 November 2011 15:25, Morten Lied Johansen morte...@ifi.uio.no wrote:
 On 30. nov. 2011 14:58, Martijn v Groningen wrote:



 With the StatsComponent this isn't possible at the moment. The
 StatsComponent will give you the min / max of field for the whole
 query result.
 If you want the min / max value per group you'll need to do some
 coding. The grouping logic is executed inside Lucene collectors
 located in the grouping module. You'll need to create a new second
 pass collector that computes the min / max for the top N groups. This
 collector then needs to be wired up in Solr. The
 AbstractSecondPassGroupingCollector is something you can take a look
 at. It collects the top documents for the top N groups.


 Thank you for your reply. We'll have a look at this and see if we can get
 something going this week.


 You don't need to have a patch to open an issue. Just open an issue
 with a good description and maybe some implementation details.


 I have created an issue, SOLR-2931. Let me know if I should add some more
 details to it. We will update it and follow any discussions as we work.


 --
 Morten
 We all live in a yellow subroutine.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

2011-11-30 Thread Robert Muir
This looks like a localization bug. is it possible to get the seed or
more information on this test fail?

Did maven truncate the test output or is there a bug in LuceneTestCase
where its not providing the reproduce-with (hopefully) that it
should if beforeClass() throws an exception?

I looked at the console log and there wasn't any failure information there...

On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/

 2 tests failed.
 FAILED:  
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser

 Error Message:
 Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 1867

 Stack Trace:
 java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 
 1867
        at java.text.DateFormat.parse(DateFormat.java:354)
        at 
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFormatSanity(TestNumericQueryParser.java:88)
        at 
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(TestNumericQueryParser.java:145)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
        at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        at 
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
        at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
        at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
        at 
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
        at 
 org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
        at 
 org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
        at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)


 FAILED:  
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser

 Error Message:
 null

 Stack Trace:
 java.lang.NullPointerException
        at 
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(TestNumericQueryParser.java:495)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        at 
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
        at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
        at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
        at 
 

RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

2011-11-30 Thread Steven A Rowe
I looked at the Surefire report 
https://builds.apache.org/job/Lucene-Solr-Maven-3.x/ws/checkout/lucene/build/contrib/queryparser/surefire-reports/TEST-org.apache.lucene.queryParser.standard.TestNumericQueryParser.xml/*view*/
 and I don't see any more information.

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Wednesday, November 30, 2011 9:55 AM
 To: dev@lucene.apache.org
 Subject: Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync
 
 This looks like a localization bug. is it possible to get the seed or
 more information on this test fail?
 
 Did maven truncate the test output or is there a bug in LuceneTestCase
 where its not providing the reproduce-with (hopefully) that it
 should if beforeClass() throws an exception?
 
 I looked at the console log and there wasn't any failure information
 there...
 
 On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server
 jenk...@builds.apache.org wrote:
  Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/
 
  2 tests failed.
 
 FAILED:  org.apache.lucene.queryParser.standard.TestNumericQueryParser.org
 .apache.lucene.queryParser.standard.TestNumericQueryParser
 
  Error Message:
  Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 1867
 
  Stack Trace:
  java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET  9
 +0100 1867
         at java.text.DateFormat.parse(DateFormat.java:354)
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFor
 matSanity(TestNumericQueryParser.java:88)
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(
 TestNumericQueryParser.java:145)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho
 d.java:44)
         at
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable
 .java:15)
         at
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.
 java:41)
         at
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:
 27)
         at
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31
 )
         at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
         at
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:
 53)
         at
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provi
 der.java:123)
         at
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java
 :104)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(Refle
 ctionUtils.java:164)
         at
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(Prov
 iderFactory.java:110)
         at
 org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireSt
 arter.java:175)
         at
 org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenFor
 ked(SurefireStarter.java:107)
         at
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
 
 
 
 FAILED:  org.apache.lucene.queryParser.standard.TestNumericQueryParser.org
 .apache.lucene.queryParser.standard.TestNumericQueryParser
 
  Error Message:
  null
 
  Stack Trace:
  java.lang.NullPointerException
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(T
 estNumericQueryParser.java:495)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho
 d.java:44)
         at
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable
 .java:15)
         at
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.
 java:41)
         at
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37
 )
         at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
         at
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:
 53)
         at
 

Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

2011-11-30 Thread Robert Muir
Thanks Steven.

I think this is a bug in LuceneTestCase (I'll open an issue), because
if I add the following to TestDemo, i get no seed or anything at all:

  @BeforeClass
  public static void beforeClass() throws Exception {
throw new NullPointerException();
  }

junit-sequential:
[junit] Testsuite: org.apache.lucene.TestDemo
[junit] Tests run: 0, Failures: 0, Errors: 1, Time elapsed: 0.143 sec
[junit]
[junit] Testcase: org.apache.lucene.TestDemo:   Caused an ERROR
[junit] null
[junit] java.lang.NullPointerException
[junit] at org.apache.lucene.TestDemo.beforeClass(TestDemo.java:44)
[junit]
[junit]
[junit] Test org.apache.lucene.TestDemo FAILED


On Wed, Nov 30, 2011 at 9:59 AM, Steven A Rowe sar...@syr.edu wrote:
 I looked at the Surefire report 
 https://builds.apache.org/job/Lucene-Solr-Maven-3.x/ws/checkout/lucene/build/contrib/queryparser/surefire-reports/TEST-org.apache.lucene.queryParser.standard.TestNumericQueryParser.xml/*view*/
  and I don't see any more information.

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Wednesday, November 30, 2011 9:55 AM
 To: dev@lucene.apache.org
 Subject: Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync

 This looks like a localization bug. is it possible to get the seed or
 more information on this test fail?

 Did maven truncate the test output or is there a bug in LuceneTestCase
 where its not providing the reproduce-with (hopefully) that it
 should if beforeClass() throws an exception?

 I looked at the console log and there wasn't any failure information
 there...

 On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server
 jenk...@builds.apache.org wrote:
  Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/
 
  2 tests failed.
 
 FAILED:  org.apache.lucene.queryParser.standard.TestNumericQueryParser.org
 .apache.lucene.queryParser.standard.TestNumericQueryParser
 
  Error Message:
  Unparseable date: 1867.06.20 6時34分09秒 CET  9 +0100 1867
 
  Stack Trace:
  java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET  9
 +0100 1867
         at java.text.DateFormat.parse(DateFormat.java:354)
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFor
 matSanity(TestNumericQueryParser.java:88)
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(
 TestNumericQueryParser.java:145)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho
 d.java:44)
         at
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable
 .java:15)
         at
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.
 java:41)
         at
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:
 27)
         at
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31
 )
         at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
         at
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:
 53)
         at
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provi
 der.java:123)
         at
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java
 :104)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
 pl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:616)
         at
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(Refle
 ctionUtils.java:164)
         at
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(Prov
 iderFactory.java:110)
         at
 org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireSt
 arter.java:175)
         at
 org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenFor
 ked(SurefireStarter.java:107)
         at
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
 
 
 
 FAILED:  org.apache.lucene.queryParser.standard.TestNumericQueryParser.org
 .apache.lucene.queryParser.standard.TestNumericQueryParser
 
  Error Message:
  null
 
  Stack Trace:
  java.lang.NullPointerException
         at
 org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(T
 estNumericQueryParser.java:495)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
 57)
         at
 

[jira] [Created] (LUCENE-3611) If a test fails in beforeClass(), we don't get any debugging information

2011-11-30 Thread Robert Muir (Created) (JIRA)
If a test fails in beforeClass(), we don't get any debugging information


 Key: LUCENE-3611
 URL: https://issues.apache.org/jira/browse/LUCENE-3611
 Project: Lucene - Java
  Issue Type: Test
  Components: general/test
Reporter: Robert Muir


At the minimum we at least need reportPartialFailureInfo()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2280) commitWithin ignored for a delete query

2011-11-30 Thread Updated

 [ 
https://issues.apache.org/jira/browse/SOLR-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2280:
--

Attachment: SOLR-2280.patch
SOLR-2280-3x.patch

New patches which adds new commitWithin capable SolrJ methods for deleteBy*()

 commitWithin ignored for a delete query
 ---

 Key: SOLR-2280
 URL: https://issues.apache.org/jira/browse/SOLR-2280
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Reporter: David Smiley
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2280-3x.patch, SOLR-2280-3x.patch, SOLR-2280.patch, 
 SOLR-2280.patch, SOLR-2280.patch, SOLR-2280.patch


 The commitWithin option on an UpdateRequest is only honored for requests 
 containing new documents.  It does not, for example, work with a delete 
 query.  The following doesn't work as expected:
 {code:java}
 UpdateRequest request = new UpdateRequest();
 request.deleteById(id123);
 request.setCommitWithin(1000);
 solrServer.request(request);
 {code}
 In my opinion, the commitWithin attribute should be  permitted on the 
 delete/ xml tag as well as add/.  Such a change would go in 
 XMLLoader.java and its would have some ramifications elsewhere too.  Once 
 this is done, then UpdateRequest.getXml() can be updated to generate the 
 right XML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud

2011-11-30 Thread Eric Pugh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160083#comment-13160083
 ] 

Eric Pugh commented on SOLR-2805:
-

I started working on something like this, and noticed that ZkController is 
marked final, why is that?   I ended up cut'n'paste into my own class.

 Add a main method to ZkController so that it's easier to script config upload 
 with SolrCloud
 

 Key: SOLR-2805
 URL: https://issues.apache.org/jira/browse/SOLR-2805
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0


 when scripting a cluster setup, it would be nice if it was easy to upload a 
 set of configs - otherwise you have to wait to start secondary servers until 
 the first server has uploaded the config - kind of a pain
 You should be able to do something like:
 java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf 
 conf1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11612 - Failure

2011-11-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11612/

1 tests failed.
REGRESSION:  
org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild(TestBlockJoin.java:659)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




Build Log (for compile errors):
[...truncated 11289 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3586) Choose a specific Directory implementation running the CheckIndex main

2011-11-30 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160102#comment-13160102
 ] 

Michael McCandless commented on LUCENE-3586:


I think just load the classes by name via reflection?  This way if I have my 
own external Dir impl somewhere I can also have CheckIndex use that...

 Choose a specific Directory implementation running the CheckIndex main
 --

 Key: LUCENE-3586
 URL: https://issues.apache.org/jira/browse/LUCENE-3586
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Luca Cavanna
Assignee: Luca Cavanna
Priority: Minor
 Attachments: LUCENE-3586.patch


 It should be possible to choose a specific Directory implementation to use 
 during the CheckIndex process when we run it from its main.
 What about an additional main parameter?
 In fact, I'm experiencing some problems with MMapDirectory working with a big 
 segment, and after some failed attempts playing with maxChunkSize, I decided 
 to switch to another FSDirectory implementation but I needed to do that on my 
 own main.
 Should we also consider to use a FileSwitchDirectory?
 I'm willing to contribute, could you please let me know your thoughts about 
 it?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3608) MultiFields.getUniqueFieldCount is broken

2011-11-30 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160107#comment-13160107
 ] 

Michael McCandless commented on LUCENE-3608:


+1 for -1 ;)

 MultiFields.getUniqueFieldCount is broken
 -

 Key: LUCENE-3608
 URL: https://issues.apache.org/jira/browse/LUCENE-3608
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0


 this returns terms.size(), but terms is lazy-initted. So it wrongly returns 0.
 Simplest fix would be to return -1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2922) Upgrade commons io and lang in Solr

2011-11-30 Thread Koji Sekiguchi (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2922.
--

Resolution: Fixed
  Assignee: Koji Sekiguchi

trunk: Committed revision 1208509.
3x: Committed revision 1208516.

 Upgrade commons io and lang in Solr
 ---

 Key: SOLR-2922
 URL: https://issues.apache.org/jira/browse/SOLR-2922
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.5, 4.0
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: SOLR-2922.patch


 Upgrade commons-io and commons-lang in Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 11608 - Failure

2011-11-30 Thread Michael McCandless
I committed a fix...

Mike McCandless

http://blog.mikemccandless.com

On Wed, Nov 30, 2011 at 4:54 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11608/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.TestSearcherManager.testIntermediateClose

 Error Message:
 java.lang.NullPointerException

 Stack Trace:
 junit.framework.AssertionFailedError: java.lang.NullPointerException
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
        at 
 org.apache.lucene.search.TestSearcherManager.testIntermediateClose(TestSearcherManager.java:248)
        at 
 org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)




 Build Log (for compile errors):
 [...truncated 7958 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Christopher Currens
Trevor,

I'm not sure if you can use 2.9.4, though, it looks like you're using
VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use
classes only available in 4.0 (or 3.5?).  However, if you can, I would
suggest updating, as 2.9.4 should be a fairly stable release.

The leak I'm talking about is addressed here:
https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code
isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may
or may not be what your issue is.  You say that it was at one time working
fine, I assume you mean no memory leak.  I would take some time to see what
else in your code has changed.  Make sure you're calling Close on whatever
needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

Unfortunately for us, memory leaks are hard to debug over email, and it's
difficult for us to tell if it's any change to your code or an issue with
Lucene .NET.  As far as I can tell, this is the only memory leak I can find
that affects 2.9.2.


Thanks,
Christopher

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote:

 We just released 2.9.4 - the website didn't update last night, so ill have
 to try and update it later today. But if you follow the link to download
 2.9.2 dist you'll see folders for 2.9.4.

 I'll send an email to the user and dev lists once i get the website to
 update
 
 From: Trevor Watson
 Sent: 11/30/2011 8:14 AM
 To: lucene-net-...@lucene.apache.org
 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 You said pre 2.9.3  I checked the apache lucene.net page to try to see
 if
 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and
 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong
 spot for updates to lucene.net?

 Thanks for all your help

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 
 powersearchsoftw...@gmail.com wrote:

  I can send you the dll that I am using if you would like.  The documents
  are _mostly_ small documents.  Emails and office docs size of plain text
 
 
  On Tuesday, November 29, 2011, Christopher Currens 
  currens.ch...@gmail.com wrote:
   Do you know how big the documents are that you are trying to
  delete/update?
I'm trying to find a copy of 2.9.2 to see if I can reproduce it.
  
  
   Thanks,
   Christopher
  
   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 
   powersearchsoftw...@gmail.com wrote:
  
   Sorry for the duplicate post. I was on the road and posted both via my
  web
   mail and office mail by mistake
  
   The increase is a very gradual,  the program starts at about 160,000k
   according to task manager (I know that's not entirely accurate, but it
  was
   the best I had at the time) and would, after adding 25,000-40,000
  result in
   an out of memory exception (800,000k according to taskmanager). I
 tried
   building a copy of 2.9.4 to test, but could not find one that worked
 in
   visual studio 2005
  
   I did notice using Ants memory profiler that there were a number of
   byte[32789] arrays that I didn't know where they came from in memory.
  
   On Monday, November 28, 2011, Christopher Currens 
  currens.ch...@gmail.com
   
   wrote:
Hi Trevor,
   
What kind of memory increase are we talking about?  Also, how big
 are
  the
documents that you are indexing, the ones returned from
  getFileInfoDoc()?
 Is it putting an entire file into the index?  Pre 2.9.3 versions
 had
issues with holding onto allocated byte arrays far beyond when they
  were
used.  The memory could only be freed via closing the IndexWriter.
   
I'm a little unclear on exactly what's happening.  Are you noticing
   memory
spike and stay constant at that level or is it a gradual increase?
   Is it
causing your application to error, (ie OutOfMemory exception, etc)?
   
   
Thanks,
Christopher
   
On Mon, Nov 28, 2011 at 5:59 PM, Trevor Watson 
powersearchsoftw...@gmail.com wrote:
   
I'm attempting to use Lucene.Net v2.9.2.2 in a Visual Studio 2005
  (.NET
2.0) environment.  We had a piece of software that WAS working.
  I'm
  not
sure what has changed however, the following code results in a
 memory
   leak
in the Lucene.Net component (or a failure to clean up used memory).
   
The code in issue is here:
   
 private void SaveFileToFileInfo(Lucene.Net.Index.IndexWriter iw,
  bool
delayCommit, string sDataPath)
{
  Document doc = getFileInfoDoc(sDataPath);
  Analyzer analyzer = clsLuceneFunctions.getAnalyzer();
  if (this.FileID == 0)
  {
 string s = ;
  }
  iw.UpdateDocument(new Lucene.Net.Index.Term(FileId,
this.fileID.ToString(0)), doc, analyzer);
   
  analyzer = null;
  doc = null;
  if (!delayCommit)
   iw.Commit();
}
   
Commenting out the line iw.UpdateDocument resulted in no memory
   increase.
I also tried replacing it with a 

Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 11612 - Failure

2011-11-30 Thread Michael McCandless
I committed fix...

Mike McCandless

http://blog.mikemccandless.com

On Wed, Nov 30, 2011 at 10:32 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11612/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild

 Error Message:
 null

 Stack Trace:
 java.lang.NullPointerException
        at 
 org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild(TestBlockJoin.java:659)
        at 
 org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
        at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




 Build Log (for compile errors):
 [...truncated 11289 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: buildbots for PyLucene?

2011-11-30 Thread Bill Janssen
I sent a note off to Trent Nelson to see if we could use Snakebite for
this purpose.

I'd be happy to set up a buildbot on our internal PARC Jenkins
infrastructure for this, but the results wouldn't be visible outside.

Is there a lucene-infrastructure or apache-infrastructure mailing list
this might be appropriate for?

Bill


[jira] [Created] (SOLR-2932) Replication filelist failures

2011-11-30 Thread Kyle Maxwell (Created) (JIRA)
Replication filelist failures
-

 Key: SOLR-2932
 URL: https://issues.apache.org/jira/browse/SOLR-2932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 3.5
Reporter: Kyle Maxwell


Replicating the bug manually:
http://../replication?command=indexversion 
- 1234561234
http://../replication?command=filelistindexversion=1234561234
- invalid index version

In the logs, I tend to see lines like:
SEVERE: No files to download for indexversion: 1321658703961

This bug only appears on certain indexes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Granroth, Neal V.
DIGY,

Thanks for the tip, but could you be a little more specific?
Where and how are extension-methods calls replaced?

For example, how would I change the CloseableThreadLocalExtensions method
 
public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}


to eliminate the compile error

Error   2   Cannot define a new extension method because the compiler 
required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be 
found. Are you missing a reference to System.Core.dll?

due to the SetT parameter this ThreadLocalt t ?

 
- Neal


-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Wednesday, November 30, 2011 12:27 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
CloseableThreadLocal

(like uncommenting ThreadLocalT class and replacing extension-methods
calls 

with static calls to CloseableThreadLocalExtensions)

 

 

DIGY

 

 

-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com] 
Sent: Wednesday, November 30, 2011 7:26 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

Trevor,

 

I'm not sure if you can use 2.9.4, though, it looks like you're using

VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

classes only available in 4.0 (or 3.5?).  However, if you can, I would

suggest updating, as 2.9.4 should be a fairly stable release.

 

The leak I'm talking about is addressed here:

https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

or may not be what your issue is.  You say that it was at one time working

fine, I assume you mean no memory leak.  I would take some time to see what

else in your code has changed.  Make sure you're calling Close on whatever

needs to be closed (IndexWriter/IndexReader/Analyzers, etc).

 

Unfortunately for us, memory leaks are hard to debug over email, and it's

difficult for us to tell if it's any change to your code or an issue with

Lucene .NET.  As far as I can tell, this is the only memory leak I can find

that affects 2.9.2.

 

 

Thanks,

Christopher

 

On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
geobmx...@hotmail.comwrote:

 

 We just released 2.9.4 - the website didn't update last night, so ill have

 to try and update it later today. But if you follow the link to download

 2.9.2 dist you'll see folders for 2.9.4.

 

 I'll send an email to the user and dev lists once i get the website to

 update

 

 From: Trevor Watson

 Sent: 11/30/2011 8:14 AM

 To: lucene-net-...@lucene.apache.org

 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

 You said pre 2.9.3  I checked the apache lucene.net page to try to see

 if

 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2
and

 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong

 spot for updates to lucene.net?

 

 Thanks for all your help

 

 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 

 powersearchsoftw...@gmail.com wrote:

 

  I can send you the dll that I am using if you would like.  The documents

  are _mostly_ small documents.  Emails and office docs size of plain text

 

 

  On Tuesday, November 29, 2011, Christopher Currens 

  currens.ch...@gmail.com wrote:

   Do you know how big the documents are that you are trying to

  delete/update?

I'm trying to find a copy of 2.9.2 to see if I can reproduce it.

  

  

   Thanks,

   Christopher

  

   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 

   powersearchsoftw...@gmail.com wrote:

  

   Sorry for the duplicate post. I was on the road and posted both via
my

  web

   mail and office mail by mistake

  

   The increase is a very gradual,  the program starts at about 160,000k

   according to task manager (I know that's not entirely accurate, but
it

  was

   the best I had at the time) and would, after adding 25,000-40,000

  result in

   an out of memory exception (800,000k according to taskmanager). I

 tried

   building a copy of 2.9.4 to test, but could not find one that worked

 in

   visual studio 2005

  

   I did notice using Ants memory profiler that there were a number of

   byte[32789] arrays that I didn't know where they came from in memory.

  

   On Monday, November 28, 2011, Christopher Currens 

  currens.ch...@gmail.com

   

   wrote:

Hi Trevor,

   

What kind of memory increase are we talking about?  Also, how big

 are

  the

documents that you are indexing, the ones returned from

  getFileInfoDoc()?

 Is it putting an entire file into the index?  Pre 2.9.3 versions

 had

issues with holding onto allocated byte arrays far beyond when they

  were

used.  The memory could only be freed via 

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Mike Sokolov (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160278#comment-13160278
 ] 

Mike Sokolov commented on SOLR-2921:


No not stemmers.  Not synonyms, not shinglers or anything that might produce 
multiple tokens.


 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute

2011-11-30 Thread Mikhail Khludnev (Created) (JIRA)
DIHCacheSupport ignores left side of where=xid=x.id attribute
---

 Key: SOLR-2933
 URL: https://issues.apache.org/jira/browse/SOLR-2933
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor


DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk and 
cacheLookup. But support old one where=xid=x.id is broken by 
[DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup]
 - it never put where= sides into the context, but it revealed by 
[SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup],
 which takes just first column as a primary key. That's why all tests are green.

To reproduce the issue I need just reorder entry at [line 
219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]
 and make desc first and picked up as a primary key. 

To do that I propose to chose concrete map class randomly for all DIH test 
cases at 
[createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup].
 

I'm attaching test breaking patch and seed.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute

2011-11-30 Thread Mikhail Khludnev (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-2933:
---

Attachment: AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch

AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map 
class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]


 DIHCacheSupport ignores left side of where=xid=x.id attribute
 ---

 Key: SOLR-2933
 URL: https://issues.apache.org/jira/browse/SOLR-2933
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor
  Labels: noob, random
 Attachments: 
 AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk 
 and cacheLookup. But support old one where=xid=x.id is broken by 
 [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup]
  - it never put where= sides into the context, but it revealed by 
 [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup],
  which takes just first column as a primary key. That's why all tests are 
 green.
 To reproduce the issue I need just reorder entry at [line 
 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]
  and make desc first and picked up as a primary key. 
 To do that I propose to chose concrete map class randomly for all DIH test 
 cases at 
 [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup].
  
 I'm attaching test breaking patch and seed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute

2011-11-30 Thread Mikhail Khludnev (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160304#comment-13160304
 ] 

Mikhail Khludnev edited comment on SOLR-2933 at 11/30/11 8:28 PM:
--

AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map 
class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]
 

Let me attach the fix tomorrow.   

  was (Author: mkhludnev):
AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up 
map class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]

  
 DIHCacheSupport ignores left side of where=xid=x.id attribute
 ---

 Key: SOLR-2933
 URL: https://issues.apache.org/jira/browse/SOLR-2933
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor
  Labels: noob, random
 Attachments: 
 AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk 
 and cacheLookup. But support old one where=xid=x.id is broken by 
 [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup]
  - it never put where= sides into the context, but it revealed by 
 [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup],
  which takes just first column as a primary key. That's why all tests are 
 green.
 To reproduce the issue I need just reorder entry at [line 
 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]
  and make desc first and picked up as a primary key. 
 To do that I propose to chose concrete map class randomly for all DIH test 
 cases at 
 [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup].
  
 I'm attaching test breaking patch and seed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute

2011-11-30 Thread Mikhail Khludnev (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160304#comment-13160304
 ] 

Mikhail Khludnev edited comment on SOLR-2933 at 11/30/11 8:31 PM:
--

AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map 
class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]
 

seed which reproduces the fail
{code}
ant test -Dtestcase=TestCachedSqlEntityProcessor -Dtestmethod=withKeyAndLookup 
-Dtests.seed=7735f677498f3558:-29c15941cc37921e:-32c8bd2280b92536 
-Dargs=-Dfile.encoding=UTF-8
{code}

Let me attach the fix tomorrow. It's not a big deal anyway.   

  was (Author: mkhludnev):
AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up 
map class randomly in all DIH testcases.
It breaks 
[TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup].
 More explanations are 
[1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689]
 
[2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418]
 
[3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431]
 

Let me attach the fix tomorrow.   
  
 DIHCacheSupport ignores left side of where=xid=x.id attribute
 ---

 Key: SOLR-2933
 URL: https://issues.apache.org/jira/browse/SOLR-2933
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: Mikhail Khludnev
Priority: Minor
  Labels: noob, random
 Attachments: 
 AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk 
 and cacheLookup. But support old one where=xid=x.id is broken by 
 [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup]
  - it never put where= sides into the context, but it revealed by 
 [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup],
  which takes just first column as a primary key. That's why all tests are 
 green.
 To reproduce the issue I need just reorder entry at [line 
 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]
  and make desc first and picked up as a primary key. 
 To do that I propose to chose concrete map class randomly for all DIH test 
 cases at 
 [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup].
  
 I'm attaching test breaking patch and seed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Prescott Nasser
Probably makes for a good wiki entry

Sent from my Windows Phone

From: Digy
Sent: 11/30/2011 12:04 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

OK, here is the code that can be compiled against .NET 2.0
http://pastebin.com/k2f7JfPd

DIGY


-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
Sent: Wednesday, November 30, 2011 9:26 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

DIGY,

Thanks for the tip, but could you be a little more specific?
Where and how are extension-methods calls replaced?

For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}


to eliminate the compile error

Error   2   Cannot define a new extension method because the compiler
required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be
found. Are you missing a reference to System.Core.dll?

due to the SetT parameter this ThreadLocalt t ?


- Neal


-Original Message-
From: Digy [mailto:digyd...@gmail.com]
Sent: Wednesday, November 30, 2011 12:27 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
CloseableThreadLocal

(like uncommenting ThreadLocalT class and replacing extension-methods
calls

with static calls to CloseableThreadLocalExtensions)





DIGY





-Original Message-
From: Christopher Currens [mailto:currens.ch...@gmail.com]
Sent: Wednesday, November 30, 2011 7:26 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2



Trevor,



I'm not sure if you can use 2.9.4, though, it looks like you're using

VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

classes only available in 4.0 (or 3.5?).  However, if you can, I would

suggest updating, as 2.9.4 should be a fairly stable release.



The leak I'm talking about is addressed here:

https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

or may not be what your issue is.  You say that it was at one time working

fine, I assume you mean no memory leak.  I would take some time to see what

else in your code has changed.  Make sure you're calling Close on whatever

needs to be closed (IndexWriter/IndexReader/Analyzers, etc).



Unfortunately for us, memory leaks are hard to debug over email, and it's

difficult for us to tell if it's any change to your code or an issue with

Lucene .NET.  As far as I can tell, this is the only memory leak I can find

that affects 2.9.2.





Thanks,

Christopher



On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
geobmx...@hotmail.comwrote:



 We just released 2.9.4 - the website didn't update last night, so ill have

 to try and update it later today. But if you follow the link to download

 2.9.2 dist you'll see folders for 2.9.4.



 I'll send an email to the user and dev lists once i get the website to

 update

 

 From: Trevor Watson

 Sent: 11/30/2011 8:14 AM

 To: lucene-net-...@lucene.apache.org

 Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2



 You said pre 2.9.3  I checked the apache lucene.net page to try to see

 if

 I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2
and

 2.9.4(g).  Was this an un-released version?  Or am I looking in the wrong

 spot for updates to lucene.net?



 Thanks for all your help



 On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson 

 powersearchsoftw...@gmail.com wrote:



  I can send you the dll that I am using if you would like.  The documents

  are _mostly_ small documents.  Emails and office docs size of plain text

 

 

  On Tuesday, November 29, 2011, Christopher Currens 

  currens.ch...@gmail.com wrote:

   Do you know how big the documents are that you are trying to

  delete/update?

I'm trying to find a copy of 2.9.2 to see if I can reproduce it.

  

  

   Thanks,

   Christopher

  

   On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson 

   powersearchsoftw...@gmail.com wrote:

  

   Sorry for the duplicate post. I was on the road and posted both via
my

  web

   mail and office mail by mistake

  

   The increase is a very gradual,  the program starts at about 160,000k

   according to task manager (I know that's not entirely accurate, but
it

  was

   the best I had at the time) and would, after adding 25,000-40,000

  result in

   an out of memory exception (800,000k according to taskmanager). I

 tried

   building a copy of 2.9.4 to test, but could not find one that worked

 in

   visual studio 2005

  

   I did notice using Ants memory profiler that there were a number of

   byte[32789] arrays that I didn't know where 

[jira] [Created] (LUCENE-3612) remove _X.fnx

2011-11-30 Thread Robert Muir (Created) (JIRA)
remove _X.fnx
-

 Key: LUCENE-3612
 URL: https://issues.apache.org/jira/browse/LUCENE-3612
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3612.patch

Currently we store a global (not per-segment) field number-name mapping in 
_X.fnx

However, it doesn't actually save us any performance e.g on IndexWriter's init 
because
since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() 
for IFD, etc, 
as thats where hasProx/hasVectors is.

Additionally in the past global files like shared doc stores have caused us 
problems,
(recently we just fixed a bug related to this file in LUCENE-3601).

Finally this is trouble for backwards compatibility as its difficult to handle 
a global
file with the codecs mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3612) remove _X.fnx

2011-11-30 Thread Robert Muir (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3612:


Attachment: LUCENE-3612.patch

Patch: all tests pass.

before committing I think we should cleanup some apis/javadocs, remove the 
various versioning stuff (now unused), and not read/write it in segments files.

 remove _X.fnx
 -

 Key: LUCENE-3612
 URL: https://issues.apache.org/jira/browse/LUCENE-3612
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3612.patch


 Currently we store a global (not per-segment) field number-name mapping in 
 _X.fnx
 However, it doesn't actually save us any performance e.g on IndexWriter's 
 init because
 since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() 
 for IFD, etc, 
 as thats where hasProx/hasVectors is.
 Additionally in the past global files like shared doc stores have caused us 
 problems,
 (recently we just fixed a bug related to this file in LUCENE-3601).
 Finally this is trouble for backwards compatibility as its difficult to 
 handle a global
 file with the codecs mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2

2011-11-30 Thread Christopher Currens
Trevor,

Unforunately I was unable to reproduce the memory leak you're experiencing
in 2.9.2.  Particularly with byte[], of the 18,277 that were created, only
13 were not garbage collected, and it's likely that they are not related to
Lucene (it's possible they are static, therefore would only be destroyed
with the AppDomain, outside of what the profiler can trace).  I tried to
emulate the code you showed us and there were no signs of any allocated
arrays that weren't cleaned up.  That doesn't mean there isn't one in your
code, but I just can't reproduce it with what you've shown us.  If it's
possible you can write a small program that has the same behavior, that
could help us track it down.

As a side note, what was a little disconcerting, though, was in 2.9.4 with
the same code, it created 28,565 byte[], and there was quite a few more
left uncollected (2,805 arrays).  The allocations are happening in
DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though,
to see if its even a problem.


Thanks,
Christopher


On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. 
neal.granr...@thermofisher.com wrote:

 Or maybe put the changes within a conditional compile code block?

 Thanks DIGY, works great.

 - Neal

 -Original Message-
 From: Prescott Nasser [mailto:geobmx...@hotmail.com]
 Sent: Wednesday, November 30, 2011 2:35 PM
 To: lucene-net-...@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 Probably makes for a good wiki entry

 Sent from my Windows Phone
 
 From: Digy
 Sent: 11/30/2011 12:04 PM
 To: lucene-net-...@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 OK, here is the code that can be compiled against .NET 2.0
 http://pastebin.com/k2f7JfPd

 DIGY


 -Original Message-
 From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
 Sent: Wednesday, November 30, 2011 9:26 PM
 To: lucene-net-...@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 DIGY,

 Thanks for the tip, but could you be a little more specific?
 Where and how are extension-methods calls replaced?

 For example, how would I change the CloseableThreadLocalExtensions method

public static void SetT(this ThreadLocalT t, T val)
{
t.Value = val;
}


 to eliminate the compile error

 Error   2   Cannot define a new extension method because the compiler
 required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot
 be
 found. Are you missing a reference to System.Core.dll?

 due to the SetT parameter this ThreadLocalt t ?


 - Neal


 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, November 30, 2011 12:27 PM
 To: lucene-net-...@lucene.apache.org
 Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in
 CloseableThreadLocal

 (like uncommenting ThreadLocalT class and replacing extension-methods
 calls

 with static calls to CloseableThreadLocalExtensions)





 DIGY





 -Original Message-
 From: Christopher Currens [mailto:currens.ch...@gmail.com]
 Sent: Wednesday, November 30, 2011 7:26 PM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2



 Trevor,



 I'm not sure if you can use 2.9.4, though, it looks like you're using

 VS2005 and .NET 2.0.  2.9.4 targets .NET 4.0, and I'm fairly certain we use

 classes only available in 4.0 (or 3.5?).  However, if you can, I would

 suggest updating, as 2.9.4 should be a fairly stable release.



 The leak I'm talking about is addressed here:

 https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code

 isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4.  It may

 or may not be what your issue is.  You say that it was at one time working

 fine, I assume you mean no memory leak.  I would take some time to see what

 else in your code has changed.  Make sure you're calling Close on whatever

 needs to be closed (IndexWriter/IndexReader/Analyzers, etc).



 Unfortunately for us, memory leaks are hard to debug over email, and it's

 difficult for us to tell if it's any change to your code or an issue with

 Lucene .NET.  As far as I can tell, this is the only memory leak I can find

 that affects 2.9.2.





 Thanks,

 Christopher



 On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser
 geobmx...@hotmail.comwrote:



  We just released 2.9.4 - the website didn't update last night, so ill
 have

  to try and update it later today. But if you follow the link to download

  2.9.2 dist you'll see folders for 2.9.4.

 

  I'll send an email to the user and dev lists once i get the website to

  update

  

  From: Trevor Watson

  Sent: 11/30/2011 8:14 AM

  To: lucene-net-...@lucene.apache.org

  Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2

 

  You said pre 2.9.3  I checked the apache lucene.net 

[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Erick Erickson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160406#comment-13160406
 ] 

Erick Erickson commented on SOLR-2921:
--

Not synonyms - agreed.
Not shinglers - agreed.
Not anything that might produce multiple tokens - agreed.

Stemmers... When do stemmers produce multiple tokens? My ignorance of all the 
possibilities knows no bounds. I was wondering if, in this case, stemmers 
really reduced to prefix queries. Maybe it's just a bad idea altogether, I 
guess it begs the question of what use adding stemmers to the mix would be. You 
want to match the root, just specify the root with an asterisk and be done with 
it. No need to introduce stemming into the MultiTermAwareComponent mix.

But this kind of question is exactly why I have this JIRA in place, we can 
collect reasons I wouldn't think of and record them. Before I mess something up 
with well-intentioned-but-wrong help.

 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160416#comment-13160416
 ] 

Robert Muir commented on SOLR-2921:
---

{quote}
I guess it begs the question of what use adding stemmers to the mix would be.
{quote}

I agree with Mike.

Most stemmers are basically suffix-strippers and use heuristics like term 
length. They are not going to work with the syntax of various multitermqueries. 
no stemmer is going to stem dogs* to dog*. some might remove any non-alpha 
characters completely, and its not a bug that they do this. they are 
heuristical in nature and designed to work on natural language text... not 
syntax.


 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud

2011-11-30 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160433#comment-13160433
 ] 

Mark Miller commented on SOLR-2805:
---

I tend to limit by default and open when needed I guess.

FWIW, I've got code for this in the solrcloud branch now.

It lets you temporily launch a zk server, connects to it, and uploads a set of 
conf files by calling (from the /solr folder in a checkout):

java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
127.0.0.1 8983 solr ../example/solr/conf conf1 example/solr/zoo_data

 Add a main method to ZkController so that it's easier to script config upload 
 with SolrCloud
 

 Key: SOLR-2805
 URL: https://issues.apache.org/jira/browse/SOLR-2805
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0


 when scripting a cluster setup, it would be nice if it was easy to upload a 
 set of configs - otherwise you have to wait to start secondary servers until 
 the first server has uploaded the config - kind of a pain
 You should be able to do something like:
 java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf 
 conf1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud

2011-11-30 Thread Mark Miller (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160433#comment-13160433
 ] 

Mark Miller edited comment on SOLR-2805 at 11/30/11 11:05 PM:
--

I tend to limit by default and open when needed I guess.

FWIW, I've got code for this in the solrcloud branch now.

It lets you temporily launch a zk server, connects to it, and uploads a set of 
conf files by calling (from the /solr folder in a checkout):

{noformat}java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 
127.0.0.1:9983 127.0.0.1 8983 solr ../example/solr/conf conf1 
example/solr/zoo_data{noformat}

  was (Author: markrmil...@gmail.com):
I tend to limit by default and open when needed I guess.

FWIW, I've got code for this in the solrcloud branch now.

It lets you temporily launch a zk server, connects to it, and uploads a set of 
conf files by calling (from the /solr folder in a checkout):

java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
127.0.0.1 8983 solr ../example/solr/conf conf1 example/solr/zoo_data
  
 Add a main method to ZkController so that it's easier to script config upload 
 with SolrCloud
 

 Key: SOLR-2805
 URL: https://issues.apache.org/jira/browse/SOLR-2805
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0


 when scripting a cluster setup, it would be nice if it was easy to upload a 
 set of configs - otherwise you have to wait to start secondary servers until 
 the first server has uploaded the config - kind of a pain
 You should be able to do something like:
 java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 
 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf 
 conf1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3612) remove _X.fnx

2011-11-30 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160467#comment-13160467
 ] 

Uwe Schindler commented on LUCENE-3612:
---

+1

 remove _X.fnx
 -

 Key: LUCENE-3612
 URL: https://issues.apache.org/jira/browse/LUCENE-3612
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3612.patch


 Currently we store a global (not per-segment) field number-name mapping in 
 _X.fnx
 However, it doesn't actually save us any performance e.g on IndexWriter's 
 init because
 since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() 
 for IFD, etc, 
 as thats where hasProx/hasVectors is.
 Additionally in the past global files like shared doc stores have caused us 
 problems,
 (recently we just fixed a bug related to this file in LUCENE-3601).
 Finally this is trouble for backwards compatibility as its difficult to 
 handle a global
 file with the codecs mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0

2011-11-30 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160515#comment-13160515
 ] 

Uwe Schindler commented on LUCENE-3606:
---

OK, I will work on this as soon as I can (next weekend). I will be gald to 
remove the copy-on-write setNorm stuff in Lucene40 codec and make Lucene3x 
codec completely read-only (only reading the newest norm file). I hope Robert 
will possibly help me :-)

 Make IndexReader really read-only in Lucene 4.0
 ---

 Key: LUCENE-3606
 URL: https://issues.apache.org/jira/browse/LUCENE-3606
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler

 As we change API completely in Lucene 4.0 we are also free to remove 
 read-write access and commits from IndexReader. This code is so hairy and 
 buggy (as investigated by Robert and Mike today) when you work on 
 SegmentReader level but forget to flush in the DirectoryReader, so its better 
 to really make IndexReaders readonly.
 Currently with IndexReader you can do things like:
 - delete/undelete Documents - Can be done by with IndexWriter, too (using 
 deleteByQuery)
 - change norms - this is a bad idea in general, but when we remove norms at 
 all and replace by DocValues this is obsolete already. Changing DocValues 
 should also be done using IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0

2011-11-30 Thread Uwe Schindler (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-3606:
-

Assignee: Uwe Schindler

 Make IndexReader really read-only in Lucene 4.0
 ---

 Key: LUCENE-3606
 URL: https://issues.apache.org/jira/browse/LUCENE-3606
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler

 As we change API completely in Lucene 4.0 we are also free to remove 
 read-write access and commits from IndexReader. This code is so hairy and 
 buggy (as investigated by Robert and Mike today) when you work on 
 SegmentReader level but forget to flush in the DirectoryReader, so its better 
 to really make IndexReaders readonly.
 Currently with IndexReader you can do things like:
 - delete/undelete Documents - Can be done by with IndexWriter, too (using 
 deleteByQuery)
 - change norms - this is a bad idea in general, but when we remove norms at 
 all and replace by DocValues this is obsolete already. Changing DocValues 
 should also be done using IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0

2011-11-30 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160559#comment-13160559
 ] 

Robert Muir commented on LUCENE-3606:
-

{quote}
finally, holy grail where similarities can declare the normalization 
factor(s) they need, using byte/float/int whatever, and its all unified with 
the docvalues api. IndexReader.norms() maybe goes away here, and maybe 
NormsFormat too.
{quote}

Thinking about this: a clean way to do it would be for Similarity to get a new 
method:
{code}
ValueType getValueType();
{code}

and we would change:
{code}
byte computeNorm(FieldInvertState state);
{code}
to:
{code}
void computeNorm(FieldInvertState state, PerDocFieldValues norm);
{code}

Sims that want to encode multiple index-time scoring factors separately 
could just use BYTES_FIXED_STRAIGHT. This should be only for some rare
sims anyway, because a Sim can pull named 'application' specific scoring
factors from IR.perDocValues() today already.

Its not too crazy either since sims are already doing their own encoding,
so e.g. default sim would just use FIXED_INTS_8.

People that don't want to mess with bytes or smallfloat could use things
like FLOAT_32 if they want and need this.

we would just change FieldInfo.omitNorms to instead be FieldInfo.normValueType,
which is the value type of the norm (null if its omitted, just like 
docValueType).

Preflex FieldInfosReader would just set FIXED_INTS_8 or null, based on
whether the fieldinfos had omitNorms or not. it doesnt support
any other types... 

Finally then, sims would be own their scoring factors, and we could
even remove omitNorms from Field/FieldType etc (just use the correct 
scoring algorithm for the field, if you don't want norms, use a sim
that doesn't need them for scoring)

This would remove the awkward/messy situation where every similarity 
implementation we have has to 'downgrade' itself to handle things like
if the user decided to omit parts of their formula!

 Make IndexReader really read-only in Lucene 4.0
 ---

 Key: LUCENE-3606
 URL: https://issues.apache.org/jira/browse/LUCENE-3606
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler

 As we change API completely in Lucene 4.0 we are also free to remove 
 read-write access and commits from IndexReader. This code is so hairy and 
 buggy (as investigated by Robert and Mike today) when you work on 
 SegmentReader level but forget to flush in the DirectoryReader, so its better 
 to really make IndexReaders readonly.
 Currently with IndexReader you can do things like:
 - delete/undelete Documents - Can be done by with IndexWriter, too (using 
 deleteByQuery)
 - change norms - this is a bad idea in general, but when we remove norms at 
 all and replace by DocValues this is obsolete already. Changing DocValues 
 should also be done using IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Mike Sokolov (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160576#comment-13160576
 ] 

Mike Sokolov commented on SOLR-2921:


I spoke hastily, and it's true that stemmers are different from those other 
multi-token things.  It would be kind of nice if it were possible to have a 
query for do?s actually match the a document containing dogs, even when 
matching against a stemmed field, but I don't see how to do it without breaking 
all kinds of other things.  Consider how messed up range queries would get.  
[dogs TO *] would match doge, doggone,   and other words in [dog TO dogs] which 
would be totally counterintuitive.

 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160593#comment-13160593
 ] 

Robert Muir commented on SOLR-2921:
---

well Erick i think the ones you listed here are ok.

There are cases where they won't work correctly, but trying to do
multitermqueries with mappingcharfilter and asciifolding
filter are already problematic (eg ? won't match œ because 
its now 'oe').

Personally i think this is fine, but we should document
that things don't work correctly all the time, and we 
should not make changes to analysis components to try 
to make them cope  with multiterm queries syntax or 
anything (this would be bad design, it turns them into 
queryparsers).

If the user cares about the corner cases, then they just
specify the chain.

 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2011-11-30 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160607#comment-13160607
 ] 

Grant Ingersoll commented on SOLR-1726:
---

Hi Manoj,

This looks OK as a start.  Would be nice to have tests to go with it.  

Why the overriding of getTotalHits on the TopScoreDocCollector?  I don't think 
returning collectedHits is the right thing to do there.

Also, you should be able to avoid an extra Collector create call at:
{code}
topCollector = TopScoreDocCollector.create(len, true);
//Issue 1726 Start
if(cmd.getScoreDoc() != null)
{
topCollector = TopScoreDocCollector.create(len, 
cmd.getScoreDoc(), true); //create the Collector with InOrderPagingCollector
}

{code}

But that is easy enough to fix.



 Deep Paging and Large Results Improvements
 --

 Key: SOLR-1726
 URL: https://issues.apache.org/jira/browse/SOLR-1726
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
 ResponseBuilder.java, SOLR-1726.patch, SolrIndexSearcher.java, 
 TopDocsCollector.java, TopScoreDocCollector.java


 There are possibly ways to improve collections of deep paging by passing 
 Solr/Lucene more information about the last page of results seen, thereby 
 saving priority queue operations.   See LUCENE-2215.
 There may also be better options for retrieving large numbers of rows at a 
 time that are worth exploring.  LUCENE-2127.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-30 Thread Erick Erickson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160612#comment-13160612
 ] 

Erick Erickson commented on SOLR-2921:
--

Mike:

stemmers - not going to make them MultiTermAware. No way. No how. Not on my 
watch, one succinct example and I'm convinced.

The beauty of the way Yonik and Robert directed this is that we can take care 
of the 80% case, not provide things that are *that* surprising and still have 
all the flexibility available to those who really need it. As Robert says, if 
they really want some interesting behavior, they can specify the complete 
chain.

Robert:

I guess I'm at a loss as to how to write tests for the various filters and 
tokenizers I listed, which is why I'm reluctant to just make them 
MultTermAwareComponents. Do you have any suggestions as to how I could get 
tests? I had enough surprises when I ran the tests in English that I'm 
reluctant to just plow ahead. As far as I understand, Arabic is caseless for 
instance.

I totally agree with your point that making the analysis components cope with 
syntax is evil. Not going there either.

Maybe the right action is to wait for someone to volunteer to be the guinea pig 
for the various filters, I suppose we could advertise for volunteers...

 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2011-11-30 Thread Manojkumar Rangasamy Kannadasan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160620#comment-13160620
 ] 

Manojkumar Rangasamy Kannadasan commented on SOLR-1726:
---

Hi Grant, thanks for your comments. Regarding the collectedHits, if there are 4 
docs as results and if we want to return only bottom 2 by giving appropriate 
pageScore and pageDoc, the expected result is to return only 2 docs as results. 
But totalHits returns all the 4 docs. Thats the reason i used collectedHits.
Kindly correct me if my understanding is wrong.

 Deep Paging and Large Results Improvements
 --

 Key: SOLR-1726
 URL: https://issues.apache.org/jira/browse/SOLR-1726
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
 ResponseBuilder.java, SOLR-1726.patch, SolrIndexSearcher.java, 
 TopDocsCollector.java, TopScoreDocCollector.java


 There are possibly ways to improve collections of deep paging by passing 
 Solr/Lucene more information about the last page of results seen, thereby 
 saving priority queue operations.   See LUCENE-2215.
 There may also be better options for retrieving large numbers of rows at a 
 time that are worth exploring.  LUCENE-2127.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements

2011-11-30 Thread Grant Ingersoll (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160627#comment-13160627
 ] 

Grant Ingersoll commented on SOLR-1726:
---

totalHits should return the count of all the hits regardless of the number that 
are actually being collected.  In other words, totalHits could be a million, 
but we only return the top 10.  collectedHits only returns the count of how 
many are being returned.

 Deep Paging and Large Results Improvements
 --

 Key: SOLR-1726
 URL: https://issues.apache.org/jira/browse/SOLR-1726
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: CommonParams.java, QParser.java, QueryComponent.java, 
 ResponseBuilder.java, SOLR-1726.patch, SolrIndexSearcher.java, 
 TopDocsCollector.java, TopScoreDocCollector.java


 There are possibly ways to improve collections of deep paging by passing 
 Solr/Lucene more information about the last page of results seen, thereby 
 saving priority queue operations.   See LUCENE-2215.
 There may also be better options for retrieving large numbers of rows at a 
 time that are worth exploring.  LUCENE-2127.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-11-30 Thread Mikhail Khludnev (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160683#comment-13160683
 ] 

Mikhail Khludnev commented on SOLR-2382:


I spawned subtask SOLR-2933

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, 
 SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, 
 SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter_standalone.patch, 
 SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, 
 SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, 
 SOLR-2382-entities.patch, SOLR-2382-entities.patch, 
 SOLR-2382-properties.patch, SOLR-2382-properties.patch, 
 SOLR-2382-solrwriter-verbose-fix.patch, SOLR-2382-solrwriter.patch, 
 SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 TestCachedSqlEntityProcessor.java-break-where-clause.patch, 
 TestCachedSqlEntityProcessor.java-fix-where-clause-by-adding-cachePk-and-lookup.patch,
  
 TestCachedSqlEntityProcessor.java-wrong-pk-detected-due-to-lack-of-where-support.patch,
  TestThreaded.java.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, 

[jira] [Commented] (LUCENE-3612) remove _X.fnx

2011-11-30 Thread Simon Willnauer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160689#comment-13160689
 ] 

Simon Willnauer commented on LUCENE-3612:
-

+1

 remove _X.fnx
 -

 Key: LUCENE-3612
 URL: https://issues.apache.org/jira/browse/LUCENE-3612
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3612.patch


 Currently we store a global (not per-segment) field number-name mapping in 
 _X.fnx
 However, it doesn't actually save us any performance e.g on IndexWriter's 
 init because
 since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() 
 for IFD, etc, 
 as thats where hasProx/hasVectors is.
 Additionally in the past global files like shared doc stores have caused us 
 problems,
 (recently we just fixed a bug related to this file in LUCENE-3601).
 Finally this is trouble for backwards compatibility as its difficult to 
 handle a global
 file with the codecs mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-11-30 Thread Noble Paul (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160688#comment-13160688
 ] 

Noble Paul commented on SOLR-2382:
--

@James 
Yes create a new issue for all the further functionalities and let's close this 
one

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, 
 SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, 
 SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter_standalone.patch, 
 SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, 
 SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, 
 SOLR-2382-entities.patch, SOLR-2382-entities.patch, 
 SOLR-2382-properties.patch, SOLR-2382-properties.patch, 
 SOLR-2382-solrwriter-verbose-fix.patch, SOLR-2382-solrwriter.patch, 
 SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 TestCachedSqlEntityProcessor.java-break-where-clause.patch, 
 TestCachedSqlEntityProcessor.java-fix-where-clause-by-adding-cachePk-and-lookup.patch,
  
 TestCachedSqlEntityProcessor.java-wrong-pk-detected-due-to-lack-of-where-support.patch,
  TestThreaded.java.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6.