[Lucene.Net] Reminder we need to get a board report in
FYI the process seems to have changed, board reports are due first of the month (if you have to report that month) to give people time to review. I can handle the report if nobody else, but I won't be able to do it for a day or so
[Lucene.Net] Re: Memory Leak in code (mine or Lucene?) 2.9.2.2
I was wrong, analyzer does have the close function. I closed my analyzer, but the steady climb in memory is still there. I wonder if I should create a global analyzer variable and enclose it in a lock to make sure there isn't any thread issues and use that instead. Could it be a leak in the analyzer itself? On Tue, Nov 29, 2011 at 12:16 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I don't recall seeing a close function on the analyzer. But I will definitely take a look. Thanks! On Tuesday, November 29, 2011, Oren Eini (Ayende Rahien) aye...@ayende.com wrote: You need to close the analyzer. On Tue, Nov 29, 2011 at 12:32 AM, Trevor Watson powersearchsoftw...@gmail.com wrote: I'm using the following block of code. The document is created in another function and written to the Lucene index via an IndexWriter private void SaveFileToFileInfo(Lucene.Net.Index.IndexWriter iw, bool delayCommit, string sDataPath) { Document doc = getFileInfoDoc(sDataPath); Analyzer analyzer = clsLuceneFunctions.getAnalyzer(); if (this.FileID == 0) { string s = ; } iw.UpdateDocument(new Lucene.Net.Index.Term(FileId, this.fileID.ToString(0)), doc, analyzer); analyzer = null; doc = null; if (!delayCommit) iw.Commit(); } When the UpdateDocument line is commented out, everything seems to run fine. When that line of code is run, it slowly creeps up. However, it used to work on some computers and now works on 1 or 2, but fails on our client's computers. Is there an issue with UpdateDocument that I am not aware of in 2.9.2.2? Thanks in advance.
[Lucene.Net] Re: Memory Leak in 2.9.2.2
You said pre 2.9.3 I checked the apache lucene.net page to try to see if I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and 2.9.4(g). Was this an un-released version? Or am I looking in the wrong spot for updates to lucene.net? Thanks for all your help On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I can send you the dll that I am using if you would like. The documents are _mostly_ small documents. Emails and office docs size of plain text On Tuesday, November 29, 2011, Christopher Currens currens.ch...@gmail.com wrote: Do you know how big the documents are that you are trying to delete/update? I'm trying to find a copy of 2.9.2 to see if I can reproduce it. Thanks, Christopher On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson powersearchsoftw...@gmail.com wrote: Sorry for the duplicate post. I was on the road and posted both via my web mail and office mail by mistake The increase is a very gradual, the program starts at about 160,000k according to task manager (I know that's not entirely accurate, but it was the best I had at the time) and would, after adding 25,000-40,000 result in an out of memory exception (800,000k according to taskmanager). I tried building a copy of 2.9.4 to test, but could not find one that worked in visual studio 2005 I did notice using Ants memory profiler that there were a number of byte[32789] arrays that I didn't know where they came from in memory. On Monday, November 28, 2011, Christopher Currens currens.ch...@gmail.com wrote: Hi Trevor, What kind of memory increase are we talking about? Also, how big are the documents that you are indexing, the ones returned from getFileInfoDoc()? Is it putting an entire file into the index? Pre 2.9.3 versions had issues with holding onto allocated byte arrays far beyond when they were used. The memory could only be freed via closing the IndexWriter. I'm a little unclear on exactly what's happening. Are you noticing memory spike and stay constant at that level or is it a gradual increase? Is it causing your application to error, (ie OutOfMemory exception, etc)? Thanks, Christopher On Mon, Nov 28, 2011 at 5:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I'm attempting to use Lucene.Net v2.9.2.2 in a Visual Studio 2005 (.NET 2.0) environment. We had a piece of software that WAS working. I'm not sure what has changed however, the following code results in a memory leak in the Lucene.Net component (or a failure to clean up used memory). The code in issue is here: private void SaveFileToFileInfo(Lucene.Net.Index.IndexWriter iw, bool delayCommit, string sDataPath) { Document doc = getFileInfoDoc(sDataPath); Analyzer analyzer = clsLuceneFunctions.getAnalyzer(); if (this.FileID == 0) { string s = ; } iw.UpdateDocument(new Lucene.Net.Index.Term(FileId, this.fileID.ToString(0)), doc, analyzer); analyzer = null; doc = null; if (!delayCommit) iw.Commit(); } Commenting out the line iw.UpdateDocument resulted in no memory increase. I also tried replacing it with a deleteDocument and AddDocument and the memory increased the same as using the UpdateDocument function The getAnalyzer() function returns a ExtendedStandardAnalyzer, but it's the UpdateDocument line specifically that gives me the issue. Any assistance would be greatly appreciated. Trevor Watson
RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2
FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in CloseableThreadLocal (like uncommenting ThreadLocalT class and replacing extension-methods calls with static calls to CloseableThreadLocalExtensions) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 7:26 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, I'm not sure if you can use 2.9.4, though, it looks like you're using VS2005 and .NET 2.0. 2.9.4 targets .NET 4.0, and I'm fairly certain we use classes only available in 4.0 (or 3.5?). However, if you can, I would suggest updating, as 2.9.4 should be a fairly stable release. The leak I'm talking about is addressed here: https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4. It may or may not be what your issue is. You say that it was at one time working fine, I assume you mean no memory leak. I would take some time to see what else in your code has changed. Make sure you're calling Close on whatever needs to be closed (IndexWriter/IndexReader/Analyzers, etc). Unfortunately for us, memory leaks are hard to debug over email, and it's difficult for us to tell if it's any change to your code or an issue with Lucene .NET. As far as I can tell, this is the only memory leak I can find that affects 2.9.2. Thanks, Christopher On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote: We just released 2.9.4 - the website didn't update last night, so ill have to try and update it later today. But if you follow the link to download 2.9.2 dist you'll see folders for 2.9.4. I'll send an email to the user and dev lists once i get the website to update From: Trevor Watson Sent: 11/30/2011 8:14 AM To: lucene-net-dev@lucene.apache.org Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2 You said pre 2.9.3 I checked the apache lucene.net page to try to see if I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and 2.9.4(g). Was this an un-released version? Or am I looking in the wrong spot for updates to lucene.net? Thanks for all your help On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I can send you the dll that I am using if you would like. The documents are _mostly_ small documents. Emails and office docs size of plain text On Tuesday, November 29, 2011, Christopher Currens currens.ch...@gmail.com wrote: Do you know how big the documents are that you are trying to delete/update? I'm trying to find a copy of 2.9.2 to see if I can reproduce it. Thanks, Christopher On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson powersearchsoftw...@gmail.com wrote: Sorry for the duplicate post. I was on the road and posted both via my web mail and office mail by mistake The increase is a very gradual, the program starts at about 160,000k according to task manager (I know that's not entirely accurate, but it was the best I had at the time) and would, after adding 25,000-40,000 result in an out of memory exception (800,000k according to taskmanager). I tried building a copy of 2.9.4 to test, but could not find one that worked in visual studio 2005 I did notice using Ants memory profiler that there were a number of byte[32789] arrays that I didn't know where they came from in memory. On Monday, November 28, 2011, Christopher Currens currens.ch...@gmail.com wrote: Hi Trevor, What kind of memory increase are we talking about? Also, how big are the documents that you are indexing, the ones returned from getFileInfoDoc()? Is it putting an entire file into the index? Pre 2.9.3 versions had issues with holding onto allocated byte arrays far beyond when they were used. The memory could only be freed via closing the IndexWriter. I'm a little unclear on exactly what's happening. Are you noticing memory spike and stay constant at that level or is it a gradual increase? Is it causing your application to error, (ie OutOfMemory exception, etc)? Thanks, Christopher On Mon, Nov 28, 2011 at 5:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I'm attempting to use Lucene.Net v2.9.2.2 in a Visual Studio 2005 (.NET 2.0) environment. We had a piece of software that WAS working. I'm not sure what has changed however, the following code results in a memory leak in the Lucene.Net component (or a failure to clean up used memory). The code in issue is here: private void
RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2
OK, here is the code that can be compiled against .NET 2.0 http://pastebin.com/k2f7JfPd DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, November 30, 2011 9:26 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 DIGY, Thanks for the tip, but could you be a little more specific? Where and how are extension-methods calls replaced? For example, how would I change the CloseableThreadLocalExtensions method public static void SetT(this ThreadLocalT t, T val) { t.Value = val; } to eliminate the compile error Error 2 Cannot define a new extension method because the compiler required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be found. Are you missing a reference to System.Core.dll? due to the SetT parameter this ThreadLocalt t ? - Neal -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Wednesday, November 30, 2011 12:27 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in CloseableThreadLocal (like uncommenting ThreadLocalT class and replacing extension-methods calls with static calls to CloseableThreadLocalExtensions) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 7:26 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, I'm not sure if you can use 2.9.4, though, it looks like you're using VS2005 and .NET 2.0. 2.9.4 targets .NET 4.0, and I'm fairly certain we use classes only available in 4.0 (or 3.5?). However, if you can, I would suggest updating, as 2.9.4 should be a fairly stable release. The leak I'm talking about is addressed here: https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4. It may or may not be what your issue is. You say that it was at one time working fine, I assume you mean no memory leak. I would take some time to see what else in your code has changed. Make sure you're calling Close on whatever needs to be closed (IndexWriter/IndexReader/Analyzers, etc). Unfortunately for us, memory leaks are hard to debug over email, and it's difficult for us to tell if it's any change to your code or an issue with Lucene .NET. As far as I can tell, this is the only memory leak I can find that affects 2.9.2. Thanks, Christopher On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote: We just released 2.9.4 - the website didn't update last night, so ill have to try and update it later today. But if you follow the link to download 2.9.2 dist you'll see folders for 2.9.4. I'll send an email to the user and dev lists once i get the website to update From: Trevor Watson Sent: 11/30/2011 8:14 AM To: lucene-net-dev@lucene.apache.org Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2 You said pre 2.9.3 I checked the apache lucene.net page to try to see if I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and 2.9.4(g). Was this an un-released version? Or am I looking in the wrong spot for updates to lucene.net? Thanks for all your help On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I can send you the dll that I am using if you would like. The documents are _mostly_ small documents. Emails and office docs size of plain text On Tuesday, November 29, 2011, Christopher Currens currens.ch...@gmail.com wrote: Do you know how big the documents are that you are trying to delete/update? I'm trying to find a copy of 2.9.2 to see if I can reproduce it. Thanks, Christopher On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson powersearchsoftw...@gmail.com wrote: Sorry for the duplicate post. I was on the road and posted both via my web mail and office mail by mistake The increase is a very gradual, the program starts at about 160,000k according to task manager (I know that's not entirely accurate, but it was the best I had at the time) and would, after adding 25,000-40,000 result in an out of memory exception (800,000k according to taskmanager). I tried building a copy of 2.9.4 to test, but could not find one that worked in visual studio 2005 I did notice using Ants memory profiler that there were a number of byte[32789] arrays that I didn't know where they came from in memory. On Monday, November 28, 2011, Christopher Currens currens.ch...@gmail.com wrote: Hi Trevor, What kind of memory increase are we talking about?
RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2
If I recall it correctly, last memory leak problem for 2.9.2 was reported in ~August from RavenDB, and it was fixed in 2.9.4(g) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 11:33 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, Unforunately I was unable to reproduce the memory leak you're experiencing in 2.9.2. Particularly with byte[], of the 18,277 that were created, only 13 were not garbage collected, and it's likely that they are not related to Lucene (it's possible they are static, therefore would only be destroyed with the AppDomain, outside of what the profiler can trace). I tried to emulate the code you showed us and there were no signs of any allocated arrays that weren't cleaned up. That doesn't mean there isn't one in your code, but I just can't reproduce it with what you've shown us. If it's possible you can write a small program that has the same behavior, that could help us track it down. As a side note, what was a little disconcerting, though, was in 2.9.4 with the same code, it created 28,565 byte[], and there was quite a few more left uncollected (2,805 arrays). The allocations are happening in DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though, to see if its even a problem. Thanks, Christopher On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. neal.granr...@thermofisher.com wrote: Or maybe put the changes within a conditional compile code block? Thanks DIGY, works great. - Neal -Original Message- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: Wednesday, November 30, 2011 2:35 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Probably makes for a good wiki entry Sent from my Windows Phone From: Digy Sent: 11/30/2011 12:04 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 OK, here is the code that can be compiled against .NET 2.0 http://pastebin.com/k2f7JfPd DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, November 30, 2011 9:26 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 DIGY, Thanks for the tip, but could you be a little more specific? Where and how are extension-methods calls replaced? For example, how would I change the CloseableThreadLocalExtensions method public static void SetT(this ThreadLocalT t, T val) { t.Value = val; } to eliminate the compile error Error 2 Cannot define a new extension method because the compiler required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be found. Are you missing a reference to System.Core.dll? due to the SetT parameter this ThreadLocalt t ? - Neal -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Wednesday, November 30, 2011 12:27 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in CloseableThreadLocal (like uncommenting ThreadLocalT class and replacing extension-methods calls with static calls to CloseableThreadLocalExtensions) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 7:26 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, I'm not sure if you can use 2.9.4, though, it looks like you're using VS2005 and .NET 2.0. 2.9.4 targets .NET 4.0, and I'm fairly certain we use classes only available in 4.0 (or 3.5?). However, if you can, I would suggest updating, as 2.9.4 should be a fairly stable release. The leak I'm talking about is addressed here: https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4. It may or may not be what your issue is. You say that it was at one time working fine, I assume you mean no memory leak. I would take some time to see what else in your code has changed. Make sure you're calling Close on whatever needs to be closed (IndexWriter/IndexReader/Analyzers, etc). Unfortunately for us, memory leaks are hard to debug over email, and it's difficult for us to tell if it's any change to your code or an issue with Lucene .NET. As far as I can tell, this is the only memory leak I can find that affects 2.9.2. Thanks, Christopher On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote: We just released 2.9.4 - the website didn't update last night, so ill have to try and update it later today. But if you follow the link to download 2.9.2
RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2
... and it was related with CloseableThreadLocal (fixed in 2.9.4(g)) which now creates compilation problem against .Net20 :) DIGY -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Thursday, December 01, 2011 12:09 AM To: 'lucene-net-dev@lucene.apache.org' Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 If I recall it correctly, last memory leak problem for 2.9.2 was reported in ~August from RavenDB, and it was fixed in 2.9.4(g) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 11:33 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, Unforunately I was unable to reproduce the memory leak you're experiencing in 2.9.2. Particularly with byte[], of the 18,277 that were created, only 13 were not garbage collected, and it's likely that they are not related to Lucene (it's possible they are static, therefore would only be destroyed with the AppDomain, outside of what the profiler can trace). I tried to emulate the code you showed us and there were no signs of any allocated arrays that weren't cleaned up. That doesn't mean there isn't one in your code, but I just can't reproduce it with what you've shown us. If it's possible you can write a small program that has the same behavior, that could help us track it down. As a side note, what was a little disconcerting, though, was in 2.9.4 with the same code, it created 28,565 byte[], and there was quite a few more left uncollected (2,805 arrays). The allocations are happening in DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though, to see if its even a problem. Thanks, Christopher On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. neal.granr...@thermofisher.com wrote: Or maybe put the changes within a conditional compile code block? Thanks DIGY, works great. - Neal -Original Message- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: Wednesday, November 30, 2011 2:35 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Probably makes for a good wiki entry Sent from my Windows Phone From: Digy Sent: 11/30/2011 12:04 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 OK, here is the code that can be compiled against .NET 2.0 http://pastebin.com/k2f7JfPd DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, November 30, 2011 9:26 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 DIGY, Thanks for the tip, but could you be a little more specific? Where and how are extension-methods calls replaced? For example, how would I change the CloseableThreadLocalExtensions method public static void SetT(this ThreadLocalT t, T val) { t.Value = val; } to eliminate the compile error Error 2 Cannot define a new extension method because the compiler required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be found. Are you missing a reference to System.Core.dll? due to the SetT parameter this ThreadLocalt t ? - Neal -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Wednesday, November 30, 2011 12:27 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in CloseableThreadLocal (like uncommenting ThreadLocalT class and replacing extension-methods calls with static calls to CloseableThreadLocalExtensions) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 7:26 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, I'm not sure if you can use 2.9.4, though, it looks like you're using VS2005 and .NET 2.0. 2.9.4 targets .NET 4.0, and I'm fairly certain we use classes only available in 4.0 (or 3.5?). However, if you can, I would suggest updating, as 2.9.4 should be a fairly stable release. The leak I'm talking about is addressed here: https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4. It may or may not be what your issue is. You say that it was at one time working fine, I assume you mean no memory leak. I would take some time to see what else in your code has changed. Make sure you're calling Close on whatever needs to be closed (IndexWriter/IndexReader/Analyzers, etc). Unfortunately for us, memory leaks are hard to debug over email, and it's difficult for us to tell if it's any change to your code or an issue with Lucene .NET. As far as I can tell, this
[Lucene.Net] December Board Report
The december board report has been updated: http://wiki.apache.org/incubator/December2011 Please review and adjust as needed, ~P
Re: buildbots for PyLucene?
On Nov 29, 2011, at 18:04, Bill Janssen jans...@parc.com wrote: Andi Vajda va...@apache.org wrote: On Nov 29, 2011, at 15:18, Bill Janssen jans...@parc.com wrote: I've once again spent an hour building PyLucene, which gives me some sympathy for issue 10: =20 https://issues.apache.org/jira/browse/PYLUCENE-10 =20 I was thinking about how to address this... =20 One thing I've found useful at PARC is to set up buildbot tests for hard-to-package systems. Basically, the test just waits for changes to the SCM repository, checks out the code, and tries to build. A nice side-effect is that, when successful, it produces a binary for the build slave's platform. =20 I'm unsure whether this would work for PyLucene. The ASF build slaves seem pretty coarse-grained. I see that there is an osx-slave, but there's no information about it (10.5? 10.6? 10.7?), no contact, and it's down. I know nothing about the Apache buildbots. Why not contribute buildbots for P= yLucene at PARC ? Because this is something the ASF should really address. I'm happy to volunteer to set up a PyLucene build test on an ASF buildbot -- maybe more than one if it's easy to clone. Given the bizarre netbsd(?) jail the Lucene Java bot is setup in, I can't imagine a multi OS x multi Java x multi Python build bot to materialize anytime soon. That being said, I really don't know what is and isn't available as infrastructure from the ASF for these kinds of things so I might be completely wrong here. Andi.. Just looked at snakebite.org -- no OS X buildbots there, either. Bill Andi.. =20 A possibility would be to use the Python buildbots, but of course there's no assurance that Java is installed on any of them. =20 Bill
[jira] [Commented] (LUCENE-3586) Choose a specific Directory implementation running the CheckIndex main
[ https://issues.apache.org/jira/browse/LUCENE-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159894#comment-13159894 ] Luca Cavanna commented on LUCENE-3586: -- {quote} Hmm, I don't think we should add an enum to FSDir here? Can we simply accept the class name and then just load that class (maybe prefixing oal.store so user doesn't have to type that all the time)? Also, can we make it a hard error if the specified name isn't recognized? (Instead of silently falling back to FSDir.open). {quote} That's fine as well. Just a little bit longer than writing NIOFS, MMAP or SIMPLE, but I guess it doesn't matter. Mike, do you mean to load the class using reflection or compare the input string to those three class names? Any other opinion? Choose a specific Directory implementation running the CheckIndex main -- Key: LUCENE-3586 URL: https://issues.apache.org/jira/browse/LUCENE-3586 Project: Lucene - Java Issue Type: Improvement Reporter: Luca Cavanna Assignee: Luca Cavanna Priority: Minor Attachments: LUCENE-3586.patch It should be possible to choose a specific Directory implementation to use during the CheckIndex process when we run it from its main. What about an additional main parameter? In fact, I'm experiencing some problems with MMapDirectory working with a big segment, and after some failed attempts playing with maxChunkSize, I decided to switch to another FSDirectory implementation but I needed to do that on my own main. Should we also consider to use a FileSwitchDirectory? I'm willing to contribute, could you please let me know your thoughts about it? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.
Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika. - Key: SOLR-2930 URL: https://issues.apache.org/jira/browse/SOLR-2930 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 3.5 Reporter: Ravish Bhagdev Tika 1.0 has fixed a major issue with processing and parsing of PDF files that was splitting the words incorrectly: https://issues.apache.org/jira/browse/TIKA-724 This causes text to be indexed incorrectly in solr and it becomes specially visible when using spellcheck features etc. They have added a special parameter set using setEnableAutoSpace that fixes the problem but there is currently no way of setting this when using Solr. As discussed in thread on above issue, it would be nice if we could control this (and in future other) parameter via Solr configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11608 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11608/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestSearcherManager.testIntermediateClose Error Message: java.lang.NullPointerException Stack Trace: junit.framework.AssertionFailedError: java.lang.NullPointerException at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.search.TestSearcherManager.testIntermediateClose(TestSearcherManager.java:248) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432) Build Log (for compile errors): [...truncated 7958 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2472) StatsComponent should support hierarchical facets
[ https://issues.apache.org/jira/browse/SOLR-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159947#comment-13159947 ] Dmitry Drozdov commented on SOLR-2472: -- Any chance for this to be merged into trunk? StatsComponent should support hierarchical facets - Key: SOLR-2472 URL: https://issues.apache.org/jira/browse/SOLR-2472 Project: Solr Issue Type: New Feature Affects Versions: 3.1, 4.0 Reporter: Dmitry Drozdov Attachments: SOLR-2472.patch Original Estimate: 24h Remaining Estimate: 24h It is currently possible to get only single layer of faceting in StatsComponent. The proposal is it make it possible to specify stats.facet parameter like this: stats=truestats.field=sFieldstats.facet=fField1,fField2 and get the response like this: lst name=stats lst name=stats_fields lst name=sField double name=min1.0/double double name=max1.0/double double name=sum4.0/double long name=count4/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double lst name=facets lst name=fField1 lst name=fField1Value1 double name=min1.0/double double name=max1.0/double double name=sum2.0/double long name=count2/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double lst name=facets lst name=fField2 lst name=fField2Value1 double name=min1.0/double double name=max1.0/double double name=sum1.0/double long name=count1/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double /lst lst name=fField2Value2 double name=min1.0/double double name=max1.0/double double name=sum1.0/double long name=count1/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double /lst /lst /lst /lst lst name=fField1Value2 double name=min1.0/double double name=max1.0/double double name=sum2.0/double long name=count2/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double lst name=facets lst name=fField2 lst name=fField2Value1 double name=min1.0/double double name=max1.0/double double name=sum1.0/double long name=count1/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double /lst lst name=fField2Value2 double name=min1.0/double double name=max1.0/double double name=sum1.0/double long name=count1/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double /lst /lst /lst /lst /lst /lst /lst /lst /lst -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1112 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1112/ No tests ran. Build Log (for compile errors): [...truncated 12340 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3609) BooleanFilter changed behavior in 3.5, no longer acts as if minimum should match set to 1
[ https://issues.apache.org/jira/browse/LUCENE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159970#comment-13159970 ] Uwe Schindler commented on LUCENE-3609: --- Committed 3.x revision: 1208375 Now forward-porting BooleanFilter changed behavior in 3.5, no longer acts as if minimum should match set to 1 --- Key: LUCENE-3609 URL: https://issues.apache.org/jira/browse/LUCENE-3609 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.5 Reporter: Shay Banon Assignee: Uwe Schindler Attachments: LUCENE-3609.patch, LUCENE-3609.patch, LUCENE-3609.patch The change LUCENE-3446 causes a change in behavior in BooleanFilter. It used to work as if minimum should match clauses is 1 (compared to BQ lingo), but now, if no should clauses match, then the should clauses are ignored, and for example, if there is a must clause, only that one will be used and returned. For example, a single must clause and should clause, with the should clause not matching anything, should not match anything, but, it will match whatever the must clause matches. The fix is simple, after iterating over the should clauses, if the aggregated bitset is null, return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3609) BooleanFilter changed behavior in 3.5, no longer acts as if minimum should match set to 1
[ https://issues.apache.org/jira/browse/LUCENE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-3609. --- Resolution: Fixed Fix Version/s: 4.0 3.6 Committed trunk revision: 1208381 BooleanFilter changed behavior in 3.5, no longer acts as if minimum should match set to 1 --- Key: LUCENE-3609 URL: https://issues.apache.org/jira/browse/LUCENE-3609 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.5 Reporter: Shay Banon Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3609.patch, LUCENE-3609.patch, LUCENE-3609.patch The change LUCENE-3446 causes a change in behavior in BooleanFilter. It used to work as if minimum should match clauses is 1 (compared to BQ lingo), but now, if no should clauses match, then the should clauses are ignored, and for example, if there is a must clause, only that one will be used and returned. For example, a single must clause and should clause, with the should clause not matching anything, should not match anything, but, it will match whatever the must clause matches. The fix is simple, after iterating over the should clauses, if the aggregated bitset is null, return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)
[ https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159983#comment-13159983 ] Martin Oberhuber commented on LUCENE-3607: -- Hi all, thanks for the many comments. I understand that there's no desire changing behavior that's been working (and documented!) for years. What about a different approach ... would it be possible to write a small Java main that normalizes an index, very much like stripping an EXE ? That way I could postprocess my indexes (which are meant for distribution with our product), but at its core Lucene could continue working as today. Regarding some other comments, - Our main reason for shipping a pre-built index is initial search performance. In a large eclipse based product, generating the docs index on initial search can take approx 4 minutes on a decent computer. With everything pre-indexed, initial search can proceed after 10 seconds. That's an important usability issue for our help system. Another reason is the desire to find any index building errors at build-time (where we can investigate them) rather than runtime. - We do have both the build environment and the deployment environment under full control (same lucene version, same JVM version, same ICU version, all our content is en_US). - Regarding heuristics ... sure the search is heuristic at runtime, but that's a very different thing than having the build environment heuristic... having identical input produce identical output is still desirable. - The issue of different analyzes used at index generation time vs. runtime has indeed bitten us in the past (see [[https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c16]]). In my personal opinion, the choice of analyzer should be bound to the content, and not to the search environment ... since in many cases the language of the search string will not be known, but the language of the documents / index is known. Right now, the best workaround for this at Eclipse is launching Eclipse with a -nl en_US argument to force US locale when I know all the docs are US... but that won't work at all in an environment where some docs are English and others are German, a very common scenario with software products on Eclipse (main product may be localized but some plugins are not). Is that analyzer binding to content vs. binding to search issue known and discussed at Lucene already ? I.e. is it possible to have parts of the index (the US one) searched with an US analyzer but other parts (the German one) with a German analyzer ? And, why does the German analyzer truncate words at . while the US one does not (See [[https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c18 ]]) ? Lucene Index files can not be reproduced faithfully (due to timestamps embedded) Key: LUCENE-3607 URL: https://issues.apache.org/jira/browse/LUCENE-3607 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 2.9.1 Environment: Eclipse 3.7 Reporter: Martin Oberhuber Assignee: Michael McCandless Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A pre-generated help index can be shipped together with online content. As per [[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]] it turns out that the help index can not be faithfully reproduced during a build, because there are timestamps embedded in the index files, and the NameCounter field in segments_2 contains different contents on every build. Not being able to faithfully reproduce the index from identical source bits undermines trust in the index (and software delivery) being correct. I'm wondering whether this is a known issue and/or has been addressed in a newer Lucene version already ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2280) commitWithin ignored for a delete query
[ https://issues.apache.org/jira/browse/SOLR-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159994#comment-13159994 ] Jan Høydahl commented on SOLR-2280: --- I also plan to add in support for the convenience methods deleteById(String id, int commitWithinMs) etc in SolrJ the same way as for adds. commitWithin ignored for a delete query --- Key: SOLR-2280 URL: https://issues.apache.org/jira/browse/SOLR-2280 Project: Solr Issue Type: Bug Components: clients - java Reporter: David Smiley Assignee: Jan Høydahl Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2280-3x.patch, SOLR-2280.patch, SOLR-2280.patch, SOLR-2280.patch The commitWithin option on an UpdateRequest is only honored for requests containing new documents. It does not, for example, work with a delete query. The following doesn't work as expected: {code:java} UpdateRequest request = new UpdateRequest(); request.deleteById(id123); request.setCommitWithin(1000); solrServer.request(request); {code} In my opinion, the commitWithin attribute should be permitted on the delete/ xml tag as well as add/. Such a change would go in XMLLoader.java and its would have some ramifications elsewhere too. Once this is done, then UpdateRequest.getXml() can be updated to generate the right XML. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)
[ https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159997#comment-13159997 ] Robert Muir commented on LUCENE-3607: - {quote} Is that analyzer binding to content vs. binding to search issue known and discussed at Lucene already ? {quote} No, because its eclipses bug. you can set analyzers however you want in lucene, we don't enforce anything. {quote} And, why does the German analyzer truncate words at . while the US one does not (See [https://bugs.eclipse.org/bugs/show_bug.cgi?id=219928#c18]) ? {quote} Because you are using an ancient version of lucene. Lucene Index files can not be reproduced faithfully (due to timestamps embedded) Key: LUCENE-3607 URL: https://issues.apache.org/jira/browse/LUCENE-3607 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 2.9.1 Environment: Eclipse 3.7 Reporter: Martin Oberhuber Assignee: Michael McCandless Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A pre-generated help index can be shipped together with online content. As per [[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]] it turns out that the help index can not be faithfully reproduced during a build, because there are timestamps embedded in the index files, and the NameCounter field in segments_2 contains different contents on every build. Not being able to faithfully reproduce the index from identical source bits undermines trust in the index (and software delivery) being correct. I'm wondering whether this is a known issue and/or has been addressed in a newer Lucene version already ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3607) Lucene Index files can not be reproduced faithfully (due to timestamps embedded)
[ https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3607. - Resolution: Won't Fix Lucene Index files can not be reproduced faithfully (due to timestamps embedded) Key: LUCENE-3607 URL: https://issues.apache.org/jira/browse/LUCENE-3607 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 2.9.1 Environment: Eclipse 3.7 Reporter: Martin Oberhuber Assignee: Michael McCandless Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A pre-generated help index can be shipped together with online content. As per [[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]] it turns out that the help index can not be faithfully reproduced during a build, because there are timestamps embedded in the index files, and the NameCounter field in segments_2 contains different contents on every build. Not being able to faithfully reproduce the index from identical source bits undermines trust in the index (and software delivery) being correct. I'm wondering whether this is a known issue and/or has been addressed in a newer Lucene version already ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160006#comment-13160006 ] Eric Pugh commented on SOLR-1972: - Has anyone had thoughts on how to do this via a component that is less intrusive then modifying RequestHandlerBase? I'd love to do this via a component that I could compile as a standalone project and then drop in my existing Solr. Also, I am only interested in certain subset of queries, so I added a collection of regex patterns as that are used to test against the query string to see if it should be included in the rolling statistics. I will upload the patch. Also fixed the patch to work against the latest trunk. Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: SOLR-1972-url_pattern.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-1972: Attachment: SOLR-1972-url_pattern.patch Updated to latest trunk, added regex patterns. Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: SOLR-1972-url_pattern.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, elyograg-1972-trunk.patch, elyograg-1972-trunk.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/ 2 tests failed. FAILED: org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser Error Message: Unparseable date: 1867.06.20 6時34分09秒 CET 9 +0100 1867 Stack Trace: java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET 9 +0100 1867 at java.text.DateFormat.parse(DateFormat.java:354) at org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFormatSanity(TestNumericQueryParser.java:88) at org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(TestNumericQueryParser.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) FAILED: org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(TestNumericQueryParser.java:495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) Build Log (for compile errors): [...truncated 26196 lines...]
[jira] [Commented] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.
[ https://issues.apache.org/jira/browse/SOLR-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160034#comment-13160034 ] Robert Muir commented on SOLR-2930: --- i think the most important piece is that this parameter is *off* by default. for a search engine, if some bold content gets duplicated... there could really be worse things. but if spaces get incorrectly added to words, thats going to mess up tokenization. Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika. - Key: SOLR-2930 URL: https://issues.apache.org/jira/browse/SOLR-2930 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 3.5 Reporter: Ravish Bhagdev Labels: pdf, text-splitting, tika, Tika 1.0 has fixed a major issue with processing and parsing of PDF files that was splitting the words incorrectly: https://issues.apache.org/jira/browse/TIKA-724 This causes text to be indexed incorrectly in solr and it becomes specially visible when using spellcheck features etc. They have added a special parameter set using setEnableAutoSpace that fixes the problem but there is currently no way of setting this when using Solr. As discussed in thread on above issue, it would be nice if we could control this (and in future other) parameter via Solr configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Stats per group with StatsComponent?
Hi I posted the below mail to the solr-user list a little over a week ago. Since there has been no response, we assume this means that what we need is not currently possible. We need this functionality, and are willing to put in time and effort to implement it, but could use some pointers to where it would be natural to add this, and ideas for how to best solve it. I'm also wondering if I should create an issue in JIRA right away, or if I should wait until we have a first patch ready? Original Message Subject: Stats per group with StatsComponent? Date: Tue, 22 Nov 2011 14:40:45 +0100 From: Morten Lied Johansen morte...@ifi.uio.no Reply-To: solr-u...@lucene.apache.org To: solr-u...@lucene.apache.org Hi We need to get minimum and maximum values for a field, within a group in a grouped search-result. Is this possible today, perhaps by using StatsComponent some way? I'll flesh out the example a little, to make the question clearer. We have a number of documents, indexed with a price, date and a hotel. For each hotel, there are a number of documents, each representing a price/date combination. We then group our search result on hotel. We want to show the minimum and maximum price for each hotel. A little googling leads us to look at StatsComponent, as what it does would be what we need, if it could be done for each group. There was a thread on this list in August, Grouping and performing statistics per group that seemed to go into this a bit, but didn't find a solution. Is this possible in Solr 3.4, either with StatsComponent, or some other way? -- Morten We all live in a yellow subroutine. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2930) Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika.
[ https://issues.apache.org/jira/browse/SOLR-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160036#comment-13160036 ] Robert Muir commented on SOLR-2930: --- my bad, i confused this bug with the pdfbox 'character deletion' one (TIKA-767), thats still unfortunately not in tika 1.0 it seems. Allow controlling an important PDF processing parameter in Tika that splits the words in text and is now suppored in version 1.0 of Tika. - Key: SOLR-2930 URL: https://issues.apache.org/jira/browse/SOLR-2930 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Affects Versions: 3.5 Reporter: Ravish Bhagdev Labels: pdf, text-splitting, tika, Tika 1.0 has fixed a major issue with processing and parsing of PDF files that was splitting the words incorrectly: https://issues.apache.org/jira/browse/TIKA-724 This causes text to be indexed incorrectly in solr and it becomes specially visible when using spellcheck features etc. They have added a special parameter set using setEnableAutoSpace that fixes the problem but there is currently no way of setting this when using Solr. As discussed in thread on above issue, it would be nice if we could control this (and in future other) parameter via Solr configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2472) StatsComponent should support hierarchical facets
[ https://issues.apache.org/jira/browse/SOLR-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160046#comment-13160046 ] Erick Erickson commented on SOLR-2472: -- This patch no longer applies cleanly. I'll volunteer to shepherd this through the commit process if: 1 we can get some consensus that this is a good thing to do. 2 you update it to apply cleanly, and provide some unit tests, StatsComponentTest might be the place to start. It's probably worthwhile to get consensus before spending time working on the patch, could you outline the use-case for this functionality? StatsComponent should support hierarchical facets - Key: SOLR-2472 URL: https://issues.apache.org/jira/browse/SOLR-2472 Project: Solr Issue Type: New Feature Affects Versions: 3.1, 4.0 Reporter: Dmitry Drozdov Attachments: SOLR-2472.patch Original Estimate: 24h Remaining Estimate: 24h It is currently possible to get only single layer of faceting in StatsComponent. The proposal is it make it possible to specify stats.facet parameter like this: stats=truestats.field=sFieldstats.facet=fField1,fField2 and get the response like this: lst name=stats lst name=stats_fields lst name=sField double name=min1.0/double double name=max1.0/double double name=sum4.0/double long name=count4/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double lst name=facets lst name=fField1 lst name=fField1Value1 double name=min1.0/double double name=max1.0/double double name=sum2.0/double long name=count2/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double lst name=facets lst name=fField2 lst name=fField2Value1 double name=min1.0/double double name=max1.0/double double name=sum1.0/double long name=count1/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double /lst lst name=fField2Value2 double name=min1.0/double double name=max1.0/double double name=sum1.0/double long name=count1/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double /lst /lst /lst /lst lst name=fField1Value2 double name=min1.0/double double name=max1.0/double double name=sum2.0/double long name=count2/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double lst name=facets lst name=fField2 lst name=fField2Value1 double name=min1.0/double double name=max1.0/double double name=sum1.0/double long name=count1/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double /lst lst name=fField2Value2 double name=min1.0/double double name=max1.0/double double name=sum1.0/double long name=count1/long long name=missing0/long double name=sumOfSquares/double double name=mean/double double name=stddev/double /lst /lst /lst /lst /lst /lst /lst /lst /lst -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 1121 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1121/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest.testMultiThreaded Error Message: java.lang.AssertionError: Some threads threw uncaught exceptions! Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:571) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:96) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:599) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:543) Build Log (for compile errors): [...truncated 15111 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2929) TermsComponent Adding entries
[ https://issues.apache.org/jira/browse/SOLR-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160056#comment-13160056 ] maillard commented on SOLR-2929: Thank you for the reponse. I understand your response. I have tired to flush and commit my after my update. I have played around with the mergefactor set to 2. I have played wit the maxPendingDeletes all without success. How can i be sure and or force the deletion of thsese marked docs. In other words how do i make sure that my TermsComponent is a correct view of the indexes (wihtout any marked for deletion) at a given time? TermsComponent Adding entries - Key: SOLR-2929 URL: https://issues.apache.org/jira/browse/SOLR-2929 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.3, 3.4 Environment: solr 3.x Reporter: maillard Priority: Minor When indexing multiple documents in one go and then updating one of the documents in a later process Termscomponent count gets wrongfully incremented. example indexing two documents with a country field as such: add doc field name=COUNTRYUS/field field name=IDL20110121151204207/field /doc doc field name=COUNTRYCanada/field field name=IDL20110121151204208/field /doc /add Termscomponent returns: US(1) Canada(1) Update the first document: add doc field name=COUNTRYUS/field field name=IDL20110121151204207/field /doc /add Termscomponent returns: US(2) Canada(1) There still are only two documents in the index. This does not happen when only dealing with a single doc, or when you update the same set of documents you initially indexed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stats per group with StatsComponent?
Hi Morten, I missed your question on the user mailing list. Here is my answer: With the StatsComponent this isn't possible at the moment. The StatsComponent will give you the min / max of field for the whole query result. If you want the min / max value per group you'll need to do some coding. The grouping logic is executed inside Lucene collectors located in the grouping module. You'll need to create a new second pass collector that computes the min / max for the top N groups. This collector then needs to be wired up in Solr. The AbstractSecondPassGroupingCollector is something you can take a look at. It collects the top documents for the top N groups. You don't need to have a patch to open an issue. Just open an issue with a good description and maybe some implementation details. Martijn On 30 November 2011 14:25, Morten Lied Johansen morte...@ifi.uio.no wrote: Hi I posted the below mail to the solr-user list a little over a week ago. Since there has been no response, we assume this means that what we need is not currently possible. We need this functionality, and are willing to put in time and effort to implement it, but could use some pointers to where it would be natural to add this, and ideas for how to best solve it. I'm also wondering if I should create an issue in JIRA right away, or if I should wait until we have a first patch ready? Original Message Subject: Stats per group with StatsComponent? Date: Tue, 22 Nov 2011 14:40:45 +0100 From: Morten Lied Johansen morte...@ifi.uio.no Reply-To: solr-u...@lucene.apache.org To: solr-u...@lucene.apache.org Hi We need to get minimum and maximum values for a field, within a group in a grouped search-result. Is this possible today, perhaps by using StatsComponent some way? I'll flesh out the example a little, to make the question clearer. We have a number of documents, indexed with a price, date and a hotel. For each hotel, there are a number of documents, each representing a price/date combination. We then group our search result on hotel. We want to show the minimum and maximum price for each hotel. A little googling leads us to look at StatsComponent, as what it does would be what we need, if it could be done for each group. There was a thread on this list in August, Grouping and performing statistics per group that seemed to go into this a bit, but didn't find a solution. Is this possible in Solr 3.4, either with StatsComponent, or some other way? -- Morten We all live in a yellow subroutine. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Met vriendelijke groet, Martijn van Groningen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stats per group with StatsComponent?
On 30. nov. 2011 14:58, Martijn v Groningen wrote: With the StatsComponent this isn't possible at the moment. The StatsComponent will give you the min / max of field for the whole query result. If you want the min / max value per group you'll need to do some coding. The grouping logic is executed inside Lucene collectors located in the grouping module. You'll need to create a new second pass collector that computes the min / max for the top N groups. This collector then needs to be wired up in Solr. The AbstractSecondPassGroupingCollector is something you can take a look at. It collects the top documents for the top N groups. Thank you for your reply. We'll have a look at this and see if we can get something going this week. You don't need to have a patch to open an issue. Just open an issue with a good description and maybe some implementation details. I have created an issue, SOLR-2931. Let me know if I should add some more details to it. We will update it and follow any discussions as we work. -- Morten We all live in a yellow subroutine. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2931) Statistics/aggregated values per group in a grouped response
Statistics/aggregated values per group in a grouped response Key: SOLR-2931 URL: https://issues.apache.org/jira/browse/SOLR-2931 Project: Solr Issue Type: New Feature Reporter: Morten Lied Johansen We need to get minimum and maximum values for a field, within a group in a grouped search-result. I'll flesh out our use-case a little to make our needs clearer: We have a number of documents, indexed with a price, date and a hotel. For each hotel, there are a number of documents, each representing a price/date combination. We then group our search result on hotel. We want to show the minimum and maximum price for each hotel. Other use-cases could be to calculate an average or a sum within a group. We plan to work on this in the coming weeks, and will be supplying patches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stats per group with StatsComponent?
Looks fine! Martijn On 30 November 2011 15:25, Morten Lied Johansen morte...@ifi.uio.no wrote: On 30. nov. 2011 14:58, Martijn v Groningen wrote: With the StatsComponent this isn't possible at the moment. The StatsComponent will give you the min / max of field for the whole query result. If you want the min / max value per group you'll need to do some coding. The grouping logic is executed inside Lucene collectors located in the grouping module. You'll need to create a new second pass collector that computes the min / max for the top N groups. This collector then needs to be wired up in Solr. The AbstractSecondPassGroupingCollector is something you can take a look at. It collects the top documents for the top N groups. Thank you for your reply. We'll have a look at this and see if we can get something going this week. You don't need to have a patch to open an issue. Just open an issue with a good description and maybe some implementation details. I have created an issue, SOLR-2931. Let me know if I should add some more details to it. We will update it and follow any discussions as we work. -- Morten We all live in a yellow subroutine. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Met vriendelijke groet, Martijn van Groningen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync
This looks like a localization bug. is it possible to get the seed or more information on this test fail? Did maven truncate the test output or is there a bug in LuceneTestCase where its not providing the reproduce-with (hopefully) that it should if beforeClass() throws an exception? I looked at the console log and there wasn't any failure information there... On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/ 2 tests failed. FAILED: org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser Error Message: Unparseable date: 1867.06.20 6時34分09秒 CET 9 +0100 1867 Stack Trace: java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET 9 +0100 1867 at java.text.DateFormat.parse(DateFormat.java:354) at org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFormatSanity(TestNumericQueryParser.java:88) at org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass(TestNumericQueryParser.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) FAILED: org.apache.lucene.queryParser.standard.TestNumericQueryParser.org.apache.lucene.queryParser.standard.TestNumericQueryParser Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(TestNumericQueryParser.java:495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) at
RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync
I looked at the Surefire report https://builds.apache.org/job/Lucene-Solr-Maven-3.x/ws/checkout/lucene/build/contrib/queryparser/surefire-reports/TEST-org.apache.lucene.queryParser.standard.TestNumericQueryParser.xml/*view*/ and I don't see any more information. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, November 30, 2011 9:55 AM To: dev@lucene.apache.org Subject: Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync This looks like a localization bug. is it possible to get the seed or more information on this test fail? Did maven truncate the test output or is there a bug in LuceneTestCase where its not providing the reproduce-with (hopefully) that it should if beforeClass() throws an exception? I looked at the console log and there wasn't any failure information there... On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/ 2 tests failed. FAILED: org.apache.lucene.queryParser.standard.TestNumericQueryParser.org .apache.lucene.queryParser.standard.TestNumericQueryParser Error Message: Unparseable date: 1867.06.20 6時34分09秒 CET 9 +0100 1867 Stack Trace: java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET 9 +0100 1867 at java.text.DateFormat.parse(DateFormat.java:354) at org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFor matSanity(TestNumericQueryParser.java:88) at org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass( TestNumericQueryParser.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho d.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable .java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod. java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java: 27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31 ) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java: 53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provi der.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java :104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(Refle ctionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(Prov iderFactory.java:110) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireSt arter.java:175) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenFor ked(SurefireStarter.java:107) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) FAILED: org.apache.lucene.queryParser.standard.TestNumericQueryParser.org .apache.lucene.queryParser.standard.TestNumericQueryParser Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(T estNumericQueryParser.java:495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho d.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable .java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod. java:41) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37 ) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java: 53) at
Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync
Thanks Steven. I think this is a bug in LuceneTestCase (I'll open an issue), because if I add the following to TestDemo, i get no seed or anything at all: @BeforeClass public static void beforeClass() throws Exception { throw new NullPointerException(); } junit-sequential: [junit] Testsuite: org.apache.lucene.TestDemo [junit] Tests run: 0, Failures: 0, Errors: 1, Time elapsed: 0.143 sec [junit] [junit] Testcase: org.apache.lucene.TestDemo: Caused an ERROR [junit] null [junit] java.lang.NullPointerException [junit] at org.apache.lucene.TestDemo.beforeClass(TestDemo.java:44) [junit] [junit] [junit] Test org.apache.lucene.TestDemo FAILED On Wed, Nov 30, 2011 at 9:59 AM, Steven A Rowe sar...@syr.edu wrote: I looked at the Surefire report https://builds.apache.org/job/Lucene-Solr-Maven-3.x/ws/checkout/lucene/build/contrib/queryparser/surefire-reports/TEST-org.apache.lucene.queryParser.standard.TestNumericQueryParser.xml/*view*/ and I don't see any more information. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, November 30, 2011 9:55 AM To: dev@lucene.apache.org Subject: Re: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #316: POMs out of sync This looks like a localization bug. is it possible to get the seed or more information on this test fail? Did maven truncate the test output or is there a bug in LuceneTestCase where its not providing the reproduce-with (hopefully) that it should if beforeClass() throws an exception? I looked at the console log and there wasn't any failure information there... On Wed, Nov 30, 2011 at 7:59 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/316/ 2 tests failed. FAILED: org.apache.lucene.queryParser.standard.TestNumericQueryParser.org .apache.lucene.queryParser.standard.TestNumericQueryParser Error Message: Unparseable date: 1867.06.20 6時34分09秒 CET 9 +0100 1867 Stack Trace: java.text.ParseException: Unparseable date: 1867.06.20 6時34分09秒 CET 9 +0100 1867 at java.text.DateFormat.parse(DateFormat.java:354) at org.apache.lucene.queryParser.standard.TestNumericQueryParser.checkDateFor matSanity(TestNumericQueryParser.java:88) at org.apache.lucene.queryParser.standard.TestNumericQueryParser.beforeClass( TestNumericQueryParser.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMetho d.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable .java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod. java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java: 27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31 ) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java: 53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provi der.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java :104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(Refle ctionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(Prov iderFactory.java:110) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireSt arter.java:175) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenFor ked(SurefireStarter.java:107) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) FAILED: org.apache.lucene.queryParser.standard.TestNumericQueryParser.org .apache.lucene.queryParser.standard.TestNumericQueryParser Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.queryParser.standard.TestNumericQueryParser.afterClass(T estNumericQueryParser.java:495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57) at
[jira] [Created] (LUCENE-3611) If a test fails in beforeClass(), we don't get any debugging information
If a test fails in beforeClass(), we don't get any debugging information Key: LUCENE-3611 URL: https://issues.apache.org/jira/browse/LUCENE-3611 Project: Lucene - Java Issue Type: Test Components: general/test Reporter: Robert Muir At the minimum we at least need reportPartialFailureInfo() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2280) commitWithin ignored for a delete query
[ https://issues.apache.org/jira/browse/SOLR-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2280: -- Attachment: SOLR-2280.patch SOLR-2280-3x.patch New patches which adds new commitWithin capable SolrJ methods for deleteBy*() commitWithin ignored for a delete query --- Key: SOLR-2280 URL: https://issues.apache.org/jira/browse/SOLR-2280 Project: Solr Issue Type: Bug Components: clients - java Reporter: David Smiley Assignee: Jan Høydahl Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2280-3x.patch, SOLR-2280-3x.patch, SOLR-2280.patch, SOLR-2280.patch, SOLR-2280.patch, SOLR-2280.patch The commitWithin option on an UpdateRequest is only honored for requests containing new documents. It does not, for example, work with a delete query. The following doesn't work as expected: {code:java} UpdateRequest request = new UpdateRequest(); request.deleteById(id123); request.setCommitWithin(1000); solrServer.request(request); {code} In my opinion, the commitWithin attribute should be permitted on the delete/ xml tag as well as add/. Such a change would go in XMLLoader.java and its would have some ramifications elsewhere too. Once this is done, then UpdateRequest.getXml() can be updated to generate the right XML. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160083#comment-13160083 ] Eric Pugh commented on SOLR-2805: - I started working on something like this, and noticed that ZkController is marked final, why is that? I ended up cut'n'paste into my own class. Add a main method to ZkController so that it's easier to script config upload with SolrCloud Key: SOLR-2805 URL: https://issues.apache.org/jira/browse/SOLR-2805 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0 when scripting a cluster setup, it would be nice if it was easy to upload a set of configs - otherwise you have to wait to start secondary servers until the first server has uploaded the config - kind of a pain You should be able to do something like: java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf conf1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11612 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11612/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild(TestBlockJoin.java:659) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) Build Log (for compile errors): [...truncated 11289 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3586) Choose a specific Directory implementation running the CheckIndex main
[ https://issues.apache.org/jira/browse/LUCENE-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160102#comment-13160102 ] Michael McCandless commented on LUCENE-3586: I think just load the classes by name via reflection? This way if I have my own external Dir impl somewhere I can also have CheckIndex use that... Choose a specific Directory implementation running the CheckIndex main -- Key: LUCENE-3586 URL: https://issues.apache.org/jira/browse/LUCENE-3586 Project: Lucene - Java Issue Type: Improvement Reporter: Luca Cavanna Assignee: Luca Cavanna Priority: Minor Attachments: LUCENE-3586.patch It should be possible to choose a specific Directory implementation to use during the CheckIndex process when we run it from its main. What about an additional main parameter? In fact, I'm experiencing some problems with MMapDirectory working with a big segment, and after some failed attempts playing with maxChunkSize, I decided to switch to another FSDirectory implementation but I needed to do that on my own main. Should we also consider to use a FileSwitchDirectory? I'm willing to contribute, could you please let me know your thoughts about it? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3608) MultiFields.getUniqueFieldCount is broken
[ https://issues.apache.org/jira/browse/LUCENE-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160107#comment-13160107 ] Michael McCandless commented on LUCENE-3608: +1 for -1 ;) MultiFields.getUniqueFieldCount is broken - Key: LUCENE-3608 URL: https://issues.apache.org/jira/browse/LUCENE-3608 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 this returns terms.size(), but terms is lazy-initted. So it wrongly returns 0. Simplest fix would be to return -1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2922) Upgrade commons io and lang in Solr
[ https://issues.apache.org/jira/browse/SOLR-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-2922. -- Resolution: Fixed Assignee: Koji Sekiguchi trunk: Committed revision 1208509. 3x: Committed revision 1208516. Upgrade commons io and lang in Solr --- Key: SOLR-2922 URL: https://issues.apache.org/jira/browse/SOLR-2922 Project: Solr Issue Type: Improvement Affects Versions: 3.5, 4.0 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Trivial Fix For: 3.6, 4.0 Attachments: SOLR-2922.patch Upgrade commons-io and commons-lang in Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 11608 - Failure
I committed a fix... Mike McCandless http://blog.mikemccandless.com On Wed, Nov 30, 2011 at 4:54 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11608/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestSearcherManager.testIntermediateClose Error Message: java.lang.NullPointerException Stack Trace: junit.framework.AssertionFailedError: java.lang.NullPointerException at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.search.TestSearcherManager.testIntermediateClose(TestSearcherManager.java:248) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432) Build Log (for compile errors): [...truncated 7958 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2
Trevor, I'm not sure if you can use 2.9.4, though, it looks like you're using VS2005 and .NET 2.0. 2.9.4 targets .NET 4.0, and I'm fairly certain we use classes only available in 4.0 (or 3.5?). However, if you can, I would suggest updating, as 2.9.4 should be a fairly stable release. The leak I'm talking about is addressed here: https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4. It may or may not be what your issue is. You say that it was at one time working fine, I assume you mean no memory leak. I would take some time to see what else in your code has changed. Make sure you're calling Close on whatever needs to be closed (IndexWriter/IndexReader/Analyzers, etc). Unfortunately for us, memory leaks are hard to debug over email, and it's difficult for us to tell if it's any change to your code or an issue with Lucene .NET. As far as I can tell, this is the only memory leak I can find that affects 2.9.2. Thanks, Christopher On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote: We just released 2.9.4 - the website didn't update last night, so ill have to try and update it later today. But if you follow the link to download 2.9.2 dist you'll see folders for 2.9.4. I'll send an email to the user and dev lists once i get the website to update From: Trevor Watson Sent: 11/30/2011 8:14 AM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2 You said pre 2.9.3 I checked the apache lucene.net page to try to see if I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and 2.9.4(g). Was this an un-released version? Or am I looking in the wrong spot for updates to lucene.net? Thanks for all your help On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I can send you the dll that I am using if you would like. The documents are _mostly_ small documents. Emails and office docs size of plain text On Tuesday, November 29, 2011, Christopher Currens currens.ch...@gmail.com wrote: Do you know how big the documents are that you are trying to delete/update? I'm trying to find a copy of 2.9.2 to see if I can reproduce it. Thanks, Christopher On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson powersearchsoftw...@gmail.com wrote: Sorry for the duplicate post. I was on the road and posted both via my web mail and office mail by mistake The increase is a very gradual, the program starts at about 160,000k according to task manager (I know that's not entirely accurate, but it was the best I had at the time) and would, after adding 25,000-40,000 result in an out of memory exception (800,000k according to taskmanager). I tried building a copy of 2.9.4 to test, but could not find one that worked in visual studio 2005 I did notice using Ants memory profiler that there were a number of byte[32789] arrays that I didn't know where they came from in memory. On Monday, November 28, 2011, Christopher Currens currens.ch...@gmail.com wrote: Hi Trevor, What kind of memory increase are we talking about? Also, how big are the documents that you are indexing, the ones returned from getFileInfoDoc()? Is it putting an entire file into the index? Pre 2.9.3 versions had issues with holding onto allocated byte arrays far beyond when they were used. The memory could only be freed via closing the IndexWriter. I'm a little unclear on exactly what's happening. Are you noticing memory spike and stay constant at that level or is it a gradual increase? Is it causing your application to error, (ie OutOfMemory exception, etc)? Thanks, Christopher On Mon, Nov 28, 2011 at 5:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I'm attempting to use Lucene.Net v2.9.2.2 in a Visual Studio 2005 (.NET 2.0) environment. We had a piece of software that WAS working. I'm not sure what has changed however, the following code results in a memory leak in the Lucene.Net component (or a failure to clean up used memory). The code in issue is here: private void SaveFileToFileInfo(Lucene.Net.Index.IndexWriter iw, bool delayCommit, string sDataPath) { Document doc = getFileInfoDoc(sDataPath); Analyzer analyzer = clsLuceneFunctions.getAnalyzer(); if (this.FileID == 0) { string s = ; } iw.UpdateDocument(new Lucene.Net.Index.Term(FileId, this.fileID.ToString(0)), doc, analyzer); analyzer = null; doc = null; if (!delayCommit) iw.Commit(); } Commenting out the line iw.UpdateDocument resulted in no memory increase. I also tried replacing it with a
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 11612 - Failure
I committed fix... Mike McCandless http://blog.mikemccandless.com On Wed, Nov 30, 2011 at 10:32 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11612/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestBlockJoin.testAdvanceSingleParentNoChild(TestBlockJoin.java:659) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) Build Log (for compile errors): [...truncated 11289 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: buildbots for PyLucene?
I sent a note off to Trent Nelson to see if we could use Snakebite for this purpose. I'd be happy to set up a buildbot on our internal PARC Jenkins infrastructure for this, but the results wouldn't be visible outside. Is there a lucene-infrastructure or apache-infrastructure mailing list this might be appropriate for? Bill
[jira] [Created] (SOLR-2932) Replication filelist failures
Replication filelist failures - Key: SOLR-2932 URL: https://issues.apache.org/jira/browse/SOLR-2932 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 3.5 Reporter: Kyle Maxwell Replicating the bug manually: http://../replication?command=indexversion - 1234561234 http://../replication?command=filelistindexversion=1234561234 - invalid index version In the logs, I tend to see lines like: SEVERE: No files to download for indexversion: 1321658703961 This bug only appears on certain indexes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2
DIGY, Thanks for the tip, but could you be a little more specific? Where and how are extension-methods calls replaced? For example, how would I change the CloseableThreadLocalExtensions method public static void SetT(this ThreadLocalT t, T val) { t.Value = val; } to eliminate the compile error Error 2 Cannot define a new extension method because the compiler required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be found. Are you missing a reference to System.Core.dll? due to the SetT parameter this ThreadLocalt t ? - Neal -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Wednesday, November 30, 2011 12:27 PM To: lucene-net-...@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in CloseableThreadLocal (like uncommenting ThreadLocalT class and replacing extension-methods calls with static calls to CloseableThreadLocalExtensions) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 7:26 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, I'm not sure if you can use 2.9.4, though, it looks like you're using VS2005 and .NET 2.0. 2.9.4 targets .NET 4.0, and I'm fairly certain we use classes only available in 4.0 (or 3.5?). However, if you can, I would suggest updating, as 2.9.4 should be a fairly stable release. The leak I'm talking about is addressed here: https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4. It may or may not be what your issue is. You say that it was at one time working fine, I assume you mean no memory leak. I would take some time to see what else in your code has changed. Make sure you're calling Close on whatever needs to be closed (IndexWriter/IndexReader/Analyzers, etc). Unfortunately for us, memory leaks are hard to debug over email, and it's difficult for us to tell if it's any change to your code or an issue with Lucene .NET. As far as I can tell, this is the only memory leak I can find that affects 2.9.2. Thanks, Christopher On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote: We just released 2.9.4 - the website didn't update last night, so ill have to try and update it later today. But if you follow the link to download 2.9.2 dist you'll see folders for 2.9.4. I'll send an email to the user and dev lists once i get the website to update From: Trevor Watson Sent: 11/30/2011 8:14 AM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2 You said pre 2.9.3 I checked the apache lucene.net page to try to see if I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and 2.9.4(g). Was this an un-released version? Or am I looking in the wrong spot for updates to lucene.net? Thanks for all your help On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I can send you the dll that I am using if you would like. The documents are _mostly_ small documents. Emails and office docs size of plain text On Tuesday, November 29, 2011, Christopher Currens currens.ch...@gmail.com wrote: Do you know how big the documents are that you are trying to delete/update? I'm trying to find a copy of 2.9.2 to see if I can reproduce it. Thanks, Christopher On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson powersearchsoftw...@gmail.com wrote: Sorry for the duplicate post. I was on the road and posted both via my web mail and office mail by mistake The increase is a very gradual, the program starts at about 160,000k according to task manager (I know that's not entirely accurate, but it was the best I had at the time) and would, after adding 25,000-40,000 result in an out of memory exception (800,000k according to taskmanager). I tried building a copy of 2.9.4 to test, but could not find one that worked in visual studio 2005 I did notice using Ants memory profiler that there were a number of byte[32789] arrays that I didn't know where they came from in memory. On Monday, November 28, 2011, Christopher Currens currens.ch...@gmail.com wrote: Hi Trevor, What kind of memory increase are we talking about? Also, how big are the documents that you are indexing, the ones returned from getFileInfoDoc()? Is it putting an entire file into the index? Pre 2.9.3 versions had issues with holding onto allocated byte arrays far beyond when they were used. The memory could only be freed via
[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160278#comment-13160278 ] Mike Sokolov commented on SOLR-2921: No not stemmers. Not synonyms, not shinglers or anything that might produce multiple tokens. Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute
DIHCacheSupport ignores left side of where=xid=x.id attribute --- Key: SOLR-2933 URL: https://issues.apache.org/jira/browse/SOLR-2933 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Reporter: Mikhail Khludnev Priority: Minor DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk and cacheLookup. But support old one where=xid=x.id is broken by [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup] - it never put where= sides into the context, but it revealed by [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup], which takes just first column as a primary key. That's why all tests are green. To reproduce the issue I need just reorder entry at [line 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup] and make desc first and picked up as a primary key. To do that I propose to chose concrete map class randomly for all DIH test cases at [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup]. I'm attaching test breaking patch and seed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute
[ https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-2933: --- Attachment: AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map class randomly in all DIH testcases. It breaks [TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]. More explanations are [1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689] [2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418] [3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431] DIHCacheSupport ignores left side of where=xid=x.id attribute --- Key: SOLR-2933 URL: https://issues.apache.org/jira/browse/SOLR-2933 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Reporter: Mikhail Khludnev Priority: Minor Labels: noob, random Attachments: AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch Original Estimate: 1h Remaining Estimate: 1h DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk and cacheLookup. But support old one where=xid=x.id is broken by [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup] - it never put where= sides into the context, but it revealed by [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup], which takes just first column as a primary key. That's why all tests are green. To reproduce the issue I need just reorder entry at [line 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup] and make desc first and picked up as a primary key. To do that I propose to chose concrete map class randomly for all DIH test cases at [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup]. I'm attaching test breaking patch and seed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute
[ https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160304#comment-13160304 ] Mikhail Khludnev edited comment on SOLR-2933 at 11/30/11 8:28 PM: -- AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map class randomly in all DIH testcases. It breaks [TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]. More explanations are [1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689] [2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418] [3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431] Let me attach the fix tomorrow. was (Author: mkhludnev): AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map class randomly in all DIH testcases. It breaks [TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]. More explanations are [1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689] [2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418] [3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431] DIHCacheSupport ignores left side of where=xid=x.id attribute --- Key: SOLR-2933 URL: https://issues.apache.org/jira/browse/SOLR-2933 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Reporter: Mikhail Khludnev Priority: Minor Labels: noob, random Attachments: AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch Original Estimate: 1h Remaining Estimate: 1h DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk and cacheLookup. But support old one where=xid=x.id is broken by [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup] - it never put where= sides into the context, but it revealed by [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup], which takes just first column as a primary key. That's why all tests are green. To reproduce the issue I need just reorder entry at [line 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup] and make desc first and picked up as a primary key. To do that I propose to chose concrete map class randomly for all DIH test cases at [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup]. I'm attaching test breaking patch and seed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2933) DIHCacheSupport ignores left side of where=xid=x.id attribute
[ https://issues.apache.org/jira/browse/SOLR-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160304#comment-13160304 ] Mikhail Khludnev edited comment on SOLR-2933 at 11/30/11 8:31 PM: -- AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map class randomly in all DIH testcases. It breaks [TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]. More explanations are [1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689] [2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418] [3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431] seed which reproduces the fail {code} ant test -Dtestcase=TestCachedSqlEntityProcessor -Dtestmethod=withKeyAndLookup -Dtests.seed=7735f677498f3558:-29c15941cc37921e:-32c8bd2280b92536 -Dargs=-Dfile.encoding=UTF-8 {code} Let me attach the fix tomorrow. It's not a big deal anyway. was (Author: mkhludnev): AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map class randomly in all DIH testcases. It breaks [TestCachedSqlEntityProcessor.withKeyAndLookup|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup]. More explanations are [1|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13158689page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13158689] [2|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159418] [3|https://issues.apache.org/jira/browse/SOLR-2382?focusedCommentId=13159431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13159431] Let me attach the fix tomorrow. DIHCacheSupport ignores left side of where=xid=x.id attribute --- Key: SOLR-2933 URL: https://issues.apache.org/jira/browse/SOLR-2933 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Reporter: Mikhail Khludnev Priority: Minor Labels: noob, random Attachments: AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch Original Estimate: 1h Remaining Estimate: 1h DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk and cacheLookup. But support old one where=xid=x.id is broken by [DIHCacheSupport.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DIHCacheSupport.java?view=markup] - it never put where= sides into the context, but it revealed by [SortedMapBackedCache.init|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SortedMapBackedCache.java?view=markup], which takes just first column as a primary key. That's why all tests are green. To reproduce the issue I need just reorder entry at [line 219|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestCachedSqlEntityProcessor.java?revision=1201659view=markup] and make desc first and picked up as a primary key. To do that I propose to chose concrete map class randomly for all DIH test cases at [createMap()|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java?revision=1149600view=markup]. I'm attaching test breaking patch and seed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2
Probably makes for a good wiki entry Sent from my Windows Phone From: Digy Sent: 11/30/2011 12:04 PM To: lucene-net-...@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 OK, here is the code that can be compiled against .NET 2.0 http://pastebin.com/k2f7JfPd DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, November 30, 2011 9:26 PM To: lucene-net-...@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 DIGY, Thanks for the tip, but could you be a little more specific? Where and how are extension-methods calls replaced? For example, how would I change the CloseableThreadLocalExtensions method public static void SetT(this ThreadLocalT t, T val) { t.Value = val; } to eliminate the compile error Error 2 Cannot define a new extension method because the compiler required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be found. Are you missing a reference to System.Core.dll? due to the SetT parameter this ThreadLocalt t ? - Neal -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Wednesday, November 30, 2011 12:27 PM To: lucene-net-...@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in CloseableThreadLocal (like uncommenting ThreadLocalT class and replacing extension-methods calls with static calls to CloseableThreadLocalExtensions) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 7:26 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, I'm not sure if you can use 2.9.4, though, it looks like you're using VS2005 and .NET 2.0. 2.9.4 targets .NET 4.0, and I'm fairly certain we use classes only available in 4.0 (or 3.5?). However, if you can, I would suggest updating, as 2.9.4 should be a fairly stable release. The leak I'm talking about is addressed here: https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4. It may or may not be what your issue is. You say that it was at one time working fine, I assume you mean no memory leak. I would take some time to see what else in your code has changed. Make sure you're calling Close on whatever needs to be closed (IndexWriter/IndexReader/Analyzers, etc). Unfortunately for us, memory leaks are hard to debug over email, and it's difficult for us to tell if it's any change to your code or an issue with Lucene .NET. As far as I can tell, this is the only memory leak I can find that affects 2.9.2. Thanks, Christopher On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote: We just released 2.9.4 - the website didn't update last night, so ill have to try and update it later today. But if you follow the link to download 2.9.2 dist you'll see folders for 2.9.4. I'll send an email to the user and dev lists once i get the website to update From: Trevor Watson Sent: 11/30/2011 8:14 AM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2 You said pre 2.9.3 I checked the apache lucene.net page to try to see if I could get a copy of 2.9.3, but it was never on the site, just 2.9.2.2 and 2.9.4(g). Was this an un-released version? Or am I looking in the wrong spot for updates to lucene.net? Thanks for all your help On Tue, Nov 29, 2011 at 2:59 PM, Trevor Watson powersearchsoftw...@gmail.com wrote: I can send you the dll that I am using if you would like. The documents are _mostly_ small documents. Emails and office docs size of plain text On Tuesday, November 29, 2011, Christopher Currens currens.ch...@gmail.com wrote: Do you know how big the documents are that you are trying to delete/update? I'm trying to find a copy of 2.9.2 to see if I can reproduce it. Thanks, Christopher On Tue, Nov 29, 2011 at 9:11 AM, Trevor Watson powersearchsoftw...@gmail.com wrote: Sorry for the duplicate post. I was on the road and posted both via my web mail and office mail by mistake The increase is a very gradual, the program starts at about 160,000k according to task manager (I know that's not entirely accurate, but it was the best I had at the time) and would, after adding 25,000-40,000 result in an out of memory exception (800,000k according to taskmanager). I tried building a copy of 2.9.4 to test, but could not find one that worked in visual studio 2005 I did notice using Ants memory profiler that there were a number of byte[32789] arrays that I didn't know where
[jira] [Created] (LUCENE-3612) remove _X.fnx
remove _X.fnx - Key: LUCENE-3612 URL: https://issues.apache.org/jira/browse/LUCENE-3612 Project: Lucene - Java Issue Type: Task Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3612.patch Currently we store a global (not per-segment) field number-name mapping in _X.fnx However, it doesn't actually save us any performance e.g on IndexWriter's init because since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() for IFD, etc, as thats where hasProx/hasVectors is. Additionally in the past global files like shared doc stores have caused us problems, (recently we just fixed a bug related to this file in LUCENE-3601). Finally this is trouble for backwards compatibility as its difficult to handle a global file with the codecs mechanism. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3612) remove _X.fnx
[ https://issues.apache.org/jira/browse/LUCENE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3612: Attachment: LUCENE-3612.patch Patch: all tests pass. before committing I think we should cleanup some apis/javadocs, remove the various versioning stuff (now unused), and not read/write it in segments files. remove _X.fnx - Key: LUCENE-3612 URL: https://issues.apache.org/jira/browse/LUCENE-3612 Project: Lucene - Java Issue Type: Task Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3612.patch Currently we store a global (not per-segment) field number-name mapping in _X.fnx However, it doesn't actually save us any performance e.g on IndexWriter's init because since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() for IFD, etc, as thats where hasProx/hasVectors is. Additionally in the past global files like shared doc stores have caused us problems, (recently we just fixed a bug related to this file in LUCENE-3601). Finally this is trouble for backwards compatibility as its difficult to handle a global file with the codecs mechanism. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2
Trevor, Unforunately I was unable to reproduce the memory leak you're experiencing in 2.9.2. Particularly with byte[], of the 18,277 that were created, only 13 were not garbage collected, and it's likely that they are not related to Lucene (it's possible they are static, therefore would only be destroyed with the AppDomain, outside of what the profiler can trace). I tried to emulate the code you showed us and there were no signs of any allocated arrays that weren't cleaned up. That doesn't mean there isn't one in your code, but I just can't reproduce it with what you've shown us. If it's possible you can write a small program that has the same behavior, that could help us track it down. As a side note, what was a little disconcerting, though, was in 2.9.4 with the same code, it created 28,565 byte[], and there was quite a few more left uncollected (2,805 arrays). The allocations are happening in DocumentsWriter.ByteBlockAllocator, I'll have to look at it later though, to see if its even a problem. Thanks, Christopher On Wed, Nov 30, 2011 at 12:41 PM, Granroth, Neal V. neal.granr...@thermofisher.com wrote: Or maybe put the changes within a conditional compile code block? Thanks DIGY, works great. - Neal -Original Message- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: Wednesday, November 30, 2011 2:35 PM To: lucene-net-...@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Probably makes for a good wiki entry Sent from my Windows Phone From: Digy Sent: 11/30/2011 12:04 PM To: lucene-net-...@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 OK, here is the code that can be compiled against .NET 2.0 http://pastebin.com/k2f7JfPd DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, November 30, 2011 9:26 PM To: lucene-net-...@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 DIGY, Thanks for the tip, but could you be a little more specific? Where and how are extension-methods calls replaced? For example, how would I change the CloseableThreadLocalExtensions method public static void SetT(this ThreadLocalT t, T val) { t.Value = val; } to eliminate the compile error Error 2 Cannot define a new extension method because the compiler required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be found. Are you missing a reference to System.Core.dll? due to the SetT parameter this ThreadLocalt t ? - Neal -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Wednesday, November 30, 2011 12:27 PM To: lucene-net-...@lucene.apache.org Subject: RE: [Lucene.Net] Re: Memory Leak in 2.9.2.2 FYI, 2.9.4 can be compiled against .Net 2.0 with a few minor changes in CloseableThreadLocal (like uncommenting ThreadLocalT class and replacing extension-methods calls with static calls to CloseableThreadLocalExtensions) DIGY -Original Message- From: Christopher Currens [mailto:currens.ch...@gmail.com] Sent: Wednesday, November 30, 2011 7:26 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Re: Memory Leak in 2.9.2.2 Trevor, I'm not sure if you can use 2.9.4, though, it looks like you're using VS2005 and .NET 2.0. 2.9.4 targets .NET 4.0, and I'm fairly certain we use classes only available in 4.0 (or 3.5?). However, if you can, I would suggest updating, as 2.9.4 should be a fairly stable release. The leak I'm talking about is addressed here: https://issues.apache.org/jira/browse/LUCENE-2467, and the ported code isn't available in 2.9.2, but I've confirmed the patch is in 2.9.4. It may or may not be what your issue is. You say that it was at one time working fine, I assume you mean no memory leak. I would take some time to see what else in your code has changed. Make sure you're calling Close on whatever needs to be closed (IndexWriter/IndexReader/Analyzers, etc). Unfortunately for us, memory leaks are hard to debug over email, and it's difficult for us to tell if it's any change to your code or an issue with Lucene .NET. As far as I can tell, this is the only memory leak I can find that affects 2.9.2. Thanks, Christopher On Wed, Nov 30, 2011 at 8:25 AM, Prescott Nasser geobmx...@hotmail.comwrote: We just released 2.9.4 - the website didn't update last night, so ill have to try and update it later today. But if you follow the link to download 2.9.2 dist you'll see folders for 2.9.4. I'll send an email to the user and dev lists once i get the website to update From: Trevor Watson Sent: 11/30/2011 8:14 AM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Re: Memory Leak in 2.9.2.2 You said pre 2.9.3 I checked the apache lucene.net
[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160406#comment-13160406 ] Erick Erickson commented on SOLR-2921: -- Not synonyms - agreed. Not shinglers - agreed. Not anything that might produce multiple tokens - agreed. Stemmers... When do stemmers produce multiple tokens? My ignorance of all the possibilities knows no bounds. I was wondering if, in this case, stemmers really reduced to prefix queries. Maybe it's just a bad idea altogether, I guess it begs the question of what use adding stemmers to the mix would be. You want to match the root, just specify the root with an asterisk and be done with it. No need to introduce stemming into the MultiTermAwareComponent mix. But this kind of question is exactly why I have this JIRA in place, we can collect reasons I wouldn't think of and record them. Before I mess something up with well-intentioned-but-wrong help. Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160416#comment-13160416 ] Robert Muir commented on SOLR-2921: --- {quote} I guess it begs the question of what use adding stemmers to the mix would be. {quote} I agree with Mike. Most stemmers are basically suffix-strippers and use heuristics like term length. They are not going to work with the syntax of various multitermqueries. no stemmer is going to stem dogs* to dog*. some might remove any non-alpha characters completely, and its not a bug that they do this. they are heuristical in nature and designed to work on natural language text... not syntax. Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160433#comment-13160433 ] Mark Miller commented on SOLR-2805: --- I tend to limit by default and open when needed I guess. FWIW, I've got code for this in the solrcloud branch now. It lets you temporily launch a zk server, connects to it, and uploads a set of conf files by calling (from the /solr folder in a checkout): java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 127.0.0.1:9983 127.0.0.1 8983 solr ../example/solr/conf conf1 example/solr/zoo_data Add a main method to ZkController so that it's easier to script config upload with SolrCloud Key: SOLR-2805 URL: https://issues.apache.org/jira/browse/SOLR-2805 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0 when scripting a cluster setup, it would be nice if it was easy to upload a set of configs - otherwise you have to wait to start secondary servers until the first server has uploaded the config - kind of a pain You should be able to do something like: java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf conf1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2805) Add a main method to ZkController so that it's easier to script config upload with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160433#comment-13160433 ] Mark Miller edited comment on SOLR-2805 at 11/30/11 11:05 PM: -- I tend to limit by default and open when needed I guess. FWIW, I've got code for this in the solrcloud branch now. It lets you temporily launch a zk server, connects to it, and uploads a set of conf files by calling (from the /solr folder in a checkout): {noformat}java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 127.0.0.1:9983 127.0.0.1 8983 solr ../example/solr/conf conf1 example/solr/zoo_data{noformat} was (Author: markrmil...@gmail.com): I tend to limit by default and open when needed I guess. FWIW, I've got code for this in the solrcloud branch now. It lets you temporily launch a zk server, connects to it, and uploads a set of conf files by calling (from the /solr folder in a checkout): java -classpath lib/*:dist/* org.apache.solr.cloud.ZkController 127.0.0.1:9983 127.0.0.1 8983 solr ../example/solr/conf conf1 example/solr/zoo_data Add a main method to ZkController so that it's easier to script config upload with SolrCloud Key: SOLR-2805 URL: https://issues.apache.org/jira/browse/SOLR-2805 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0 when scripting a cluster setup, it would be nice if it was easy to upload a set of configs - otherwise you have to wait to start secondary servers until the first server has uploaded the config - kind of a pain You should be able to do something like: java -classpath .:* org.apache.solr.cloud.ZkController 127.0.0.1:9983 127.0.0.1 8983 solr /home/mark/workspace/SolrCloud/solr/example/solr/conf conf1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3612) remove _X.fnx
[ https://issues.apache.org/jira/browse/LUCENE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160467#comment-13160467 ] Uwe Schindler commented on LUCENE-3612: --- +1 remove _X.fnx - Key: LUCENE-3612 URL: https://issues.apache.org/jira/browse/LUCENE-3612 Project: Lucene - Java Issue Type: Task Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3612.patch Currently we store a global (not per-segment) field number-name mapping in _X.fnx However, it doesn't actually save us any performance e.g on IndexWriter's init because since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() for IFD, etc, as thats where hasProx/hasVectors is. Additionally in the past global files like shared doc stores have caused us problems, (recently we just fixed a bug related to this file in LUCENE-3601). Finally this is trouble for backwards compatibility as its difficult to handle a global file with the codecs mechanism. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0
[ https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160515#comment-13160515 ] Uwe Schindler commented on LUCENE-3606: --- OK, I will work on this as soon as I can (next weekend). I will be gald to remove the copy-on-write setNorm stuff in Lucene40 codec and make Lucene3x codec completely read-only (only reading the newest norm file). I hope Robert will possibly help me :-) Make IndexReader really read-only in Lucene 4.0 --- Key: LUCENE-3606 URL: https://issues.apache.org/jira/browse/LUCENE-3606 Project: Lucene - Java Issue Type: Task Components: core/index Affects Versions: 4.0 Reporter: Uwe Schindler As we change API completely in Lucene 4.0 we are also free to remove read-write access and commits from IndexReader. This code is so hairy and buggy (as investigated by Robert and Mike today) when you work on SegmentReader level but forget to flush in the DirectoryReader, so its better to really make IndexReaders readonly. Currently with IndexReader you can do things like: - delete/undelete Documents - Can be done by with IndexWriter, too (using deleteByQuery) - change norms - this is a bad idea in general, but when we remove norms at all and replace by DocValues this is obsolete already. Changing DocValues should also be done using IndexWriter in trunk (once it is ready) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0
[ https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-3606: - Assignee: Uwe Schindler Make IndexReader really read-only in Lucene 4.0 --- Key: LUCENE-3606 URL: https://issues.apache.org/jira/browse/LUCENE-3606 Project: Lucene - Java Issue Type: Task Components: core/index Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler As we change API completely in Lucene 4.0 we are also free to remove read-write access and commits from IndexReader. This code is so hairy and buggy (as investigated by Robert and Mike today) when you work on SegmentReader level but forget to flush in the DirectoryReader, so its better to really make IndexReaders readonly. Currently with IndexReader you can do things like: - delete/undelete Documents - Can be done by with IndexWriter, too (using deleteByQuery) - change norms - this is a bad idea in general, but when we remove norms at all and replace by DocValues this is obsolete already. Changing DocValues should also be done using IndexWriter in trunk (once it is ready) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0
[ https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160559#comment-13160559 ] Robert Muir commented on LUCENE-3606: - {quote} finally, holy grail where similarities can declare the normalization factor(s) they need, using byte/float/int whatever, and its all unified with the docvalues api. IndexReader.norms() maybe goes away here, and maybe NormsFormat too. {quote} Thinking about this: a clean way to do it would be for Similarity to get a new method: {code} ValueType getValueType(); {code} and we would change: {code} byte computeNorm(FieldInvertState state); {code} to: {code} void computeNorm(FieldInvertState state, PerDocFieldValues norm); {code} Sims that want to encode multiple index-time scoring factors separately could just use BYTES_FIXED_STRAIGHT. This should be only for some rare sims anyway, because a Sim can pull named 'application' specific scoring factors from IR.perDocValues() today already. Its not too crazy either since sims are already doing their own encoding, so e.g. default sim would just use FIXED_INTS_8. People that don't want to mess with bytes or smallfloat could use things like FLOAT_32 if they want and need this. we would just change FieldInfo.omitNorms to instead be FieldInfo.normValueType, which is the value type of the norm (null if its omitted, just like docValueType). Preflex FieldInfosReader would just set FIXED_INTS_8 or null, based on whether the fieldinfos had omitNorms or not. it doesnt support any other types... Finally then, sims would be own their scoring factors, and we could even remove omitNorms from Field/FieldType etc (just use the correct scoring algorithm for the field, if you don't want norms, use a sim that doesn't need them for scoring) This would remove the awkward/messy situation where every similarity implementation we have has to 'downgrade' itself to handle things like if the user decided to omit parts of their formula! Make IndexReader really read-only in Lucene 4.0 --- Key: LUCENE-3606 URL: https://issues.apache.org/jira/browse/LUCENE-3606 Project: Lucene - Java Issue Type: Task Components: core/index Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler As we change API completely in Lucene 4.0 we are also free to remove read-write access and commits from IndexReader. This code is so hairy and buggy (as investigated by Robert and Mike today) when you work on SegmentReader level but forget to flush in the DirectoryReader, so its better to really make IndexReaders readonly. Currently with IndexReader you can do things like: - delete/undelete Documents - Can be done by with IndexWriter, too (using deleteByQuery) - change norms - this is a bad idea in general, but when we remove norms at all and replace by DocValues this is obsolete already. Changing DocValues should also be done using IndexWriter in trunk (once it is ready) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160576#comment-13160576 ] Mike Sokolov commented on SOLR-2921: I spoke hastily, and it's true that stemmers are different from those other multi-token things. It would be kind of nice if it were possible to have a query for do?s actually match the a document containing dogs, even when matching against a stemmed field, but I don't see how to do it without breaking all kinds of other things. Consider how messed up range queries would get. [dogs TO *] would match doge, doggone, and other words in [dog TO dogs] which would be totally counterintuitive. Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160593#comment-13160593 ] Robert Muir commented on SOLR-2921: --- well Erick i think the ones you listed here are ok. There are cases where they won't work correctly, but trying to do multitermqueries with mappingcharfilter and asciifolding filter are already problematic (eg ? won't match œ because its now 'oe'). Personally i think this is fine, but we should document that things don't work correctly all the time, and we should not make changes to analysis components to try to make them cope with multiterm queries syntax or anything (this would be bad design, it turns them into queryparsers). If the user cares about the corner cases, then they just specify the chain. Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements
[ https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160607#comment-13160607 ] Grant Ingersoll commented on SOLR-1726: --- Hi Manoj, This looks OK as a start. Would be nice to have tests to go with it. Why the overriding of getTotalHits on the TopScoreDocCollector? I don't think returning collectedHits is the right thing to do there. Also, you should be able to avoid an extra Collector create call at: {code} topCollector = TopScoreDocCollector.create(len, true); //Issue 1726 Start if(cmd.getScoreDoc() != null) { topCollector = TopScoreDocCollector.create(len, cmd.getScoreDoc(), true); //create the Collector with InOrderPagingCollector } {code} But that is easy enough to fix. Deep Paging and Large Results Improvements -- Key: SOLR-1726 URL: https://issues.apache.org/jira/browse/SOLR-1726 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.6, 4.0 Attachments: CommonParams.java, QParser.java, QueryComponent.java, ResponseBuilder.java, SOLR-1726.patch, SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java There are possibly ways to improve collections of deep paging by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215. There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160612#comment-13160612 ] Erick Erickson commented on SOLR-2921: -- Mike: stemmers - not going to make them MultiTermAware. No way. No how. Not on my watch, one succinct example and I'm convinced. The beauty of the way Yonik and Robert directed this is that we can take care of the 80% case, not provide things that are *that* surprising and still have all the flexibility available to those who really need it. As Robert says, if they really want some interesting behavior, they can specify the complete chain. Robert: I guess I'm at a loss as to how to write tests for the various filters and tokenizers I listed, which is why I'm reluctant to just make them MultTermAwareComponents. Do you have any suggestions as to how I could get tests? I had enough surprises when I ran the tests in English that I'm reluctant to just plow ahead. As far as I understand, Arabic is caseless for instance. I totally agree with your point that making the analysis components cope with syntax is evil. Not going there either. Maybe the right action is to wait for someone to volunteer to be the guinea pig for the various filters, I suppose we could advertise for volunteers... Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements
[ https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160620#comment-13160620 ] Manojkumar Rangasamy Kannadasan commented on SOLR-1726: --- Hi Grant, thanks for your comments. Regarding the collectedHits, if there are 4 docs as results and if we want to return only bottom 2 by giving appropriate pageScore and pageDoc, the expected result is to return only 2 docs as results. But totalHits returns all the 4 docs. Thats the reason i used collectedHits. Kindly correct me if my understanding is wrong. Deep Paging and Large Results Improvements -- Key: SOLR-1726 URL: https://issues.apache.org/jira/browse/SOLR-1726 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.6, 4.0 Attachments: CommonParams.java, QParser.java, QueryComponent.java, ResponseBuilder.java, SOLR-1726.patch, SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java There are possibly ways to improve collections of deep paging by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215. There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements
[ https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160627#comment-13160627 ] Grant Ingersoll commented on SOLR-1726: --- totalHits should return the count of all the hits regardless of the number that are actually being collected. In other words, totalHits could be a million, but we only return the top 10. collectedHits only returns the count of how many are being returned. Deep Paging and Large Results Improvements -- Key: SOLR-1726 URL: https://issues.apache.org/jira/browse/SOLR-1726 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.6, 4.0 Attachments: CommonParams.java, QParser.java, QueryComponent.java, ResponseBuilder.java, SOLR-1726.patch, SolrIndexSearcher.java, TopDocsCollector.java, TopScoreDocCollector.java There are possibly ways to improve collections of deep paging by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215. There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160683#comment-13160683 ] Mikhail Khludnev commented on SOLR-2382: I spawned subtask SOLR-2933 DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter_standalone.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-properties.patch, SOLR-2382-properties.patch, SOLR-2382-solrwriter-verbose-fix.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, TestCachedSqlEntityProcessor.java-break-where-clause.patch, TestCachedSqlEntityProcessor.java-fix-where-clause-by-adding-cachePk-and-lookup.patch, TestCachedSqlEntityProcessor.java-wrong-pk-detected-due-to-lack-of-where-support.patch, TestThreaded.java.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously,
[jira] [Commented] (LUCENE-3612) remove _X.fnx
[ https://issues.apache.org/jira/browse/LUCENE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160689#comment-13160689 ] Simon Willnauer commented on LUCENE-3612: - +1 remove _X.fnx - Key: LUCENE-3612 URL: https://issues.apache.org/jira/browse/LUCENE-3612 Project: Lucene - Java Issue Type: Task Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3612.patch Currently we store a global (not per-segment) field number-name mapping in _X.fnx However, it doesn't actually save us any performance e.g on IndexWriter's init because since LUCENE-2984 we are to loading the fieldinfos anyway to compute files() for IFD, etc, as thats where hasProx/hasVectors is. Additionally in the past global files like shared doc stores have caused us problems, (recently we just fixed a bug related to this file in LUCENE-3601). Finally this is trouble for backwards compatibility as its difficult to handle a global file with the codecs mechanism. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160688#comment-13160688 ] Noble Paul commented on SOLR-2382: -- @James Yes create a new issue for all the further functionalities and let's close this one DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter_standalone.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-properties.patch, SOLR-2382-properties.patch, SOLR-2382-solrwriter-verbose-fix.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, TestCachedSqlEntityProcessor.java-break-where-clause.patch, TestCachedSqlEntityProcessor.java-fix-where-clause-by-adding-cachePk-and-lookup.patch, TestCachedSqlEntityProcessor.java-wrong-pk-detected-due-to-lack-of-where-support.patch, TestThreaded.java.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6.