Re: [Lucene.Net] 2.9.4
I thought it was: 2.9.2 and before are 2.0 compatible 2.9.4 and before are 3.5 compatible After 2.9.4 are 4.0 compatible Thanks, Troy On Wed, Sep 21, 2011 at 10:15 AM, Michael Herndon mhern...@wickedsoftware.net wrote: if thats the case, then well need conditional statements for including ThreadLocalT On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser geobmx...@hotmail.comwrote: I thought this was after 2.9.4 Sent from my Windows Phone -Original Message- From: Michael Herndon Sent: Wednesday, September 21, 2011 8:30 AM To: lucene-net-dev@lucene.apache.org Cc: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 @Robert, I believe the overwhelming consensus on the mailing list vote was to move to .NET 4.0 and drop support for previous versions. I'll take care of build scripts issue while they being refactored into smaller chunks this week. @Troy, Agreed. On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote: On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/**lucene.net/commit/** c5218bca56c19b3407648224781eec**7316994a39 https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39 https://github.com/robert-j/**lucene.net/commit/** 50bad187655d59968d51d472b57c2a**40e201d663 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/**lucene.net/commit/** 23ea6f52362fc7dbce48fd012cea12**9a7350c73c https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c Did we agree about abandoning .NET = 3.5? Robert
Re: [Lucene.Net] 2.9.4
@all, I updated the build scripts to increase it's granularity. https://cwiki.apache.org/LUCENENET/build-system-scripts.html Similarity was include, though are there any tests for this project ? Some of the contrib tests are failing, I saw a few in Contrib.Highlighter just glancing at the output . I recieved some feedback Eric Woodruff. It looks like SHFB Sandcastle generate a plain file html, its been staring me in the face this whole time. I'll need to build in some targets that extract whats needed to push to site branch. Then I'll start working on nuget. @Prescott, Can the volatile fields be wrapped in a lock statement and code that access those fields with replaced with call to a property /method that wraps access to that field? On Wed, Sep 21, 2011 at 1:36 PM, Troy Howard thowar...@gmail.com wrote: I thought it was: 2.9.2 and before are 2.0 compatible 2.9.4 and before are 3.5 compatible After 2.9.4 are 4.0 compatible Thanks, Troy On Wed, Sep 21, 2011 at 10:15 AM, Michael Herndon mhern...@wickedsoftware.net wrote: if thats the case, then well need conditional statements for including ThreadLocalT On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser geobmx...@hotmail.com wrote: I thought this was after 2.9.4 Sent from my Windows Phone -Original Message- From: Michael Herndon Sent: Wednesday, September 21, 2011 8:30 AM To: lucene-net-dev@lucene.apache.org Cc: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 @Robert, I believe the overwhelming consensus on the mailing list vote was to move to .NET 4.0 and drop support for previous versions. I'll take care of build scripts issue while they being refactored into smaller chunks this week. @Troy, Agreed. On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote: On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/**lucene.net/commit/** c5218bca56c19b3407648224781eec**7316994a39 https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39 https://github.com/robert-j/**lucene.net/commit/** 50bad187655d59968d51d472b57c2a**40e201d663 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/**lucene.net/commit/** 23ea6f52362fc7dbce48fd012cea12**9a7350c73c https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c Did we agree about abandoning .NET = 3.5? Robert
Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
Use a Lucene.Net core package for the core, and separate packages for each contrib. That makes the most sense, and that is how most projects work. This is also how Java Lucene does. Don't create a nightly nuget package - nuget should only be used for distribution packages On Wed, Sep 21, 2011 at 6:56 AM, Michael Herndon mhern...@wickedsoftware.net wrote: We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-dev@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns.
RE: [Lucene.Net] 2.9.4
Similarity was include, though are there any tests for this project ? Similarity is obsolete (Queries.Net replaces it has test cases). It has already been removed in 2.9.4g DIGY -Original Message- From: Michael Herndon [mailto:mhern...@wickedsoftware.net] Sent: Wednesday, September 21, 2011 10:40 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] 2.9.4 @all, I updated the build scripts to increase it's granularity. https://cwiki.apache.org/LUCENENET/build-system-scripts.html Similarity was include, though are there any tests for this project ? Some of the contrib tests are failing, I saw a few in Contrib.Highlighter just glancing at the output . I recieved some feedback Eric Woodruff. It looks like SHFB Sandcastle generate a plain file html, its been staring me in the face this whole time. I'll need to build in some targets that extract whats needed to push to site branch. Then I'll start working on nuget. @Prescott, Can the volatile fields be wrapped in a lock statement and code that access those fields with replaced with call to a property /method that wraps access to that field? On Wed, Sep 21, 2011 at 1:36 PM, Troy Howard thowar...@gmail.com wrote: I thought it was: 2.9.2 and before are 2.0 compatible 2.9.4 and before are 3.5 compatible After 2.9.4 are 4.0 compatible Thanks, Troy On Wed, Sep 21, 2011 at 10:15 AM, Michael Herndon mhern...@wickedsoftware.net wrote: if thats the case, then well need conditional statements for including ThreadLocalT On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser geobmx...@hotmail.com wrote: I thought this was after 2.9.4 Sent from my Windows Phone -Original Message- From: Michael Herndon Sent: Wednesday, September 21, 2011 8:30 AM To: lucene-net-dev@lucene.apache.org Cc: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 @Robert, I believe the overwhelming consensus on the mailing list vote was to move to .NET 4.0 and drop support for previous versions. I'll take care of build scripts issue while they being refactored into smaller chunks this week. @Troy, Agreed. On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote: On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/**lucene.net/commit/** c5218bca56c19b3407648224781eec**7316994a39 https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec 7316994a39 https://github.com/robert-j/**lucene.net/commit/** 50bad187655d59968d51d472b57c2a**40e201d663 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a 40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/**lucene.net/commit/** 23ea6f52362fc7dbce48fd012cea12**9a7350c73c https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a 7350c73c Did we agree about abandoning .NET = 3.5? Robert - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11
RE: [Lucene.Net] 2.9.4
@Robert Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: There is a commented part at the end of the CloseableThreadLocal which may seem familiar to you :) No harm in uncommenting it and no conditional compilation is needed. It also pass all test cases. DIGY -Original Message- From: Robert Jordan [mailto:robe...@gmx.net] Sent: Wednesday, September 21, 2011 3:09 PM To: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec 7316994a39 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a 40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a 7350c73c Did we agree about abandoning .NET = 3.5? Robert - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11
RE: [Lucene.Net] 2.9.4
You are right in race condition NullReferenceException. but static SupportClass.WeakHashTable slots = new SupportClass.WeakHashTable(); wouldn't work since it is intented to be created in all threads not once. Would you patch it or leave it to me? Thanks, DIGY -Original Message- From: Robert Jordan [mailto:robe...@gmx.net] Sent: Thursday, September 22, 2011 1:16 AM To: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 Hi Digy, On 21.09.2011 23:38, Digy wrote: @Robert Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: There is a commented part at the end of the CloseableThreadLocal which may seem familiar to you :) Indeed :) I've missed this comment. No harm in uncommenting it and no conditional compilation is needed. It also pass all test cases. BTW, there is an issue with this commented-out code. If Value is not accessed at least once, Dispose() will fail with a NullReferenceException. There is also a little chance for a race condition. I'd rather get rid of Init() for this code: static SupportClass.WeakHashTable slots = new SupportClass.WeakHashTable(); Robert DIGY -Original Message- From: Robert Jordan [mailto:robe...@gmx.net] Sent: Wednesday, September 21, 2011 3:09 PM To: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec 7316994a39 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a 40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a 7350c73c Did we agree about abandoning .NET= 3.5? Robert - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11 - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11
Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
@Digy, that could be done post build with ILMerge or build an additional uber assembly that stores other assemblies as a resource. http://blogs.msdn.com/b/microsoft_press/archive/2010/02/03/jeffrey-richter-excerpt-2-from-clr-via-c-third-edition.aspx We can add the above to the build process if that would interest people. To some nuget is just another disruption and to others its a godsend. Some might say only hipsters would use nuget, others might say the cools kids with iphones use nuget. (or android or wp7). At the end of the day nuget or combining assemblies are just channels/ways we can make it easier for various developers to consume get their hands on Lucene.Net. If anyone else has ideas along those lines and it can be automated, post it in this thread. On Wed, Sep 21, 2011 at 6:00 PM, Digy digyd...@gmail.com wrote: Even all contribs could be a single project/assembly. That way, users could reference all contribs with a single assembly. I see no harm in putting a few KB pressure on RAM :) DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Wednesday, September 21, 2011 7:32 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts While it may be a bit redundant, why couldn't there be an individual package for each piece of contrib and a Lucene.Net Contrib (All) package that drags them all down. That way users can grab just the bit they need, or if they just want to get the whole thing, grab the All package. Thanks, Troy On Tue, Sep 20, 2011 at 9:11 PM, Aaron Powell m...@aaron-powell.com wrote: I'm going to vote +1 for granular. With the RC you could look at myget and have a Lucene.Net repository on there so people can go for unstable on myget, stables on nuget. Also, I came across this article which explains how to setup a build server to automatically push to nuget/ myget which could be useful to the maintainers: http://brendanforster.com/doing-the-build-server-dance-with-nuget.html Aaron Powell MVP - Internet Explorer (Development) | FunnelWeb Team Member http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | Github | BitBucket -Original Message- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: Wednesday, 21 September 2011 2:05 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. +1 Granular, we just need to be good about descriptions. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? Having a package for the latest RC would probably be a good idea - Checked by AVG - www.avg.com Version: 2012.0.1808 / Virus Database: 2085/4508 - Release Date: 09/20/11
RE: [Lucene.Net] 2.9.4
I reconsidered it and there is no race condition. A new slot will be created for each thread. But NullReferenceException bug is still there. DIGY -Original Message- From: Robert Jordan [mailto:robe...@gmx.net] Sent: Thursday, September 22, 2011 1:16 AM To: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 Hi Digy, On 21.09.2011 23:38, Digy wrote: @Robert Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: There is a commented part at the end of the CloseableThreadLocal which may seem familiar to you :) Indeed :) I've missed this comment. No harm in uncommenting it and no conditional compilation is needed. It also pass all test cases. BTW, there is an issue with this commented-out code. If Value is not accessed at least once, Dispose() will fail with a NullReferenceException. There is also a little chance for a race condition. I'd rather get rid of Init() for this code: static SupportClass.WeakHashTable slots = new SupportClass.WeakHashTable(); Robert DIGY -Original Message- From: Robert Jordan [mailto:robe...@gmx.net] Sent: Wednesday, September 21, 2011 3:09 PM To: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec 7316994a39 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a 40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a 7350c73c Did we agree about abandoning .NET= 3.5? Robert - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11 - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11
RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
http://blogs.msdn.com/b/microsoft_press/archive/2010/02/03/jeffrey-richter-e xcerpt-2-from-clr-via-c-third-edition.aspx Yes, this is the trick some obfuscators use.(they use also some scrambling fxns to hide the code in resource) DIGY -Original Message- From: Michael Herndon [mailto:mhern...@wickedsoftware.net] Sent: Thursday, September 22, 2011 1:36 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts @Digy, that could be done post build with ILMerge or build an additional uber assembly that stores other assemblies as a resource. http://blogs.msdn.com/b/microsoft_press/archive/2010/02/03/jeffrey-richter-e xcerpt-2-from-clr-via-c-third-edition.aspx We can add the above to the build process if that would interest people. To some nuget is just another disruption and to others its a godsend. Some might say only hipsters would use nuget, others might say the cools kids with iphones use nuget. (or android or wp7). At the end of the day nuget or combining assemblies are just channels/ways we can make it easier for various developers to consume get their hands on Lucene.Net. If anyone else has ideas along those lines and it can be automated, post it in this thread. On Wed, Sep 21, 2011 at 6:00 PM, Digy digyd...@gmail.com wrote: Even all contribs could be a single project/assembly. That way, users could reference all contribs with a single assembly. I see no harm in putting a few KB pressure on RAM :) DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Wednesday, September 21, 2011 7:32 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts While it may be a bit redundant, why couldn't there be an individual package for each piece of contrib and a Lucene.Net Contrib (All) package that drags them all down. That way users can grab just the bit they need, or if they just want to get the whole thing, grab the All package. Thanks, Troy On Tue, Sep 20, 2011 at 9:11 PM, Aaron Powell m...@aaron-powell.com wrote: I'm going to vote +1 for granular. With the RC you could look at myget and have a Lucene.Net repository on there so people can go for unstable on myget, stables on nuget. Also, I came across this article which explains how to setup a build server to automatically push to nuget/ myget which could be useful to the maintainers: http://brendanforster.com/doing-the-build-server-dance-with-nuget.html Aaron Powell MVP - Internet Explorer (Development) | FunnelWeb Team Member http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | Github | BitBucket -Original Message- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: Wednesday, 21 September 2011 2:05 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. +1 Granular, we just need to be good about descriptions. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? Having a package for the latest RC would probably be a good idea - Checked by AVG - www.avg.com Version: 2012.0.1808 / Virus Database: 2085/4508 - Release Date: 09/20/11 - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11
RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
Any particular reason you guys are not interested in NuGet? Aaron Powell MVP - Internet Explorer (Development) | FunnelWeb Team Member http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | Github | BitBucket -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Thursday, 22 September 2011 7:42 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Sorry, but I feel the same as Neal. DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, September 21, 2011 6:08 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts No interest in Nuget whatsoever. - Neal -Original Message- From: Michael Herndon [mailto:mhern...@wickedsoftware.net] Sent: Tuesday, September 20, 2011 10:57 PM To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-dev@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns. - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11
RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
I am not against it, but personally think it as a toy. I am from the generation where people used vi to write codes. DIGY -Original Message- From: Aaron Powell [mailto:m...@aaron-powell.com] Sent: Thursday, September 22, 2011 1:56 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Any particular reason you guys are not interested in NuGet? Aaron Powell MVP - Internet Explorer (Development) | FunnelWeb Team Member http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | Github | BitBucket -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Thursday, 22 September 2011 7:42 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Sorry, but I feel the same as Neal. DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, September 21, 2011 6:08 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts No interest in Nuget whatsoever. - Neal -Original Message- From: Michael Herndon [mailto:mhern...@wickedsoftware.net] Sent: Tuesday, September 20, 2011 10:57 PM To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-dev@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns. - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11 - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11
RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
Not that old :) DIGY -Original Message- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: Thursday, September 22, 2011 2:14 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Punch cards or bust! Sent from my Windows Phone -Original Message- From: Digy Sent: Wednesday, September 21, 2011 4:06 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts I am not against it, but personally think it as a toy. I am from the generation where people used vi to write codes. DIGY -Original Message- From: Aaron Powell [mailto:m...@aaron-powell.com] Sent: Thursday, September 22, 2011 1:56 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Any particular reason you guys are not interested in NuGet? Aaron Powell MVP - Internet Explorer (Development) |�FunnelWeb Team Member http://apowell.me�|�http://twitter.com/slace�| Skype: aaron.l.powell | Github | BitBucket -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Thursday, 22 September 2011 7:42 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Sorry, but I feel the same as Neal. DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, September 21, 2011 6:08 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts No interest in Nuget whatsoever. - Neal -Original Message- From: Michael Herndon [mailto:mhern...@wickedsoftware.net] Sent: Tuesday, September 20, 2011 10:57 PM To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-dev@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns. - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11 - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11 - Checked by AVG - www.avg.com Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11
Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
Nick, The last e-mail was out of line and out of context. If anything, emails like that can push people into emotional or motivational apathy towards working on a project. 1) Lucene.Net will be getting nuget packages. People can hate on it, grumble, or not use it, but its a viable distribution vehicle. Its going in. This thread was to gather feedback on how people that would use it, see themselves using it. 2) Others might want alternatives to nuget that have not been provided yet. We should be open to providing distribution alternatives if enough people warrant it. Its not apathetic or impassive to think to that there might be more than one way to distribute releases. 3) Attack problems. Not people. If you believe a person is the problem, take the issue up with them offline. Those kinds of things are better face to face or through a phone call, or an exceptionally clear e-mail. Its way too easy for people to read into things too much or take things out of context in an e-mail. Attacking people also distracts people from focusing on the actual issue and prevents any actually logic or reason or sound argument from being heard. Its a good way to alienate people that you should actually be trying to persuade. 4) If I was actually apathetic and severely short sighted, I would not be spending my own vacation time this weekend automating nuget packages with the build scripts for Lucene.Net or experimenting Portable Library Tools for Lucene.Net 4.x to see if we can get it working on mobile. Nor would I have spent my last 4 day weekend setting up jenkins and local builds of Lucene.Net. Or put in the hours today to make sure the build scripts are granular enough to implement the smaller packages. 5) If you feel so passionately about all this, why not work towards being a contributor or committer and lead by example ? - Michael Since I'm the one implementing Nuget into the build process and I have not played with the nuget server or creating a package, it just seem wise to gather feedback on how people saw themselves using the contrib packages. On Wed, Sep 21, 2011 at 9:00 PM, Nicholas Paldino [.NET/C# MVP] casper...@caspershouse.com wrote: With all due respect, it's myopic opinions like yours and Michael's (his leans more towards apathy) which will harm the ability to get the project into the hands of people. I think (hope?) it can be agreed upon that the more that people are aware of Lucene.NET, the better it is for the project in general, and most importantly, the more potential that you have that someone will *contribute back* to it (and given what Lucene.NET has gone through in the past year, it desperately needs that participation). The fact of the matter is that Nuget puts packages in the hands of .NET developers, that leads to exposure and regardless of personal opinions on whether or not they *like* Nuget, it can't be denied that it's an *extremely* popular way to get libraries into people's projects. If you want to quibble over the actual numbers (and the definition of extremely popular) then that's fine, but here are the numbers you want: http://stats.nuget.org/ If you want to just tell that audience to take a leap, that's fine, but I think it would be foolish to do so otherwise. Additionally, given that Lucene.NET is already on Nuget, isn't there *any* concern that there isn't an official distro? Aren't you concerned about the integrity of the brand that so many of you fought to keep alive over the past year? There's no guarantee that what's on Nuget will be the official releases/builds that come out of this project, and I'm a little surprised there isn't more concern over that aspect either. Just my $0.02 - Nick -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Wednesday, September 21, 2011 7:06 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts I am not against it, but personally think it as a toy. I am from the generation where people used vi to write codes. DIGY -Original Message- From: Aaron Powell [mailto:m...@aaron-powell.com] Sent: Thursday, September 22, 2011 1:56 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Any particular reason you guys are not interested in NuGet? Aaron Powell MVP - Internet Explorer (Development) | FunnelWeb Team Member http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | Github | BitBucket -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Thursday, 22 September 2011 7:42 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Sorry, but I feel the same as Neal. DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, September 21, 2011 6:08 PM To: lucene-net-dev@lucene.apache.org Subject:
Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
Michael - Could be wrong, but I think Nick might have gotten you confused with Neal. Regardless, I completely agree with everything you just said. And, Yay for NuGet! Package management is the bomb. -T On Wed, Sep 21, 2011 at 7:43 PM, Michael Herndon mhern...@wickedsoftware.net wrote: Nick, The last e-mail was out of line and out of context. If anything, emails like that can push people into emotional or motivational apathy towards working on a project. 1) Lucene.Net will be getting nuget packages. People can hate on it, grumble, or not use it, but its a viable distribution vehicle. Its going in. This thread was to gather feedback on how people that would use it, see themselves using it. 2) Others might want alternatives to nuget that have not been provided yet. We should be open to providing distribution alternatives if enough people warrant it. Its not apathetic or impassive to think to that there might be more than one way to distribute releases. 3) Attack problems. Not people. If you believe a person is the problem, take the issue up with them offline. Those kinds of things are better face to face or through a phone call, or an exceptionally clear e-mail. Its way too easy for people to read into things too much or take things out of context in an e-mail. Attacking people also distracts people from focusing on the actual issue and prevents any actually logic or reason or sound argument from being heard. Its a good way to alienate people that you should actually be trying to persuade. 4) If I was actually apathetic and severely short sighted, I would not be spending my own vacation time this weekend automating nuget packages with the build scripts for Lucene.Net or experimenting Portable Library Tools for Lucene.Net 4.x to see if we can get it working on mobile. Nor would I have spent my last 4 day weekend setting up jenkins and local builds of Lucene.Net. Or put in the hours today to make sure the build scripts are granular enough to implement the smaller packages. 5) If you feel so passionately about all this, why not work towards being a contributor or committer and lead by example ? - Michael Since I'm the one implementing Nuget into the build process and I have not played with the nuget server or creating a package, it just seem wise to gather feedback on how people saw themselves using the contrib packages. On Wed, Sep 21, 2011 at 9:00 PM, Nicholas Paldino [.NET/C# MVP] casper...@caspershouse.com wrote: With all due respect, it's myopic opinions like yours and Michael's (his leans more towards apathy) which will harm the ability to get the project into the hands of people. I think (hope?) it can be agreed upon that the more that people are aware of Lucene.NET, the better it is for the project in general, and most importantly, the more potential that you have that someone will *contribute back* to it (and given what Lucene.NET has gone through in the past year, it desperately needs that participation). The fact of the matter is that Nuget puts packages in the hands of .NET developers, that leads to exposure and regardless of personal opinions on whether or not they *like* Nuget, it can't be denied that it's an *extremely* popular way to get libraries into people's projects. If you want to quibble over the actual numbers (and the definition of extremely popular) then that's fine, but here are the numbers you want: http://stats.nuget.org/ If you want to just tell that audience to take a leap, that's fine, but I think it would be foolish to do so otherwise. Additionally, given that Lucene.NET is already on Nuget, isn't there *any* concern that there isn't an official distro? Aren't you concerned about the integrity of the brand that so many of you fought to keep alive over the past year? There's no guarantee that what's on Nuget will be the official releases/builds that come out of this project, and I'm a little surprised there isn't more concern over that aspect either. Just my $0.02 - Nick -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Wednesday, September 21, 2011 7:06 PM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts I am not against it, but personally think it as a toy. I am from the generation where people used vi to write codes. DIGY -Original Message- From: Aaron Powell [mailto:m...@aaron-powell.com] Sent: Thursday, September 22, 2011 1:56 AM To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts Any particular reason you guys are not interested in NuGet? Aaron Powell MVP - Internet Explorer (Development) | FunnelWeb Team Member http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | Github | BitBucket -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Thursday, 22 September 2011 7:42 AM To:
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109291#comment-13109291 ] Hoss Man commented on SOLR-2787: i honestly have no idea what this request is for. an external link directive to an external http: file that supplies a (.htaccess compatible) list of known bad bot sites that solr should do what with exactly? when/how/why should solr use this (user maintained?) list of bad sites? what is the goal? add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an external link directive to an external http: file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109292#comment-13109292 ] Karl Wright commented on SOLR-1895: --- bq. The core of this path is an allow/deny matrix to lucene Query; this is applicable to many security strategies not just manifold. My hope with introducing the AccessTokenService is to separate the user-to-token mapping I agree - there should be a unified framework to the degree feasible. This would allow common testing and reasonable maintenance across Lucene and Solr versions for the future. For ManifoldCF, there's also an unrelated release-engineering question, specifically for the ManifoldCF-specific portion of the proposal, which is why we'd think introducing a code dependency on something like Solr/Lucene would be a good idea, especially since we'd be building a jar specifically for deployment within Solr. We do this reluctantly for a couple of other connectors but it's a complete one-of each time and requires a great deal of work by end users. This inconvenience greatly impacts the level of deployment of the affected connectors. Since Solr is Apache licensed we could make this easier in Solr's case, but probably not without redistributing a specific version of Solr and Lucene, and providing build targets which fire up an already configured Solr/Lucene instance. We would need this also for testing, if the plugin code lived in ManifoldCF. It is also the case that the current ManifoldCF search component needed significant rework even to build between version 3.x and version 4.x, because many of the classes that were necessary changed their packages. Thus we'd need to redistribute more than one Solr/Lucene instance, and release perhaps twice as frequently to keep up. Given all that, does everyone still think it is desirable for ManifoldCF to build Solr components itself? The alternative would be a Solr contrib module, which I'd be very happy with. To me, it is the obvious choice if you want a straightforward overall user experience. The underlying http-based protocol that the component will need to use is well-defined, quite complete, and is unlikely to change. ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time -- Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Labels: document, security, solr Fix For: 3.5, 4.0 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109292#comment-13109292 ] Karl Wright edited comment on SOLR-1895 at 9/21/11 6:06 AM: bq. The core of this path is an allow/deny matrix to lucene Query; this is applicable to many security strategies not just manifold. My hope with introducing the AccessTokenService is to separate the user-to-token mapping I agree - there should be a unified framework to the degree feasible. This would allow common testing and reasonable maintenance across Lucene and Solr versions for the future. For ManifoldCF, there's also an unrelated release-engineering question, specifically for the ManifoldCF-specific portion of the proposal. I don't understand why we'd believe that introducing a code dependency on something like Solr/Lucene would be a good idea, especially since we'd be building a jar specifically for deployment within Solr. We do this reluctantly for a couple of other connectors but it's a complete one-of each time and always requires a great deal of work by end users. This inconvenience greatly impacts the level of deployment of the affected connectors. Since Solr is Apache licensed we could make this easier in Solr's case, but probably not without redistributing a specific version of Solr and Lucene, and providing build targets which fire up an already configured Solr/Lucene instance. We would need this also for testing, if the plugin code lived in ManifoldCF. It is also the case that the current ManifoldCF search component needed significant rework even to build between version Lucene/Solr 3.x and version 4.x, because many of the classes that were used changed their packages. Thus we'd likely need to redistribute more than one Solr/Lucene instance at a time, and release perhaps twice as frequently as we currently do just to keep up with the Solr/Lucene release schedule. Given all that, does everyone still think it is desirable for ManifoldCF to build Solr components itself? The alternative would be a Solr contrib module, which I'd be very happy with. To me, it is the obvious choice if you want a straightforward overall user experience. The underlying http-based protocol that the component will need to use is well-defined, quite complete, and is unlikely to change. The required dependencies (commons-httpclient) are already redistributed by Solr, so that shouldn't be a problem either. was (Author: kwri...@metacarta.com): bq. The core of this path is an allow/deny matrix to lucene Query; this is applicable to many security strategies not just manifold. My hope with introducing the AccessTokenService is to separate the user-to-token mapping I agree - there should be a unified framework to the degree feasible. This would allow common testing and reasonable maintenance across Lucene and Solr versions for the future. For ManifoldCF, there's also an unrelated release-engineering question, specifically for the ManifoldCF-specific portion of the proposal, which is why we'd think introducing a code dependency on something like Solr/Lucene would be a good idea, especially since we'd be building a jar specifically for deployment within Solr. We do this reluctantly for a couple of other connectors but it's a complete one-of each time and requires a great deal of work by end users. This inconvenience greatly impacts the level of deployment of the affected connectors. Since Solr is Apache licensed we could make this easier in Solr's case, but probably not without redistributing a specific version of Solr and Lucene, and providing build targets which fire up an already configured Solr/Lucene instance. We would need this also for testing, if the plugin code lived in ManifoldCF. It is also the case that the current ManifoldCF search component needed significant rework even to build between version 3.x and version 4.x, because many of the classes that were necessary changed their packages. Thus we'd need to redistribute more than one Solr/Lucene instance, and release perhaps twice as frequently to keep up. Given all that, does everyone still think it is desirable for ManifoldCF to build Solr components itself? The alternative would be a Solr contrib module, which I'd be very happy with. To me, it is the obvious choice if you want a straightforward overall user experience. The underlying http-based protocol that the component will need to use is well-defined, quite complete, and is unlikely to change. ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time -- Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109297#comment-13109297 ] Mark Dickensob commented on SOLR-2787: -- Message Yes it is a goal!!! Obviously you dont run a big Apache site (no offence) here is the list of bad bots i have so far in .htaccess I can make this a file available for apache server users. If I am in the wromg group let me know where I can lodge this request PLEASE! # Kill bad bots # RewriteCond %{HTTP_USER_AGENT} ^Web-sniffer/1 [OR] RewriteCond %{HTTP_REFERER} ^AEE- [OR] RewriteCond %{HTTP_USER_AGENT} ^Apache-HttpClient [OR] RewriteCond %{HTTP_USER_AGENT} ^Atomic_Email_Hunter [OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craft...@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^CakePHP [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^BDFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^DomainWatcher [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EMail\ Exractor [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^Fetch [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^Huawei [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} ^IlTrovatore [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Infoseek\ SideWinder [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^Jakarta [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^jikespider [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla-Firefox-Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^MyApp [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Nimo\ Software [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^swish-e [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE
[jira] [Commented] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109301#comment-13109301 ] Erik Hatcher commented on SOLR-1895: bq. The purpose of a QueryParser is to parse the query... but this does not require any parsing. Ryan - how about the term query parser? While not strictly taking a free form query string and parsing it into a Query, the general QParserPlugin is about being a Query factory taking whatever inputs it needs to construct that; parser is a bit of a misnomer with what the abstraction really defines. [I didn't understand the comment about MatchAllDocsQuery earlier either, as that doesn't seem necessary here] bq. I think the bigger question is do we want any security scaffolding in solr, or is this something that should always be delegated elsewhere In this case, it really boils down to generating a handful of wildcard queries, it looks like, but in an MCF-specific way. I'm not sure this is, yet, a pressing need to generalize a security framework within Solr, as it's _just_ a Query generator. Regarding the location of this capability - a Solr contrib works for me. It's tricky business deciding where to put glue code between two projects (e.g. MCF contains a Solr indexer, using this same logic, though, why shouldn't it also be in a Solr contrib/mcf too?). Perhaps the real deciding factor is a practical choice of where the maintainers of this best can work on it - and in this case it'd be MCF so that that community can maintain it directly rather than through JIRA patches and committers that aren't using MCF. But again though, in this case I'm fine with it living in Solr contrib/mcf. ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time -- Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Labels: document, security, solr Fix For: 3.5, 4.0 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109308#comment-13109308 ] Mark Dickensob commented on SOLR-2787: -- Also bad IP addresses # Harvester Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)- Russia deny from 31.184.238. # Discobot deny from 38.101.148.126 # Harvester Washington, United States deny from 38.127.197.104 # Harvester Ukraine deny from 46.211.205.71 # Harvester Seattle, United States deny from 50.17.81.237 # Harvester Xiamen, China deny from 58.23.252.136 # Harvester Great Britain deny from 62.128.150.15 # Hacker New York, United States deny from 66.114.72.9 # Google!!! # deny from 66.249.71 # Harvester Massapequa, United States deny from 68.194.246.194 # Harvester Lake Orion, United States deny from 71.238.32.52 # Harvester San Marcos, United States deny from 72.199.108.105 # Hacker Russia deny from 77.221.130.4 # Harvester Germany deny from 79.143.182.232 # Harvester Germany deny from 79.143.182.232 # Sheffield, Great Britain deny from 81.105.137.203 # Harvester Israel deny from 82.166.235. # Hacker Höst, Germany deny from 83.169.6.156] # Harvester Netherlands deny from 85.17.147.193 # Harvester Netherlands deny from 85.201.16.158 # Harvester France deny from 87.98.187.40 # Harvester Spain deny from 87.98.228.22 # Hacker Bulgaria deny from 87.120.106.5 # Harvester Zdar Nad Sazavou, Czech Republic deny from 90.180.139.29 # Harvester London, Great Britain deny from 90.194.19. # Harvester London, Great Britain deny from 90.214.146.214 # Hacker Russian Federation deny from 91.195.124.8 # Harvester Netherlands deny from 93.190.136.5 # Harvester Italy deny from 94.23.65.72 # Hacker Bulgaria deny from 94.26.53.6 # Harvester Valencia, Spain deny from 95.19.216.61 # Harvester Germany deny from 95.169.160. # Amsterdam, Netherlands deny from 95.211.73.195 deny from trygoclio.com # Hacker El Segundo, United States deny from 96.46.227.5 # Harvester United States deny from 98.174.196.217 # Harvester United States deny from 108.27.42.190 # Fake Googlebot - Russia deny from 109.86.225.205 # Harvester Tel Aviv, Israel deny from 109.64.34.186 # Harvester Great Britain deny from 109.104.92.118 # Harvester China deny from 111.162.201.111 # Harvester China deny from 113.104.242.61 # Hacker Chinanet deny from 122.225.0.170 # Hacker Chinanet deny from 124.115.1. # Hacker Englewood, United States deny from 130.94.69.217 # Harvester Scranton, United States deny from 173.212.244.106 # Spectrum Adaptive Spider deny from 174.127.132 # Harvester China deny from 175.44.8.36 # Harvester Netherlands deny from 178.239.58.144 # Harvester São Paulo, Brazil deny from 201.95.81.134 # Atlanta, United States deny from 205.251.153.164 # Hacker USA deny from 208.79.212.174 # Ezooms deny from 208.115.111.67 # Harvester USA deny from 209.18.124.32 # Harvester Columbus, United States deny from 209.190.28.178 # Sitebot deny from 212.113.35.162 # Harvester United States, Kill subdomain deny from 212.124.113 # Hacker Great Britain deny from 213.40.79.217 # Harvester Spain deny from 213.149.247.102 # Beijing Harvester deny from 222.187.199.37 add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an external link directive to an external http: file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109315#comment-13109315 ] Jan Høydahl commented on SOLR-1895: --- {quote} bq. I think the bigger question is do we want any security scaffolding in solr, or is this something that should always be delegated elsewhere In this case, it really boils down to generating a handful of wildcard queries, it looks like, but in an MCF-specific way. I'm not sure this is, yet, a pressing need to generalize a security framework within Solr, as it's just a Query generator. {quote} Both fq and SearchComponent would work for early binding, but when we want to extend the model with an (optional) late binding, i.e. filtering search results, fq won't cut it. A SearchComponent however can be extended not only to handle early+late binding but also any other strange requirements there may be regarding security, such as authentication by IP address, peeking at other parameters, modifying the request (or response) in some way etc. These would fit as plugins to the Security SearchComponent just as AccessTokenServices (for early-binding) are in current design. I'm +1 for starting to include some built-in framework support for security, else I think we'll start seeing a multitude of different ways to integrate security which is not a competitive advantage for Solr. A SC is itself only a plugin anyway so we don't enforce anything on people, but I think it makes a huge difference that it's a plugin which ships with Solr rather than each connector having its own not-up-to-date security mechanism floating around. In Real Life™ a deployment may include a mix of MCF and non-MCF connectors; in fact we have two customers in that situation already. The ideal would be to move everything to MCF but that might not be possible due to a custom or more fine-grained security model. Such a special case is also easier to handle with SC - I don't see how to add code to merge/unify two (possibly 3rd party) QParsers, except from creating a new umbrella one. We'll keep the core layer generic and thin. AccessTokenSecurityComponent and AccessTokenService (which should perhaps be an Interface instead) go in core, while ManifoldCFAccessTokenService and others may live wherever most convenient. I, for one, would be interested in maintaining some of these classes, and also adding a Velocity demo of it all. That was my +1 for SearchComponent :) @Ryan, that's true, we only need to be concerned with authenticated user, the Velocity demo tab could simulate the rest. ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time -- Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Labels: document, security, solr Fix For: 3.5, 4.0 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-1979: -- Attachment: SOLR-1979.patch Fixed java.lang.IndexOutOfBoundsException bug in resolveLanguage() when no languages detected. Added more corner case tests. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: contrib - LangId, update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Labels: UpdateProcessor Fix For: 3.5, 4.0 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. See user documentation at http://wiki.apache.org/solr/LanguageDetection -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109332#comment-13109332 ] Uwe Schindler commented on LUCENE-3390: --- Doron: That's exactly the problem. This easy use case is problematic: You allow sorting by Price. The user can switch between forward and backward sorting. In all cases, you want all articles without a price at the beginning. To achieve this, you have to set the price value e.g. to negative_infinity for the forward sorting, but positive_infinity for backwards sorting. If now two users are using your user interface in parallel, they collide. The fix used here is identical to Lucene trunk and we should keep the code similar. FieldComparator is now almost identical between trunk and 3.x (except the new BytesRef/Docvalues stuff in trunk). Thinking more about it: Another apporoach (also possible for trunk) is to supply the missing value to FieldCache.getXxx(). The FieldCache would the first use Arrays.fill() to populate the FieldCache array with the default value and after that populate the index values. The drawback is that you get a separate FieldCache entry for each distinct missing value. For the above se case, you would have two float/double price caches. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109297#comment-13109297 ] Mark Dickensob edited comment on SOLR-2787 at 9/21/11 8:28 AM: --- Message Yes it is a goal!!! Obviously you dont run a big Apache site (no offence) here is the list of bad bots i have so far in .htaccess I can make this a file available for apache server users. If I am in the wrong group let me know where I can lodge this request PLEASE! # Kill bad bots # RewriteCond %{HTTP_USER_AGENT} ^Web-sniffer/1 [OR] RewriteCond %{HTTP_REFERER} ^AEE- [OR] RewriteCond %{HTTP_USER_AGENT} ^Apache-HttpClient [OR] RewriteCond %{HTTP_USER_AGENT} ^Atomic_Email_Hunter [OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craft...@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^CakePHP [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^BDFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^DomainWatcher [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EMail\ Exractor [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^Fetch [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^Huawei [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} ^IlTrovatore [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Infoseek\ SideWinder [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^Jakarta [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^jikespider [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla-Firefox-Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^MyApp [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Nimo\ Software [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^swish-e [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT}
[jira] [Issue Comment Edited] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109297#comment-13109297 ] Mark Dickensob edited comment on SOLR-2787 at 9/21/11 8:29 AM: --- Message Yes it is a goal!!! Obviously you dont run a big Apache site (no offence) here is the list of bad bots i have so far in .htaccess I can make this file available for apache server users via a .htaccess directive. If I am in the wrong group let me know where I can lodge this request PLEASE! # Kill bad bots # RewriteCond %{HTTP_USER_AGENT} ^Web-sniffer/1 [OR] RewriteCond %{HTTP_REFERER} ^AEE- [OR] RewriteCond %{HTTP_USER_AGENT} ^Apache-HttpClient [OR] RewriteCond %{HTTP_USER_AGENT} ^Atomic_Email_Hunter [OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craft...@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^CakePHP [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^BDFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^DomainWatcher [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EMail\ Exractor [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^Fetch [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^Huawei [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} ^IlTrovatore [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Infoseek\ SideWinder [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^Jakarta [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^jikespider [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla-Firefox-Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^MyApp [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Nimo\ Software [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^swish-e [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond
[jira] [Issue Comment Edited] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109308#comment-13109308 ] Mark Dickensob edited comment on SOLR-2787 at 9/21/11 8:32 AM: --- Also bad IP addresses # Harvester Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)- Russia deny from 31.184.238. # Harvester Washington, United States deny from 38.127.197.104 # Harvester Ukraine deny from 46.211.205.71 # Harvester Seattle, United States deny from 50.17.81.237 # Harvester Xiamen, China deny from 58.23.252.136 # Harvester Great Britain deny from 62.128.150.15 # Hacker New York, United States deny from 66.114.72.9 # deny from 66.249.71 # Harvester Massapequa, United States deny from 68.194.246.194 # Harvester Lake Orion, United States deny from 71.238.32.52 # Harvester San Marcos, United States deny from 72.199.108.105 # Hacker Russia deny from 77.221.130.4 # Harvester Germany deny from 79.143.182.232 # Harvester Germany deny from 79.143.182.232 # Sheffield, Great Britain deny from 81.105.137.203 # Harvester Israel deny from 82.166.235. # Hacker Höst, Germany deny from 83.169.6.156] # Harvester Netherlands deny from 85.17.147.193 # Harvester Netherlands deny from 85.201.16.158 # Harvester France deny from 87.98.187.40 # Harvester Spain deny from 87.98.228.22 # Hacker Bulgaria deny from 87.120.106.5 # Harvester Zdar Nad Sazavou, Czech Republic deny from 90.180.139.29 # Harvester London, Great Britain deny from 90.194.19. # Harvester London, Great Britain deny from 90.214.146.214 # Hacker Russian Federation deny from 91.195.124.8 # Harvester Netherlands deny from 93.190.136.5 # Harvester Italy deny from 94.23.65.72 # Hacker Bulgaria deny from 94.26.53.6 # Harvester Valencia, Spain deny from 95.19.216.61 # Harvester Germany deny from 95.169.160. # Amsterdam, Netherlands deny from 95.211.73.195 deny from trygoclio.com # Hacker El Segundo, United States deny from 96.46.227.5 # Harvester United States deny from 98.174.196.217 # Harvester United States deny from 108.27.42.190 # Fake Googlebot - Russia deny from 109.86.225.205 # Harvester Tel Aviv, Israel deny from 109.64.34.186 # Harvester Great Britain deny from 109.104.92.118 # Harvester China deny from 111.162.201.111 # Harvester China deny from 113.104.242.61 # Hacker Chinanet deny from 122.225.0.170 # Hacker Chinanet deny from 124.115.1. # Hacker Englewood, United States deny from 130.94.69.217 # Harvester Scranton, United States deny from 173.212.244.106 # Spectrum Adaptive Spider deny from 174.127.132 # Harvester China deny from 175.44.8.36 # Harvester Netherlands deny from 178.239.58.144 # Harvester São Paulo, Brazil deny from 201.95.81.134 # Atlanta, United States deny from 205.251.153.164 # Hacker USA deny from 208.79.212.174 # Ezooms deny from 208.115.111.67 # Harvester USA deny from 209.18.124.32 # Harvester Columbus, United States deny from 209.190.28.178 # Sitebot deny from 212.113.35.162 # Harvester United States, Kill subdomain deny from 212.124.113 # Hacker Great Britain deny from 213.40.79.217 # Harvester Spain deny from 213.149.247.102 # Beijing Harvester deny from 222.187.199.37 was (Author: goan69): Also bad IP addresses # Harvester Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)- Russia deny from 31.184.238. # Discobot deny from 38.101.148.126 # Harvester Washington, United States deny from 38.127.197.104 # Harvester Ukraine deny from 46.211.205.71 # Harvester Seattle, United States deny from 50.17.81.237 # Harvester Xiamen, China deny from 58.23.252.136 # Harvester Great Britain deny from 62.128.150.15 # Hacker New York, United States deny from 66.114.72.9 # Google!!! # deny from 66.249.71 # Harvester Massapequa, United States deny from 68.194.246.194 # Harvester Lake Orion, United States deny from 71.238.32.52 # Harvester San Marcos, United States deny from 72.199.108.105 # Hacker Russia deny from 77.221.130.4 # Harvester Germany deny from 79.143.182.232 # Harvester Germany deny from 79.143.182.232 # Sheffield, Great Britain deny from 81.105.137.203 # Harvester Israel deny from 82.166.235. # Hacker Höst, Germany deny from 83.169.6.156] # Harvester Netherlands deny from 85.17.147.193 # Harvester Netherlands deny from 85.201.16.158 # Harvester France deny from 87.98.187.40 # Harvester Spain deny from 87.98.228.22 # Hacker Bulgaria deny from 87.120.106.5 # Harvester Zdar Nad Sazavou, Czech Republic deny from 90.180.139.29 # Harvester London, Great Britain deny from 90.194.19. # Harvester London, Great Britain deny from 90.214.146.214 # Hacker Russian Federation deny from 91.195.124.8 # Harvester Netherlands deny from 93.190.136.5 # Harvester Italy deny from 94.23.65.72 # Hacker Bulgaria deny from 94.26.53.6 # Harvester Valencia, Spain deny from 95.19.216.61 # Harvester Germany deny from 95.169.160. # Amsterdam, Netherlands deny from 95.211.73.195 deny from trygoclio.com # Hacker El Segundo, United
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3390: -- Attachment: LUCENE-3390-fix-like-trunk.patch Final patch: - Added changes for backwards breaks - Removed the bogus docFreq check - Optimized the case of empty unvalued docs bit set (like in trunk) This patch is now 100% in line with trunk. The code was already tested in trunk and does not affect sort speed for the common case without missing value, as the compiler will ignore the additional null check. Will commit later this day. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Dickensob updated SOLR-2787: - Description: Include an .htaccess external link directive to an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com was: Include an external link directive to an external http: file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109342#comment-13109342 ] Mark Dickensob commented on SOLR-2787: -- Do you get it yet ? add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Dickensob updated SOLR-2787: - Description: Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com was: Include an .htaccess external link directive to an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109349#comment-13109349 ] Uwe Schindler commented on SOLR-2787: - What does this have to do with Apache Solr? add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer closed SOLR-2787. - Resolution: Invalid this issue is totally unrelated to apache solr. if at all then this might be something for httpd (http://httpd.apache.org/) Mark, this is the issue tracker for Apache Solr a fulltext search server which usually runs behind a firewall and only serves read requests to the outside. I think you used the wrong issue tracker to create your issue. In this context here your issue doesn't make sense to me either. add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3305) Kuromoji code donation - a new Japanese morphological analyzer
[ https://issues.apache.org/jira/browse/LUCENE-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109356#comment-13109356 ] Simon Willnauer commented on LUCENE-3305: - According to LEGAL-97 we can include the dict files. That means we can finish this code donation and get everything in shape for a commit. I will finish the paper work once I am back from traveling. Kuromoji code donation - a new Japanese morphological analyzer -- Key: LUCENE-3305 URL: https://issues.apache.org/jira/browse/LUCENE-3305 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Assignee: Simon Willnauer Attachments: Kuromoji short overview .pdf, ip-clearance-Kuromoji.xml, kuromoji-0.7.6-asf.tar.gz, kuromoji-0.7.6.tar.gz, kuromoji-solr-0.5.3-asf.tar.gz, kuromoji-solr-0.5.3.tar.gz Atilika Inc. (アティリカ株式会社) would like to donate the Kuromoji Japanese morphological analyzer to the Apache Software Foundation in the hope that it will be useful to Lucene and Solr users in Japan and elsewhere. The project was started in 2010 since we couldn't find any high-quality, actively maintained and easy-to-use Java-based Japanese morphological analyzers, and these become many of our design goals for Kuromoji. Kuromoji also has a segmentation mode that is particularly useful for search, which we hope will interest Lucene and Solr users. Compound-nouns, such as 関西国際空港 (Kansai International Airport) and 日本経済新聞 (Nikkei Newspaper), are segmented as one token with most analyzers. As a result, a search for 空港 (airport) or 新聞 (newspaper) will not give you a for in these words. Kuromoji can segment these words into 関西 国際 空港 and 日本 経済 新聞, which is generally what you would want for search and you'll get a hit. We also wanted to make sure the technology has a license that makes it compatible with other Apache Software Foundation software to maximize its usefulness. Kuromoji has an Apache License 2.0 and all code is currently owned by Atilika Inc. The software has been developed by my good friend and ex-colleague Masaru Hasegawa and myself. Kuromoji uses the so-called IPADIC for its dictionary/statistical model and its license terms are described in NOTICE.txt. I'll upload code distributions and their corresponding hashes and I'd very much like to start the code grant process. I'm also happy to provide patches to integrate Kuromoji into the codebase, if you prefer that. Please advise on how you'd like me to proceed with this. Thank you. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109376#comment-13109376 ] Mark Dickensob commented on SOLR-2787: -- You guys must be thick a bricks. add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2785) DateField timezone handling
[ https://issues.apache.org/jira/browse/SOLR-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Howard Cox resolved SOLR-2785. -- Resolution: Invalid DateField timezone handling --- Key: SOLR-2785 URL: https://issues.apache.org/jira/browse/SOLR-2785 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.3 Environment: Debian Gnu/Linux, OpenJDK Runtime Environment 14.0-b16 Reporter: Howard Cox Priority: Minor Labels: datetime, datetimes, schema The Solr DateField appears to only be partially ISO 8601 compliant. The DateMathParser requires Timezone modifications to be in the format +nMINUTES, +xHOURS, +yDAYS etc. [http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html] ISO 6801 states that timezone modifications should be in the format +00:01, +01:00 [http://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC] It would be useful if Solr DateField could parse both (I presume there's a reason for +nMINUTE etc somewhere in Java.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109387#comment-13109387 ] Uwe Schindler commented on SOLR-2787: - bq. You guys must be thick a bricks. You should maybe *read* what we have written before. Simon explained: Your request seems to be related to Apache HTTP Server and you should open issues at their issue tracker. Apache Solr is a different software that has nothing to do with the Apache HTTP Server. Please open a bug report at the Apache HTTP Server website: http://httpd.apache.org/. I would recommend to use a more appropriate tone when opening the issue there. add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109391#comment-13109391 ] Mark Dickensob commented on SOLR-2787: -- Nice one Uwe! add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing
[ https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109398#comment-13109398 ] Mark Dickensob commented on SOLR-2787: -- Bye the way, I was a convert from Apache from microsoft servers. Now I have changed my mind. How is that for tone! add external http: include file reference for .htaccess processing -- Key: SOLR-2787 URL: https://issues.apache.org/jira/browse/SOLR-2787 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.4 Environment: All operating systems Reporter: Mark Dickensob Labels: Spam, killer Original Estimate: 504h Remaining Estimate: 504h Include an .htaccess external link directive to include an external http:file that supplies a (.htaccess compatible) list of known bad bot sites. ie common resource for spam kill list site(s) Personally, I run a portal and I think that this feature is important to kill spam! I will supply the files for testing if you need them. Mark goan.com -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109402#comment-13109402 ] Michael McCandless commented on LUCENE-3390: I would love to take this even further, and have trunk's FC implement missing values the same way 3.x does (ie, separate FC method to getUnvaluedDocs, rather than bundling this bitset w/ the computation of the values array). But we should do that separately. This is actually a serious bug; maybe we should release 3.4.1 soon (this would also fix the Maven packaging problem in 3.4.0). Why did we need to narrow the return value from FC.getUnvaluedDocs to FixedBitSet? Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109404#comment-13109404 ] Uwe Schindler commented on LUCENE-3390: --- bq. Why did we need to narrow the return value from FC.getUnvaluedDocs to FixedBitSet? We have no Bits interface in 3.x. And DocIdSet is not random access. Maybe we should backport the Bits interface? Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109411#comment-13109411 ] Uwe Schindler commented on LUCENE-3390: --- In my opinion a much more clean and simple approach for FieldComaparator and all other stuff would be the following, as it removes all additional branches from FieldComaparator and makes the code as simple as it was before missingValues at all (also in trunk): {quote} Thinking more about it: Another apporoach (also possible for trunk) is to supply the missing value to FieldCache.getXxx(). The FieldCache would the first use Arrays.fill() to populate the FieldCache array with the default value and after that populate the index values. The drawback is that you get a separate FieldCache entry for each distinct missing value. For the above se case, you would have two float/double price caches. {quote} We just have to think about additional memory requirements (which would affect only users actually using different missingValues for several searches). From my perspective this is much cleaner, as you can pass in a missingValue directly when populating the FieldCache. FieldComaparator would simply call FieldCache.DEFAULT.getInts(reader, parser, defaultValue). The cache would use the triplet including defaultValue as key. The sorting code would not need to be changed at all (this is similar to Doron's idea, but moved to FieldCache and not FC.setNextReader). We should think about this in an additional issue and for now only fix the broken implementation in 3.x. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109413#comment-13109413 ] Aaron McCurry commented on LUCENE-2205: --- I have reimplemented the patch using the UTF8SortedAsUTF16Comparator as well as ByteArrayDataInput. The patch also contains a unit test and I have run all the current tests of the core plus the contribs and everything passes. As a plus the code has gotten much simpler. During my functional testing I created a test index with small but very diverse terms. Roughly 50 terms per document with 50 million documents. So there are approximately 2.5 billion terms in this index. The current 3x branch produces: 5000 documents at a heap size of 598902872. The patched version produces: 5000 documents at a heap size of 282526224. The random access performance of this index goes to the patch. Running 200 passes of a collection of randomly sampled queries (queries changes each time) produces the following: The current 3x branch produces: 4186.0225 avg response time in ms The patched version produces: 2930.1371 avg response time in ms NOTE: The hard drive I was using is a very slow drive. While using smaller indexes the patch and the current branch are very close to the same performance. Depending on the pass the either one was faster. Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron McCurry updated LUCENE-2205: -- Attachment: lowmemory_w_utf8_encoding.patch Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3390: -- Attachment: LUCENE-3390-BitsInterface.patch Here a patch with a more clean API (as noted by Mike McCandless): - backported the Bits interface from Lucene trunk (do a svn cp http://svn.apache.org//trunk//Bits.java before applying the patch - Added interface to the well-known impls in util package - FieldCache.getUnValuesDocs returns Bits now which makes the API very clean This breaks backwards a bit more, as Bits does not extend DocIdSet, so code using the new FieldCache method will break, before recompilation was enough (as FixedBitSet extends DocIdSet). Mike: How about this? Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] 2.9.4
On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c Did we agree about abandoning .NET = 3.5? Robert
Prettify JS and CSS exceluded from Javadocs
Hi I noticed that our build does not include the prettify JS and CSS with Javadocs, unless the javadocs are created for the release. For example, if you open any of the *javadocs.jar files (core or contrib), you'll see that the prettify files are missing. Therefore, documentation which relies on it is not displayed nicely (such as contrib-highlight). The invoke-javadoc macro copies the prettify files and adds references to them, but when the javadocs are jar-ed, the files are omitted. At first I thought that this is a bug, but then I noticed how the files are referenced, and the directory structure that is assumed to be created for the javadocs, and thought that this may be intentional? When the release binaries are created, a folder docs/api is created, under which there are sub-folders for 'core' and 'contrib-*'. Also, a sub-folder for prettify. So prettify is assumed to be 'sibling' of any of the javadocs folders, and the reference in the HTML is created as such. However, if we add prettify to any of the .jar, then it won't be a sibling anymore, but a 'child', and the reference should change from ../prettify/* to prettify/*. I think this can be solved easily by referencing two scripts (and perhaps same trick for stylesheet as well) -- only one of them will be found depending on the distribution. I wanted to ask first if the prettify files were omitted from the .jar intentionally or not. Shai
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109434#comment-13109434 ] Michael McCandless commented on LUCENE-3390: Looks great Uwe! I think we can assert that the cardinality is = numDocs, and then short-circuit the common == numDocs (all docs have values) case like you are. I love how 3.x handles the unvalued bits... I think we should port this forward to trunk, but maybe make it possible to set the bits as we build up the values (single pass) if you specify up front you want the bit set. I'll open a new issue for this... Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
I would love to use it. Unfortunately, my project is well underway and under tight deadlines, so we can't afford the disruption of switching to NuGet for Lucene, or any of the other libraries we use. However, once we release, I definitely want to embrace NuGet and would love for Lucene.NET to be available through NuGet. -Original Message- From: Michael Herndon [mailto:mhern...@wickedsoftware.net] Sent: Tuesday, September 20, 2011 11:57 PM To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-...@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns. Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.comhttp://www.cinlegal.com This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayato...@creditinfonet.com and delete all copies of this material from any computer.
Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
I think I'd like to stick with 2 packages Lucene.net Core Lucene.net Contrib Just because I think it's nice and simple. I would say that any contrib parts that get really big or popular either to split out into their own package or maybe added to the core package? I'm also in favour of a nightly package and experimental packages. thanks Dan Swain On Wed, Sep 21, 2011 at 4:56 AM, Michael Herndon mhern...@wickedsoftware.net wrote: We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-...@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns.
[jira] [Created] (LUCENE-3443) Port 3.x getUnvaluedDocs to trunk
Port 3.x getUnvaluedDocs to trunk - Key: LUCENE-3443 URL: https://issues.apache.org/jira/browse/LUCENE-3443 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Michael McCandless Fix For: 3.5, 4.0 [Spinoff from LUCENE-3390] I think the approach in 3.x for handling un-valued docs, and making it possible to specify how such docs are sorted, is better than the solution we have in trunk. I like that FC has a dedicated method to get the Bits for un-valued docs -- easy for apps to directly use. And I like that the un-valued bits have their own entry in the FC. One downside is that it's 2 passes to get values and missing bits, but I think we can fix this by passing optional bool to FC.getXXX methods indicating you want the bits, and the populate the FC entry for the missing bits as well. (We can do that for 3.x and trunk). Then it's single pass. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
FW: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
---BeginMessage--- The granular approach can cause dependency issues as well. FubuMVC is running into this with their granularity had to invent their own build chain for ripples of changes. I would say do two packages Lucene and Contrib and when one of the pieces of Contrib gets awesome enough to warrant it's own package. I look forward to official Lucene.Net packages. On Tue, Sep 20, 2011 at 10:56 PM, Michael Herndon mhern...@wickedsoftware.net wrote: We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-...@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns. ---End Message---
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109454#comment-13109454 ] Doron Cohen commented on LUCENE-3390: - I wrote a small test that should fail with the bug Uwe fixed here and pass with the fix. For some reason it is still failing even with that fix. Tried this with previous patch, will now try with last one, though I think it it should pass also with previous one. I'll give it another try. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109507#comment-13109507 ] Erik Hatcher commented on SOLR-1895: bq. Both fq and SearchComponent would work for early binding, but when we want to extend the model with an (optional) late binding, i.e. filtering search results, fq won't cut it. Not true. There's now PostFilter to enable late binding. This might even be advantageous for this MCF filtering, as the WildcardQuery's could be expensive filters to generate and work best on the most constrained subset matching the rest of the traditional query and filters. bq. A SearchComponent however can be extended not only to handle early+late binding but also any other strange requirements there may be regarding security, such as authentication by IP address, peeking at other parameters A QParserPlugin can see all the parameters a SearchComponent can see [createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req)] bq. ...else I think we'll start seeing a multitude of different ways to integrate security which is not a competitive advantage for Solr If we cannot elaborate those different ways at this point, then building a framework is only asking for it to be changed later. In what scenarios would a security filter want to modify the response? bq. I don't see how to add code to merge/unify two (possibly 3rd party) QParsers, except from creating a new umbrella one. nested queries. bq. We'll keep the core layer generic and thin. AccessTokenSecurityComponent and AccessTokenService (which should perhaps be an Interface instead) I'm not sure that those abstractions are general enough. I still think a qparser is the simplest/cleanest thing that will work here and doesn't preclude or make harder any future needs. All of these other abstractions mentioned here are overkill, IMO, to what MCF needs - all it needs is a handful of aggregated WildcardQuery's. ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time -- Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Labels: document, security, solr Fix For: 3.5, 4.0 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Prettify JS and CSS exceluded from Javadocs
Hi Shai, I think the prettify stuff should be included in the .jar It’s possible that I messed this up in the packaging work I’ve done recently, but if so, it was not intentional. Steve From: Shai Erera [mailto:ser...@gmail.com] Sent: Wednesday, September 21, 2011 8:10 AM To: dev@lucene.apache.org Subject: Prettify JS and CSS exceluded from Javadocs Hi I noticed that our build does not include the prettify JS and CSS with Javadocs, unless the javadocs are created for the release. For example, if you open any of the *javadocs.jar files (core or contrib), you'll see that the prettify files are missing. Therefore, documentation which relies on it is not displayed nicely (such as contrib-highlight). The invoke-javadoc macro copies the prettify files and adds references to them, but when the javadocs are jar-ed, the files are omitted. At first I thought that this is a bug, but then I noticed how the files are referenced, and the directory structure that is assumed to be created for the javadocs, and thought that this may be intentional? When the release binaries are created, a folder docs/api is created, under which there are sub-folders for 'core' and 'contrib-*'. Also, a sub-folder for prettify. So prettify is assumed to be 'sibling' of any of the javadocs folders, and the reference in the HTML is created as such. However, if we add prettify to any of the .jar, then it won't be a sibling anymore, but a 'child', and the reference should change from ../prettify/* to prettify/*. I think this can be solved easily by referencing two scripts (and perhaps same trick for stylesheet as well) -- only one of them will be found depending on the distribution. I wanted to ask first if the prettify files were omitted from the .jar intentionally or not. Shai
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3390: Attachment: LUCENE-3390-BitsInterface.patch Attached patch with a test that fails before this fix (otherwise patch same as previous). The test uses 4 collectors simultaneously, each with different missing values. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
+1 For this -Original Message- From: Dan Swain [mailto:dan.sw...@gmail.com] Sent: 21 September 2011 13:22 To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts I think I'd like to stick with 2 packages Lucene.net Core Lucene.net Contrib Just because I think it's nice and simple. I would say that any contrib parts that get really big or popular either to split out into their own package or maybe added to the core package? I'm also in favour of a nightly package and experimental packages. thanks Dan Swain On Wed, Sep 21, 2011 at 4:56 AM, Michael Herndon mhern...@wickedsoftware.net wrote: We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-...@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns.
[jira] [Created] (LUCENE-3444) Distinct field value count per group
Distinct field value count per group Key: LUCENE-3444 URL: https://issues.apache.org/jira/browse/LUCENE-3444 Project: Lucene - Java Issue Type: New Feature Components: modules/grouping Reporter: Martijn van Groningen Support a second pass collector that counts unique field values of a field per group. This is just one example of group statistics that one might want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3444) Distinct field value count per group
[ https://issues.apache.org/jira/browse/LUCENE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated LUCENE-3444: -- Attachment: LUCENE-3444.patch Attached initial version of a second pass collector that count the unique field values per group for a specific field. Distinct field value count per group Key: LUCENE-3444 URL: https://issues.apache.org/jira/browse/LUCENE-3444 Project: Lucene - Java Issue Type: New Feature Components: modules/grouping Reporter: Martijn van Groningen Attachments: LUCENE-3444.patch Support a second pass collector that counts unique field values of a field per group. This is just one example of group statistics that one might want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3444) Distinct field value count per group
[ https://issues.apache.org/jira/browse/LUCENE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109540#comment-13109540 ] Martijn van Groningen edited comment on LUCENE-3444 at 9/21/11 2:45 PM: Attached initial version of a second pass collector that counts the unique field values per group for a specific field. was (Author: martijn.v.groningen): Attached initial version of a second pass collector that count the unique field values per group for a specific field. Distinct field value count per group Key: LUCENE-3444 URL: https://issues.apache.org/jira/browse/LUCENE-3444 Project: Lucene - Java Issue Type: New Feature Components: modules/grouping Reporter: Martijn van Groningen Attachments: LUCENE-3444.patch Support a second pass collector that counts unique field values of a field per group. This is just one example of group statistics that one might want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2780) Facet count problem : Multi-Select Faceting After grouping results
[ https://issues.apache.org/jira/browse/SOLR-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109545#comment-13109545 ] Ramzi Alqrainy commented on SOLR-2780: -- Hi Groningen, I have used your patch and I made FunctionAllGroupHeadsCollector public and when I execute this command ant dist to build , the below errors are displayed [javac] 77 errors [javac] 100 warnings Please advise Kindly note that I am using fedora 15 and solr 4.0 that released 13-09 Facet count problem : Multi-Select Faceting After grouping results --- Key: SOLR-2780 URL: https://issues.apache.org/jira/browse/SOLR-2780 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.3, 3.4, 4.0 Reporter: Ramzi Alqrainy Priority: Critical Fix For: 3.5, 4.0 Attachments: SOLR-2780.patch Dear All , Kindly note that I am using Solr 4.0 and Kindly note that group.truncate=true calculates facet counts that based on the most relevant document of each group matching the query. But when I used Multi-Select Faceting [Tagging and excluding Filters] , the solr can't calculate the facet after grouping the results and select multi facet. http://127.0.0.1:8983/solr/select/?facet=truesort=score+desc,+rate+desc,total_of_reviews+descfacet.limit=-1bf=sum%28product%28atan%28total_of_reviews%29,50%29,product%28rate,10%29%29^4group.field=place_idfacet.field={!ex%3Dce}cat_enfacet.field={!ex%3Dce}cat_arfacet.field={!ex%3Dir}iregionfacet.field={!ex%3Dir}region_enfacet.field={!ex%3Dir}region_arfacet.field={!ex%3Drr}rratefacet.field=place_statusfacet.field=theme_enfacet.field=icityfacet.field={!ex%3Dce}icatfacet.field={!ex%3Dsce}isubcatfacet.field={!ex%3Dsce}subcat_enfacet.field={!ex%3Dsce}subcat_arqt=/spellfq=place_status:1fq=icity:1fq=cat_en:%28%22Restaurants%22%29group.format=simplegroup.ngroups=truefacet.mincount=1qf=title_ar^24+title_en^24+cat_ar^10+cat_en^10++review^20hl.fl=reviewjson.nl=mapwt=jsondefType=edismaxrows=10spellcheck.accuracy=0.6start=0q=smartgroup.truncate=truegroup=trueindent=on -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3390: -- Attachment: LUCENE-3390-BitsInterface.patch I added a further test in TestFieldCache to check the Bits returned. I think that's ready to commit. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Doron Cohen Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-3390: - Assignee: Uwe Schindler (was: Doron Cohen) Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109553#comment-13109553 ] Michael McCandless commented on LUCENE-2215: For 3.x can we just add these methods to IndexSearcher (not Searcher/Searchable)? This would require the app to use IndexSearcher if they are not already, which is great because that's what they'll need to do in 4.0 anyway (since Searcher/Searchable are deprecated). Or is there some other back compat issue? paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, LUCENE-2215.patch, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109556#comment-13109556 ] Michael McCandless commented on LUCENE-3390: bq. I think that's ready to commit. +1, looks great! Thanks Uwe. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-3390. --- Resolution: Fixed Fix Version/s: 3.5 Committed 3.x branch revision: 1173701 Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.5, 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
No interest in Nuget whatsoever. - Neal -Original Message- From: Michael Herndon [mailto:mhern...@wickedsoftware.net] Sent: Tuesday, September 20, 2011 10:57 PM To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts We're taking a quick poll over the next few days to see how people would like use Lucene.Net through Nuget on the developers mailing list** Currently version 2.9.2 is hosted on nuget.org, but that package was not create by the project maintainers, thus nuget is not currently set up in source. Going forward, we would like to continue what someone else started by creating nuget packages for Lucene.Net. Right now there are two packages: Lucene Lucene.Contrib. My question to the community is do you wish to finer grain packages, i.e. a package for each contrib project or continue to keep it simple. The granular approach will let you use only what you need. We can also create additional higher level packages which have dependencies on the other ones. Possibly a Lucene.Net-Essentials and Lucene.Net-Full. Or we can keep it simple and continue with only two packages. My concerns are that the granular approach might overwhelm people with choice. The simple choice might be considered bloat for importing and then installing assemblies that you might never use. Another topic to converse about is would you like to see an out-of-band project nuget feed for nightly builds, branches with new or experimental features, or stable code snapshots for a projected release? ** when you post, please respond to lucene-net-...@lucene.apache.org. This was posted to both lists to make sure everyone subscribed to both lists has a chance to voice their use cases or concerns.
[jira] [Commented] (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109571#comment-13109571 ] Robert Muir commented on LUCENE-2215: - bq. Or is there some other back compat issue? We add this param to a protected method signature, so it would affect subclasses of IndexSearcher. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, LUCENE-2215.patch, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109575#comment-13109575 ] Michael McCandless commented on LUCENE-2215: bq. We add this param to a protected method signature, so it would affect subclasses of IndexSearcher. Ahh, right. Well, I think we can make an exception here -- subclassing IS is very expert. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, LUCENE-2215.patch, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] 2.9.4
@Robert, I believe the overwhelming consensus on the mailing list vote was to move to .NET 4.0 and drop support for previous versions. I'll take care of build scripts issue while they being refactored into smaller chunks this week. @Troy, Agreed. On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote: On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/**lucene.net/commit/** c5218bca56c19b3407648224781eec**7316994a39https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39 https://github.com/robert-j/**lucene.net/commit/** 50bad187655d59968d51d472b57c2a**40e201d663https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/**lucene.net/commit/** 23ea6f52362fc7dbce48fd012cea12**9a7350c73chttps://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c Did we agree about abandoning .NET = 3.5? Robert
[jira] [Commented] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader
[ https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109584#comment-13109584 ] Mihai Caraman commented on LUCENE-3441: --- Newb question: Shouldn't you also commit in the constructor, so you can create a reader right after? For exmaple, to work with the taxReader with refresh(), I have to initialize: taxWriter,commit,taxReader, else it throws no segment exception(which you'd expect to be there because of the taxWriter ctor, or is that just me:P ?). Add NRT support to LuceneTaxonomyReader --- Key: LUCENE-3441 URL: https://issues.apache.org/jira/browse/LUCENE-3441 Project: Lucene - Java Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Priority: Minor Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to LuceneTaxonomyWriter, you cannot have the reader updated, like IndexReader/Writer. In order to do that we need to do the following: # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with LuceneTaxonomyWriter. # Add API to LuceneTaxonomyWriter to expose its internal IndexReader # Change LTR.refresh() to return an LTR, rather than void. This is actually not strictly related to that issue, but since we'll need to modify refresh() impl, I think it'll be good to change its API as well. Since all of facet API is @lucene.experimental, no backwards issues here (and the sooner we do it, the better). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader
[ https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109584#comment-13109584 ] Mihai Caraman edited comment on LUCENE-3441 at 9/21/11 3:45 PM: Newb question: Shouldn't you also commit in the constructor, so you can create a reader right after? For example, to work with the taxReader with refresh(), I have to initialize: w= LuceneTaxonomyWriter(x), w.commit(), new LuceneTaxonomyReader(x), else it throws no segment exception(segments which you'd expect to be there because of the taxWriter ctor, or is that just me:P ?). was (Author: mihai caraman): Newb question: Shouldn't you also commit in the constructor, so you can create a reader right after? For exmaple, to work with the taxReader with refresh(), I have to initialize: taxWriter,commit,taxReader, else it throws no segment exception(which you'd expect to be there because of the taxWriter ctor, or is that just me:P ?). Add NRT support to LuceneTaxonomyReader --- Key: LUCENE-3441 URL: https://issues.apache.org/jira/browse/LUCENE-3441 Project: Lucene - Java Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Priority: Minor Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to LuceneTaxonomyWriter, you cannot have the reader updated, like IndexReader/Writer. In order to do that we need to do the following: # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with LuceneTaxonomyWriter. # Add API to LuceneTaxonomyWriter to expose its internal IndexReader # Change LTR.refresh() to return an LTR, rather than void. This is actually not strictly related to that issue, but since we'll need to modify refresh() impl, I think it'll be good to change its API as well. Since all of facet API is @lucene.experimental, no backwards issues here (and the sooner we do it, the better). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader
[ https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109584#comment-13109584 ] Mihai Caraman edited comment on LUCENE-3441 at 9/21/11 3:46 PM: Newb question: Shouldn't you also commit in the constructor, so you can create a reader right after? For example, to later work with the taxReader through refresh(), when i start clean, I have to initialize: w= LuceneTaxonomyWriter(...), w.commit(), new LuceneTaxonomyReader(...), else it throws no segment exception(segments which you'd expect to be there because of the taxWriter ctor, or is that just me:P ?). was (Author: mihai caraman): Newb question: Shouldn't you also commit in the constructor, so you can create a reader right after? For example, to work with the taxReader with refresh(), I have to initialize: w= LuceneTaxonomyWriter(...), w.commit(), new LuceneTaxonomyReader(...), else it throws no segment exception(segments which you'd expect to be there because of the taxWriter ctor, or is that just me:P ?). Add NRT support to LuceneTaxonomyReader --- Key: LUCENE-3441 URL: https://issues.apache.org/jira/browse/LUCENE-3441 Project: Lucene - Java Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Priority: Minor Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to LuceneTaxonomyWriter, you cannot have the reader updated, like IndexReader/Writer. In order to do that we need to do the following: # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with LuceneTaxonomyWriter. # Add API to LuceneTaxonomyWriter to expose its internal IndexReader # Change LTR.refresh() to return an LTR, rather than void. This is actually not strictly related to that issue, but since we'll need to modify refresh() impl, I think it'll be good to change its API as well. Since all of facet API is @lucene.experimental, no backwards issues here (and the sooner we do it, the better). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader
[ https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109584#comment-13109584 ] Mihai Caraman edited comment on LUCENE-3441 at 9/21/11 3:45 PM: Newb question: Shouldn't you also commit in the constructor, so you can create a reader right after? For example, to work with the taxReader with refresh(), I have to initialize: w= LuceneTaxonomyWriter(...), w.commit(), new LuceneTaxonomyReader(...), else it throws no segment exception(segments which you'd expect to be there because of the taxWriter ctor, or is that just me:P ?). was (Author: mihai caraman): Newb question: Shouldn't you also commit in the constructor, so you can create a reader right after? For example, to work with the taxReader with refresh(), I have to initialize: w= LuceneTaxonomyWriter(x), w.commit(), new LuceneTaxonomyReader(x), else it throws no segment exception(segments which you'd expect to be there because of the taxWriter ctor, or is that just me:P ?). Add NRT support to LuceneTaxonomyReader --- Key: LUCENE-3441 URL: https://issues.apache.org/jira/browse/LUCENE-3441 Project: Lucene - Java Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Priority: Minor Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to LuceneTaxonomyWriter, you cannot have the reader updated, like IndexReader/Writer. In order to do that we need to do the following: # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with LuceneTaxonomyWriter. # Add API to LuceneTaxonomyWriter to expose its internal IndexReader # Change LTR.refresh() to return an LTR, rather than void. This is actually not strictly related to that issue, but since we'll need to modify refresh() impl, I think it'll be good to change its API as well. Since all of facet API is @lucene.experimental, no backwards issues here (and the sooner we do it, the better). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader
[ https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109599#comment-13109599 ] Jason Rutherglen commented on LUCENE-3441: -- It would be great if the cost of (re)opening a new LTR is. Also an explanation of what it's doing underneath. Add NRT support to LuceneTaxonomyReader --- Key: LUCENE-3441 URL: https://issues.apache.org/jira/browse/LUCENE-3441 Project: Lucene - Java Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Priority: Minor Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to LuceneTaxonomyWriter, you cannot have the reader updated, like IndexReader/Writer. In order to do that we need to do the following: # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with LuceneTaxonomyWriter. # Add API to LuceneTaxonomyWriter to expose its internal IndexReader # Change LTR.refresh() to return an LTR, rather than void. This is actually not strictly related to that issue, but since we'll need to modify refresh() impl, I think it'll be good to change its API as well. Since all of facet API is @lucene.experimental, no backwards issues here (and the sooner we do it, the better). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2739) TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some systems
[ https://issues.apache.org/jira/browse/SOLR-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109602#comment-13109602 ] Shawn Heisey commented on SOLR-2739: If I do have the right idea, then the rest of this paragraph applies, otherwise not: I have to wonder why the current test is passing for everyone but me. It seems as though it should be failing for everyone. I added a couple more lines, so now it tries a delta import, checks for numFound=0, then runs a full import and checks for numFound=1. Contrary to what I expected, the second part failed. TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some systems --- Key: SOLR-2739 URL: https://issues.apache.org/jira/browse/SOLR-2739 Project: Solr Issue Type: Bug Affects Versions: 3.3 Reporter: Shawn Heisey Assignee: Hoss Man Fix For: 3.5, 4.0 Shawn Heisey noted on the mailing list that he was getting consistent failures from TestSqlEntityProcessorDelta.testNonWritablePersistFile on his machine. I can't reproduce his exact failures, but the test is hinky enough that i want to try and clean it up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-3390: --- When discussing about the forward port with Mike McCandless on IRC, we thought the double inversion is useless (it was in Doron's patch, because he wanted to use DocIdSetIterator effectively). We changed the name to FieldCache.getDocsWithField(). Patch is easy. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4, 3.5 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3390: -- Attachment: LUCENE-3390-inverted.patch Patch with the BitSet inverted. We break backwards compatibility so this is not an issue at all. Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4, 3.5 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3390: -- Attachment: LUCENE-3390-inverted.patch Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4, 3.5 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3390: -- Attachment: (was: LUCENE-3390-inverted.patch) Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4, 3.5 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109611#comment-13109611 ] Michael McCandless commented on LUCENE-3390: Looks great! Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.4, 3.5 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3439) add checks/asserts if you search across a closed reader
[ https://issues.apache.org/jira/browse/LUCENE-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3439. Resolution: Fixed Fix Version/s: 4.0 3.5 add checks/asserts if you search across a closed reader --- Key: LUCENE-3439 URL: https://issues.apache.org/jira/browse/LUCENE-3439 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Michael McCandless Fix For: 3.5, 4.0 Attachments: LUCENE-3439.patch, LUCENE-3439_test.patch if you try to search across a closed reader (and/or searcher too), there are no checks, not even assertions statements. this results in crazy scary stacktraces deep inside places like FSTs/various term dictionary implementations etc. In some situations, depending on codec, you wont even get an error (i'm sure its fun when you try to retrieve the stored fields!) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3443) Port 3.x FieldCache.getDocsWithField() to trunk
[ https://issues.apache.org/jira/browse/LUCENE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3443: -- Description: [Spinoff from LUCENE-3390] I think the approach in 3.x for handling un-valued docs, and making it possible to specify how such docs are sorted, is better than the solution we have in trunk. I like that FC has a dedicated method to get the Bits for docs with field -- easy for apps to directly use. And I like that the bits have their own entry in the FC. One downside is that it's 2 passes to get values and valid bits, but I think we can fix this by passing optional bool to FC.getXXX methods indicating you want the bits, and the populate the FC entry for the missing bits as well. (We can do that for 3.x and trunk). Then it's single pass. was: [Spinoff from LUCENE-3390] I think the approach in 3.x for handling un-valued docs, and making it possible to specify how such docs are sorted, is better than the solution we have in trunk. I like that FC has a dedicated method to get the Bits for un-valued docs -- easy for apps to directly use. And I like that the un-valued bits have their own entry in the FC. One downside is that it's 2 passes to get values and missing bits, but I think we can fix this by passing optional bool to FC.getXXX methods indicating you want the bits, and the populate the FC entry for the missing bits as well. (We can do that for 3.x and trunk). Then it's single pass. Summary: Port 3.x FieldCache.getDocsWithField() to trunk (was: Port 3.x getUnvaluedDocs to trunk) Port 3.x FieldCache.getDocsWithField() to trunk --- Key: LUCENE-3443 URL: https://issues.apache.org/jira/browse/LUCENE-3443 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Michael McCandless Fix For: 3.5, 4.0 [Spinoff from LUCENE-3390] I think the approach in 3.x for handling un-valued docs, and making it possible to specify how such docs are sorted, is better than the solution we have in trunk. I like that FC has a dedicated method to get the Bits for docs with field -- easy for apps to directly use. And I like that the bits have their own entry in the FC. One downside is that it's 2 passes to get values and valid bits, but I think we can fix this by passing optional bool to FC.getXXX methods indicating you want the bits, and the populate the FC entry for the missing bits as well. (We can do that for 3.x and trunk). Then it's single pass. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-3390. --- Resolution: Fixed Committed 3.x branch revision: 1173745 Incorrect sort by Numeric values for documents missing the sorting field Key: LUCENE-3390 URL: https://issues.apache.org/jira/browse/LUCENE-3390 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3 Reporter: Gilad Barkai Assignee: Uwe Schindler Priority: Minor Labels: double, float, int, long, numeric, sort Fix For: 3.5, 3.4 Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int Long numeric fields ascending and descending order). This behavior is unexpected, as zero is comparable to the rest of the values. A better solution would either be allowing the user to define such a non-value default, or always bring those document results as the last ones. Example scenario: Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value. Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1. Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109630#comment-13109630 ] Aaron McCurry commented on LUCENE-2205: --- I would agree on the heap size, I'm will do more analysis on that tonight. As far the speed, it took a bit of time to get the performance basically the same. I had to change a few methods inside TermInfosReader to reuse resources. The random access test sampled 100,000 terms from the index and stored it in a file. Then at when I run the test it pulls all of the terms into memory and random selects terms to use in TermQueries. Then the test times the search in nanotime and averages it. I will attach my test programs tonight if you want. While running a MMAPDirectory on a small ~1,000,000 documents the performance is basically the same between the patch and no patch, if there is a difference the current implementation (no patch) is slightly faster, as you would think. Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2754: -- Attachment: SOLR-2754.patch i added tests for the new factories: i think its ready to commit. create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch, SOLR-2754.patch, SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] 2.9.4
I thought this was after 2.9.4 Sent from my Windows Phone -Original Message- From: Michael Herndon Sent: Wednesday, September 21, 2011 8:30 AM To: lucene-net-...@lucene.apache.org Cc: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 @Robert, I believe the overwhelming consensus on the mailing list vote was to move to .NET 4.0 and drop support for previous versions. I'll take care of build scripts issue while they being refactored into smaller chunks this week. @Troy, Agreed. On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote: On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/**lucene.net/commit/** c5218bca56c19b3407648224781eec**7316994a39https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39 https://github.com/robert-j/**lucene.net/commit/** 50bad187655d59968d51d472b57c2a**40e201d663https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/**lucene.net/commit/** 23ea6f52362fc7dbce48fd012cea12**9a7350c73chttps://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c Did we agree about abandoning .NET = 3.5? Robert
Re: [Lucene.Net] 2.9.4
if thats the case, then well need conditional statements for including ThreadLocalT On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser geobmx...@hotmail.comwrote: I thought this was after 2.9.4 Sent from my Windows Phone -Original Message- From: Michael Herndon Sent: Wednesday, September 21, 2011 8:30 AM To: lucene-net-...@lucene.apache.org Cc: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] 2.9.4 @Robert, I believe the overwhelming consensus on the mailing list vote was to move to .NET 4.0 and drop support for previous versions. I'll take care of build scripts issue while they being refactored into smaller chunks this week. @Troy, Agreed. On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote: On 20.09.2011 23:48, Prescott Nasser wrote: Hey all seems like we are set with 2.9.4? Feedback has been positive and its been quiet. Do we feel ready to vote for a new release? I don't know if the build infrastructure is part of the release. If yes, then there is an open issue: Contrib doesn't build right now because there are some assembly name mismatches between certain *.csproj files and build/scripts/contrib.targets. The following patches should fix the issue: https://github.com/robert-j/**lucene.net/commit/** c5218bca56c19b3407648224781eec**7316994a39 https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39 https://github.com/robert-j/**lucene.net/commit/** 50bad187655d59968d51d472b57c2a**40e201d663 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a .NET 4.0-only assembly: https://github.com/apache/**lucene.net/commit/** 23ea6f52362fc7dbce48fd012cea12**9a7350c73c https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c Did we agree about abandoning .NET = 3.5? Robert
[jira] [Created] (LUCENE-3445) Add SearcherManager, to manage IndexSearcher usage across threads and reopens
Add SearcherManager, to manage IndexSearcher usage across threads and reopens - Key: LUCENE-3445 URL: https://issues.apache.org/jira/browse/LUCENE-3445 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.5, 4.0 This is a simple helper class I wrote for Lucene in Action 2nd ed. I'd like to commit under Lucene (contrib/misc). It simplifies using reopening an IndexSearcher across multiple threads, by using IndexReader's ref counts to know when it's safe to close the reader. In the process I also factored out a test base class for tests that want to make lots of simultaneous indexing and searching threads, and fixed TestNRTThreads (core), TestNRTManager (contrib/misc) and the new TestSearcherManager (contrib/misc) to use this base class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3445) Add SearcherManager, to manage IndexSearcher usage across threads and reopens
[ https://issues.apache.org/jira/browse/LUCENE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3445: --- Attachment: LUCENE-3445.patch Add SearcherManager, to manage IndexSearcher usage across threads and reopens - Key: LUCENE-3445 URL: https://issues.apache.org/jira/browse/LUCENE-3445 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.5, 4.0 Attachments: LUCENE-3445.patch This is a simple helper class I wrote for Lucene in Action 2nd ed. I'd like to commit under Lucene (contrib/misc). It simplifies using reopening an IndexSearcher across multiple threads, by using IndexReader's ref counts to know when it's safe to close the reader. In the process I also factored out a test base class for tests that want to make lots of simultaneous indexing and searching threads, and fixed TestNRTThreads (core), TestNRTManager (contrib/misc) and the new TestSearcherManager (contrib/misc) to use this base class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] S.L. updated LUCENE-3440: - Attachment: (was: LUCENE-3440.patch) FastVectorHighlighter: IDF-weighted terms for ordered fragments Key: LUCENE-3440 URL: https://issues.apache.org/jira/browse/LUCENE-3440 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.5 Reporter: S.L. Priority: Minor Labels: patch Fix For: 3.5 Attachments: LUCENE-3440-1.patch The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains *all* of the terms used in the original query. This patch provides ordered fragments with IDF-weighted terms: total weight = total weight + IDF for unique term per fragment * boost of query; The ranking-formular should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer. The patch is simple, but it works for us. Some ideas: - A better approach would be moving the whole fragments-scoring into a separate class. - Switch scoring via parameter - Exact phrases should be given a even better score, regardless if a phrase-query was executed or not - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] S.L. updated LUCENE-3440: - Description: The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains *all* of the terms used in the original query. This patch provides ordered fragments with IDF-weighted terms: total weight = total weight + IDF for unique term per fragment * boost of query; The ranking-formula should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer. The patch is simple, but it works for us. Some ideas: - A better approach would be moving the whole fragments-scoring into a separate class. - Switch scoring via parameter - Exact phrases should be given a even better score, regardless if a phrase-query was executed or not - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher was: The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains *all* of the terms used in the original query. This patch provides ordered fragments with IDF-weighted terms: total weight = total weight + IDF for unique term per fragment * boost of query; The ranking-formular should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer. The patch is simple, but it works for us. Some ideas: - A better approach would be moving the whole fragments-scoring into a separate class. - Switch scoring via parameter - Exact phrases should be given a even better score, regardless if a phrase-query was executed or not - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher FastVectorHighlighter: IDF-weighted terms for ordered fragments Key: LUCENE-3440 URL: https://issues.apache.org/jira/browse/LUCENE-3440 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.5 Reporter: S.L. Priority: Minor Labels: patch Fix For: 3.5 Attachments: LUCENE-3440-1.patch The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains *all* of the terms used in the original query. This patch provides ordered fragments with IDF-weighted terms: total weight = total weight + IDF for unique term per fragment * boost of query; The ranking-formula should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer. The patch is simple, but it works for us. Some ideas: - A better approach would be moving the whole fragments-scoring into a separate class. - Switch scoring via parameter - Exact phrases should be given a even better score, regardless if a phrase-query was executed or not - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader
[ https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109735#comment-13109735 ] Shai Erera commented on LUCENE-3441: bq. Shouldn't you also commit in the constructor LuceneTaxonomyWriter behaves just like IndexWriter. Today (I think since 3.1), opening an IndexWriter is just another transaction that you should commit if you want IndexReaders to see it. So if you try: {code} IndexWriter w = new IndexWriter(emptyDir, new IWC()); IndexReader r = IndexReader.open(emptyDir); {code} you'll get an exception as well. If you want that to work, you must insert a commit() call after line #1, and LTW follows this logic. bq. Also an explanation of what it's doing underneath Refreshing LTR means reopening its internal IndexReader instance. If it has changed, then LTR updates its parents array with the newly added categories. Usually, assuming the taxonomy does not grow a lot (i.e., usually after some point your taxonomy is relatively fixed, and new categories are not added often -- much like an index lexicon), this additional update of the parents array is quick. Add NRT support to LuceneTaxonomyReader --- Key: LUCENE-3441 URL: https://issues.apache.org/jira/browse/LUCENE-3441 Project: Lucene - Java Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Priority: Minor Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to LuceneTaxonomyWriter, you cannot have the reader updated, like IndexReader/Writer. In order to do that we need to do the following: # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with LuceneTaxonomyWriter. # Add API to LuceneTaxonomyWriter to expose its internal IndexReader # Change LTR.refresh() to return an LTR, rather than void. This is actually not strictly related to that issue, but since we'll need to modify refresh() impl, I think it'll be good to change its API as well. Since all of facet API is @lucene.experimental, no backwards issues here (and the sooner we do it, the better). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2754. --- Resolution: Fixed Thanks David! create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch, SOLR-2754.patch, SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3445) Add SearcherManager, to manage IndexSearcher usage across threads and reopens
[ https://issues.apache.org/jira/browse/LUCENE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109768#comment-13109768 ] Shai Erera commented on LUCENE-3445: This is great Mike ! I reviewed SearcherManager and have a comment about the TODO on whether or not to call warm in the ctor. If an extending class relies on some internal members to be initialized before warm() can safely be called, then this will lead to exceptions. I think that warm() should not be called in the ctor, or at least add a ctor which accepts a boolean doWarm, while the other ctors call it with 'true'. Calling warm() in the ctor is useful if one wants to warm the IndexSearcher instance before SearcherManager is ready for use. So perhaps an additional ctor with the boolean gives the most flexibility. Also, I remember there was a ctor which took IndexWriter, to allow for an NRT-SearcherManager. What happened to it? :) Add SearcherManager, to manage IndexSearcher usage across threads and reopens - Key: LUCENE-3445 URL: https://issues.apache.org/jira/browse/LUCENE-3445 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.5, 4.0 Attachments: LUCENE-3445.patch This is a simple helper class I wrote for Lucene in Action 2nd ed. I'd like to commit under Lucene (contrib/misc). It simplifies using reopening an IndexSearcher across multiple threads, by using IndexReader's ref counts to know when it's safe to close the reader. In the process I also factored out a test base class for tests that want to make lots of simultaneous indexing and searching threads, and fixed TestNRTThreads (core), TestNRTManager (contrib/misc) and the new TestSearcherManager (contrib/misc) to use this base class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109780#comment-13109780 ] Aaron McCurry commented on LUCENE-2205: --- I found a major bug in my test. I was using keyword analyzer instead of whitespace or standard, thus it was turning everyone of my sentences that contained 100 randomly generated words into 1 huge token. This helps to explain why the heap space results are not that stellar, because the fewer terms there are (as well as the larger they are), the less the patch helps reduce space. I'm retesting now. Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109787#comment-13109787 ] Michael McCandless commented on LUCENE-2205: Patch looks great Aaron! Very much simplified... some comments: * Instead of separate build method, could we have TermInfosReaderIndex's ctor take all the args? Then we can make its private fields final? * I think the index and indexLength can be final, in TermInfosReader? * Can you put the GrowableByteArrayDataOutput as a separate source file in oal.store? Seems useful! * Hmm should indexToTermsArray be a long[]...? I wonder how large your index would have to be to overflow 2.1GB of the byte[] format... * We could further reduce the RAM usage by using packed ints (oal.util.packed) for the indexToTerms array; this way each indexed term would only use as many bits are actually required to address the byte[] (and, this would solve the int[]/long[] problem since packed ints are logically a long[]). * I think we should just always trim? (Ie we don't need the {{private boolean trim}}) * Could you add comment Just for testing to TermInfosReaderIndex.getTerm? * For the compareTo methods, can you add to the jdocs that this compares term to index term, ie it returns negative N when term is less than index term? * Hmm... I wonder if memory fragmentation will cause problems for the allocating/growing the single byte[]. Also, a single byte[] can only address 2.1B bytes (the same overflow problem as above). Maybe we should port back PagedBytes (from trunk oal.util) and use that instead? If we did that, then we could create a simple DataInput impl that reads from that. * Could you please remove the @author tags? Thanks. It's Apache's policy (or at least discouraged) to not commit author tags... Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109780#comment-13109780 ] Aaron McCurry edited comment on LUCENE-2205 at 9/21/11 7:09 PM: I found a major bug in my test. I was using keyword analyzer instead of whitespace or standard, thus it was turning everyone of my sentences that contained 50 randomly generated words into 1 huge token. This helps to explain why the heap space results are not that stellar, because the fewer terms there are (as well as the larger they are), the less the patch helps reduce space. I'm retesting now. was (Author: amccurry): I found a major bug in my test. I was using keyword analyzer instead of whitespace or standard, thus it was turning everyone of my sentences that contained 100 randomly generated words into 1 huge token. This helps to explain why the heap space results are not that stellar, because the fewer terms there are (as well as the larger they are), the less the patch helps reduce space. I'm retesting now. Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109788#comment-13109788 ] Michael McCandless commented on LUCENE-2205: bq. I was using keyword analyzer instead of whitespace or standard Aha! Good catch :) I'm also building up a 2B terms index (using Test2BTerms), and then I'll compare patch/3.x on that index. Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2780) Facet count problem : Multi-Select Faceting After grouping results
[ https://issues.apache.org/jira/browse/SOLR-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109826#comment-13109826 ] Martijn van Groningen commented on SOLR-2780: - Hi Ramzi, So you have 77 errors :) Can you send me what errors you have? BTW if you just want to use the patch you can just apply it and build Solr (ant clean dist). The patch should work when using group.field parameter. Facet count problem : Multi-Select Faceting After grouping results --- Key: SOLR-2780 URL: https://issues.apache.org/jira/browse/SOLR-2780 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.3, 3.4, 4.0 Reporter: Ramzi Alqrainy Priority: Critical Fix For: 3.5, 4.0 Attachments: SOLR-2780.patch Dear All , Kindly note that I am using Solr 4.0 and Kindly note that group.truncate=true calculates facet counts that based on the most relevant document of each group matching the query. But when I used Multi-Select Faceting [Tagging and excluding Filters] , the solr can't calculate the facet after grouping the results and select multi facet. http://127.0.0.1:8983/solr/select/?facet=truesort=score+desc,+rate+desc,total_of_reviews+descfacet.limit=-1bf=sum%28product%28atan%28total_of_reviews%29,50%29,product%28rate,10%29%29^4group.field=place_idfacet.field={!ex%3Dce}cat_enfacet.field={!ex%3Dce}cat_arfacet.field={!ex%3Dir}iregionfacet.field={!ex%3Dir}region_enfacet.field={!ex%3Dir}region_arfacet.field={!ex%3Drr}rratefacet.field=place_statusfacet.field=theme_enfacet.field=icityfacet.field={!ex%3Dce}icatfacet.field={!ex%3Dsce}isubcatfacet.field={!ex%3Dsce}subcat_enfacet.field={!ex%3Dsce}subcat_arqt=/spellfq=place_status:1fq=icity:1fq=cat_en:%28%22Restaurants%22%29group.format=simplegroup.ngroups=truefacet.mincount=1qf=title_ar^24+title_en^24+cat_ar^10+cat_en^10++review^20hl.fl=reviewjson.nl=mapwt=jsondefType=edismaxrows=10spellcheck.accuracy=0.6start=0q=smartgroup.truncate=truegroup=trueindent=on -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 483 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/483/ 1 tests failed. REGRESSION: org.apache.solr.TestDistributedSearch.testDistribSearch Error Message: java.lang.AssertionError: Some threads threw uncaught exceptions! Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:729) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:89) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:174) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:757) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:701) Build Log (for compile errors): [...truncated 11020 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org