Re: [Lucene.Net] 2.9.4

2011-09-21 Thread Troy Howard
I thought it was:

2.9.2 and before are 2.0 compatible
2.9.4 and before are 3.5 compatible
After 2.9.4 are 4.0 compatible

Thanks,
Troy

On Wed, Sep 21, 2011 at 10:15 AM, Michael Herndon
mhern...@wickedsoftware.net wrote:
 if thats the case, then well need conditional statements for including
 ThreadLocalT

 On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser 
 geobmx...@hotmail.comwrote:

 I thought this was after 2.9.4

 Sent from my Windows Phone

 -Original Message-
 From: Michael Herndon
 Sent: Wednesday, September 21, 2011 8:30 AM
 To: lucene-net-dev@lucene.apache.org
 Cc: lucene-net-...@incubator.apache.org
 Subject: Re: [Lucene.Net] 2.9.4

 @Robert,

 I believe the overwhelming consensus on the mailing list vote was to move
 to
 .NET 4.0 and drop support for previous versions.

 I'll take care of build scripts issue while they being refactored into
 smaller chunks this week.

 @Troy, Agreed.

 On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote:

  On 20.09.2011 23:48, Prescott Nasser wrote:
 
  Hey all seems like we are set with 2.9.4? Feedback has been positive and
  its been quiet. Do we feel ready to vote for a new release?
 
 
  I don't know if the build infrastructure is part of the
  release. If yes, then there is an open issue:
 
  Contrib doesn't build right now because there
  are some assembly name mismatches between certain *.csproj
  files and  build/scripts/contrib.targets.
 
  The following patches should fix the issue:
 
  https://github.com/robert-j/**lucene.net/commit/**
  c5218bca56c19b3407648224781eec**7316994a39
 https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39
 
 
  https://github.com/robert-j/**lucene.net/commit/**
  50bad187655d59968d51d472b57c2a**40e201d663
 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663
 
 
 
  Also, the fix for [LUCENENET-358] is basically making
  Lucene.Net.dll a .NET 4.0-only assembly:
 
  https://github.com/apache/**lucene.net/commit/**
  23ea6f52362fc7dbce48fd012cea12**9a7350c73c
 https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c
 
 
  Did we agree about abandoning .NET = 3.5?
 
  Robert
 
 




Re: [Lucene.Net] 2.9.4

2011-09-21 Thread Michael Herndon
@all,

I updated the build scripts to increase it's granularity.
https://cwiki.apache.org/LUCENENET/build-system-scripts.html

Similarity was include, though are there any tests for this project ?

Some of the contrib tests are failing, I saw a few in Contrib.Highlighter
just glancing at the output .

I recieved some feedback Eric Woodruff. It looks like SHFB  Sandcastle
generate a plain file html, its been staring me in the face this whole time.
 I'll need to build in some targets that extract whats needed to push to
site branch. Then I'll start working on nuget.

@Prescott,
Can the volatile fields be wrapped in a lock statement and code that access
those fields with replaced with call to a property /method that wraps access
to that field?




On Wed, Sep 21, 2011 at 1:36 PM, Troy Howard thowar...@gmail.com wrote:

 I thought it was:

 2.9.2 and before are 2.0 compatible
 2.9.4 and before are 3.5 compatible
 After 2.9.4 are 4.0 compatible

 Thanks,
 Troy

 On Wed, Sep 21, 2011 at 10:15 AM, Michael Herndon
 mhern...@wickedsoftware.net wrote:
  if thats the case, then well need conditional statements for including
  ThreadLocalT
 
  On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser geobmx...@hotmail.com
 wrote:
 
  I thought this was after 2.9.4
 
  Sent from my Windows Phone
 
  -Original Message-
  From: Michael Herndon
  Sent: Wednesday, September 21, 2011 8:30 AM
  To: lucene-net-dev@lucene.apache.org
  Cc: lucene-net-...@incubator.apache.org
  Subject: Re: [Lucene.Net] 2.9.4
 
  @Robert,
 
  I believe the overwhelming consensus on the mailing list vote was to
 move
  to
  .NET 4.0 and drop support for previous versions.
 
  I'll take care of build scripts issue while they being refactored into
  smaller chunks this week.
 
  @Troy, Agreed.
 
  On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote:
 
   On 20.09.2011 23:48, Prescott Nasser wrote:
  
   Hey all seems like we are set with 2.9.4? Feedback has been positive
 and
   its been quiet. Do we feel ready to vote for a new release?
  
  
   I don't know if the build infrastructure is part of the
   release. If yes, then there is an open issue:
  
   Contrib doesn't build right now because there
   are some assembly name mismatches between certain *.csproj
   files and  build/scripts/contrib.targets.
  
   The following patches should fix the issue:
  
   https://github.com/robert-j/**lucene.net/commit/**
   c5218bca56c19b3407648224781eec**7316994a39
 
 https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39
  
  
   https://github.com/robert-j/**lucene.net/commit/**
   50bad187655d59968d51d472b57c2a**40e201d663
 
 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663
  
  
  
   Also, the fix for [LUCENENET-358] is basically making
   Lucene.Net.dll a .NET 4.0-only assembly:
  
   https://github.com/apache/**lucene.net/commit/**
   23ea6f52362fc7dbce48fd012cea12**9a7350c73c
 
 https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c
  
  
   Did we agree about abandoning .NET = 3.5?
  
   Robert
  
  
 
 



Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Itamar Syn-Hershko
Use a Lucene.Net core package for the core, and separate packages for each
contrib. That makes the most sense, and that is how most projects work. This
is also how Java Lucene does.

Don't create a nightly nuget package - nuget should only be used for
distribution packages

On Wed, Sep 21, 2011 at 6:56 AM, Michael Herndon 
mhern...@wickedsoftware.net wrote:

 We're taking a quick poll over the next few days to see how people would
 like use Lucene.Net through Nuget on the developers mailing list**

 Currently version 2.9.2 is hosted on nuget.org, but that package was not
 create by the project maintainers, thus nuget is not currently set up in
 source.  Going forward, we would like to continue what someone else started
 by creating nuget packages for Lucene.Net.

 Right now there are two packages: Lucene  Lucene.Contrib.  My question to
 the community is do you wish to finer grain packages, i.e. a package for
 each contrib project or continue to keep it simple.

 The granular approach will let you use only what you need. We can also
 create additional higher level packages which have dependencies on the
 other
 ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

 Or we can keep it simple and continue with only two packages.

 My concerns are that the granular approach might overwhelm people with
 choice. The simple choice might be considered bloat for importing and then
 installing assemblies that you might never use.


 Another topic to converse about is would you like to see an out-of-band
 project nuget feed for  nightly builds, branches with new or experimental
 features, or stable code snapshots for a projected release?


 ** when you post, please respond to lucene-net-dev@lucene.apache.org.
  This
 was posted to both lists to make sure everyone subscribed to both lists has
 a chance to voice their use cases or concerns.



RE: [Lucene.Net] 2.9.4

2011-09-21 Thread Digy
 Similarity was include, though are there any tests for this project ?
Similarity is obsolete (Queries.Net replaces it  has test cases). It has
already been removed in 2.9.4g

DIGY

-Original Message-
From: Michael Herndon [mailto:mhern...@wickedsoftware.net] 
Sent: Wednesday, September 21, 2011 10:40 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] 2.9.4

@all,

I updated the build scripts to increase it's granularity.
https://cwiki.apache.org/LUCENENET/build-system-scripts.html

Similarity was include, though are there any tests for this project ?

Some of the contrib tests are failing, I saw a few in Contrib.Highlighter
just glancing at the output .

I recieved some feedback Eric Woodruff. It looks like SHFB  Sandcastle
generate a plain file html, its been staring me in the face this whole time.
 I'll need to build in some targets that extract whats needed to push to
site branch. Then I'll start working on nuget.

@Prescott,
Can the volatile fields be wrapped in a lock statement and code that access
those fields with replaced with call to a property /method that wraps access
to that field?




On Wed, Sep 21, 2011 at 1:36 PM, Troy Howard thowar...@gmail.com wrote:

 I thought it was:

 2.9.2 and before are 2.0 compatible
 2.9.4 and before are 3.5 compatible
 After 2.9.4 are 4.0 compatible

 Thanks,
 Troy

 On Wed, Sep 21, 2011 at 10:15 AM, Michael Herndon
 mhern...@wickedsoftware.net wrote:
  if thats the case, then well need conditional statements for including
  ThreadLocalT
 
  On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser geobmx...@hotmail.com
 wrote:
 
  I thought this was after 2.9.4
 
  Sent from my Windows Phone
 
  -Original Message-
  From: Michael Herndon
  Sent: Wednesday, September 21, 2011 8:30 AM
  To: lucene-net-dev@lucene.apache.org
  Cc: lucene-net-...@incubator.apache.org
  Subject: Re: [Lucene.Net] 2.9.4
 
  @Robert,
 
  I believe the overwhelming consensus on the mailing list vote was to
 move
  to
  .NET 4.0 and drop support for previous versions.
 
  I'll take care of build scripts issue while they being refactored into
  smaller chunks this week.
 
  @Troy, Agreed.
 
  On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote:
 
   On 20.09.2011 23:48, Prescott Nasser wrote:
  
   Hey all seems like we are set with 2.9.4? Feedback has been positive
 and
   its been quiet. Do we feel ready to vote for a new release?
  
  
   I don't know if the build infrastructure is part of the
   release. If yes, then there is an open issue:
  
   Contrib doesn't build right now because there
   are some assembly name mismatches between certain *.csproj
   files and  build/scripts/contrib.targets.
  
   The following patches should fix the issue:
  
   https://github.com/robert-j/**lucene.net/commit/**
   c5218bca56c19b3407648224781eec**7316994a39
 

https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec
7316994a39
  
  
   https://github.com/robert-j/**lucene.net/commit/**
   50bad187655d59968d51d472b57c2a**40e201d663
 

https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a
40e201d663
  
  
  
   Also, the fix for [LUCENENET-358] is basically making
   Lucene.Net.dll a .NET 4.0-only assembly:
  
   https://github.com/apache/**lucene.net/commit/**
   23ea6f52362fc7dbce48fd012cea12**9a7350c73c
 

https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a
7350c73c
  
  
   Did we agree about abandoning .NET = 3.5?
  
   Robert
  
  
 
 


-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11



RE: [Lucene.Net] 2.9.4

2011-09-21 Thread Digy
@Robert 

 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a
.NET 4.0-only assembly:

There is a commented part at the end of the CloseableThreadLocal which may
seem familiar to you :)
No harm in uncommenting it and no conditional compilation is needed. 
It also pass all test cases.

DIGY



-Original Message-
From: Robert Jordan [mailto:robe...@gmx.net] 
Sent: Wednesday, September 21, 2011 3:09 PM
To: lucene-net-...@incubator.apache.org
Subject: Re: [Lucene.Net] 2.9.4

On 20.09.2011 23:48, Prescott Nasser wrote:
 Hey all seems like we are set with 2.9.4? Feedback has been positive and
its been quiet. Do we feel ready to vote for a new release?

I don't know if the build infrastructure is part of the
release. If yes, then there is an open issue:

Contrib doesn't build right now because there
are some assembly name mismatches between certain *.csproj
files and  build/scripts/contrib.targets.

The following patches should fix the issue:

https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec
7316994a39

https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a
40e201d663


Also, the fix for [LUCENENET-358] is basically making
Lucene.Net.dll a .NET 4.0-only assembly:

https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a
7350c73c

Did we agree about abandoning .NET = 3.5?

Robert

-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11



RE: [Lucene.Net] 2.9.4

2011-09-21 Thread Digy
You are right in race condition  NullReferenceException. 
but 
static SupportClass.WeakHashTable slots = new
SupportClass.WeakHashTable();
wouldn't work since it is intented to be created in all threads not once.

Would you patch it or leave it to me?

Thanks,
DIGY

-Original Message-
From: Robert Jordan [mailto:robe...@gmx.net] 
Sent: Thursday, September 22, 2011 1:16 AM
To: lucene-net-...@incubator.apache.org
Subject: Re: [Lucene.Net] 2.9.4

Hi Digy,

On 21.09.2011 23:38, Digy wrote:
 @Robert

 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a
 .NET 4.0-only assembly:

 There is a commented part at the end of the CloseableThreadLocal which may
 seem familiar to you :)

Indeed :) I've missed this comment.

 No harm in uncommenting it and no conditional compilation is needed.
 It also pass all test cases.

BTW, there is an issue with this commented-out code. If Value
is not accessed at least once, Dispose() will fail with a
NullReferenceException. There is also a little chance for
a race condition.

I'd rather get rid of Init() for this code:

static SupportClass.WeakHashTable slots = new SupportClass.WeakHashTable();

Robert


 DIGY



 -Original Message-
 From: Robert Jordan [mailto:robe...@gmx.net]
 Sent: Wednesday, September 21, 2011 3:09 PM
 To: lucene-net-...@incubator.apache.org
 Subject: Re: [Lucene.Net] 2.9.4

 On 20.09.2011 23:48, Prescott Nasser wrote:
 Hey all seems like we are set with 2.9.4? Feedback has been positive and
 its been quiet. Do we feel ready to vote for a new release?

 I don't know if the build infrastructure is part of the
 release. If yes, then there is an open issue:

 Contrib doesn't build right now because there
 are some assembly name mismatches between certain *.csproj
 files and  build/scripts/contrib.targets.

 The following patches should fix the issue:


https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec
 7316994a39


https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a
 40e201d663


 Also, the fix for [LUCENENET-358] is basically making
 Lucene.Net.dll a .NET 4.0-only assembly:


https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a
 7350c73c

 Did we agree about abandoning .NET= 3.5?

 Robert

 -

 Checked by AVG - www.avg.com
 Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11




-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11



Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Michael Herndon
@Digy, that could be done post build with ILMerge or build an additional
uber assembly that stores other assemblies as a resource.
http://blogs.msdn.com/b/microsoft_press/archive/2010/02/03/jeffrey-richter-excerpt-2-from-clr-via-c-third-edition.aspx

We can add the above to the build process if that would interest people.

To some nuget is just another disruption and  to others its a godsend.  Some
might say only hipsters would use nuget, others might say the cools kids
with iphones use nuget. (or android or wp7).

At the end of the day nuget or combining assemblies are just channels/ways
we can make it easier for various developers to consume  get their hands on
Lucene.Net. If anyone else has ideas along those lines and it can be
automated, post it in this thread.





On Wed, Sep 21, 2011 at 6:00 PM, Digy digyd...@gmail.com wrote:

 Even all contribs could be a single project/assembly. That way, users could
 reference all contribs with a single assembly.
 I see no harm in putting a few KB pressure on RAM :)

 DIGY


 -Original Message-
 From: Troy Howard [mailto:thowar...@gmail.com]
 Sent: Wednesday, September 21, 2011 7:32 AM
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 While it may be a bit redundant, why couldn't there be an individual
 package for each piece of contrib and a Lucene.Net Contrib (All)
 package that drags them all down.

 That way users can grab just the bit they need, or if they just want
 to get the whole thing, grab the All package.

 Thanks,
 Troy


 On Tue, Sep 20, 2011 at 9:11 PM, Aaron Powell m...@aaron-powell.com wrote:
  I'm going to vote +1 for granular.
 
  With the RC you could look at myget and have a Lucene.Net repository on
 there so people can go for unstable on myget, stables on nuget.
 
  Also, I came across this article which explains how to setup a build
 server to automatically push to nuget/ myget which could be useful to the
 maintainers:
 http://brendanforster.com/doing-the-build-server-dance-with-nuget.html
 
  Aaron Powell
  MVP - Internet Explorer (Development) | FunnelWeb Team Member
 
  http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell |
 Github | BitBucket
 
  -Original Message-
  From: Prescott Nasser [mailto:geobmx...@hotmail.com]
  Sent: Wednesday, 21 September 2011 2:05 PM
  To: lucene-net-dev@lucene.apache.org
  Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
 
 
  Right now there are two packages: Lucene  Lucene.Contrib. My question
  to the community is do you wish to finer grain packages, i.e. a
  package for each contrib project or continue to keep it simple.
 
 
 
  +1 Granular, we just need to be good about descriptions.
 
 
 
  Another topic to converse about is would you like to see an
  out-of-band project nuget feed for nightly builds, branches with new
  or experimental features, or stable code snapshots for a projected
 release?
 
 
  Having a package for the latest RC would probably be a good idea
 
 -

 Checked by AVG - www.avg.com
 Version: 2012.0.1808 / Virus Database: 2085/4508 - Release Date: 09/20/11




RE: [Lucene.Net] 2.9.4

2011-09-21 Thread Digy
I reconsidered it and there is no race condition.  A new slot will be
created for each thread.
But NullReferenceException bug is still there.

DIGY

-Original Message-
From: Robert Jordan [mailto:robe...@gmx.net] 
Sent: Thursday, September 22, 2011 1:16 AM
To: lucene-net-...@incubator.apache.org
Subject: Re: [Lucene.Net] 2.9.4

Hi Digy,

On 21.09.2011 23:38, Digy wrote:
 @Robert

 Also, the fix for [LUCENENET-358] is basically making Lucene.Net.dll a
 .NET 4.0-only assembly:

 There is a commented part at the end of the CloseableThreadLocal which may
 seem familiar to you :)

Indeed :) I've missed this comment.

 No harm in uncommenting it and no conditional compilation is needed.
 It also pass all test cases.

BTW, there is an issue with this commented-out code. If Value
is not accessed at least once, Dispose() will fail with a
NullReferenceException. There is also a little chance for
a race condition.

I'd rather get rid of Init() for this code:

static SupportClass.WeakHashTable slots = new SupportClass.WeakHashTable();

Robert


 DIGY



 -Original Message-
 From: Robert Jordan [mailto:robe...@gmx.net]
 Sent: Wednesday, September 21, 2011 3:09 PM
 To: lucene-net-...@incubator.apache.org
 Subject: Re: [Lucene.Net] 2.9.4

 On 20.09.2011 23:48, Prescott Nasser wrote:
 Hey all seems like we are set with 2.9.4? Feedback has been positive and
 its been quiet. Do we feel ready to vote for a new release?

 I don't know if the build infrastructure is part of the
 release. If yes, then there is an open issue:

 Contrib doesn't build right now because there
 are some assembly name mismatches between certain *.csproj
 files and  build/scripts/contrib.targets.

 The following patches should fix the issue:


https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec
 7316994a39


https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a
 40e201d663


 Also, the fix for [LUCENENET-358] is basically making
 Lucene.Net.dll a .NET 4.0-only assembly:


https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a
 7350c73c

 Did we agree about abandoning .NET= 3.5?

 Robert

 -

 Checked by AVG - www.avg.com
 Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11




-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11



RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Digy


http://blogs.msdn.com/b/microsoft_press/archive/2010/02/03/jeffrey-richter-e
xcerpt-2-from-clr-via-c-third-edition.aspx

Yes, this is the trick some obfuscators use.(they use also some scrambling
fxns to hide the code in resource)

DIGY


-Original Message-
From: Michael Herndon [mailto:mhern...@wickedsoftware.net] 
Sent: Thursday, September 22, 2011 1:36 AM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

@Digy, that could be done post build with ILMerge or build an additional
uber assembly that stores other assemblies as a resource.
http://blogs.msdn.com/b/microsoft_press/archive/2010/02/03/jeffrey-richter-e
xcerpt-2-from-clr-via-c-third-edition.aspx

We can add the above to the build process if that would interest people.

To some nuget is just another disruption and  to others its a godsend.  Some
might say only hipsters would use nuget, others might say the cools kids
with iphones use nuget. (or android or wp7).

At the end of the day nuget or combining assemblies are just channels/ways
we can make it easier for various developers to consume  get their hands on
Lucene.Net. If anyone else has ideas along those lines and it can be
automated, post it in this thread.





On Wed, Sep 21, 2011 at 6:00 PM, Digy digyd...@gmail.com wrote:

 Even all contribs could be a single project/assembly. That way, users
could
 reference all contribs with a single assembly.
 I see no harm in putting a few KB pressure on RAM :)

 DIGY


 -Original Message-
 From: Troy Howard [mailto:thowar...@gmail.com]
 Sent: Wednesday, September 21, 2011 7:32 AM
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 While it may be a bit redundant, why couldn't there be an individual
 package for each piece of contrib and a Lucene.Net Contrib (All)
 package that drags them all down.

 That way users can grab just the bit they need, or if they just want
 to get the whole thing, grab the All package.

 Thanks,
 Troy


 On Tue, Sep 20, 2011 at 9:11 PM, Aaron Powell m...@aaron-powell.com wrote:
  I'm going to vote +1 for granular.
 
  With the RC you could look at myget and have a Lucene.Net repository on
 there so people can go for unstable on myget, stables on nuget.
 
  Also, I came across this article which explains how to setup a build
 server to automatically push to nuget/ myget which could be useful to the
 maintainers:
 http://brendanforster.com/doing-the-build-server-dance-with-nuget.html
 
  Aaron Powell
  MVP - Internet Explorer (Development) | FunnelWeb Team Member
 
  http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell |
 Github | BitBucket
 
  -Original Message-
  From: Prescott Nasser [mailto:geobmx...@hotmail.com]
  Sent: Wednesday, 21 September 2011 2:05 PM
  To: lucene-net-dev@lucene.apache.org
  Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts
 
 
  Right now there are two packages: Lucene  Lucene.Contrib. My question
  to the community is do you wish to finer grain packages, i.e. a
  package for each contrib project or continue to keep it simple.
 
 
 
  +1 Granular, we just need to be good about descriptions.
 
 
 
  Another topic to converse about is would you like to see an
  out-of-band project nuget feed for nightly builds, branches with new
  or experimental features, or stable code snapshots for a projected
 release?
 
 
  Having a package for the latest RC would probably be a good idea
 
 -

 Checked by AVG - www.avg.com
 Version: 2012.0.1808 / Virus Database: 2085/4508 - Release Date: 09/20/11



-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11



RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Aaron Powell
Any particular reason you guys are not interested in NuGet?

Aaron Powell
MVP - Internet Explorer (Development) | FunnelWeb Team Member

http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell | Github | 
BitBucket 


-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Thursday, 22 September 2011 7:42 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

Sorry, but I feel the same as Neal.

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
Sent: Wednesday, September 21, 2011 6:08 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

No interest in Nuget whatsoever.

- Neal

-Original Message-
From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
Sent: Tuesday, September 20, 2011 10:57 PM
To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

We're taking a quick poll over the next few days to see how people would like 
use Lucene.Net through Nuget on the developers mailing list**

Currently version 2.9.2 is hosted on nuget.org, but that package was not create 
by the project maintainers, thus nuget is not currently set up in source.  
Going forward, we would like to continue what someone else started by creating 
nuget packages for Lucene.Net.

Right now there are two packages: Lucene  Lucene.Contrib.  My question to the 
community is do you wish to finer grain packages, i.e. a package for each 
contrib project or continue to keep it simple.

The granular approach will let you use only what you need. We can also create 
additional higher level packages which have dependencies on the other
ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

Or we can keep it simple and continue with only two packages.

My concerns are that the granular approach might overwhelm people with choice. 
The simple choice might be considered bloat for importing and then installing 
assemblies that you might never use.


Another topic to converse about is would you like to see an out-of-band project 
nuget feed for  nightly builds, branches with new or experimental features, or 
stable code snapshots for a projected release?


** when you post, please respond to lucene-net-dev@lucene.apache.org.  This was 
posted to both lists to make sure everyone subscribed to both lists has a 
chance to voice their use cases or concerns.
-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11



RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Digy
I am not against it, but personally think it as a toy.
I am from the generation where people used vi to write codes.

DIGY

-Original Message-
From: Aaron Powell [mailto:m...@aaron-powell.com] 
Sent: Thursday, September 22, 2011 1:56 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

Any particular reason you guys are not interested in NuGet?

Aaron Powell
MVP - Internet Explorer (Development) | FunnelWeb Team Member

http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell |
Github | BitBucket 


-Original Message-
From: Digy [mailto:digyd...@gmail.com] 
Sent: Thursday, 22 September 2011 7:42 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

Sorry, but I feel the same as Neal.

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
Sent: Wednesday, September 21, 2011 6:08 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

No interest in Nuget whatsoever.

- Neal

-Original Message-
From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
Sent: Tuesday, September 20, 2011 10:57 PM
To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

We're taking a quick poll over the next few days to see how people would
like use Lucene.Net through Nuget on the developers mailing list**

Currently version 2.9.2 is hosted on nuget.org, but that package was not
create by the project maintainers, thus nuget is not currently set up in
source.  Going forward, we would like to continue what someone else started
by creating nuget packages for Lucene.Net.

Right now there are two packages: Lucene  Lucene.Contrib.  My question to
the community is do you wish to finer grain packages, i.e. a package for
each contrib project or continue to keep it simple.

The granular approach will let you use only what you need. We can also
create additional higher level packages which have dependencies on the other
ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

Or we can keep it simple and continue with only two packages.

My concerns are that the granular approach might overwhelm people with
choice. The simple choice might be considered bloat for importing and then
installing assemblies that you might never use.


Another topic to converse about is would you like to see an out-of-band
project nuget feed for  nightly builds, branches with new or experimental
features, or stable code snapshots for a projected release?


** when you post, please respond to lucene-net-dev@lucene.apache.org.  This
was posted to both lists to make sure everyone subscribed to both lists has
a chance to voice their use cases or concerns.
-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11

-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11



RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Digy
Not that old :)
DIGY

-Original Message-
From: Prescott Nasser [mailto:geobmx...@hotmail.com] 
Sent: Thursday, September 22, 2011 2:14 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

Punch cards or bust!

Sent from my Windows Phone

-Original Message-
From: Digy
Sent: Wednesday, September 21, 2011 4:06 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

I am not against it, but personally think it as a toy.
I am from the generation where people used vi to write codes.

DIGY

-Original Message-
From: Aaron Powell [mailto:m...@aaron-powell.com]
Sent: Thursday, September 22, 2011 1:56 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

Any particular reason you guys are not interested in NuGet?

Aaron Powell
MVP - Internet Explorer (Development) |�FunnelWeb Team Member

http://apowell.me�|�http://twitter.com/slace�| Skype: aaron.l.powell |
Github | BitBucket


-Original Message-
From: Digy [mailto:digyd...@gmail.com]
Sent: Thursday, 22 September 2011 7:42 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

Sorry, but I feel the same as Neal.

DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
Sent: Wednesday, September 21, 2011 6:08 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

No interest in Nuget whatsoever.

- Neal

-Original Message-
From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
Sent: Tuesday, September 20, 2011 10:57 PM
To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

We're taking a quick poll over the next few days to see how people would
like use Lucene.Net through Nuget on the developers mailing list**

Currently version 2.9.2 is hosted on nuget.org, but that package was not
create by the project maintainers, thus nuget is not currently set up in
source.  Going forward, we would like to continue what someone else started
by creating nuget packages for Lucene.Net.

Right now there are two packages: Lucene  Lucene.Contrib.  My question to
the community is do you wish to finer grain packages, i.e. a package for
each contrib project or continue to keep it simple.

The granular approach will let you use only what you need. We can also
create additional higher level packages which have dependencies on the other
ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

Or we can keep it simple and continue with only two packages.

My concerns are that the granular approach might overwhelm people with
choice. The simple choice might be considered bloat for importing and then
installing assemblies that you might never use.


Another topic to converse about is would you like to see an out-of-band
project nuget feed for  nightly builds, branches with new or experimental
features, or stable code snapshots for a projected release?


** when you post, please respond to lucene-net-dev@lucene.apache.org.  This
was posted to both lists to make sure everyone subscribed to both lists has
a chance to voice their use cases or concerns.
-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11

-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11

-

Checked by AVG - www.avg.com
Version: 2012.0.1809 / Virus Database: 2085/4510 - Release Date: 09/21/11



Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Michael Herndon
Nick,

The last e-mail was out of line and out of context. If anything, emails like
that can push people into emotional or motivational apathy towards working
on a project.

1) Lucene.Net will be getting nuget packages.   People can hate on it,
grumble, or not use it, but its a viable distribution vehicle. Its going in.
  This thread was to gather feedback on how people that would use it, see
themselves using it.

2) Others might want alternatives to nuget that have not been provided yet.
 We should be open to providing distribution alternatives if enough people
warrant it.  Its not apathetic or impassive to think to that there might be
more than one way to distribute releases.

3) Attack problems. Not people. If you believe a person is the problem, take
the issue up with them offline. Those kinds of things are better face to
face or through a phone call, or an exceptionally clear e-mail. Its way too
easy for people to read into things too much or take things out of context
in an e-mail.

Attacking people also distracts people from focusing on the actual issue and
prevents any actually logic or reason or sound argument from being heard.
 Its a good way to alienate people that you should actually be trying to
persuade.

4) If I was actually apathetic and severely short sighted, I would not be
spending my own vacation time this weekend automating nuget packages with
the build scripts for Lucene.Net or experimenting Portable Library Tools for
Lucene.Net 4.x to see if we can get it working on mobile.  Nor would I  have
spent my last 4 day weekend setting up jenkins and local builds of
Lucene.Net.  Or put in the hours today to make sure the build scripts
are granular enough to implement the smaller packages.

5) If you feel so passionately about all this, why not work towards being a
contributor or committer and lead by example ?


- Michael



Since I'm the one implementing Nuget into the build process and I have not
played with the nuget server or creating a package, it just seem wise to
gather feedback on how people saw themselves using the contrib packages.





On Wed, Sep 21, 2011 at 9:00 PM, Nicholas Paldino [.NET/C# MVP] 
casper...@caspershouse.com wrote:

 With all due respect, it's myopic opinions like yours and Michael's (his
 leans more towards apathy) which will harm the ability to get the project
 into the hands of people.

 I think (hope?) it can be agreed upon that the more that people are aware
 of
 Lucene.NET, the better it is for the project in general, and most
 importantly, the more potential that you have that someone will *contribute
 back* to it (and given what Lucene.NET has gone through in the past year,
 it
 desperately needs that participation).

 The fact of the matter is that Nuget puts packages in the hands of .NET
 developers, that leads to exposure and regardless of personal opinions on
 whether or not they *like* Nuget, it can't be denied that it's an
 *extremely* popular way to get libraries into people's projects.

 If you want to quibble over the actual numbers (and the definition of
 extremely popular) then that's fine, but here are the numbers you want:

 http://stats.nuget.org/

 If you want to just tell that audience to take a leap, that's fine, but I
 think it would be foolish to do so otherwise.

 Additionally, given that Lucene.NET is already on Nuget, isn't there *any*
 concern that there isn't an official distro?  Aren't you concerned about
 the
 integrity of the brand that so many of you fought to keep alive over the
 past year?  There's no guarantee that what's on Nuget will be the official
 releases/builds that come out of this project, and I'm a little surprised
 there isn't more concern over that aspect either.

 Just my $0.02

 - Nick

 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, September 21, 2011 7:06 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 I am not against it, but personally think it as a toy.
 I am from the generation where people used vi to write codes.

 DIGY

 -Original Message-
 From: Aaron Powell [mailto:m...@aaron-powell.com]
 Sent: Thursday, September 22, 2011 1:56 AM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 Any particular reason you guys are not interested in NuGet?

 Aaron Powell
 MVP - Internet Explorer (Development) | FunnelWeb Team Member

 http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell |
 Github | BitBucket


 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Thursday, 22 September 2011 7:42 AM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 Sorry, but I feel the same as Neal.

 DIGY

 -Original Message-
 From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
 Sent: Wednesday, September 21, 2011 6:08 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: 

Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Troy Howard
Michael - Could be wrong, but I think Nick might have gotten you
confused with Neal.

Regardless, I completely agree with everything you just said.

And, Yay for NuGet! Package management is the bomb.

-T


On Wed, Sep 21, 2011 at 7:43 PM, Michael Herndon
mhern...@wickedsoftware.net wrote:
 Nick,

 The last e-mail was out of line and out of context. If anything, emails like
 that can push people into emotional or motivational apathy towards working
 on a project.

 1) Lucene.Net will be getting nuget packages.   People can hate on it,
 grumble, or not use it, but its a viable distribution vehicle. Its going in.
  This thread was to gather feedback on how people that would use it, see
 themselves using it.

 2) Others might want alternatives to nuget that have not been provided yet.
  We should be open to providing distribution alternatives if enough people
 warrant it.  Its not apathetic or impassive to think to that there might be
 more than one way to distribute releases.

 3) Attack problems. Not people. If you believe a person is the problem, take
 the issue up with them offline. Those kinds of things are better face to
 face or through a phone call, or an exceptionally clear e-mail. Its way too
 easy for people to read into things too much or take things out of context
 in an e-mail.

 Attacking people also distracts people from focusing on the actual issue and
 prevents any actually logic or reason or sound argument from being heard.
  Its a good way to alienate people that you should actually be trying to
 persuade.

 4) If I was actually apathetic and severely short sighted, I would not be
 spending my own vacation time this weekend automating nuget packages with
 the build scripts for Lucene.Net or experimenting Portable Library Tools for
 Lucene.Net 4.x to see if we can get it working on mobile.  Nor would I  have
 spent my last 4 day weekend setting up jenkins and local builds of
 Lucene.Net.  Or put in the hours today to make sure the build scripts
 are granular enough to implement the smaller packages.

 5) If you feel so passionately about all this, why not work towards being a
 contributor or committer and lead by example ?


 - Michael



 Since I'm the one implementing Nuget into the build process and I have not
 played with the nuget server or creating a package, it just seem wise to
 gather feedback on how people saw themselves using the contrib packages.





 On Wed, Sep 21, 2011 at 9:00 PM, Nicholas Paldino [.NET/C# MVP] 
 casper...@caspershouse.com wrote:

 With all due respect, it's myopic opinions like yours and Michael's (his
 leans more towards apathy) which will harm the ability to get the project
 into the hands of people.

 I think (hope?) it can be agreed upon that the more that people are aware
 of
 Lucene.NET, the better it is for the project in general, and most
 importantly, the more potential that you have that someone will *contribute
 back* to it (and given what Lucene.NET has gone through in the past year,
 it
 desperately needs that participation).

 The fact of the matter is that Nuget puts packages in the hands of .NET
 developers, that leads to exposure and regardless of personal opinions on
 whether or not they *like* Nuget, it can't be denied that it's an
 *extremely* popular way to get libraries into people's projects.

 If you want to quibble over the actual numbers (and the definition of
 extremely popular) then that's fine, but here are the numbers you want:

 http://stats.nuget.org/

 If you want to just tell that audience to take a leap, that's fine, but I
 think it would be foolish to do so otherwise.

 Additionally, given that Lucene.NET is already on Nuget, isn't there *any*
 concern that there isn't an official distro?  Aren't you concerned about
 the
 integrity of the brand that so many of you fought to keep alive over the
 past year?  There's no guarantee that what's on Nuget will be the official
 releases/builds that come out of this project, and I'm a little surprised
 there isn't more concern over that aspect either.

 Just my $0.02

 - Nick

 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, September 21, 2011 7:06 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 I am not against it, but personally think it as a toy.
 I am from the generation where people used vi to write codes.

 DIGY

 -Original Message-
 From: Aaron Powell [mailto:m...@aaron-powell.com]
 Sent: Thursday, September 22, 2011 1:56 AM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 Any particular reason you guys are not interested in NuGet?

 Aaron Powell
 MVP - Internet Explorer (Development) | FunnelWeb Team Member

 http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell |
 Github | BitBucket


 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Thursday, 22 September 2011 7:42 AM
 To: 

[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109291#comment-13109291
 ] 

Hoss Man commented on SOLR-2787:


i honestly have no idea what this request is for.

an external link directive to an external http: file that supplies a 
(.htaccess compatible) list of known bad bot sites  that solr should do 
what with exactly?

when/how/why should solr use this (user maintained?) list of bad sites?

what is the goal?

 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an external link directive to an external http: file that supplies a 
 (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time

2011-09-21 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109292#comment-13109292
 ] 

Karl Wright commented on SOLR-1895:
---

bq. The core of this path is an allow/deny matrix to lucene Query; this is 
applicable to many security strategies not just manifold.  My hope with 
introducing the AccessTokenService is to separate the user-to-token mapping

I agree - there should be a unified framework to the degree feasible.  This 
would allow common testing and reasonable maintenance across Lucene and Solr 
versions for the future.

For ManifoldCF, there's also an unrelated release-engineering question, 
specifically for the ManifoldCF-specific portion of the proposal, which is why 
we'd think introducing a code dependency on something like Solr/Lucene would be 
a good idea, especially since we'd be building a jar specifically for 
deployment within Solr.  We do this reluctantly for a couple of other 
connectors but it's a complete one-of each time and requires a great deal of 
work by end users.  This inconvenience greatly impacts the level of deployment 
of the affected connectors.  Since Solr is Apache licensed we could make this 
easier in Solr's case, but probably not without redistributing a specific 
version of Solr and Lucene, and providing build targets which fire up an 
already configured Solr/Lucene instance.  We would need this also for testing, 
if the plugin code lived in ManifoldCF.  It is also the case that the current 
ManifoldCF search component needed significant rework even to build between 
version 3.x and version 4.x, because many of the classes that were necessary 
changed their packages.  Thus we'd need to redistribute more than one 
Solr/Lucene instance, and release perhaps twice as frequently to keep up.

Given all that, does everyone still think it is desirable for ManifoldCF to 
build Solr components itself?  The alternative would be a Solr contrib module, 
which I'd be very happy with.  To me, it is the obvious choice if you want a 
straightforward overall user experience.  The underlying http-based protocol 
that the component will need to use is well-defined, quite complete, and is 
unlikely to change.



 ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search 
 time
 --

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
  Labels: document, security, solr
 Fix For: 3.5, 4.0

 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, 
 LCFSecurityFilter.java, LCFSecurityFilter.java, 
 SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, 
 SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, 
 SOLR-1895.patch, SOLR-1895.patch


 I've written an LCF SearchComponent which filters returned results based on 
 access tokens provided by LCF's authority service.  The component requires 
 you to configure the appropriate authority service URL base, e.g.:
   !-- LCF document security enforcement component --
   searchComponent name=lcfSecurity class=LCFSecurityFilter
 str 
 name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str
   /searchComponent
 Also required are the following schema.xml additions:
!-- Security fields --
field name=allow_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=allow_token_share type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_share type=string indexed=true stored=false 
 multiValued=true/
 Finally, to tie it into the standard request handler, it seems to need to run 
 last:
   requestHandler name=standard class=solr.SearchHandler default=true
 arr name=last-components
   strlcfSecurity/str
 /arr
 ...
 I have not set a package for this code.  Nor have I been able to get it 
 reviewed by someone as conversant with Solr as I would prefer.  It is my 
 hope, however, that this module will become part of the standard Solr 1.5 
 suite of search components, since that would tie it in with LCF nicely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time

2011-09-21 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109292#comment-13109292
 ] 

Karl Wright edited comment on SOLR-1895 at 9/21/11 6:06 AM:


bq. The core of this path is an allow/deny matrix to lucene Query; this is 
applicable to many security strategies not just manifold.  My hope with 
introducing the AccessTokenService is to separate the user-to-token mapping

I agree - there should be a unified framework to the degree feasible.  This 
would allow common testing and reasonable maintenance across Lucene and Solr 
versions for the future.

For ManifoldCF, there's also an unrelated release-engineering question, 
specifically for the ManifoldCF-specific portion of the proposal.  I don't 
understand why we'd believe that introducing a code dependency on something 
like Solr/Lucene would be a good idea, especially since we'd be building a jar 
specifically for deployment within Solr.  We do this reluctantly for a couple 
of other connectors but it's a complete one-of each time and always requires a 
great deal of work by end users.  This inconvenience greatly impacts the level 
of deployment of the affected connectors.  Since Solr is Apache licensed we 
could make this easier in Solr's case, but probably not without redistributing 
a specific version of Solr and Lucene, and providing build targets which fire 
up an already configured Solr/Lucene instance.  We would need this also for 
testing, if the plugin code lived in ManifoldCF.  It is also the case that the 
current ManifoldCF search component needed significant rework even to build 
between version Lucene/Solr 3.x and version 4.x, because many of the classes 
that were used changed their packages.  Thus we'd likely need to redistribute 
more than one Solr/Lucene instance at a time, and release perhaps twice as 
frequently as we currently do just to keep up with the Solr/Lucene release 
schedule.

Given all that, does everyone still think it is desirable for ManifoldCF to 
build Solr components itself?  The alternative would be a Solr contrib module, 
which I'd be very happy with.  To me, it is the obvious choice if you want a 
straightforward overall user experience.  The underlying http-based protocol 
that the component will need to use is well-defined, quite complete, and is 
unlikely to change.  The required dependencies (commons-httpclient) are already 
redistributed by Solr, so that shouldn't be a problem either.



  was (Author: kwri...@metacarta.com):
bq. The core of this path is an allow/deny matrix to lucene Query; this is 
applicable to many security strategies not just manifold.  My hope with 
introducing the AccessTokenService is to separate the user-to-token mapping

I agree - there should be a unified framework to the degree feasible.  This 
would allow common testing and reasonable maintenance across Lucene and Solr 
versions for the future.

For ManifoldCF, there's also an unrelated release-engineering question, 
specifically for the ManifoldCF-specific portion of the proposal, which is why 
we'd think introducing a code dependency on something like Solr/Lucene would be 
a good idea, especially since we'd be building a jar specifically for 
deployment within Solr.  We do this reluctantly for a couple of other 
connectors but it's a complete one-of each time and requires a great deal of 
work by end users.  This inconvenience greatly impacts the level of deployment 
of the affected connectors.  Since Solr is Apache licensed we could make this 
easier in Solr's case, but probably not without redistributing a specific 
version of Solr and Lucene, and providing build targets which fire up an 
already configured Solr/Lucene instance.  We would need this also for testing, 
if the plugin code lived in ManifoldCF.  It is also the case that the current 
ManifoldCF search component needed significant rework even to build between 
version 3.x and version 4.x, because many of the classes that were necessary 
changed their packages.  Thus we'd need to redistribute more than one 
Solr/Lucene instance, and release perhaps twice as frequently to keep up.

Given all that, does everyone still think it is desirable for ManifoldCF to 
build Solr components itself?  The alternative would be a Solr contrib module, 
which I'd be very happy with.  To me, it is the obvious choice if you want a 
straightforward overall user experience.  The underlying http-based protocol 
that the component will need to use is well-defined, quite complete, and is 
unlikely to change.


  
 ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search 
 time
 --

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
 

[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109297#comment-13109297
 ] 

Mark Dickensob commented on SOLR-2787:
--

Message Yes it is a goal!!!
 
Obviously you dont run a big Apache site (no offence)
here is the list of bad bots i have so far in .htaccess
I can make this a file available for apache server users.
If I am in the wromg group let me know where I can lodge this request PLEASE!
 

# Kill bad bots
# RewriteCond %{HTTP_USER_AGENT} ^Web-sniffer/1 [OR]
RewriteCond %{HTTP_REFERER} ^AEE- [OR]
RewriteCond %{HTTP_USER_AGENT} ^Apache-HttpClient [OR]
RewriteCond %{HTTP_USER_AGENT} ^Atomic_Email_Hunter [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craft...@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^CakePHP [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^BDFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DomainWatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EMail\ Exractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Fetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^Huawei [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^IlTrovatore [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Infoseek\ SideWinder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Jakarta [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^jikespider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla-Firefox-Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^MyApp [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Nimo\ Software [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^swish-e [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE 

[jira] [Commented] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time

2011-09-21 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109301#comment-13109301
 ] 

Erik Hatcher commented on SOLR-1895:


bq. The purpose of a QueryParser is to parse the query... but this does not 
require any parsing.

Ryan - how about the term query parser?  While not strictly taking a free form 
query string and parsing it into a Query, the general QParserPlugin is about 
being a Query factory taking whatever inputs it needs to construct that; 
parser is a bit of a misnomer with what the abstraction really defines.  [I 
didn't understand the comment about MatchAllDocsQuery earlier either, as that 
doesn't seem necessary here]

bq. I think the bigger question is do we want any security scaffolding in solr, 
or is this something that should always be delegated elsewhere

In this case, it really boils down to generating a handful of wildcard queries, 
it looks like, but in an MCF-specific way.   I'm not sure this is, yet, a 
pressing need to generalize a security framework within Solr, as it's _just_ a 
Query generator.

Regarding the location of this capability - a Solr contrib works for me.  It's 
tricky business deciding where to put glue code between two projects (e.g. MCF 
contains a Solr indexer, using this same logic, though, why shouldn't it also 
be in a Solr contrib/mcf too?).  Perhaps the real deciding factor is a 
practical choice of where the maintainers of this best can work on it - and in 
this case it'd be MCF so that that community can maintain it directly rather 
than through JIRA patches and committers that aren't using MCF.  But again 
though, in this case I'm fine with it living in Solr contrib/mcf.


 ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search 
 time
 --

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
  Labels: document, security, solr
 Fix For: 3.5, 4.0

 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, 
 LCFSecurityFilter.java, LCFSecurityFilter.java, 
 SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, 
 SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, 
 SOLR-1895.patch, SOLR-1895.patch


 I've written an LCF SearchComponent which filters returned results based on 
 access tokens provided by LCF's authority service.  The component requires 
 you to configure the appropriate authority service URL base, e.g.:
   !-- LCF document security enforcement component --
   searchComponent name=lcfSecurity class=LCFSecurityFilter
 str 
 name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str
   /searchComponent
 Also required are the following schema.xml additions:
!-- Security fields --
field name=allow_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=allow_token_share type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_share type=string indexed=true stored=false 
 multiValued=true/
 Finally, to tie it into the standard request handler, it seems to need to run 
 last:
   requestHandler name=standard class=solr.SearchHandler default=true
 arr name=last-components
   strlcfSecurity/str
 /arr
 ...
 I have not set a package for this code.  Nor have I been able to get it 
 reviewed by someone as conversant with Solr as I would prefer.  It is my 
 hope, however, that this module will become part of the standard Solr 1.5 
 suite of search components, since that would tie it in with LCF nicely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109308#comment-13109308
 ] 

Mark Dickensob commented on SOLR-2787:
--

Also bad IP addresses

# Harvester Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)- Russia
deny from 31.184.238.
# Discobot
deny from 38.101.148.126
# Harvester Washington, United States
deny from 38.127.197.104
# Harvester Ukraine
deny from 46.211.205.71
# Harvester Seattle, United States
deny from 50.17.81.237
# Harvester Xiamen, China
deny from 58.23.252.136
# Harvester Great Britain
deny from 62.128.150.15
# Hacker New York, United States
deny from 66.114.72.9
# Google!!!
# deny from 66.249.71
# Harvester Massapequa, United States
deny from 68.194.246.194
# Harvester Lake Orion, United States
deny from 71.238.32.52
# Harvester San Marcos, United States
deny from 72.199.108.105
# Hacker Russia
deny from 77.221.130.4
# Harvester Germany
deny from 79.143.182.232
# Harvester Germany
deny from 79.143.182.232
# Sheffield, Great Britain
deny from 81.105.137.203
# Harvester Israel
deny from 82.166.235.
# Hacker Höst, Germany
deny from 83.169.6.156]
# Harvester Netherlands
deny from 85.17.147.193
# Harvester Netherlands
deny from 85.201.16.158
# Harvester France
deny from 87.98.187.40
# Harvester Spain
deny from 87.98.228.22
# Hacker Bulgaria
deny from 87.120.106.5
# Harvester Zdar Nad Sazavou, Czech Republic
deny from 90.180.139.29
# Harvester London, Great Britain
deny from 90.194.19.
# Harvester London, Great Britain
deny from 90.214.146.214
# Hacker Russian Federation
deny from 91.195.124.8
# Harvester Netherlands
deny from 93.190.136.5
# Harvester Italy
deny from 94.23.65.72
# Hacker Bulgaria
deny from 94.26.53.6
# Harvester Valencia, Spain
deny from 95.19.216.61
# Harvester Germany
deny from 95.169.160.
# Amsterdam, Netherlands
deny from 95.211.73.195
deny from trygoclio.com
# Hacker El Segundo, United States
deny from 96.46.227.5
# Harvester United States
deny from 98.174.196.217
# Harvester United States
deny from 108.27.42.190
# Fake Googlebot - Russia
deny from 109.86.225.205
# Harvester Tel Aviv, Israel
deny from 109.64.34.186
# Harvester Great Britain
deny from 109.104.92.118
# Harvester China
deny from 111.162.201.111
# Harvester China
deny from 113.104.242.61
# Hacker Chinanet
deny from 122.225.0.170
# Hacker Chinanet
deny from 124.115.1.
# Hacker Englewood, United States
deny from 130.94.69.217
# Harvester Scranton, United States
deny from 173.212.244.106
# Spectrum Adaptive Spider
deny from 174.127.132
# Harvester China
deny from 175.44.8.36
# Harvester Netherlands
deny from 178.239.58.144
# Harvester São Paulo, Brazil
deny from 201.95.81.134
# Atlanta, United States
deny from 205.251.153.164
# Hacker USA
deny from 208.79.212.174
# Ezooms
deny from 208.115.111.67
# Harvester USA
deny from 209.18.124.32
# Harvester Columbus, United States
deny from 209.190.28.178
# Sitebot
deny from 212.113.35.162
# Harvester United States, Kill subdomain
deny from 212.124.113
# Hacker Great Britain
deny from 213.40.79.217
# Harvester Spain
deny from 213.149.247.102
# Beijing Harvester
deny from 222.187.199.37



 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an external link directive to an external http: file that supplies a 
 (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time

2011-09-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109315#comment-13109315
 ] 

Jan Høydahl commented on SOLR-1895:
---

{quote}
bq. I think the bigger question is do we want any security scaffolding in solr, 
or is this something that should always be delegated elsewhere
In this case, it really boils down to generating a handful of wildcard queries, 
it looks like, but in an MCF-specific way. I'm not sure this is, yet, a 
pressing need to generalize a security framework within Solr, as it's just a 
Query generator.
{quote}

Both fq and SearchComponent would work for early binding, but when we want to 
extend the model with an (optional) late binding, i.e. filtering search 
results, fq won't cut it. A SearchComponent however can be extended not only to 
handle early+late binding but also any other strange requirements there may be 
regarding security, such as authentication by IP address, peeking at other 
parameters, modifying the request (or response) in some way etc. These would 
fit as plugins to the Security SearchComponent just as AccessTokenServices (for 
early-binding) are in current design.

I'm +1 for starting to include some built-in framework support for security, 
else I think we'll start seeing a multitude of different ways to integrate 
security which is not a competitive advantage for Solr. A SC is itself only a 
plugin anyway so we don't enforce anything on people, but I think it makes a 
huge difference that it's a plugin which ships with Solr rather than each 
connector having its own not-up-to-date security mechanism floating around.

In Real Life™ a deployment may include a mix of MCF and non-MCF connectors; in 
fact we have two customers in that situation already. The ideal would be to 
move everything to MCF but that might not be possible due to a custom or more 
fine-grained security model. Such a special case is also easier to handle with 
SC - I don't see how to add code to merge/unify two (possibly 3rd party) 
QParsers, except from creating a new umbrella one.

We'll keep the core layer generic and thin. AccessTokenSecurityComponent and 
AccessTokenService (which should perhaps be an Interface instead) go in core, 
while ManifoldCFAccessTokenService and others may live wherever most 
convenient. I, for one, would be interested in maintaining some of these 
classes, and also adding a Velocity demo of it all.

That was my +1 for SearchComponent :)

@Ryan, that's true, we only need to be concerned with authenticated user, the 
Velocity demo tab could simulate the rest.

 ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search 
 time
 --

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
  Labels: document, security, solr
 Fix For: 3.5, 4.0

 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, 
 LCFSecurityFilter.java, LCFSecurityFilter.java, 
 SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, 
 SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, 
 SOLR-1895.patch, SOLR-1895.patch


 I've written an LCF SearchComponent which filters returned results based on 
 access tokens provided by LCF's authority service.  The component requires 
 you to configure the appropriate authority service URL base, e.g.:
   !-- LCF document security enforcement component --
   searchComponent name=lcfSecurity class=LCFSecurityFilter
 str 
 name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str
   /searchComponent
 Also required are the following schema.xml additions:
!-- Security fields --
field name=allow_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=allow_token_share type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_share type=string indexed=true stored=false 
 multiValued=true/
 Finally, to tie it into the standard request handler, it seems to need to run 
 last:
   requestHandler name=standard class=solr.SearchHandler default=true
 arr name=last-components
   strlcfSecurity/str
 /arr
 ...
 I have not set a package for this code.  Nor have I been able to get it 
 reviewed by someone as conversant with Solr as I would prefer.  It is my 
 hope, however, that this module will become part of the standard Solr 1.5 
 suite of search components, since that would tie it in with LCF nicely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-09-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Attachment: SOLR-1979.patch

Fixed java.lang.IndexOutOfBoundsException bug in resolveLanguage() when no 
languages detected. Added more corner case tests.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: contrib - LangId, update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
  Labels: UpdateProcessor
 Fix For: 3.5, 4.0

 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.
 See user documentation at http://wiki.apache.org/solr/LanguageDetection

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109332#comment-13109332
 ] 

Uwe Schindler commented on LUCENE-3390:
---

Doron: That's exactly the problem. This easy use case is problematic:

You allow sorting by Price. The user can switch between forward and backward 
sorting. In all cases, you want all articles without a price at the beginning. 
To achieve this, you have to set the price value e.g. to negative_infinity for 
the forward sorting, but positive_infinity for backwards sorting. If now two 
users are using your user interface in parallel, they collide.

The fix used here is identical to Lucene trunk and we should keep the code 
similar. FieldComparator is now almost identical between trunk and 3.x (except 
the new BytesRef/Docvalues stuff in trunk).

Thinking more about it: Another apporoach (also possible for trunk) is to 
supply the missing value to FieldCache.getXxx(). The FieldCache would the first 
use Arrays.fill() to populate the FieldCache array with the default value and 
after that populate the index values. The drawback is that you get a separate 
FieldCache entry for each distinct missing value. For the above se case, you 
would have two float/double price caches.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109297#comment-13109297
 ] 

Mark Dickensob edited comment on SOLR-2787 at 9/21/11 8:28 AM:
---

Message Yes it is a goal!!!
 
Obviously you dont run a big Apache site (no offence)
here is the list of bad bots i have so far in .htaccess
I can make this a file available for apache server users.
If I am in the wrong group let me know where I can lodge this request PLEASE!
 

# Kill bad bots
# RewriteCond %{HTTP_USER_AGENT} ^Web-sniffer/1 [OR]
RewriteCond %{HTTP_REFERER} ^AEE- [OR]
RewriteCond %{HTTP_USER_AGENT} ^Apache-HttpClient [OR]
RewriteCond %{HTTP_USER_AGENT} ^Atomic_Email_Hunter [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craft...@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^CakePHP [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^BDFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DomainWatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EMail\ Exractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Fetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^Huawei [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^IlTrovatore [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Infoseek\ SideWinder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Jakarta [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^jikespider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla-Firefox-Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^MyApp [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Nimo\ Software [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^swish-e [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} 

[jira] [Issue Comment Edited] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109297#comment-13109297
 ] 

Mark Dickensob edited comment on SOLR-2787 at 9/21/11 8:29 AM:
---

Message Yes it is a goal!!!
 
Obviously you dont run a big Apache site (no offence)
here is the list of bad bots i have so far in .htaccess
I can make this file available for apache server users via a .htaccess 
directive.
If I am in the wrong group let me know where I can lodge this request PLEASE!
 

# Kill bad bots
# RewriteCond %{HTTP_USER_AGENT} ^Web-sniffer/1 [OR]
RewriteCond %{HTTP_REFERER} ^AEE- [OR]
RewriteCond %{HTTP_USER_AGENT} ^Apache-HttpClient [OR]
RewriteCond %{HTTP_USER_AGENT} ^Atomic_Email_Hunter [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craft...@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^CakePHP [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^BDFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DomainWatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EMail\ Exractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Fetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^Huawei [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^IlTrovatore [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Infoseek\ SideWinder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Jakarta [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^jikespider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla-Firefox-Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^MyApp [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Nimo\ Software [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^swish-e [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond 

[jira] [Issue Comment Edited] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109308#comment-13109308
 ] 

Mark Dickensob edited comment on SOLR-2787 at 9/21/11 8:32 AM:
---

Also bad IP addresses

# Harvester Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)- Russia
deny from 31.184.238.
# Harvester Washington, United States
deny from 38.127.197.104
# Harvester Ukraine
deny from 46.211.205.71
# Harvester Seattle, United States
deny from 50.17.81.237
# Harvester Xiamen, China
deny from 58.23.252.136
# Harvester Great Britain
deny from 62.128.150.15
# Hacker New York, United States
deny from 66.114.72.9
# deny from 66.249.71
# Harvester Massapequa, United States
deny from 68.194.246.194
# Harvester Lake Orion, United States
deny from 71.238.32.52
# Harvester San Marcos, United States
deny from 72.199.108.105
# Hacker Russia
deny from 77.221.130.4
# Harvester Germany
deny from 79.143.182.232
# Harvester Germany
deny from 79.143.182.232
# Sheffield, Great Britain
deny from 81.105.137.203
# Harvester Israel
deny from 82.166.235.
# Hacker Höst, Germany
deny from 83.169.6.156]
# Harvester Netherlands
deny from 85.17.147.193
# Harvester Netherlands
deny from 85.201.16.158
# Harvester France
deny from 87.98.187.40
# Harvester Spain
deny from 87.98.228.22
# Hacker Bulgaria
deny from 87.120.106.5
# Harvester Zdar Nad Sazavou, Czech Republic
deny from 90.180.139.29
# Harvester London, Great Britain
deny from 90.194.19.
# Harvester London, Great Britain
deny from 90.214.146.214
# Hacker Russian Federation
deny from 91.195.124.8
# Harvester Netherlands
deny from 93.190.136.5
# Harvester Italy
deny from 94.23.65.72
# Hacker Bulgaria
deny from 94.26.53.6
# Harvester Valencia, Spain
deny from 95.19.216.61
# Harvester Germany
deny from 95.169.160.
# Amsterdam, Netherlands
deny from 95.211.73.195
deny from trygoclio.com
# Hacker El Segundo, United States
deny from 96.46.227.5
# Harvester United States
deny from 98.174.196.217
# Harvester United States
deny from 108.27.42.190
# Fake Googlebot - Russia
deny from 109.86.225.205
# Harvester Tel Aviv, Israel
deny from 109.64.34.186
# Harvester Great Britain
deny from 109.104.92.118
# Harvester China
deny from 111.162.201.111
# Harvester China
deny from 113.104.242.61
# Hacker Chinanet
deny from 122.225.0.170
# Hacker Chinanet
deny from 124.115.1.
# Hacker Englewood, United States
deny from 130.94.69.217
# Harvester Scranton, United States
deny from 173.212.244.106
# Spectrum Adaptive Spider
deny from 174.127.132
# Harvester China
deny from 175.44.8.36
# Harvester Netherlands
deny from 178.239.58.144
# Harvester São Paulo, Brazil
deny from 201.95.81.134
# Atlanta, United States
deny from 205.251.153.164
# Hacker USA
deny from 208.79.212.174
# Ezooms
deny from 208.115.111.67
# Harvester USA
deny from 209.18.124.32
# Harvester Columbus, United States
deny from 209.190.28.178
# Sitebot
deny from 212.113.35.162
# Harvester United States, Kill subdomain
deny from 212.124.113
# Hacker Great Britain
deny from 213.40.79.217
# Harvester Spain
deny from 213.149.247.102
# Beijing Harvester
deny from 222.187.199.37



  was (Author: goan69):
Also bad IP addresses

# Harvester Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)- Russia
deny from 31.184.238.
# Discobot
deny from 38.101.148.126
# Harvester Washington, United States
deny from 38.127.197.104
# Harvester Ukraine
deny from 46.211.205.71
# Harvester Seattle, United States
deny from 50.17.81.237
# Harvester Xiamen, China
deny from 58.23.252.136
# Harvester Great Britain
deny from 62.128.150.15
# Hacker New York, United States
deny from 66.114.72.9
# Google!!!
# deny from 66.249.71
# Harvester Massapequa, United States
deny from 68.194.246.194
# Harvester Lake Orion, United States
deny from 71.238.32.52
# Harvester San Marcos, United States
deny from 72.199.108.105
# Hacker Russia
deny from 77.221.130.4
# Harvester Germany
deny from 79.143.182.232
# Harvester Germany
deny from 79.143.182.232
# Sheffield, Great Britain
deny from 81.105.137.203
# Harvester Israel
deny from 82.166.235.
# Hacker Höst, Germany
deny from 83.169.6.156]
# Harvester Netherlands
deny from 85.17.147.193
# Harvester Netherlands
deny from 85.201.16.158
# Harvester France
deny from 87.98.187.40
# Harvester Spain
deny from 87.98.228.22
# Hacker Bulgaria
deny from 87.120.106.5
# Harvester Zdar Nad Sazavou, Czech Republic
deny from 90.180.139.29
# Harvester London, Great Britain
deny from 90.194.19.
# Harvester London, Great Britain
deny from 90.214.146.214
# Hacker Russian Federation
deny from 91.195.124.8
# Harvester Netherlands
deny from 93.190.136.5
# Harvester Italy
deny from 94.23.65.72
# Hacker Bulgaria
deny from 94.26.53.6
# Harvester Valencia, Spain
deny from 95.19.216.61
# Harvester Germany
deny from 95.169.160.
# Amsterdam, Netherlands
deny from 95.211.73.195
deny from trygoclio.com
# Hacker El Segundo, United 

[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3390:
--

Attachment: LUCENE-3390-fix-like-trunk.patch

Final patch:
- Added changes for backwards breaks
- Removed the bogus docFreq check
- Optimized the case of empty unvalued docs bit set (like in trunk)

This patch is now 100% in line with trunk. The code was already tested in trunk 
and does not affect sort speed for the common case without missing value, as 
the compiler will ignore the additional null check.

Will commit later this day.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Dickensob updated SOLR-2787:
-

Description: 
Include an .htaccess external link directive to an external http:file that 
supplies a (.htaccess compatible) list of known bad bot sites.

ie common resource for spam kill list site(s)

Personally, I run a portal and I think that this feature is important to kill 
spam!

I will supply the files for testing if you need them.

Mark goan.com

  was:
Include an external link directive to an external http: file that supplies a 
(.htaccess compatible) list of known bad bot sites.

ie common resource for spam kill list site(s)

Personally, I run a portal and I think that this feature is important to kill 
spam!

I will supply the files for testing if you need them.

Mark goan.com


 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to an external http:file that 
 supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109342#comment-13109342
 ] 

Mark Dickensob commented on SOLR-2787:
--

Do you get it yet ?

 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to include an external http:file 
 that supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Dickensob updated SOLR-2787:
-

Description: 
Include an .htaccess external link directive to include an external http:file 
that supplies a (.htaccess compatible) list of known bad bot sites.

ie common resource for spam kill list site(s)

Personally, I run a portal and I think that this feature is important to kill 
spam!

I will supply the files for testing if you need them.

Mark goan.com

  was:
Include an .htaccess external link directive to an external http:file that 
supplies a (.htaccess compatible) list of known bad bot sites.

ie common resource for spam kill list site(s)

Personally, I run a portal and I think that this feature is important to kill 
spam!

I will supply the files for testing if you need them.

Mark goan.com


 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to include an external http:file 
 that supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109349#comment-13109349
 ] 

Uwe Schindler commented on SOLR-2787:
-

What does this have to do with Apache Solr?

 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to include an external http:file 
 that supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer closed SOLR-2787.
-

Resolution: Invalid

this issue is totally unrelated to apache solr. if at all then this might be 
something for httpd (http://httpd.apache.org/)

Mark, this is the issue tracker for Apache Solr a fulltext search server which 
usually runs behind a firewall and only serves read requests to the outside. I 
think you used the wrong issue tracker to create your issue. In this context 
here your issue doesn't make sense to me either.



 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to include an external http:file 
 that supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3305) Kuromoji code donation - a new Japanese morphological analyzer

2011-09-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109356#comment-13109356
 ] 

Simon Willnauer commented on LUCENE-3305:
-

According to LEGAL-97 we can include the dict files. That means we can finish 
this code donation and get everything in shape for a commit. I will finish the 
paper work once I am back from traveling.



 Kuromoji code donation - a new Japanese morphological analyzer
 --

 Key: LUCENE-3305
 URL: https://issues.apache.org/jira/browse/LUCENE-3305
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Christian Moen
Assignee: Simon Willnauer
 Attachments: Kuromoji short overview .pdf, ip-clearance-Kuromoji.xml, 
 kuromoji-0.7.6-asf.tar.gz, kuromoji-0.7.6.tar.gz, 
 kuromoji-solr-0.5.3-asf.tar.gz, kuromoji-solr-0.5.3.tar.gz


 Atilika Inc. (アティリカ株式会社) would like to donate the Kuromoji Japanese 
 morphological analyzer to the Apache Software Foundation in the hope that it 
 will be useful to Lucene and Solr users in Japan and elsewhere.
 The project was started in 2010 since we couldn't find any high-quality, 
 actively maintained and easy-to-use Java-based Japanese morphological 
 analyzers, and these become many of our design goals for Kuromoji.
 Kuromoji also has a segmentation mode that is particularly useful for search, 
 which we hope will interest Lucene and Solr users.  Compound-nouns, such as 
 関西国際空港 (Kansai International Airport) and 日本経済新聞 (Nikkei Newspaper), are 
 segmented as one token with most analyzers.  As a result, a search for 空港 
 (airport) or 新聞 (newspaper) will not give you a for in these words.  Kuromoji 
 can segment these words into 関西 国際 空港 and 日本 経済 新聞, which is generally what 
 you would want for search and you'll get a hit.
 We also wanted to make sure the technology has a license that makes it 
 compatible with other Apache Software Foundation software to maximize its 
 usefulness.  Kuromoji has an Apache License 2.0 and all code is currently 
 owned by Atilika Inc.  The software has been developed by my good friend and 
 ex-colleague Masaru Hasegawa and myself.
 Kuromoji uses the so-called IPADIC for its dictionary/statistical model and 
 its license terms are described in NOTICE.txt.
 I'll upload code distributions and their corresponding hashes and I'd very 
 much like to start the code grant process.  I'm also happy to provide patches 
 to integrate Kuromoji into the codebase, if you prefer that.
 Please advise on how you'd like me to proceed with this.  Thank you.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109376#comment-13109376
 ] 

Mark Dickensob commented on SOLR-2787:
--

You guys must be thick a bricks.


 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to include an external http:file 
 that supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2785) DateField timezone handling

2011-09-21 Thread Howard Cox (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Howard Cox resolved SOLR-2785.
--

Resolution: Invalid

 DateField timezone handling
 ---

 Key: SOLR-2785
 URL: https://issues.apache.org/jira/browse/SOLR-2785
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.3
 Environment: Debian Gnu/Linux, OpenJDK Runtime Environment 14.0-b16
Reporter: Howard Cox
Priority: Minor
  Labels: datetime, datetimes, schema

 The Solr DateField appears to only be partially ISO 8601 compliant.
 The DateMathParser requires Timezone modifications to be in the format 
 +nMINUTES, +xHOURS, +yDAYS etc.
 [http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html]
 ISO 6801 states that timezone modifications should be in the format +00:01, 
 +01:00
 [http://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC]
 It would be useful if Solr DateField could parse both (I presume there's a 
 reason for +nMINUTE etc somewhere in Java.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109387#comment-13109387
 ] 

Uwe Schindler commented on SOLR-2787:
-

bq. You guys must be thick a bricks.

You should maybe *read* what we have written before. Simon explained: Your 
request seems to be related to Apache HTTP Server and you should open issues 
at their issue tracker. Apache Solr is a different software that has nothing 
to do with the Apache HTTP Server. Please open a bug report at the Apache 
HTTP Server website: http://httpd.apache.org/.

I would recommend to use a more appropriate tone when opening the issue there.

 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to include an external http:file 
 that supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109391#comment-13109391
 ] 

Mark Dickensob commented on SOLR-2787:
--

Nice one Uwe!



 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to include an external http:file 
 that supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2787) add external http: include file reference for .htaccess processing

2011-09-21 Thread Mark Dickensob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109398#comment-13109398
 ] 

Mark Dickensob commented on SOLR-2787:
--

Bye the way, I was a convert from Apache from microsoft servers.
Now I have changed my mind.

How is that for tone!

 add external http: include file reference for .htaccess processing
 --

 Key: SOLR-2787
 URL: https://issues.apache.org/jira/browse/SOLR-2787
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.4
 Environment: All operating systems
Reporter: Mark Dickensob
  Labels: Spam, killer
   Original Estimate: 504h
  Remaining Estimate: 504h

 Include an .htaccess external link directive to include an external http:file 
 that supplies a (.htaccess compatible) list of known bad bot sites.
 ie common resource for spam kill list site(s)
 Personally, I run a portal and I think that this feature is important to kill 
 spam!
 I will supply the files for testing if you need them.
 Mark goan.com

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109402#comment-13109402
 ] 

Michael McCandless commented on LUCENE-3390:


I would love to take this even further, and have trunk's FC implement missing 
values the same way 3.x does (ie, separate FC method to getUnvaluedDocs, rather 
than bundling this bitset w/ the computation of the values array).  But we 
should do that separately.

This is actually a serious bug; maybe we should release 3.4.1 soon (this would 
also fix the Maven packaging problem in 3.4.0).

Why did we need to narrow the return value from FC.getUnvaluedDocs to 
FixedBitSet?

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109404#comment-13109404
 ] 

Uwe Schindler commented on LUCENE-3390:
---

bq. Why did we need to narrow the return value from FC.getUnvaluedDocs to 
FixedBitSet?

We have no Bits interface in 3.x. And DocIdSet is not random access. Maybe we 
should backport the Bits interface?

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109411#comment-13109411
 ] 

Uwe Schindler commented on LUCENE-3390:
---

In my opinion a much more clean and simple approach for FieldComaparator and 
all other stuff would be the following, as it removes all additional branches 
from FieldComaparator and makes the code as simple as it was before 
missingValues at all (also in trunk):

{quote}
Thinking more about it: Another apporoach (also possible for trunk) is to 
supply the missing value to FieldCache.getXxx(). The FieldCache would the first 
use Arrays.fill() to populate the FieldCache array with the default value and 
after that populate the index values. The drawback is that you get a separate 
FieldCache entry for each distinct missing value. For the above se case, you 
would have two float/double price caches.
{quote}

We just have to think about additional memory requirements (which would affect 
only users actually using different missingValues for several searches). From 
my perspective this is much cleaner, as you can pass in a missingValue directly 
when populating the FieldCache. FieldComaparator would simply call 
FieldCache.DEFAULT.getInts(reader, parser, defaultValue). The cache would use 
the triplet including defaultValue as key. The sorting code would not need to 
be changed at all (this is similar to Doron's idea, but moved to FieldCache and 
not FC.setNextReader).

We should think about this in an additional issue and for now only fix the 
broken implementation in 3.x.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-21 Thread Aaron McCurry (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109413#comment-13109413
 ] 

Aaron McCurry commented on LUCENE-2205:
---

I have reimplemented the patch using the UTF8SortedAsUTF16Comparator as well as 
ByteArrayDataInput.  The patch also contains a unit test and I have run all the 
current tests of the core plus the contribs and everything passes.  As a plus 
the code has gotten much simpler.

During my functional testing I created a test index with small but very diverse 
terms.  Roughly 50 terms per document with 50 million documents.  So there are 
approximately 2.5 billion terms in this index.

The current 3x branch produces:
5000 documents at a heap size of 598902872.

The patched version produces:
5000 documents at a heap size of 282526224.

The random access performance of this index goes to the patch.  Running 200 
passes of a collection of randomly sampled queries (queries changes each time) 
produces the following:

The current 3x branch produces:
4186.0225 avg response time in ms

The patched version produces:
2930.1371 avg response time in ms

NOTE: The hard drive I was using is a very slow drive.  While using smaller 
indexes the patch and the current branch are very close to the same 
performance.  Depending on the pass the either one was faster.


 Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
 the index pointer long[] and create a more memory efficient data structure.
 ---

 Key: LUCENE-2205
 URL: https://issues.apache.org/jira/browse/LUCENE-2205
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
 Environment: Java5
Reporter: Aaron McCurry
Assignee: Michael McCandless
 Fix For: 3.5

 Attachments: RandomAccessTest.java, TermInfosReader.java, 
 TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
 TermInfosReaderIndexSmall.java, patch-final.txt, rawoutput.txt


 Basically packing those three arrays into a byte array with an int array as 
 an index offset.  
 The performance benefits are stagering on my test index (of size 6.2 GB, with 
 ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
 terminfos into memory were reduced to 17% of there original size.  From 291.5 
 MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
 time of the segments are ~40% faster as well, and full GC's on my JVM were 
 made 7 times faster.
 I have already performed the work and am offering this code as a patch.  
 Currently all test in the trunk pass with this new code enabled.  I did write 
 a system property switch to allow for the original implementation to be used 
 as well.
 -Dorg.apache.lucene.index.TermInfosReader=default or small
 I have also written a blog about this patch here is the link.
 http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-21 Thread Aaron McCurry (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron McCurry updated LUCENE-2205:
--

Attachment: lowmemory_w_utf8_encoding.patch

 Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
 the index pointer long[] and create a more memory efficient data structure.
 ---

 Key: LUCENE-2205
 URL: https://issues.apache.org/jira/browse/LUCENE-2205
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
 Environment: Java5
Reporter: Aaron McCurry
Assignee: Michael McCandless
 Fix For: 3.5

 Attachments: RandomAccessTest.java, TermInfosReader.java, 
 TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
 TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, 
 patch-final.txt, rawoutput.txt


 Basically packing those three arrays into a byte array with an int array as 
 an index offset.  
 The performance benefits are stagering on my test index (of size 6.2 GB, with 
 ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
 terminfos into memory were reduced to 17% of there original size.  From 291.5 
 MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
 time of the segments are ~40% faster as well, and full GC's on my JVM were 
 made 7 times faster.
 I have already performed the work and am offering this code as a patch.  
 Currently all test in the trunk pass with this new code enabled.  I did write 
 a system property switch to allow for the original implementation to be used 
 as well.
 -Dorg.apache.lucene.index.TermInfosReader=default or small
 I have also written a blog about this patch here is the link.
 http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3390:
--

Attachment: LUCENE-3390-BitsInterface.patch

Here a patch with a more clean API (as noted by Mike McCandless):
- backported the Bits interface from Lucene trunk (do a svn cp 
http://svn.apache.org//trunk//Bits.java before applying the patch
- Added interface to the well-known impls in util package
- FieldCache.getUnValuesDocs returns Bits now which makes the API very clean

This breaks backwards a bit more, as Bits does not extend DocIdSet, so code 
using the new FieldCache method will break, before recompilation was enough (as 
FixedBitSet extends DocIdSet).

Mike: How about this?

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] 2.9.4

2011-09-21 Thread Robert Jordan

On 20.09.2011 23:48, Prescott Nasser wrote:

Hey all seems like we are set with 2.9.4? Feedback has been positive and its 
been quiet. Do we feel ready to vote for a new release?


I don't know if the build infrastructure is part of the
release. If yes, then there is an open issue:

Contrib doesn't build right now because there
are some assembly name mismatches between certain *.csproj
files and  build/scripts/contrib.targets.

The following patches should fix the issue:

https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39

https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663


Also, the fix for [LUCENENET-358] is basically making
Lucene.Net.dll a .NET 4.0-only assembly:

https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c

Did we agree about abandoning .NET = 3.5?

Robert



Prettify JS and CSS exceluded from Javadocs

2011-09-21 Thread Shai Erera
Hi

I noticed that our build does not include the prettify JS and CSS with
Javadocs, unless the javadocs are created for the release. For example, if
you open any of the *javadocs.jar files (core or contrib), you'll see that
the prettify files are missing. Therefore, documentation which relies on it
is not displayed nicely (such as contrib-highlight).

The invoke-javadoc macro copies the prettify files and adds references to
them, but when the javadocs are jar-ed, the files are omitted.

At first I thought that this is a bug, but then I noticed how the files are
referenced, and the directory structure that is assumed to be created for
the javadocs, and thought that this may be intentional? When the release
binaries are created, a folder docs/api is created, under which there are
sub-folders for 'core' and 'contrib-*'. Also, a sub-folder for prettify. So
prettify is assumed to be 'sibling' of any of the javadocs folders, and the
reference in the HTML is created as such.

However, if we add prettify to any of the .jar, then it won't be a sibling
anymore, but a 'child', and the reference should change from ../prettify/*
to prettify/*.

I think this can be solved easily by referencing two scripts (and perhaps
same trick for stylesheet as well) -- only one of them will be found
depending on the distribution. I wanted to ask first if the prettify files
were omitted from the .jar intentionally or not.

Shai


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109434#comment-13109434
 ] 

Michael McCandless commented on LUCENE-3390:


Looks great Uwe!  I think we can assert that the cardinality is = numDocs, and 
then short-circuit the common == numDocs (all docs have values) case like you 
are.

I love how 3.x handles the unvalued bits... I think we should port this forward 
to trunk, but maybe make it possible to set the bits as we build up the values 
(single pass) if you specify up front you want the bit set.  I'll open a new 
issue for this...

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Brian Sayatovic
I would love to use it.  Unfortunately, my project is well underway and under 
tight deadlines, so we can't afford the disruption of switching to NuGet for 
Lucene, or any of the other libraries we use.  However, once we release, I 
definitely want to embrace NuGet and would love for Lucene.NET to be available 
through NuGet.

-Original Message-
From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
Sent: Tuesday, September 20, 2011 11:57 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

We're taking a quick poll over the next few days to see how people would like 
use Lucene.Net through Nuget on the developers mailing list**

Currently version 2.9.2 is hosted on nuget.org, but that package was not create 
by the project maintainers, thus nuget is not currently set up in source.  
Going forward, we would like to continue what someone else started by creating 
nuget packages for Lucene.Net.

Right now there are two packages: Lucene  Lucene.Contrib.  My question to the 
community is do you wish to finer grain packages, i.e. a package for each 
contrib project or continue to keep it simple.

The granular approach will let you use only what you need. We can also create 
additional higher level packages which have dependencies on the other
ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

Or we can keep it simple and continue with only two packages.

My concerns are that the granular approach might overwhelm people with choice. 
The simple choice might be considered bloat for importing and then installing 
assemblies that you might never use.


Another topic to converse about is would you like to see an out-of-band project 
nuget feed for  nightly builds, branches with new or experimental features, or 
stable code snapshots for a projected release?


** when you post, please respond to lucene-net-...@lucene.apache.org.  This was 
posted to both lists to make sure everyone subscribed to both lists has a 
chance to voice their use cases or concerns.


Learn more about the products, services and technology solutions available from 
CIN Legal Data Services at: www.cinlegal.comhttp://www.cinlegal.com

This message may contain confidential / proprietary information from CIN Legal 
Data Service and Credit Infonet, Inc.. If you are not an intended recipient, 
please refrain from the disclosure, copying, distribution or use of this 
information. All such unauthorized actions are strictly prohibited. If you have 
received this transmission in error, please notify the sender by e-mail at 
bsayato...@creditinfonet.com and delete all copies of this material from any 
computer.


Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Dan Swain
I think I'd like to stick with 2 packages

Lucene.net Core
Lucene.net Contrib

Just because I think it's nice and simple. I would say that any contrib
parts that get really big or popular either to split out into their own
package or  maybe added to the core package?

I'm also in favour of a nightly package and experimental packages.

thanks
Dan Swain


On Wed, Sep 21, 2011 at 4:56 AM, Michael Herndon 
mhern...@wickedsoftware.net wrote:

 We're taking a quick poll over the next few days to see how people would
 like use Lucene.Net through Nuget on the developers mailing list**

 Currently version 2.9.2 is hosted on nuget.org, but that package was not
 create by the project maintainers, thus nuget is not currently set up in
 source.  Going forward, we would like to continue what someone else started
 by creating nuget packages for Lucene.Net.

 Right now there are two packages: Lucene  Lucene.Contrib.  My question to
 the community is do you wish to finer grain packages, i.e. a package for
 each contrib project or continue to keep it simple.

 The granular approach will let you use only what you need. We can also
 create additional higher level packages which have dependencies on the
 other
 ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

 Or we can keep it simple and continue with only two packages.

 My concerns are that the granular approach might overwhelm people with
 choice. The simple choice might be considered bloat for importing and then
 installing assemblies that you might never use.


 Another topic to converse about is would you like to see an out-of-band
 project nuget feed for  nightly builds, branches with new or experimental
 features, or stable code snapshots for a projected release?


 ** when you post, please respond to lucene-net-...@lucene.apache.org.
  This
 was posted to both lists to make sure everyone subscribed to both lists has
 a chance to voice their use cases or concerns.



[jira] [Created] (LUCENE-3443) Port 3.x getUnvaluedDocs to trunk

2011-09-21 Thread Michael McCandless (JIRA)
Port 3.x getUnvaluedDocs to trunk
-

 Key: LUCENE-3443
 URL: https://issues.apache.org/jira/browse/LUCENE-3443
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Michael McCandless
 Fix For: 3.5, 4.0



[Spinoff from LUCENE-3390]

I think the approach in 3.x for handling un-valued docs, and making it
possible to specify how such docs are sorted, is better than the
solution we have in trunk.

I like that FC has a dedicated method to get the Bits for un-valued
docs -- easy for apps to directly use.  And I like that the un-valued
bits have their own entry in the FC.

One downside is that it's 2 passes to get values and missing bits, but
I think we can fix this by passing optional bool to FC.getXXX methods
indicating you want the bits, and the populate the FC entry for the
missing bits as well.  (We can do that for 3.x and trunk). Then it's
single pass.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



FW: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Prescott Nasser


---BeginMessage---
The granular approach can cause dependency issues as well. FubuMVC is
running into this with their granularity had to invent their own build chain
for ripples of changes.

I would say do two packages Lucene and Contrib and when one of the pieces of
Contrib gets awesome enough to warrant it's own package.

I look forward to official Lucene.Net packages.

On Tue, Sep 20, 2011 at 10:56 PM, Michael Herndon 
mhern...@wickedsoftware.net wrote:

 We're taking a quick poll over the next few days to see how people would
 like use Lucene.Net through Nuget on the developers mailing list**

 Currently version 2.9.2 is hosted on nuget.org, but that package was not
 create by the project maintainers, thus nuget is not currently set up in
 source.  Going forward, we would like to continue what someone else started
 by creating nuget packages for Lucene.Net.

 Right now there are two packages: Lucene  Lucene.Contrib.  My question to
 the community is do you wish to finer grain packages, i.e. a package for
 each contrib project or continue to keep it simple.

 The granular approach will let you use only what you need. We can also
 create additional higher level packages which have dependencies on the
 other
 ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

 Or we can keep it simple and continue with only two packages.

 My concerns are that the granular approach might overwhelm people with
 choice. The simple choice might be considered bloat for importing and then
 installing assemblies that you might never use.


 Another topic to converse about is would you like to see an out-of-band
 project nuget feed for  nightly builds, branches with new or experimental
 features, or stable code snapshots for a projected release?


 ** when you post, please respond to lucene-net-...@lucene.apache.org.
  This
 was posted to both lists to make sure everyone subscribed to both lists has
 a chance to voice their use cases or concerns.

---End Message---


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109454#comment-13109454
 ] 

Doron Cohen commented on LUCENE-3390:
-

I wrote a small test that should fail with the bug Uwe fixed here and pass with 
the fix. For some reason it is still failing even with that fix. Tried this 
with previous patch, will now try with last one, though I think it it should 
pass also with previous one. I'll give it another try.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time

2011-09-21 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109507#comment-13109507
 ] 

Erik Hatcher commented on SOLR-1895:


bq. Both fq and SearchComponent would work for early binding, but when we want 
to extend the model with an (optional) late binding, i.e. filtering search 
results, fq won't cut it.

Not true.  There's now PostFilter to enable late binding.  This might even be 
advantageous for this MCF filtering, as the WildcardQuery's could be expensive 
filters to generate and work best on the most constrained subset matching the 
rest of the traditional query and filters.

bq. A SearchComponent however can be extended not only to handle early+late 
binding but also any other strange requirements there may be regarding 
security, such as authentication by IP address, peeking at other parameters

A QParserPlugin can see all the parameters a SearchComponent can see 
[createParser(String qstr, SolrParams localParams, SolrParams params, 
SolrQueryRequest req)]

bq. ...else I think we'll start seeing a multitude of different ways to 
integrate security which is not a competitive advantage for Solr

If we cannot elaborate those different ways at this point, then building a 
framework is only asking for it to be changed later.  In what scenarios would 
a security filter want to modify the response?   

bq. I don't see how to add code to merge/unify two (possibly 3rd party) 
QParsers, except from creating a new umbrella one.

nested queries.

bq. We'll keep the core layer generic and thin. AccessTokenSecurityComponent 
and AccessTokenService (which should perhaps be an Interface instead)

I'm not sure that those abstractions are general enough.  I still think a 
qparser is the simplest/cleanest thing that will work here and doesn't preclude 
or make harder any future needs.  All of these other abstractions mentioned 
here are overkill, IMO, to what MCF needs - all it needs is a handful of 
aggregated WildcardQuery's.


 ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search 
 time
 --

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
  Labels: document, security, solr
 Fix For: 3.5, 4.0

 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, 
 LCFSecurityFilter.java, LCFSecurityFilter.java, 
 SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, 
 SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, 
 SOLR-1895.patch, SOLR-1895.patch


 I've written an LCF SearchComponent which filters returned results based on 
 access tokens provided by LCF's authority service.  The component requires 
 you to configure the appropriate authority service URL base, e.g.:
   !-- LCF document security enforcement component --
   searchComponent name=lcfSecurity class=LCFSecurityFilter
 str 
 name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str
   /searchComponent
 Also required are the following schema.xml additions:
!-- Security fields --
field name=allow_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=allow_token_share type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_share type=string indexed=true stored=false 
 multiValued=true/
 Finally, to tie it into the standard request handler, it seems to need to run 
 last:
   requestHandler name=standard class=solr.SearchHandler default=true
 arr name=last-components
   strlcfSecurity/str
 /arr
 ...
 I have not set a package for this code.  Nor have I been able to get it 
 reviewed by someone as conversant with Solr as I would prefer.  It is my 
 hope, however, that this module will become part of the standard Solr 1.5 
 suite of search components, since that would tie it in with LCF nicely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Prettify JS and CSS exceluded from Javadocs

2011-09-21 Thread Steven A Rowe
Hi Shai,

I think the prettify stuff should be included in the .jar

It’s possible that I messed this up in the packaging work I’ve done recently, 
but if so, it was not intentional.

Steve

From: Shai Erera [mailto:ser...@gmail.com]
Sent: Wednesday, September 21, 2011 8:10 AM
To: dev@lucene.apache.org
Subject: Prettify JS and CSS exceluded from Javadocs

Hi

I noticed that our build does not include the prettify JS and CSS with 
Javadocs, unless the javadocs are created for the release. For example, if you 
open any of the *javadocs.jar files (core or contrib), you'll see that the 
prettify files are missing. Therefore, documentation which relies on it is not 
displayed nicely (such as contrib-highlight).

The invoke-javadoc macro copies the prettify files and adds references to them, 
but when the javadocs are jar-ed, the files are omitted.

At first I thought that this is a bug, but then I noticed how the files are 
referenced, and the directory structure that is assumed to be created for the 
javadocs, and thought that this may be intentional? When the release binaries 
are created, a folder docs/api is created, under which there are sub-folders 
for 'core' and 'contrib-*'. Also, a sub-folder for prettify. So prettify is 
assumed to be 'sibling' of any of the javadocs folders, and the reference in 
the HTML is created as such.

However, if we add prettify to any of the .jar, then it won't be a sibling 
anymore, but a 'child', and the reference should change from ../prettify/* to 
prettify/*.

I think this can be solved easily by referencing two scripts (and perhaps same 
trick for stylesheet as well) -- only one of them will be found depending on 
the distribution. I wanted to ask first if the prettify files were omitted from 
the .jar intentionally or not.

Shai


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3390:


Attachment: LUCENE-3390-BitsInterface.patch

Attached patch with a test that fails before this fix (otherwise patch same as 
previous).

The test uses 4 collectors simultaneously, each with different missing values.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Richard Wilde
+1 For this


-Original Message-
From: Dan Swain [mailto:dan.sw...@gmail.com] 
Sent: 21 September 2011 13:22
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

I think I'd like to stick with 2 packages

Lucene.net Core
Lucene.net Contrib

Just because I think it's nice and simple. I would say that any contrib
parts that get really big or popular either to split out into their own
package or  maybe added to the core package?

I'm also in favour of a nightly package and experimental packages.

thanks
Dan Swain


On Wed, Sep 21, 2011 at 4:56 AM, Michael Herndon 
mhern...@wickedsoftware.net wrote:

 We're taking a quick poll over the next few days to see how people would
 like use Lucene.Net through Nuget on the developers mailing list**

 Currently version 2.9.2 is hosted on nuget.org, but that package was not
 create by the project maintainers, thus nuget is not currently set up in
 source.  Going forward, we would like to continue what someone else
started
 by creating nuget packages for Lucene.Net.

 Right now there are two packages: Lucene  Lucene.Contrib.  My question to
 the community is do you wish to finer grain packages, i.e. a package for
 each contrib project or continue to keep it simple.

 The granular approach will let you use only what you need. We can also
 create additional higher level packages which have dependencies on the
 other
 ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

 Or we can keep it simple and continue with only two packages.

 My concerns are that the granular approach might overwhelm people with
 choice. The simple choice might be considered bloat for importing and then
 installing assemblies that you might never use.


 Another topic to converse about is would you like to see an out-of-band
 project nuget feed for  nightly builds, branches with new or experimental
 features, or stable code snapshots for a projected release?


 ** when you post, please respond to lucene-net-...@lucene.apache.org.
  This
 was posted to both lists to make sure everyone subscribed to both lists
has
 a chance to voice their use cases or concerns.




[jira] [Created] (LUCENE-3444) Distinct field value count per group

2011-09-21 Thread Martijn van Groningen (JIRA)
Distinct field value count per group


 Key: LUCENE-3444
 URL: https://issues.apache.org/jira/browse/LUCENE-3444
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen


Support a second pass collector that counts unique field values of a field per 
group.
This is just one example of group statistics that one might want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3444) Distinct field value count per group

2011-09-21 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated LUCENE-3444:
--

Attachment: LUCENE-3444.patch

Attached initial version of a second pass collector that count the unique field 
values per group for a specific field.

 Distinct field value count per group
 

 Key: LUCENE-3444
 URL: https://issues.apache.org/jira/browse/LUCENE-3444
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
 Attachments: LUCENE-3444.patch


 Support a second pass collector that counts unique field values of a field 
 per group.
 This is just one example of group statistics that one might want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3444) Distinct field value count per group

2011-09-21 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109540#comment-13109540
 ] 

Martijn van Groningen edited comment on LUCENE-3444 at 9/21/11 2:45 PM:


Attached initial version of a second pass collector that counts the unique 
field values per group for a specific field.

  was (Author: martijn.v.groningen):
Attached initial version of a second pass collector that count the unique 
field values per group for a specific field.
  
 Distinct field value count per group
 

 Key: LUCENE-3444
 URL: https://issues.apache.org/jira/browse/LUCENE-3444
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
 Attachments: LUCENE-3444.patch


 Support a second pass collector that counts unique field values of a field 
 per group.
 This is just one example of group statistics that one might want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2780) Facet count problem : Multi-Select Faceting After grouping results

2011-09-21 Thread Ramzi Alqrainy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109545#comment-13109545
 ] 

Ramzi Alqrainy commented on SOLR-2780:
--

Hi Groningen,
I have used your patch and I made FunctionAllGroupHeadsCollector public and 
when I execute this command 
ant dist to build , the below errors are displayed

  [javac] 77 errors
  [javac] 100 warnings

Please advise 
Kindly note that I am using fedora 15 and solr 4.0 that released 13-09

 Facet count problem : Multi-Select Faceting After grouping results 
 ---

 Key: SOLR-2780
 URL: https://issues.apache.org/jira/browse/SOLR-2780
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3, 3.4, 4.0
Reporter: Ramzi Alqrainy
Priority: Critical
 Fix For: 3.5, 4.0

 Attachments: SOLR-2780.patch


 Dear All ,
 Kindly note that I am using Solr 4.0 and Kindly note that group.truncate=true 
 calculates facet counts that based on the most relevant document of each 
 group matching the query.
 But when I used Multi-Select Faceting [Tagging and excluding Filters] , the 
 solr can't calculate the facet after grouping the results and select multi 
 facet.
 http://127.0.0.1:8983/solr/select/?facet=truesort=score+desc,+rate+desc,total_of_reviews+descfacet.limit=-1bf=sum%28product%28atan%28total_of_reviews%29,50%29,product%28rate,10%29%29^4group.field=place_idfacet.field={!ex%3Dce}cat_enfacet.field={!ex%3Dce}cat_arfacet.field={!ex%3Dir}iregionfacet.field={!ex%3Dir}region_enfacet.field={!ex%3Dir}region_arfacet.field={!ex%3Drr}rratefacet.field=place_statusfacet.field=theme_enfacet.field=icityfacet.field={!ex%3Dce}icatfacet.field={!ex%3Dsce}isubcatfacet.field={!ex%3Dsce}subcat_enfacet.field={!ex%3Dsce}subcat_arqt=/spellfq=place_status:1fq=icity:1fq=cat_en:%28%22Restaurants%22%29group.format=simplegroup.ngroups=truefacet.mincount=1qf=title_ar^24+title_en^24+cat_ar^10+cat_en^10++review^20hl.fl=reviewjson.nl=mapwt=jsondefType=edismaxrows=10spellcheck.accuracy=0.6start=0q=smartgroup.truncate=truegroup=trueindent=on
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3390:
--

Attachment: LUCENE-3390-BitsInterface.patch

I added a further test in TestFieldCache to check the Bits returned.

I think that's ready to commit.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-3390:
-

Assignee: Uwe Schindler  (was: Doron Cohen)

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2215) paging collector

2011-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109553#comment-13109553
 ] 

Michael McCandless commented on LUCENE-2215:


For 3.x can we just add these methods to IndexSearcher (not 
Searcher/Searchable)?   This would require the app to use IndexSearcher if they 
are not already, which is great because that's what they'll need to do in 4.0 
anyway (since Searcher/Searchable are deprecated).

Or is there some other back compat issue?

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 LUCENE-2215.patch, LUCENE-2215.patch, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109556#comment-13109556
 ] 

Michael McCandless commented on LUCENE-3390:


bq. I think that's ready to commit.

+1, looks great!  Thanks Uwe.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3390.
---

   Resolution: Fixed
Fix Version/s: 3.5

Committed 3.x branch revision: 1173701

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.5, 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-21 Thread Granroth, Neal V.
No interest in Nuget whatsoever.

- Neal

-Original Message-
From: Michael Herndon [mailto:mhern...@wickedsoftware.net] 
Sent: Tuesday, September 20, 2011 10:57 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

We're taking a quick poll over the next few days to see how people would
like use Lucene.Net through Nuget on the developers mailing list**

Currently version 2.9.2 is hosted on nuget.org, but that package was not
create by the project maintainers, thus nuget is not currently set up in
source.  Going forward, we would like to continue what someone else started
by creating nuget packages for Lucene.Net.

Right now there are two packages: Lucene  Lucene.Contrib.  My question to
the community is do you wish to finer grain packages, i.e. a package for
each contrib project or continue to keep it simple.

The granular approach will let you use only what you need. We can also
create additional higher level packages which have dependencies on the other
ones.   Possibly a Lucene.Net-Essentials and Lucene.Net-Full.

Or we can keep it simple and continue with only two packages.

My concerns are that the granular approach might overwhelm people with
choice. The simple choice might be considered bloat for importing and then
installing assemblies that you might never use.


Another topic to converse about is would you like to see an out-of-band
project nuget feed for  nightly builds, branches with new or experimental
features, or stable code snapshots for a projected release?


** when you post, please respond to lucene-net-...@lucene.apache.org.  This
was posted to both lists to make sure everyone subscribed to both lists has
a chance to voice their use cases or concerns.


[jira] [Commented] (LUCENE-2215) paging collector

2011-09-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109571#comment-13109571
 ] 

Robert Muir commented on LUCENE-2215:
-

bq. Or is there some other back compat issue?

We add this param to a protected method signature, so it would affect 
subclasses of IndexSearcher.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 LUCENE-2215.patch, LUCENE-2215.patch, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2215) paging collector

2011-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109575#comment-13109575
 ] 

Michael McCandless commented on LUCENE-2215:


bq. We add this param to a protected method signature, so it would affect 
subclasses of IndexSearcher.

Ahh, right.  Well, I think we can make an exception here -- subclassing IS is 
very expert.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 LUCENE-2215.patch, LUCENE-2215.patch, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] 2.9.4

2011-09-21 Thread Michael Herndon
@Robert,

I believe the overwhelming consensus on the mailing list vote was to move to
.NET 4.0 and drop support for previous versions.

I'll take care of build scripts issue while they being refactored into
smaller chunks this week.

@Troy, Agreed.

On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote:

 On 20.09.2011 23:48, Prescott Nasser wrote:

 Hey all seems like we are set with 2.9.4? Feedback has been positive and
 its been quiet. Do we feel ready to vote for a new release?


 I don't know if the build infrastructure is part of the
 release. If yes, then there is an open issue:

 Contrib doesn't build right now because there
 are some assembly name mismatches between certain *.csproj
 files and  build/scripts/contrib.targets.

 The following patches should fix the issue:

 https://github.com/robert-j/**lucene.net/commit/**
 c5218bca56c19b3407648224781eec**7316994a39https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39

 https://github.com/robert-j/**lucene.net/commit/**
 50bad187655d59968d51d472b57c2a**40e201d663https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663


 Also, the fix for [LUCENENET-358] is basically making
 Lucene.Net.dll a .NET 4.0-only assembly:

 https://github.com/apache/**lucene.net/commit/**
 23ea6f52362fc7dbce48fd012cea12**9a7350c73chttps://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c

 Did we agree about abandoning .NET = 3.5?

 Robert




[jira] [Commented] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader

2011-09-21 Thread Mihai Caraman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109584#comment-13109584
 ] 

Mihai Caraman commented on LUCENE-3441:
---

Newb question: Shouldn't you also commit in the constructor, so you can create 
a reader right after? For exmaple, to work with the taxReader with refresh(), I 
have to initialize: taxWriter,commit,taxReader, else it throws no segment 
exception(which you'd expect to be there because of the taxWriter ctor, or is 
that just me:P ?).

 Add NRT support to LuceneTaxonomyReader
 ---

 Key: LUCENE-3441
 URL: https://issues.apache.org/jira/browse/LUCENE-3441
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Priority: Minor

 Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to 
 LuceneTaxonomyWriter, you cannot have the reader updated, like 
 IndexReader/Writer. In order to do that we need to do the following:
 # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with 
 LuceneTaxonomyWriter.
 # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
 # Change LTR.refresh() to return an LTR, rather than void. This is actually 
 not strictly related to that issue, but since we'll need to modify refresh() 
 impl, I think it'll be good to change its API as well. Since all of facet API 
 is @lucene.experimental, no backwards issues here (and the sooner we do it, 
 the better).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader

2011-09-21 Thread Mihai Caraman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109584#comment-13109584
 ] 

Mihai Caraman edited comment on LUCENE-3441 at 9/21/11 3:45 PM:


Newb question: Shouldn't you also commit in the constructor, so you can create 
a reader right after? For example, to work with the taxReader with refresh(), I 
have to initialize: 
w= LuceneTaxonomyWriter(x),
w.commit(),
new LuceneTaxonomyReader(x), 
else it throws no segment exception(segments which you'd expect to be there 
because of the taxWriter ctor, or is that just me:P ?).

  was (Author: mihai caraman):
Newb question: Shouldn't you also commit in the constructor, so you can 
create a reader right after? For exmaple, to work with the taxReader with 
refresh(), I have to initialize: taxWriter,commit,taxReader, else it throws no 
segment exception(which you'd expect to be there because of the taxWriter ctor, 
or is that just me:P ?).
  
 Add NRT support to LuceneTaxonomyReader
 ---

 Key: LUCENE-3441
 URL: https://issues.apache.org/jira/browse/LUCENE-3441
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Priority: Minor

 Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to 
 LuceneTaxonomyWriter, you cannot have the reader updated, like 
 IndexReader/Writer. In order to do that we need to do the following:
 # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with 
 LuceneTaxonomyWriter.
 # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
 # Change LTR.refresh() to return an LTR, rather than void. This is actually 
 not strictly related to that issue, but since we'll need to modify refresh() 
 impl, I think it'll be good to change its API as well. Since all of facet API 
 is @lucene.experimental, no backwards issues here (and the sooner we do it, 
 the better).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader

2011-09-21 Thread Mihai Caraman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109584#comment-13109584
 ] 

Mihai Caraman edited comment on LUCENE-3441 at 9/21/11 3:46 PM:


Newb question: Shouldn't you also commit in the constructor, so you can create 
a reader right after? For example, to later work with the taxReader through 
refresh(), when i start clean, I have to initialize: 
w= LuceneTaxonomyWriter(...),
w.commit(),
new LuceneTaxonomyReader(...), 
else it throws no segment exception(segments which you'd expect to be there 
because of the taxWriter ctor, or is that just me:P ?).

  was (Author: mihai caraman):
Newb question: Shouldn't you also commit in the constructor, so you can 
create a reader right after? For example, to work with the taxReader with 
refresh(), I have to initialize: 
w= LuceneTaxonomyWriter(...),
w.commit(),
new LuceneTaxonomyReader(...), 
else it throws no segment exception(segments which you'd expect to be there 
because of the taxWriter ctor, or is that just me:P ?).
  
 Add NRT support to LuceneTaxonomyReader
 ---

 Key: LUCENE-3441
 URL: https://issues.apache.org/jira/browse/LUCENE-3441
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Priority: Minor

 Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to 
 LuceneTaxonomyWriter, you cannot have the reader updated, like 
 IndexReader/Writer. In order to do that we need to do the following:
 # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with 
 LuceneTaxonomyWriter.
 # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
 # Change LTR.refresh() to return an LTR, rather than void. This is actually 
 not strictly related to that issue, but since we'll need to modify refresh() 
 impl, I think it'll be good to change its API as well. Since all of facet API 
 is @lucene.experimental, no backwards issues here (and the sooner we do it, 
 the better).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader

2011-09-21 Thread Mihai Caraman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109584#comment-13109584
 ] 

Mihai Caraman edited comment on LUCENE-3441 at 9/21/11 3:45 PM:


Newb question: Shouldn't you also commit in the constructor, so you can create 
a reader right after? For example, to work with the taxReader with refresh(), I 
have to initialize: 
w= LuceneTaxonomyWriter(...),
w.commit(),
new LuceneTaxonomyReader(...), 
else it throws no segment exception(segments which you'd expect to be there 
because of the taxWriter ctor, or is that just me:P ?).

  was (Author: mihai caraman):
Newb question: Shouldn't you also commit in the constructor, so you can 
create a reader right after? For example, to work with the taxReader with 
refresh(), I have to initialize: 
w= LuceneTaxonomyWriter(x),
w.commit(),
new LuceneTaxonomyReader(x), 
else it throws no segment exception(segments which you'd expect to be there 
because of the taxWriter ctor, or is that just me:P ?).
  
 Add NRT support to LuceneTaxonomyReader
 ---

 Key: LUCENE-3441
 URL: https://issues.apache.org/jira/browse/LUCENE-3441
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Priority: Minor

 Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to 
 LuceneTaxonomyWriter, you cannot have the reader updated, like 
 IndexReader/Writer. In order to do that we need to do the following:
 # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with 
 LuceneTaxonomyWriter.
 # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
 # Change LTR.refresh() to return an LTR, rather than void. This is actually 
 not strictly related to that issue, but since we'll need to modify refresh() 
 impl, I think it'll be good to change its API as well. Since all of facet API 
 is @lucene.experimental, no backwards issues here (and the sooner we do it, 
 the better).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader

2011-09-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109599#comment-13109599
 ] 

Jason Rutherglen commented on LUCENE-3441:
--

It would be great if the cost of (re)opening a new LTR is.  Also an explanation 
of what it's doing underneath.

 Add NRT support to LuceneTaxonomyReader
 ---

 Key: LUCENE-3441
 URL: https://issues.apache.org/jira/browse/LUCENE-3441
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Priority: Minor

 Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to 
 LuceneTaxonomyWriter, you cannot have the reader updated, like 
 IndexReader/Writer. In order to do that we need to do the following:
 # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with 
 LuceneTaxonomyWriter.
 # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
 # Change LTR.refresh() to return an LTR, rather than void. This is actually 
 not strictly related to that issue, but since we'll need to modify refresh() 
 impl, I think it'll be good to change its API as well. Since all of facet API 
 is @lucene.experimental, no backwards issues here (and the sooner we do it, 
 the better).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2739) TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some systems

2011-09-21 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109602#comment-13109602
 ] 

Shawn Heisey commented on SOLR-2739:


If I do have the right idea, then the rest of this paragraph applies, otherwise 
not:  I have to wonder why the current test is passing for everyone but me.  It 
seems as though it should be failing for everyone.

I added a couple more lines, so now it tries a delta import, checks for 
numFound=0, then runs a full import and checks for numFound=1.  Contrary to 
what I expected, the second part failed.

 TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some 
 systems
 ---

 Key: SOLR-2739
 URL: https://issues.apache.org/jira/browse/SOLR-2739
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.3
Reporter: Shawn Heisey
Assignee: Hoss Man
 Fix For: 3.5, 4.0


 Shawn Heisey noted on the mailing list that he was getting consistent 
 failures from TestSqlEntityProcessorDelta.testNonWritablePersistFile on his 
 machine.
 I can't reproduce his exact failures, but the test is hinky enough that i 
 want to try and clean it up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-3390:
---


When discussing about the forward port with Mike McCandless on IRC, we thought 
the double inversion is useless (it was in Doron's patch, because he wanted to 
use DocIdSetIterator effectively).

We changed the name to FieldCache.getDocsWithField().

Patch is easy.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4, 3.5

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3390:
--

Attachment: LUCENE-3390-inverted.patch

Patch with the BitSet inverted. We break backwards compatibility so this is not 
an issue at all.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4, 3.5

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3390:
--

Attachment: LUCENE-3390-inverted.patch

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4, 3.5

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3390:
--

Attachment: (was: LUCENE-3390-inverted.patch)

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4, 3.5

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109611#comment-13109611
 ] 

Michael McCandless commented on LUCENE-3390:


Looks great!

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4, 3.5

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3439) add checks/asserts if you search across a closed reader

2011-09-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3439.


   Resolution: Fixed
Fix Version/s: 4.0
   3.5

 add checks/asserts if you search across a closed reader
 ---

 Key: LUCENE-3439
 URL: https://issues.apache.org/jira/browse/LUCENE-3439
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3439.patch, LUCENE-3439_test.patch


 if you try to search across a closed reader (and/or searcher too),
 there are no checks, not even assertions statements.
 this results in crazy scary stacktraces deep inside places like FSTs/various 
 term dictionary implementations etc.
 In some situations, depending on codec, you wont even get an error (i'm sure 
 its fun when you try to retrieve the stored fields!)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3443) Port 3.x FieldCache.getDocsWithField() to trunk

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3443:
--

Description: 
[Spinoff from LUCENE-3390]

I think the approach in 3.x for handling un-valued docs, and making it
possible to specify how such docs are sorted, is better than the
solution we have in trunk.

I like that FC has a dedicated method to get the Bits for docs with field
-- easy for apps to directly use.  And I like that the
bits have their own entry in the FC.

One downside is that it's 2 passes to get values and valid bits, but
I think we can fix this by passing optional bool to FC.getXXX methods
indicating you want the bits, and the populate the FC entry for the
missing bits as well.  (We can do that for 3.x and trunk). Then it's
single pass.


  was:

[Spinoff from LUCENE-3390]

I think the approach in 3.x for handling un-valued docs, and making it
possible to specify how such docs are sorted, is better than the
solution we have in trunk.

I like that FC has a dedicated method to get the Bits for un-valued
docs -- easy for apps to directly use.  And I like that the un-valued
bits have their own entry in the FC.

One downside is that it's 2 passes to get values and missing bits, but
I think we can fix this by passing optional bool to FC.getXXX methods
indicating you want the bits, and the populate the FC entry for the
missing bits as well.  (We can do that for 3.x and trunk). Then it's
single pass.


Summary: Port 3.x FieldCache.getDocsWithField() to trunk  (was: Port 
3.x getUnvaluedDocs to trunk)

 Port 3.x FieldCache.getDocsWithField() to trunk
 ---

 Key: LUCENE-3443
 URL: https://issues.apache.org/jira/browse/LUCENE-3443
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Michael McCandless
 Fix For: 3.5, 4.0


 [Spinoff from LUCENE-3390]
 I think the approach in 3.x for handling un-valued docs, and making it
 possible to specify how such docs are sorted, is better than the
 solution we have in trunk.
 I like that FC has a dedicated method to get the Bits for docs with field
 -- easy for apps to directly use.  And I like that the
 bits have their own entry in the FC.
 One downside is that it's 2 passes to get values and valid bits, but
 I think we can fix this by passing optional bool to FC.getXXX methods
 indicating you want the bits, and the populate the FC entry for the
 missing bits as well.  (We can do that for 3.x and trunk). Then it's
 single pass.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3390.
---

Resolution: Fixed

Committed 3.x branch revision: 1173745

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Uwe Schindler
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.5, 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-21 Thread Aaron McCurry (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109630#comment-13109630
 ] 

Aaron McCurry commented on LUCENE-2205:
---

I would agree on the heap size, I'm will do more analysis on that tonight.  As 
far the speed, it took a bit of time to get the performance basically the same. 
 I had to change a few methods inside TermInfosReader to reuse resources.

The random access test sampled 100,000 terms from the index and stored it in a 
file.  Then at when I run the test it pulls all of the terms into memory and 
random selects terms to use in TermQueries.  Then the test times the search in 
nanotime and averages it.  I will attach my test programs tonight if you want.  
While running a MMAPDirectory on a small ~1,000,000 documents the performance 
is basically the same between the patch and no patch, if there is a difference 
the current implementation (no patch) is slightly faster, as you would think.

 Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
 the index pointer long[] and create a more memory efficient data structure.
 ---

 Key: LUCENE-2205
 URL: https://issues.apache.org/jira/browse/LUCENE-2205
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
 Environment: Java5
Reporter: Aaron McCurry
Assignee: Michael McCandless
 Fix For: 3.5

 Attachments: RandomAccessTest.java, TermInfosReader.java, 
 TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
 TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, 
 patch-final.txt, rawoutput.txt


 Basically packing those three arrays into a byte array with an int array as 
 an index offset.  
 The performance benefits are stagering on my test index (of size 6.2 GB, with 
 ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
 terminfos into memory were reduced to 17% of there original size.  From 291.5 
 MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
 time of the segments are ~40% faster as well, and full GC's on my JVM were 
 made 7 times faster.
 I have already performed the work and am offering this code as a patch.  
 Currently all test in the trunk pass with this new code enabled.  I did write 
 a system property switch to allow for the original implementation to be used 
 as well.
 -Dorg.apache.lucene.index.TermInfosReader=default or small
 I have also written a blog about this patch here is the link.
 http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-21 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2754:
--

Attachment: SOLR-2754.patch

i added tests for the new factories: i think its ready to commit.

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch, SOLR-2754.patch, SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Lucene.Net] 2.9.4

2011-09-21 Thread Prescott Nasser
I thought this was after 2.9.4

Sent from my Windows Phone

-Original Message-
From: Michael Herndon
Sent: Wednesday, September 21, 2011 8:30 AM
To: lucene-net-...@lucene.apache.org
Cc: lucene-net-...@incubator.apache.org
Subject: Re: [Lucene.Net] 2.9.4

@Robert,

I believe the overwhelming consensus on the mailing list vote was to move to
.NET 4.0 and drop support for previous versions.

I'll take care of build scripts issue while they being refactored into
smaller chunks this week.

@Troy, Agreed.

On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote:

 On 20.09.2011 23:48, Prescott Nasser wrote:

 Hey all seems like we are set with 2.9.4? Feedback has been positive and
 its been quiet. Do we feel ready to vote for a new release?


 I don't know if the build infrastructure is part of the
 release. If yes, then there is an open issue:

 Contrib doesn't build right now because there
 are some assembly name mismatches between certain *.csproj
 files and  build/scripts/contrib.targets.

 The following patches should fix the issue:

 https://github.com/robert-j/**lucene.net/commit/**
 c5218bca56c19b3407648224781eec**7316994a39https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39

 https://github.com/robert-j/**lucene.net/commit/**
 50bad187655d59968d51d472b57c2a**40e201d663https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663


 Also, the fix for [LUCENENET-358] is basically making
 Lucene.Net.dll a .NET 4.0-only assembly:

 https://github.com/apache/**lucene.net/commit/**
 23ea6f52362fc7dbce48fd012cea12**9a7350c73chttps://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c

 Did we agree about abandoning .NET = 3.5?

 Robert




Re: [Lucene.Net] 2.9.4

2011-09-21 Thread Michael Herndon
if thats the case, then well need conditional statements for including
ThreadLocalT

On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser geobmx...@hotmail.comwrote:

 I thought this was after 2.9.4

 Sent from my Windows Phone

 -Original Message-
 From: Michael Herndon
 Sent: Wednesday, September 21, 2011 8:30 AM
 To: lucene-net-...@lucene.apache.org
 Cc: lucene-net-...@incubator.apache.org
 Subject: Re: [Lucene.Net] 2.9.4

 @Robert,

 I believe the overwhelming consensus on the mailing list vote was to move
 to
 .NET 4.0 and drop support for previous versions.

 I'll take care of build scripts issue while they being refactored into
 smaller chunks this week.

 @Troy, Agreed.

 On Wed, Sep 21, 2011 at 8:08 AM, Robert Jordan robe...@gmx.net wrote:

  On 20.09.2011 23:48, Prescott Nasser wrote:
 
  Hey all seems like we are set with 2.9.4? Feedback has been positive and
  its been quiet. Do we feel ready to vote for a new release?
 
 
  I don't know if the build infrastructure is part of the
  release. If yes, then there is an open issue:
 
  Contrib doesn't build right now because there
  are some assembly name mismatches between certain *.csproj
  files and  build/scripts/contrib.targets.
 
  The following patches should fix the issue:
 
  https://github.com/robert-j/**lucene.net/commit/**
  c5218bca56c19b3407648224781eec**7316994a39
 https://github.com/robert-j/lucene.net/commit/c5218bca56c19b3407648224781eec7316994a39
 
 
  https://github.com/robert-j/**lucene.net/commit/**
  50bad187655d59968d51d472b57c2a**40e201d663
 https://github.com/robert-j/lucene.net/commit/50bad187655d59968d51d472b57c2a40e201d663
 
 
 
  Also, the fix for [LUCENENET-358] is basically making
  Lucene.Net.dll a .NET 4.0-only assembly:
 
  https://github.com/apache/**lucene.net/commit/**
  23ea6f52362fc7dbce48fd012cea12**9a7350c73c
 https://github.com/apache/lucene.net/commit/23ea6f52362fc7dbce48fd012cea129a7350c73c
 
 
  Did we agree about abandoning .NET = 3.5?
 
  Robert
 
 



[jira] [Created] (LUCENE-3445) Add SearcherManager, to manage IndexSearcher usage across threads and reopens

2011-09-21 Thread Michael McCandless (JIRA)
Add SearcherManager, to manage IndexSearcher usage across threads and reopens
-

 Key: LUCENE-3445
 URL: https://issues.apache.org/jira/browse/LUCENE-3445
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.5, 4.0


This is a simple helper class I wrote for Lucene in Action 2nd ed.
I'd like to commit under Lucene (contrib/misc).

It simplifies using  reopening an IndexSearcher across multiple
threads, by using IndexReader's ref counts to know when it's safe
to close the reader.

In the process I also factored out a test base class for tests that
want to make lots of simultaneous indexing and searching threads, and
fixed TestNRTThreads (core), TestNRTManager (contrib/misc) and the new
TestSearcherManager (contrib/misc) to use this base class.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3445) Add SearcherManager, to manage IndexSearcher usage across threads and reopens

2011-09-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3445:
---

Attachment: LUCENE-3445.patch

 Add SearcherManager, to manage IndexSearcher usage across threads and reopens
 -

 Key: LUCENE-3445
 URL: https://issues.apache.org/jira/browse/LUCENE-3445
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3445.patch


 This is a simple helper class I wrote for Lucene in Action 2nd ed.
 I'd like to commit under Lucene (contrib/misc).
 It simplifies using  reopening an IndexSearcher across multiple
 threads, by using IndexReader's ref counts to know when it's safe
 to close the reader.
 In the process I also factored out a test base class for tests that
 want to make lots of simultaneous indexing and searching threads, and
 fixed TestNRTThreads (core), TestNRTManager (contrib/misc) and the new
 TestSearcherManager (contrib/misc) to use this base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments

2011-09-21 Thread S.L. (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

S.L. updated LUCENE-3440:
-

Attachment: (was: LUCENE-3440.patch)

 FastVectorHighlighter: IDF-weighted terms for ordered fragments 
 

 Key: LUCENE-3440
 URL: https://issues.apache.org/jira/browse/LUCENE-3440
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.5
Reporter: S.L.
Priority: Minor
  Labels: patch
 Fix For: 3.5

 Attachments: LUCENE-3440-1.patch


 The FastVectorHighlighter uses for every term found in a fragment an equal 
 weight, which causes a higher ranking for fragments with a high number of 
 words or, in the worst case, a high number of very common words than 
 fragments that contains *all* of the terms used in the original query. 
 This patch provides ordered fragments with IDF-weighted terms: 
 total weight = total weight + IDF for unique term per fragment * boost of 
 query; 
 The ranking-formular should be the same, or at least similar, to that one 
 used in org.apache.lucene.search.highlight.QueryTermScorer.
 The patch is simple, but it works for us. 
 Some ideas:
 - A better approach would be moving the whole fragments-scoring into a 
 separate class.
 - Switch scoring via parameter 
 - Exact phrases should be given a even better score, regardless if a 
 phrase-query was executed or not
 - edismax/dismax-parameters pf, ps and pf^boost should be observed and 
 corresponding fragments should be ranked higher 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments

2011-09-21 Thread S.L. (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

S.L. updated LUCENE-3440:
-

Description: 
The FastVectorHighlighter uses for every term found in a fragment an equal 
weight, which causes a higher ranking for fragments with a high number of words 
or, in the worst case, a high number of very common words than fragments that 
contains *all* of the terms used in the original query. 

This patch provides ordered fragments with IDF-weighted terms: 

total weight = total weight + IDF for unique term per fragment * boost of 
query; 

The ranking-formula should be the same, or at least similar, to that one used 
in org.apache.lucene.search.highlight.QueryTermScorer.

The patch is simple, but it works for us. 

Some ideas:
- A better approach would be moving the whole fragments-scoring into a separate 
class.
- Switch scoring via parameter 
- Exact phrases should be given a even better score, regardless if a 
phrase-query was executed or not
- edismax/dismax-parameters pf, ps and pf^boost should be observed and 
corresponding fragments should be ranked higher 







  was:
The FastVectorHighlighter uses for every term found in a fragment an equal 
weight, which causes a higher ranking for fragments with a high number of words 
or, in the worst case, a high number of very common words than fragments that 
contains *all* of the terms used in the original query. 

This patch provides ordered fragments with IDF-weighted terms: 

total weight = total weight + IDF for unique term per fragment * boost of 
query; 

The ranking-formular should be the same, or at least similar, to that one used 
in org.apache.lucene.search.highlight.QueryTermScorer.

The patch is simple, but it works for us. 

Some ideas:
- A better approach would be moving the whole fragments-scoring into a separate 
class.
- Switch scoring via parameter 
- Exact phrases should be given a even better score, regardless if a 
phrase-query was executed or not
- edismax/dismax-parameters pf, ps and pf^boost should be observed and 
corresponding fragments should be ranked higher 








 FastVectorHighlighter: IDF-weighted terms for ordered fragments 
 

 Key: LUCENE-3440
 URL: https://issues.apache.org/jira/browse/LUCENE-3440
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.5
Reporter: S.L.
Priority: Minor
  Labels: patch
 Fix For: 3.5

 Attachments: LUCENE-3440-1.patch


 The FastVectorHighlighter uses for every term found in a fragment an equal 
 weight, which causes a higher ranking for fragments with a high number of 
 words or, in the worst case, a high number of very common words than 
 fragments that contains *all* of the terms used in the original query. 
 This patch provides ordered fragments with IDF-weighted terms: 
 total weight = total weight + IDF for unique term per fragment * boost of 
 query; 
 The ranking-formula should be the same, or at least similar, to that one used 
 in org.apache.lucene.search.highlight.QueryTermScorer.
 The patch is simple, but it works for us. 
 Some ideas:
 - A better approach would be moving the whole fragments-scoring into a 
 separate class.
 - Switch scoring via parameter 
 - Exact phrases should be given a even better score, regardless if a 
 phrase-query was executed or not
 - edismax/dismax-parameters pf, ps and pf^boost should be observed and 
 corresponding fragments should be ranked higher 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3441) Add NRT support to LuceneTaxonomyReader

2011-09-21 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109735#comment-13109735
 ] 

Shai Erera commented on LUCENE-3441:


bq. Shouldn't you also commit in the constructor

LuceneTaxonomyWriter behaves just like IndexWriter. Today (I think since 3.1), 
opening an IndexWriter is just another transaction that you should commit if 
you want IndexReaders to see it. So if you try:
{code}
IndexWriter w = new IndexWriter(emptyDir, new IWC());
IndexReader r = IndexReader.open(emptyDir);
{code}
you'll get an exception as well.

If you want that to work, you must insert a commit() call after line #1, and 
LTW follows this logic.

bq. Also an explanation of what it's doing underneath

Refreshing LTR means reopening its internal IndexReader instance. If it has 
changed, then LTR updates its parents array with the newly added categories. 
Usually, assuming the taxonomy does not grow a lot (i.e., usually after some 
point your taxonomy is relatively fixed, and new categories are not added often 
-- much like an index lexicon), this additional update of the parents array is 
quick.

 Add NRT support to LuceneTaxonomyReader
 ---

 Key: LUCENE-3441
 URL: https://issues.apache.org/jira/browse/LUCENE-3441
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
Priority: Minor

 Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to 
 LuceneTaxonomyWriter, you cannot have the reader updated, like 
 IndexReader/Writer. In order to do that we need to do the following:
 # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with 
 LuceneTaxonomyWriter.
 # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
 # Change LTR.refresh() to return an LTR, rather than void. This is actually 
 not strictly related to that issue, but since we'll need to modify refresh() 
 impl, I think it'll be good to change its API as well. Since all of facet API 
 is @lucene.experimental, no backwards issues here (and the sooner we do it, 
 the better).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-21 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2754.
---

Resolution: Fixed

Thanks David!

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch, SOLR-2754.patch, SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3445) Add SearcherManager, to manage IndexSearcher usage across threads and reopens

2011-09-21 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109768#comment-13109768
 ] 

Shai Erera commented on LUCENE-3445:


This is great Mike !

I reviewed SearcherManager and have a comment about the TODO on whether or not 
to call warm in the ctor. If an extending class relies on some internal members 
to be initialized before warm() can safely be called, then this will lead to 
exceptions. I think that warm() should not be called in the ctor, or at least 
add a ctor which accepts a boolean doWarm, while the other ctors call it with 
'true'.

Calling warm() in the ctor is useful if one wants to warm the IndexSearcher 
instance before SearcherManager is ready for use. So perhaps an additional ctor 
with the boolean gives the most flexibility.

Also, I remember there was a ctor which took IndexWriter, to allow for an 
NRT-SearcherManager. What happened to it? :)

 Add SearcherManager, to manage IndexSearcher usage across threads and reopens
 -

 Key: LUCENE-3445
 URL: https://issues.apache.org/jira/browse/LUCENE-3445
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3445.patch


 This is a simple helper class I wrote for Lucene in Action 2nd ed.
 I'd like to commit under Lucene (contrib/misc).
 It simplifies using  reopening an IndexSearcher across multiple
 threads, by using IndexReader's ref counts to know when it's safe
 to close the reader.
 In the process I also factored out a test base class for tests that
 want to make lots of simultaneous indexing and searching threads, and
 fixed TestNRTThreads (core), TestNRTManager (contrib/misc) and the new
 TestSearcherManager (contrib/misc) to use this base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-21 Thread Aaron McCurry (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109780#comment-13109780
 ] 

Aaron McCurry commented on LUCENE-2205:
---

I found a major bug in my test.  I was using keyword analyzer instead of 
whitespace or standard, thus it was turning everyone of my sentences that 
contained 100 randomly generated words into 1 huge token.  This helps to 
explain why the heap space results are not that stellar, because the fewer 
terms there are (as well as the larger they are), the less the patch helps 
reduce space.  I'm retesting now.

 Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
 the index pointer long[] and create a more memory efficient data structure.
 ---

 Key: LUCENE-2205
 URL: https://issues.apache.org/jira/browse/LUCENE-2205
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
 Environment: Java5
Reporter: Aaron McCurry
Assignee: Michael McCandless
 Fix For: 3.5

 Attachments: RandomAccessTest.java, TermInfosReader.java, 
 TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
 TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, 
 patch-final.txt, rawoutput.txt


 Basically packing those three arrays into a byte array with an int array as 
 an index offset.  
 The performance benefits are stagering on my test index (of size 6.2 GB, with 
 ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
 terminfos into memory were reduced to 17% of there original size.  From 291.5 
 MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
 time of the segments are ~40% faster as well, and full GC's on my JVM were 
 made 7 times faster.
 I have already performed the work and am offering this code as a patch.  
 Currently all test in the trunk pass with this new code enabled.  I did write 
 a system property switch to allow for the original implementation to be used 
 as well.
 -Dorg.apache.lucene.index.TermInfosReader=default or small
 I have also written a blog about this patch here is the link.
 http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109787#comment-13109787
 ] 

Michael McCandless commented on LUCENE-2205:



Patch looks great Aaron!  Very much simplified... some comments:

  * Instead of separate build method, could we have
TermInfosReaderIndex's ctor take all the args?  Then we can make
its private fields final?

  * I think the index and indexLength can be final, in
TermInfosReader?

  * Can you put the GrowableByteArrayDataOutput as a separate source
file in oal.store?  Seems useful!

  * Hmm should indexToTermsArray be a long[]...?  I wonder how large
your index would have to be to overflow 2.1GB of the byte[]
format...

  * We could further reduce the RAM usage by using packed ints
(oal.util.packed) for the indexToTerms array; this way each
indexed term would only use as many bits are actually required to
address the byte[] (and, this would solve the int[]/long[] problem
since packed ints are logically a long[]).

  * I think we should just always trim?  (Ie we don't need the
{{private boolean trim}})

  * Could you add comment Just for testing to
TermInfosReaderIndex.getTerm?

  * For the compareTo methods, can you add to the jdocs that this
compares term to index term, ie it returns negative N when term
is less than index term?

  * Hmm... I wonder if memory fragmentation will cause problems for
the allocating/growing the single byte[].  Also, a single byte[]
can only address 2.1B bytes (the same overflow problem as
above).  Maybe we should port back PagedBytes (from trunk
oal.util) and use that instead?  If we did that, then we could
create a simple DataInput impl that reads from that.

  * Could you please remove the @author tags?  Thanks. It's Apache's
policy (or at least discouraged) to not commit author tags...


 Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
 the index pointer long[] and create a more memory efficient data structure.
 ---

 Key: LUCENE-2205
 URL: https://issues.apache.org/jira/browse/LUCENE-2205
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
 Environment: Java5
Reporter: Aaron McCurry
Assignee: Michael McCandless
 Fix For: 3.5

 Attachments: RandomAccessTest.java, TermInfosReader.java, 
 TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
 TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, 
 patch-final.txt, rawoutput.txt


 Basically packing those three arrays into a byte array with an int array as 
 an index offset.  
 The performance benefits are stagering on my test index (of size 6.2 GB, with 
 ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
 terminfos into memory were reduced to 17% of there original size.  From 291.5 
 MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
 time of the segments are ~40% faster as well, and full GC's on my JVM were 
 made 7 times faster.
 I have already performed the work and am offering this code as a patch.  
 Currently all test in the trunk pass with this new code enabled.  I did write 
 a system property switch to allow for the original implementation to be used 
 as well.
 -Dorg.apache.lucene.index.TermInfosReader=default or small
 I have also written a blog about this patch here is the link.
 http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-21 Thread Aaron McCurry (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109780#comment-13109780
 ] 

Aaron McCurry edited comment on LUCENE-2205 at 9/21/11 7:09 PM:


I found a major bug in my test.  I was using keyword analyzer instead of 
whitespace or standard, thus it was turning everyone of my sentences that 
contained 50 randomly generated words into 1 huge token.  This helps to explain 
why the heap space results are not that stellar, because the fewer terms there 
are (as well as the larger they are), the less the patch helps reduce space.  
I'm retesting now.

  was (Author: amccurry):
I found a major bug in my test.  I was using keyword analyzer instead of 
whitespace or standard, thus it was turning everyone of my sentences that 
contained 100 randomly generated words into 1 huge token.  This helps to 
explain why the heap space results are not that stellar, because the fewer 
terms there are (as well as the larger they are), the less the patch helps 
reduce space.  I'm retesting now.
  
 Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
 the index pointer long[] and create a more memory efficient data structure.
 ---

 Key: LUCENE-2205
 URL: https://issues.apache.org/jira/browse/LUCENE-2205
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
 Environment: Java5
Reporter: Aaron McCurry
Assignee: Michael McCandless
 Fix For: 3.5

 Attachments: RandomAccessTest.java, TermInfosReader.java, 
 TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
 TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, 
 patch-final.txt, rawoutput.txt


 Basically packing those three arrays into a byte array with an int array as 
 an index offset.  
 The performance benefits are stagering on my test index (of size 6.2 GB, with 
 ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
 terminfos into memory were reduced to 17% of there original size.  From 291.5 
 MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
 time of the segments are ~40% faster as well, and full GC's on my JVM were 
 made 7 times faster.
 I have already performed the work and am offering this code as a patch.  
 Currently all test in the trunk pass with this new code enabled.  I did write 
 a system property switch to allow for the original implementation to be used 
 as well.
 -Dorg.apache.lucene.index.TermInfosReader=default or small
 I have also written a blog about this patch here is the link.
 http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109788#comment-13109788
 ] 

Michael McCandless commented on LUCENE-2205:


bq. I was using keyword analyzer instead of whitespace or standard

Aha! Good catch :)

I'm also building up a 2B terms index (using Test2BTerms), and then I'll 
compare patch/3.x on that index.

 Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
 the index pointer long[] and create a more memory efficient data structure.
 ---

 Key: LUCENE-2205
 URL: https://issues.apache.org/jira/browse/LUCENE-2205
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
 Environment: Java5
Reporter: Aaron McCurry
Assignee: Michael McCandless
 Fix For: 3.5

 Attachments: RandomAccessTest.java, TermInfosReader.java, 
 TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
 TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, 
 patch-final.txt, rawoutput.txt


 Basically packing those three arrays into a byte array with an int array as 
 an index offset.  
 The performance benefits are stagering on my test index (of size 6.2 GB, with 
 ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
 terminfos into memory were reduced to 17% of there original size.  From 291.5 
 MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
 time of the segments are ~40% faster as well, and full GC's on my JVM were 
 made 7 times faster.
 I have already performed the work and am offering this code as a patch.  
 Currently all test in the trunk pass with this new code enabled.  I did write 
 a system property switch to allow for the original implementation to be used 
 as well.
 -Dorg.apache.lucene.index.TermInfosReader=default or small
 I have also written a blog about this patch here is the link.
 http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2780) Facet count problem : Multi-Select Faceting After grouping results

2011-09-21 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109826#comment-13109826
 ] 

Martijn van Groningen commented on SOLR-2780:
-

Hi Ramzi, So you have 77 errors :) Can you send me what errors you have? 

BTW if you just want to use the patch you can just apply it and build Solr (ant 
clean dist). The patch should work when using group.field parameter.

 Facet count problem : Multi-Select Faceting After grouping results 
 ---

 Key: SOLR-2780
 URL: https://issues.apache.org/jira/browse/SOLR-2780
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3, 3.4, 4.0
Reporter: Ramzi Alqrainy
Priority: Critical
 Fix For: 3.5, 4.0

 Attachments: SOLR-2780.patch


 Dear All ,
 Kindly note that I am using Solr 4.0 and Kindly note that group.truncate=true 
 calculates facet counts that based on the most relevant document of each 
 group matching the query.
 But when I used Multi-Select Faceting [Tagging and excluding Filters] , the 
 solr can't calculate the facet after grouping the results and select multi 
 facet.
 http://127.0.0.1:8983/solr/select/?facet=truesort=score+desc,+rate+desc,total_of_reviews+descfacet.limit=-1bf=sum%28product%28atan%28total_of_reviews%29,50%29,product%28rate,10%29%29^4group.field=place_idfacet.field={!ex%3Dce}cat_enfacet.field={!ex%3Dce}cat_arfacet.field={!ex%3Dir}iregionfacet.field={!ex%3Dir}region_enfacet.field={!ex%3Dir}region_arfacet.field={!ex%3Drr}rratefacet.field=place_statusfacet.field=theme_enfacet.field=icityfacet.field={!ex%3Dce}icatfacet.field={!ex%3Dsce}isubcatfacet.field={!ex%3Dsce}subcat_enfacet.field={!ex%3Dsce}subcat_arqt=/spellfq=place_status:1fq=icity:1fq=cat_en:%28%22Restaurants%22%29group.format=simplegroup.ngroups=truefacet.mincount=1qf=title_ar^24+title_en^24+cat_ar^10+cat_en^10++review^20hl.fl=reviewjson.nl=mapwt=jsondefType=edismaxrows=10spellcheck.accuracy=0.6start=0q=smartgroup.truncate=truegroup=trueindent=on
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 483 - Failure

2011-09-21 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/483/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:729)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:89)
at 
org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:174)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:757)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:701)




Build Log (for compile errors):
[...truncated 11020 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >