RE: RE: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Lombard, Scott
Marco,

My feeling would be to create strong automated conversion tools to allow java 
Lucene to be ported in to .NET in as few steps and as possible.  The .net style 
goal is a noble one, but will require a significant more commitment to the 
project in the future.  As each new version of java Lucene will have to be 
integrated by hand into the .net version.

As the conversion tools get more advanced and robust .net style code may be 
implemented as part of the automated conversion process.


Scott



-Original Message-
From: Marco Dissel [mailto:marco.dis...@gmail.com]
Sent: Thursday, December 30, 2010 1:16 PM
To: lucene-net-u...@lucene.apache.org
Cc: lucene-net-dev@lucene.apache.org
Subject: Re: RE: Vote thread started on gene...@lucene.apache.org

What will be the goal of new committors? Convert the source into .net style
code? If yes, we should try to stop will all the spin-offs and concentrate
all the development in one project.
Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com
het volgende:
 Grant,

 Thanks for your time explaining all the details. I will be willing work on
a proposal to put Lucene.Net back in to incubation. I will need other people
to step up and be committers as well. Heath has volunteered and as Grant has
stated 4 committers are needed to for incubation. Who else is willing to be
a committer?

 Grant I will definitely be taking you up on your offer to help on bring
Lucene.Net into incubation.

 Scott


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Thursday, December 30, 2010 12:32 PM
 To: lucene-net-u...@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org


 On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote:

 Hi Grant,

 Thanks for taking the time to respond.

 While I have developed extensively against Lucene.net, I do not possess
the java skills needed to do a port of the code... So, while I wouldn't mind
being a committer, I do not think I am qualified. (I guess if I was, I could
just use Lucene proper and that would be that)

 As to other duties of a committer, I think the ASF is perceived as a
black box of questions for most of us.

 For one, I don't think anyone outside the 4 committers even understand
*why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc.
Maybe if there was an understanding of the why, the requirements of the ASF
would make more sense. I think a lot of us right now just perceive the ASF
as the group that is wanting to kill Lucene.net.

 I don't think we have a desire to kill it, I just think we are faced with
the unfortunate reality that the project is already dead and now us on the
PMC have the unfortunate job of cleaning up the mess as best we can. Again,
it is not even that we want to see it go away, we on the PMC just don't want
to be responsible for it's upkeep. You give me the names of 4 people who are
willing to be committers (i.e. people willing to volunteer their time) and I
will do my best to get the project into the Incubator. However, I have to
tell you, my willingness to help is diminishing with every trip we take
around this same circle of discussion.

 Simply put, given the way the vote has gone so far, the Lucene PMC is no
longer interested in sustaining this project. If the community wishes to see
it live at the ASF then one of you had better step up and spend 20-30
minutes of your time writing up the draft proposal (most of it can be copied
and pasted) and circulating it. In fact, given the amount of time some of
you have no doubt spent writing on this and other related threads you could
have put together the large majority of the proposal, circulated the draft
and got other volunteers to help and already be moving forward in a positive
direction. Truth be told, I would do it, but I am explicitly not going to
because I think that if the community can't take that one step to move
forward, then it truly doesn't deserve to.


 I get your comments about the slower than slow development, but that is
also somewhat of a sign that it works. While 2.9.2 may be behind, it seems
very stable with very few issues. If we send the project to the attic, how
will anyone be able to submit bugfixes ever? Frankly, I use 2.9.2 every day
and have not found bugs in the areas that I use... but I'm sure they are in
there somewhere.

 As for the name, I thought Lucene.net was the name of the project back in
the SourceForge days...
 So my question is based on the premise that if the lucene.net name was
brought *to* ASF, why can the community not leave with it?

 Again, IANAL, but just b/c it was improperly used beforehand does not mean
it is legally owned by some other entity. The Lucene name has been at the
ASF since 2001 and Lucene.NET is also now a part of the ASF. (If your
interested, go look at the discussions around iBatis and the movement of
that community to MyBatis)

 -Grant


 This message (and any associated files) is intended only for 

Re: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Troy Howard
Scott,

I agree with everything you said. My opinion is that one of the
largest failings of the current Lucene.Net development effort is that
there's too much magic in the conversion process. This is assuming
we continue with Lucene.Net as a line-by-line automated port.

As Heath said, the details of how we run the project are up to the
next group of committers to decide once that group has been
established. I'm sure this issue (as well as numerous other issues)
will be discussed in great detail and length by the community at that
time.

Thanks,
Troy


On Thu, Dec 30, 2010 at 10:57 AM, Lombard, Scott
slomb...@kingindustries.com wrote:
 Troy,

 My feeling is that a combination Java and .Net experience is needed.  Some 
 people will focus on Bug fixes in the .Net code while other focus on the 
 translation of the code as their experience allows.

 One of the things I would like to see different with Lucene.Net is that the 
 method conversion is kept in the SVN or Wiki. I feel the pre and post 
 processing as well as possibly extensions to what ever tool that is used for 
 the conversion are more important to this project then the actual executed 
 code.  Keeping a focus on making strong conversion tools as a community 
 should help reduce the lag between a Java releases to a .Net releases.  We 
 then won't be waiting for one person to make the conversion.

 Scott

 -Original Message-
 From: Troy Howard [mailto:thowar...@gmail.com]
 Sent: Thursday, December 30, 2010 1:38 PM
 To: lucene-net-u...@lucene.apache.org
 Cc: lucene-net-dev@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org

 Scott,

 I will gladly help put this proposal together and would like to
 volunteer as a committer. I am  communicating with others to find some
 additional candidates to be committers.

 Regarding Heath, a quote from his last message in this thread:

 While I have developed extensively against Lucene.net, I do not
 possess the java skills needed to do a port of the code... So, while I
 wouldn't mind being a committer, I do not think I am qualified.

 Thanks,
 Troy


 On Thu, Dec 30, 2010 at 10:01 AM, Lombard, Scott
 slomb...@kingindustries.com wrote:
 Grant,

 Thanks for your time explaining all the details.  I will be willing work on 
 a proposal to put Lucene.Net back in to incubation.  I will need other 
 people to step up and be committers as well.  Heath has volunteered and as 
 Grant has stated 4 committers are needed to for incubation.  Who else is 
 willing to be a committer?

 Grant I will definitely be taking you up on your offer to help on bring 
 Lucene.Net into incubation.

 Scott


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Thursday, December 30, 2010 12:32 PM
 To: lucene-net-u...@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org


 On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote:

 Hi Grant,

 Thanks for taking the time to respond.

 While I have developed extensively against Lucene.net, I do not possess the 
 java skills needed to do a port of the code... So, while I wouldn't mind 
 being a committer, I do not think I am qualified. (I guess if I was, I 
 could just use Lucene proper and that would be that)

 As to other duties of a committer, I think the ASF is perceived as a black 
 box of questions for most of us.

 For one, I don't think anyone outside the 4 committers even understand 
 *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc.  
 Maybe if there was an understanding of the why, the requirements of the ASF 
 would make more sense.  I think a lot of us right now just perceive the ASF 
 as the group that is wanting to kill Lucene.net.

 I don't think we have a desire to kill it, I just think we are faced with 
 the unfortunate reality that the project is already dead and now us on the 
 PMC have the unfortunate job of cleaning up the mess as best we can.  Again, 
 it is not even that we want to see it go away, we on the PMC just don't want 
 to be responsible for it's upkeep.  You give me the names of 4 people who 
 are willing to be committers (i.e. people willing to volunteer their time) 
 and I will do my best to get the project into the Incubator.  However, I 
 have to tell you, my willingness to help is diminishing with every trip we 
 take around this same circle of discussion.

 Simply put, given the way the vote has gone so far, the Lucene PMC is no 
 longer interested in sustaining this project.  If the community wishes to 
 see it live at the ASF then one of you had better step up and spend 20-30 
 minutes of your time writing up the draft proposal (most of it can be copied 
 and pasted) and circulating it.  In fact, given the amount of time some of 
 you have no doubt spent writing on this and other related threads you could 
 have put together the large majority of the proposal, circulated the draft 
 and got other volunteers to help and already be moving 

Re: RE: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Troy Howard
Marco,

I agree with you on this front. I feel that the first tasks that a new
Lucene.Net team should focus on, in terms of development are:

- Fully automating a line-by-line port using a tool such as Sharpen.
This needs to become a commodity function requiring very little
development effort
- Bring the existing forks back in as branches within the ASF project.
I am very interested in pursuing continued development on a more .NET
style port (i.e. the Lucere project I started or Aimee.Net

The Lucene.Net project should be able to continue with both
development paths in the same project.

Thanks,
Troy




On Thu, Dec 30, 2010 at 10:15 AM, Marco Dissel marco.dis...@gmail.com wrote:
 What will be the goal of new committors? Convert the source into .net style
 code? If yes, we should try to stop will all the spin-offs and concentrate
 all the development in one project.
 Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com
 het volgende:
 Grant,

 Thanks for your time explaining all the details. I will be willing work on
 a proposal to put Lucene.Net back in to incubation. I will need other people
 to step up and be committers as well. Heath has volunteered and as Grant has
 stated 4 committers are needed to for incubation. Who else is willing to be
 a committer?

 Grant I will definitely be taking you up on your offer to help on bring
 Lucene.Net into incubation.

 Scott


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Thursday, December 30, 2010 12:32 PM
 To: lucene-net-u...@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org


 On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote:

 Hi Grant,

 Thanks for taking the time to respond.

 While I have developed extensively against Lucene.net, I do not possess
 the java skills needed to do a port of the code... So, while I wouldn't mind
 being a committer, I do not think I am qualified. (I guess if I was, I could
 just use Lucene proper and that would be that)

 As to other duties of a committer, I think the ASF is perceived as a
 black box of questions for most of us.

 For one, I don't think anyone outside the 4 committers even understand
 *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc.
 Maybe if there was an understanding of the why, the requirements of the ASF
 would make more sense. I think a lot of us right now just perceive the ASF
 as the group that is wanting to kill Lucene.net.

 I don't think we have a desire to kill it, I just think we are faced with
 the unfortunate reality that the project is already dead and now us on the
 PMC have the unfortunate job of cleaning up the mess as best we can. Again,
 it is not even that we want to see it go away, we on the PMC just don't want
 to be responsible for it's upkeep. You give me the names of 4 people who are
 willing to be committers (i.e. people willing to volunteer their time) and I
 will do my best to get the project into the Incubator. However, I have to
 tell you, my willingness to help is diminishing with every trip we take
 around this same circle of discussion.

 Simply put, given the way the vote has gone so far, the Lucene PMC is no
 longer interested in sustaining this project. If the community wishes to see
 it live at the ASF then one of you had better step up and spend 20-30
 minutes of your time writing up the draft proposal (most of it can be copied
 and pasted) and circulating it. In fact, given the amount of time some of
 you have no doubt spent writing on this and other related threads you could
 have put together the large majority of the proposal, circulated the draft
 and got other volunteers to help and already be moving forward in a positive
 direction. Truth be told, I would do it, but I am explicitly not going to
 because I think that if the community can't take that one step to move
 forward, then it truly doesn't deserve to.


 I get your comments about the slower than slow development, but that is
 also somewhat of a sign that it works. While 2.9.2 may be behind, it seems
 very stable with very few issues. If we send the project to the attic, how
 will anyone be able to submit bugfixes ever? Frankly, I use 2.9.2 every day
 and have not found bugs in the areas that I use... but I'm sure they are in
 there somewhere.

 As for the name, I thought Lucene.net was the name of the project back in
 the SourceForge days...
 So my question is based on the premise that if the lucene.net name was
 brought *to* ASF, why can the community not leave with it?

 Again, IANAL, but just b/c it was improperly used beforehand does not mean
 it is legally owned by some other entity. The Lucene name has been at the
 ASF since 2001 and Lucene.NET is also now a part of the ASF. (If your
 interested, go look at the discussions around iBatis and the movement of
 that community to MyBatis)

 -Grant


 This message (and any associated files) is intended only for the
 use of the individual or 

RE: RE: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Karell Ste-Marie
Folks,

I will freely admit that I'm seizing the opportunity to raise an old
point - but that problem would be non-existent if this was a project
that implemented a methodology as opposed to being a continuous port
effort. I will even go as far as suggesting that this would broaden (and
ease) the recruitment of committers. It almost feels like the goal is
not simply to port Lucene.java to Lucene.net but to also develop a
technology that ports things automatically. I would almost suggest that
this in itself could be an ASF TLP. It still feels to me that everyone
is trying to cut the head off a two-headed dragon with a single sword
and a single motion.

Once search algorithms was discovered and implemented - it should be up
to the language-specific programmers to implement these and optimize
these as they see fit. Both languages have their strengths and their own
frameworks - at the moment the java side has great benefits which in
turn greatly hinder the success of the .net side.

In a nutshell, while some cultures seem to be better at courtship - the
fact that I don't speak some of these languages shouldn't make me less
good at it.

I think that a project for a Java-NET and NET-Java would be a great
idea. Again, it would allow a lot of people that are doing the same for
hundreds of other projects to simply pool their efforts.

Just my Canadian 2 cents (which is almost at par with the American cents
these days)


Karell Ste-Marie
C.I.O. - BrainBank Inc

-Original Message-
From: Lombard, Scott [mailto:slomb...@kingindustries.com] 
Sent: Thursday, December 30, 2010 2:17 PM
To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: RE: RE: Vote thread started on gene...@lucene.apache.org

Marco,

My feeling would be to create strong automated conversion tools to allow
java Lucene to be ported in to .NET in as few steps and as possible.
The .net style goal is a noble one, but will require a significant more
commitment to the project in the future.  As each new version of java
Lucene will have to be integrated by hand into the .net version.

As the conversion tools get more advanced and robust .net style code may
be implemented as part of the automated conversion process.


Scott


RE: RE: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Lombard, Scott

From everything that was said it seems apparent to me that the only way for 
Lucene.Net to stay alive is to move back to incubation.  So where do we go 
from here?  More than 4 people have said they are willing to be committers.  
Is this email list the best place to start working on a proposal, should it be 
done between a small group offline or is there a way that the community can 
work on it together?

Thoughts?
Scott


-Original Message-
From: Troy Howard [mailto:thowar...@gmail.com]
Sent: Thursday, December 30, 2010 2:22 PM
To: lucene-net-dev@lucene.apache.org
Cc: lucene-net-u...@lucene.apache.org
Subject: Re: RE: Vote thread started on gene...@lucene.apache.org

Marco,

I agree with you on this front. I feel that the first tasks that a new
Lucene.Net team should focus on, in terms of development are:

- Fully automating a line-by-line port using a tool such as Sharpen.
This needs to become a commodity function requiring very little
development effort
- Bring the existing forks back in as branches within the ASF project.
I am very interested in pursuing continued development on a more .NET
style port (i.e. the Lucere project I started or Aimee.Net

The Lucene.Net project should be able to continue with both
development paths in the same project.

Thanks,
Troy




On Thu, Dec 30, 2010 at 10:15 AM, Marco Dissel marco.dis...@gmail.com wrote:
 What will be the goal of new committors? Convert the source into .net style
 code? If yes, we should try to stop will all the spin-offs and concentrate
 all the development in one project.
 Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com
 het volgende:
 Grant,

 Thanks for your time explaining all the details. I will be willing work on
 a proposal to put Lucene.Net back in to incubation. I will need other people
 to step up and be committers as well. Heath has volunteered and as Grant has
 stated 4 committers are needed to for incubation. Who else is willing to be
 a committer?

 Grant I will definitely be taking you up on your offer to help on bring
 Lucene.Net into incubation.

 Scott


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Thursday, December 30, 2010 12:32 PM
 To: lucene-net-u...@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org


 On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote:

 Hi Grant,

 Thanks for taking the time to respond.

 While I have developed extensively against Lucene.net, I do not possess
 the java skills needed to do a port of the code... So, while I wouldn't mind
 being a committer, I do not think I am qualified. (I guess if I was, I could
 just use Lucene proper and that would be that)

 As to other duties of a committer, I think the ASF is perceived as a
 black box of questions for most of us.

 For one, I don't think anyone outside the 4 committers even understand
 *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc.
 Maybe if there was an understanding of the why, the requirements of the ASF
 would make more sense. I think a lot of us right now just perceive the ASF
 as the group that is wanting to kill Lucene.net.

 I don't think we have a desire to kill it, I just think we are faced with
 the unfortunate reality that the project is already dead and now us on the
 PMC have the unfortunate job of cleaning up the mess as best we can. Again,
 it is not even that we want to see it go away, we on the PMC just don't want
 to be responsible for it's upkeep. You give me the names of 4 people who are
 willing to be committers (i.e. people willing to volunteer their time) and I
 will do my best to get the project into the Incubator. However, I have to
 tell you, my willingness to help is diminishing with every trip we take
 around this same circle of discussion.

 Simply put, given the way the vote has gone so far, the Lucene PMC is no
 longer interested in sustaining this project. If the community wishes to see
 it live at the ASF then one of you had better step up and spend 20-30
 minutes of your time writing up the draft proposal (most of it can be copied
 and pasted) and circulating it. In fact, given the amount of time some of
 you have no doubt spent writing on this and other related threads you could
 have put together the large majority of the proposal, circulated the draft
 and got other volunteers to help and already be moving forward in a positive
 direction. Truth be told, I would do it, but I am explicitly not going to
 because I think that if the community can't take that one step to move
 forward, then it truly doesn't deserve to.


 I get your comments about the slower than slow development, but that is
 also somewhat of a sign that it works. While 2.9.2 may be behind, it seems
 very stable with very few issues. If we send the project to the attic, how
 will anyone be able to submit bugfixes ever? Frankly, I use 2.9.2 every day
 and have not found bugs in the areas that I use... but I'm sure they 

Re: RE: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Michael Herndon
Does the conversion tool actually help or hinder?

My feeling is that the more dependency you have on a tool, the less likely
this project will ever stand on its own.

There should probably be parallelized branches. one that continues using the
tool to provide for the current gaps between .net  lucene while the other
branch that focuses on more .net styled api is moved forward.

It also seemed like other volunteers wanted to use Visual Studio 2010, move
lucene.net to a more .net friendly api (hopefully adhere a bit better to the
ms coding 
guidelineshttp://blogs.msdn.com/b/brada/archive/2005/01/26/361363.aspxso
that figure one's way around the code base is less invovled), and let
it
evolve.

As Grant points out, the biggest problem is getting people to not
just discuss the future of lucene.net but actually to step up and get
involved working on it.

No one should be discarded for their lack of  Java or programming knowledge
if they have a sincere wish to learn and hours to give to the project.
 There are more things to be done than just coding or porting java code.
 They can learn as they go.  Does one really need to know Java to write C#
test cases?

This project seriously lacks visibility, documentation, a decent website,
blogging on lucene.net, or any kind of decent PR/Marketing pathway that will
help build up the community and move it forward.  Any future PMC should be
cognizant of that as well as the landscape of .Net opensource and how that
is changing of late.


The java version has solr (which any language can talk to) built on top of
it and can use other projects like tika / poi for indexing.  Whats the
business value of lucene.net if its line by line port of the lucene version
that doesn't have anything extra that its father project already has?

Something to think on.




- Michael


On Thu, Dec 30, 2010 at 2:17 PM, Lombard, Scott slomb...@kingindustries.com
 wrote:

 Marco,

 My feeling would be to create strong automated conversion tools to allow
 java Lucene to be ported in to .NET in as few steps and as possible.  The
 .net style goal is a noble one, but will require a significant more
 commitment to the project in the future.  As each new version of java Lucene
 will have to be integrated by hand into the .net version.

 As the conversion tools get more advanced and robust .net style code may be
 implemented as part of the automated conversion process.


 Scott



 -Original Message-
 From: Marco Dissel [mailto:marco.dis...@gmail.com]
 Sent: Thursday, December 30, 2010 1:16 PM
 To: lucene-net-u...@lucene.apache.org
 Cc: lucene-net-dev@lucene.apache.org
 Subject: Re: RE: Vote thread started on gene...@lucene.apache.org

 What will be the goal of new committors? Convert the source into .net style
 code? If yes, we should try to stop will all the spin-offs and concentrate
 all the development in one project.
 Op 30 dec. 2010 19:02 schreef Lombard, Scott 
 slomb...@kingindustries.com
 het volgende:
  Grant,
 
  Thanks for your time explaining all the details. I will be willing work
 on
 a proposal to put Lucene.Net back in to incubation. I will need other
 people
 to step up and be committers as well. Heath has volunteered and as Grant
 has
 stated 4 committers are needed to for incubation. Who else is willing to be
 a committer?
 
  Grant I will definitely be taking you up on your offer to help on bring
 Lucene.Net into incubation.
 
  Scott
 
 
  -Original Message-
  From: Grant Ingersoll [mailto:gsing...@apache.org]
  Sent: Thursday, December 30, 2010 12:32 PM
  To: lucene-net-u...@lucene.apache.org
  Subject: Re: Vote thread started on gene...@lucene.apache.org
 
 
  On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote:
 
  Hi Grant,
 
  Thanks for taking the time to respond.
 
  While I have developed extensively against Lucene.net, I do not possess
 the java skills needed to do a port of the code... So, while I wouldn't
 mind
 being a committer, I do not think I am qualified. (I guess if I was, I
 could
 just use Lucene proper and that would be that)
 
  As to other duties of a committer, I think the ASF is perceived as a
 black box of questions for most of us.
 
  For one, I don't think anyone outside the 4 committers even understand
 *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc.
 Maybe if there was an understanding of the why, the requirements of the ASF
 would make more sense. I think a lot of us right now just perceive the ASF
 as the group that is wanting to kill Lucene.net.
 
  I don't think we have a desire to kill it, I just think we are faced with
 the unfortunate reality that the project is already dead and now us on the
 PMC have the unfortunate job of cleaning up the mess as best we can. Again,
 it is not even that we want to see it go away, we on the PMC just don't
 want
 to be responsible for it's upkeep. You give me the names of 4 people who
 are
 willing to be committers (i.e. people willing to volunteer their time) and
 I
 will do 

Re: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Troy Howard
That is exactly what I would suggest. Sharpen looks like a great tool,
since you can customize it's behaviour. In fact, the only downside is
that you have to customize it's behaviour which requires a lot of
upfront work.

Thanks,
Troy


On Thu, Dec 30, 2010 at 11:42 AM, Prescott Nasser geobmx...@hotmail.com wrote:

 Maybe I'm misunderstanding you, but I think the technology is there - no 
 generic porting tool will be 100%, it will always require pre/post 
 processing. Sharpen is a pretty good generic conversion tool.

 I agree in that I think we need to focus on a process utilizing a tool such 
 as sharpen and developing the pre/post processing clean up scripts that are 
 specific to Lucene.

 ~Prescott



 Subject: RE: RE: Vote thread started on gene...@lucene.apache.org
 Date: Thu, 30 Dec 2010 14:29:21 -0500
 From: stema...@brain-bank.com
 To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org

 Folks,

 I will freely admit that I'm seizing the opportunity to raise an old
 point - but that problem would be non-existent if this was a project
 that implemented a methodology as opposed to being a continuous port
 effort. I will even go as far as suggesting that this would broaden (and
 ease) the recruitment of committers. It almost feels like the goal is
 not simply to port Lucene.java to Lucene.net but to also develop a
 technology that ports things automatically. I would almost suggest that
 this in itself could be an ASF TLP. It still feels to me that everyone
 is trying to cut the head off a two-headed dragon with a single sword
 and a single motion.

 Once search algorithms was discovered and implemented - it should be up
 to the language-specific programmers to implement these and optimize
 these as they see fit. Both languages have their strengths and their own
 frameworks - at the moment the java side has great benefits which in
 turn greatly hinder the success of the .net side.

 In a nutshell, while some cultures seem to be better at courtship - the
 fact that I don't speak some of these languages shouldn't make me less
 good at it.

 I think that a project for a Java-NET and NET-Java would be a great
 idea. Again, it would allow a lot of people that are doing the same for
 hundreds of other projects to simply pool their efforts.

 Just my Canadian 2 cents (which is almost at par with the American cents
 these days)


 Karell Ste-Marie
 C.I.O. - BrainBank Inc

 -Original Message-
 From: Lombard, Scott [mailto:slomb...@kingindustries.com]
 Sent: Thursday, December 30, 2010 2:17 PM
 To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
 Subject: RE: RE: Vote thread started on gene...@lucene.apache.org

 Marco,

 My feeling would be to create strong automated conversion tools to allow
 java Lucene to be ported in to .NET in as few steps and as possible.
 The .net style goal is a noble one, but will require a significant more
 commitment to the project in the future. As each new version of java
 Lucene will have to be integrated by hand into the .net version.

 As the conversion tools get more advanced and robust .net style code may
 be implemented as part of the automated conversion process.


 Scott



RE: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Karell Ste-Marie
I think it took be 5 deletes of this e-mail and complete rewrites to try to 
say this in the best way possible:

First off, Sharpen is a java tool (from the db4o SVN I found) - using sharpen 
to port lucene to .net means that people now have to install a jvm on their 
computers in order to contribute. While this may seem like it makes perfect 
sense in fact it is this type of requirements that scares pure .net developers 
away. You cannot ask someone to install a bunch of tools outside of their 
comfort zone in order to create a tool that works in their world. Furthermore, 
it's also saying that now - not only do contributors need to know java and have 
a jvm, but then they also need to know sharpen in order to make a c# product.

Gentlemen, I would gladly contribute - I can assure you that I wouldn't be the 
best but I would be happy to lend a hand - but speaking strictly for myself I 
don't see myself learning 2-3 new pieces of technologies when I feel that I 
should just be a good c# programmer to help out.

Would it not make more sense, given the fact that we want to reduce work and 
make a quality product that we become more selective about *what* goes through 
Sharpen and what can be hand-crafted? IE: Do we really need to port the Java 
methods of writing to files and handling Threading? What about WCF?



Karell Ste-Marie
C.I.O. - BrainBank Inc


-Original Message-
From: Troy Howard [mailto:thowar...@gmail.com] 
Sent: Thursday, December 30, 2010 2:46 PM
To: lucene-net-dev@lucene.apache.org
Cc: lucene-net-u...@lucene.apache.org
Subject: Re: Vote thread started on gene...@lucene.apache.org

That is exactly what I would suggest. Sharpen looks like a great tool, since 
you can customize it's behaviour. In fact, the only downside is that you have 
to customize it's behaviour which requires a lot of upfront work.

Thanks,
Troy


On Thu, Dec 30, 2010 at 11:42 AM, Prescott Nasser geobmx...@hotmail.com wrote:

 Maybe I'm misunderstanding you, but I think the technology is there - no 
 generic porting tool will be 100%, it will always require pre/post 
 processing. Sharpen is a pretty good generic conversion tool.

 I agree in that I think we need to focus on a process utilizing a tool such 
 as sharpen and developing the pre/post processing clean up scripts that are 
 specific to Lucene.

 ~Prescott


Re: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Ben Martz


  
  
Troy, et al,

Given the recent positive shift in attitude regarding the Lucene.Net
project, I would like to consider ways that I could help contribute
as well. As with other people in the community, while my company is
very small (I am both Chief Software Architect and Chief Bottle
Washer), we do a have a vested interest in seeing this project
succeed.

One thing to consider while developing the incubator proposal is
that the reason I stopped attempting to contribute was that very
early on it was made very clear to me that this project was a
one-man show and that any efforts I offered towards working on the
port were not welcome. I think that in order to succeed the new
proposal needs to embrace transparency in the entire port, testing
and fix process so that more people (and potential committers) can
have the opportunity to get their hands dirty and expect that their
ideas will not be rejected out of hand. I'm not saying that everyone
should be a committer but rather I would hope that the committers
would at least consider input and help from the community.

It's important to remember that Lucene.Net is "just" a (very good)
line-by-line port*. This means that the skill set we need from
committers is very different than what the Lucene Java project would
be looking for. I agree with various people who have raised the good
point that automation is the way to go for the initial pass. There
are now multiple OSS Java-.NET conversion tools out there that
while not perfect could offer a good starting point. The strength of
working to customize scripts or even the tools themselves would be a
repeatable and documented porting process that could be executed in
parallel by multiple people with the expectation of deterministic
results.

    Sharpen (db40):
http://developer.db4o.com/Blogs/Product/tabid/167/entryid/94/Default.aspx
    Java 2 CSharp (ILOG/IBM):
http://sourceforge.net/apps/mediawiki/j2cstranslator/index.php?title=Main_Page

* Various spin-offs are embracing a functional port model but this
is not what I am looking for and I get the feeling that some
developers would prefer to stick with a "true" port as well.

Also remember that we would need not only people to work on the
porting mechanism and port but also people willing to develop and
run the unit tests and such.

In summary, I believe that if we can agree as a community to get
away from this magic one-man black-box porting model then more
people such as myself would come out of the woodwork and help out.

My way is not the only way but it does represent my personal
thoughts in any case.

Thanks for your consideration,
Ben Martz


  

  

Troy Howard
  December 30, 2010 11:51 AM
  

  
  
Scott,
  
  We should communicate on the public list as much as possible.
  I'll put
  together the draft proposal today, post it here, and ask for
  feedback
  from both the Lucene PMC and the community. We will wait over
  the
  weekend and Monday to allow people who might have additional
  input the
  opportunity to either see this at home or at work.
  
  On Tuesday (Jan 4th) we will move forward with whatever our
  best
  effort has produced and go from there.
  
  Thanks,
  Troy
  

  



Re: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Troy Howard
It's my opinion that we can basically commoditize an automated port
which will fulfill the needs of the community, and allow the project
to, at minimum, continue to release, in a timely fashion, direct ports
of the Java Lucene releases...

Meanwhile we can continue the efforts represented in Lucere, Lucille,
and Aimee.Net to create an alternative API for Lucene.Net which may or
may not include completely re-written code, depending on the
specifics.

I think both concepts can co-exist in a single project and that this
will be the best way to move forward. If you followed the Lucere
project, you'll see that my approach with TDD and Contract Driven
Design was intended to facilitate just such an arrangement.

Thanks,
Troy


On Thu, Dec 30, 2010 at 12:32 PM, Prescott Nasser geobmx...@hotmail.com wrote:

 In incubator we can probably rewrite the description of the project - but in 
 the past we were pushed from doing anything but a straight port becuase the 
 description of the project was line by line port - where a tool makes 
 sense, and .NET specific contructs are basically avoided becuase that 
 wouldn't be a line by line port. We talked about using things like Enums but 
 we were shot down from this idea by someone...

 I agree with you whole heartly about utilizing sharpen and jvm just to port 
 the code. The Lucere project was the idea of rewriting the java code to .Net, 
 using standard constructs. I think the goal for the ASF project was to 
 minimize work needed to be done to upgrade to new java things that come out. 
 If we purse this direction, then every change needs to be manually ported. 
 I've already said I think that is do-able once we are on part with the latest 
 java.


 ~Prescott Nasser
 prescott.nas...@hotmail.com
 650.208.4205




 Subject: RE: Vote thread started on gene...@lucene.apache.org
 Date: Thu, 30 Dec 2010 15:24:32 -0500
 From: stema...@brain-bank.com
 To: lucene-net-dev@lucene.apache.org
 CC: lucene-net-u...@lucene.apache.org

 I think it took be 5 deletes of this e-mail and complete rewrites to try 
 to say this in the best way possible:

 First off, Sharpen is a java tool (from the db4o SVN I found) - using 
 sharpen to port lucene to .net means that people now have to install a jvm 
 on their computers in order to contribute. While this may seem like it makes 
 perfect sense in fact it is this type of requirements that scares pure .net 
 developers away. You cannot ask someone to install a bunch of tools 
 outside of their comfort zone in order to create a tool that works in 
 their world. Furthermore, it's also saying that now - not only do 
 contributors need to know java and have a jvm, but then they also need to 
 know sharpen in order to make a c# product.

 Gentlemen, I would gladly contribute - I can assure you that I wouldn't be 
 the best but I would be happy to lend a hand - but speaking strictly for 
 myself I don't see myself learning 2-3 new pieces of technologies when I 
 feel that I should just be a good c# programmer to help out.

 Would it not make more sense, given the fact that we want to reduce work and 
 make a quality product that we become more selective about *what* goes 
 through Sharpen and what can be hand-crafted? IE: Do we really need to port 
 the Java methods of writing to files and handling Threading? What about WCF?



 Karell Ste-Marie
 C.I.O. - BrainBank Inc


 -Original Message-
 From: Troy Howard [mailto:thowar...@gmail.com]
 Sent: Thursday, December 30, 2010 2:46 PM
 To: lucene-net-dev@lucene.apache.org
 Cc: lucene-net-u...@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org

 That is exactly what I would suggest. Sharpen looks like a great tool, since 
 you can customize it's behaviour. In fact, the only downside is that you 
 have to customize it's behaviour which requires a lot of upfront work.

 Thanks,
 Troy


 On Thu, Dec 30, 2010 at 11:42 AM, Prescott Nasser geobmx...@hotmail.com 
 wrote:
 
  Maybe I'm misunderstanding you, but I think the technology is there - no 
  generic porting tool will be 100%, it will always require pre/post 
  processing. Sharpen is a pretty good generic conversion tool.
 
  I agree in that I think we need to focus on a process utilizing a tool 
  such as sharpen and developing the pre/post processing clean up scripts 
  that are specific to Lucene.
 
  ~Prescott



Re: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Ben Martz


  
  
So perhaps the proposal should allow for a combination of a mostly
automated baseline line-by-line port and the explicit provision that
embraces drop-in (API compliant) .NET-specific replacements for
specific classes?

- Ben


  

  
  

  

Troy Howard
  December 30, 2010 12:39 PM
  

  
  
It's my opinion that we can basically commoditize an
  automated port
  which will fulfill the needs of the community, and allow the
  project
  to, at minimum, continue to release, in a timely fashion,
  direct ports
  of the Java Lucene releases...
  
  Meanwhile we can continue the efforts represented in Lucere,
  Lucille,
  and Aimee.Net to create an alternative API for Lucene.Net
  which may or
  may not include completely re-written code, depending on the
  specifics.
  
  I think both concepts can co-exist in a single project and
  that this
  will be the best way to move forward. If you followed the
  Lucere
  project, you'll see that my approach with TDD and Contract
  Driven
  Design was intended to facilitate just such an arrangement.
  
  Thanks,
  Troy
  

  



Re: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Troy Howard
Yes. I'm in the process of writing that proposal at this time. It will
include language in the project description that express our intent to
develop a C#/.NET idiomatic version of the library.

Please find the in-progress draft version at:

http://wiki.apache.org/incubator/Lucene.Net%20Proposal

Thanks,
Troy


On Thu, Dec 30, 2010 at 12:43 PM, Ben Martz benma...@gmail.com wrote:

  So perhaps the proposal should allow for a combination of a mostly
 automated baseline line-by-line port and the explicit provision that
 embraces drop-in (API compliant) .NET-specific replacements for specific
 classes?

 - Ben

  --

Troy Howard thowar...@gmail.com
 December 30, 2010 12:39 PM

 It's my opinion that we can basically commoditize an automated port
 which will fulfill the needs of the community, and allow the project
 to, at minimum, continue to release, in a timely fashion, direct ports
 of the Java Lucene releases...

 Meanwhile we can continue the efforts represented in Lucere, Lucille,
 and Aimee.Net to create an alternative API for Lucene.Net which may or
 may not include completely re-written code, depending on the
 specifics.

 I think both concepts can co-exist in a single project and that this
 will be the best way to move forward. If you followed the Lucere
 project, you'll see that my approach with TDD and Contract Driven
 Design was intended to facilitate just such an arrangement.

 Thanks,
 Troy




Champion and Mentor

2010-12-30 Thread Troy Howard
Grant,

I'm working on the proposal and have come to the final section where I
must list a Champion and list of Mentors.

Can I put your name for Champion and possibly as a Mentor as well? Are
there any other folk out there willing to Mentor our project during
incubation? Should I instead wait for the Incubator PMC to assign
Mentors to us?

Thanks,
Troy


Re: Incubator Proposal Draft

2010-12-30 Thread Troy Howard
Sorry... I was in outer space with those dates.

To clarify, I'll submit the application on Tuesday, January 11th, 2011
which gives us exactly 12 days as a community to determine our
opinions, plans, develop our proposal and committer list.

Thanks,
Troy


On Thu, Dec 30, 2010 at 4:13 PM, Troy Howard thowar...@gmail.com wrote:
 All,

 Please review the draft proposal located at:

 http://wiki.apache.org/incubator/Lucene.Net%20Proposal

 If you'd like to make an edit feel free create an account and edit the
 page as you see fit. I'd especially appreciate help with spelling and
 grammar proofreading in that regard.

 Regarding content I would appreciate direct comments on the text of
 the proposal presented in the mailing lists here for open discussion.

 Some points to note: I have only filled out information in the
 proposal about myself and Chris Currens. I work with Chris in real
 life and was able to discuss this with him in person. I am not going
 to take the liberty to include information about anyone else for fear
 of misrepresentation.

 If you'd like to include information about yourself in the proposal,
 please edit it and include that information.

 Since this is only a draft of the proposal, anything can change. What
 is there is mostly just to get the ball rolling on the application and
 have a concrete document to discuss.

 It's my intention to officially submit the proposal on Tuesday,
 January 11th, 2010. Please ensure that your contributions or
 commentary is provided before that time if you wish them to be
 considered for this proposal. This gives us, as a community, 2.5 weeks
 to prepare. Hopefully this will be more than enough time to discuss
 and settle on our official positions as a community.

 Thanks,
 Troy



RE: Initial committers list for Incubator Proposal

2010-12-30 Thread Lombard, Scott
Troy,

Thank you for all your work on the Incubator Proposal you have done an 
excellent job.

I volunteered to be a committer and here is my brief qualification list.  I 
have a BS in Electrical Engineering and currently work in the Automation field. 
 I do extensive programming in MS SQL, ASP.NET, C# primarily to provide useful 
and pertinent information to my users, from data that is stored in many places 
and usually from legacy products.  Currently I have been using Lucene.Net in a 
web application I developed to collate data stored in multiple Access databases 
to give users a simplified interface to our data.  I am personally interested 
in the challenge of developing and documenting an automated process to convert 
Java Lucene to C#.  The work I will be doing for the Lucene.NET project will be 
done for the most part outside of my job.  As a committer I would have adequate 
time to devote to the project.

I look forward to being an active member of the Lucene.Net project.

Scott


From: Troy Howard [thowar...@gmail.com]
Sent: Thursday, December 30, 2010 7:01 PM
To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: Initial committers list for Incubator Proposal

All,

I'm working on the Incubator Proposal now, and need to establish a
list of initial committers.

So far, the following people have come forward and offered to be
committers (in alphabetical order):

Alex Thompson
Ben Martz
Chris Currens
Heath Aldrich
Michael Herndon
Prescott Nasser
Scott Lombard
Simone Chiaretta
Troy Howard

I would like to place an open request for any interested parties to
respond to this message with their request to be a Committer. For
people who are either on that list or for people who would like to be
added, please send a message explaining (briefly) why you think you
will be qualified to be involved in the project and specifically what
ways you hope to be able to contribute.

One thing I would like to point out is that in the Apache world there
is a distinction between Committers and Contributors (aka developers).
See this link for details:

http://incubator.apache.org/guides/participation.html#committer


Please consider whether or not you wish to be a Committer or a Contributor.

Some quick rules of thumb:

Committers:

- Committers must be willing to submit a Contributor License Agreement
(CLA). See: http://www.apache.org/licenses/#clas

- Committers must have enough *consistent* free time to fulfill the
expectations of the ASF in terms of reporting,  process, documentation
and remain responsive to the community in terms of communication and
listening to, considering, and discussing community opinion. These
kinds of tasks can consume a lot of time and are some of the first
things people stop down when they start running out of time.

- A Committer may not even write code, but may simply accept, review
and commit code written by others. This is the primary responsibility
of a Committer -- to commit code, whether they wrote it themselves or
not

- Committers may have to perform the unpleasant task of reject
contribution from Contributors and explain why in a fair and objective
manner. This can be frustrating and time consuming. You may need to
play the part of a mentor or engage in debates. You may even be proved
wrong and have to swallow your pride.

- Committers have direct access to the source control and other
resources and so must be personally accountable for the quality of the
same and will need to operate under the process and restrictions ASF
expects


Contributors:

- Contributors might have a lot of free time this month, but get
really busy next month and have no time at all. They can develop code
in short bursts but then drop off the face of the planet indefinitely
after that.

- Contributors could focus on code only or work from a task list
without any need to interact with and be accountable to the community
(as this is the responsibility of the Committers)

- Contributors can do one-time or infrequently needed tasks like
updating the website, documentation, wikis, etc..

- Contributors will need to have anything they create reviewed by a
Committer and ultimately included by a Committer. Some people find
this frustrating, if the Committers are slow to respond or critical of
their work.


So in your responses, please be clear about whether you would like to
offer your help as a Committer or as a Contributor.

Thanks,
Troy


This message (and any associated files) is intended only for the
use of the individual or entity to which it is addressed and may
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient
you are hereby notified that any dissemination, copying or
distribution of this message, or files associated with this message,
is strictly prohibited. If you have received this message in error,
please notify us immediately by replying to the message and 

[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976011#action_12976011
 ] 

Michael McCandless commented on LUCENE-2837:


{quote}
bq. but before committing I think we should add a newSearcher to 
LuceneTestCase, which randomly chooses whether the searcher uses threads, and 
cutover tests to use this instead of making their own IndexSearcher.

I did this on LUCENE-2751, but the tests won't all pass until we fix the 
FieldCache autodetect
synchronization bug (the Numerics tests will fail with multiple threads)...
{quote}

Duh, I knew newSearcher() sounded familiar :)  OK so we have to fix the 
multi-threaded bug in FC first and then I think commit the newSearcher cutover 
from LUCENE-2751, then commit this issue.

Then, I think, separately create a new higher level MultiSearcher w/ a 
limited search API.  I'll open a new issue for that.

 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976012#action_12976012
 ] 

Michael McCandless commented on LUCENE-2837:


bq. We should discuss about how many threads should be spawned. If you have an 
index with many segments, even small ones, I think only the larger segments 
should be separate threads, all others should be handled sequentially. So maybe 
add a maxThreads cound, then sort the IndexReaders by maxDoc and then only 
spawn maxThreads-1 threads for the bigger readers and then one additional 
thread for the rest?

That sounds like a great improvement -- Uwe can you open a new issue for that?  
Let's try to leave this issue as a rote refactoring...

 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: strange problem of PForDelta decoder

2010-12-30 Thread Michael McCandless
On Mon, Dec 27, 2010 at 5:08 AM, Li Li fancye...@gmail.com wrote:
 I integrated pfor codec into lucene 2.9.3 and the search time
 comparsion is as follows:
                                   single term   and query   or query
 VINT in lucene 2.9.3         11.2            36.5           38.6
 PFor in lucene 2.9.3         8.7              27.6           33.4
 VINT in lucene 4 branch   10.6             26.5           35.4
 PFor in lcuene 4 branch    8.1              22.5           30.7

 My test terms are high frequncy terms because we are interested in bad case

I agree it's the bad cases we should focus on in general.  If a super
fast query gets somewhat slower it's relatively harmless (just a
capacity question for high volume sites) but if the bad queries get
slower it's awful (requires faster cutover to sharded architecture),
until we fix Lucene to run a single search concurrently (which we
badly need to do).

 It seems lucene 4 branch's implementation of and query(conjuction
 query) is well optimized that even for VINT codec, it's faster than
 PFor in lucene 2.9.3. Could any one tell me what optimization is done?
 is store docIDs and freqs separately making it faster? or anything
 else?

Actually vInt on the bulkpostings branch stores freq/doc together.  Ie
the format is the same as 2.9.x's format.  I think it could be the
fact that AND query does block reads (64 doc/freqs at once) instead of
doc-at-once?  Ie, because of this, the query is efficitively scanning
the next block of 64 docs instead of skipping to them?  Our skipping
impl is unfortunately rather costly so if skip will not skip that many
docs it's better to scan.

 Another querstion, Is there anyone interested in integrating pfor
 codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
 1.4). And how do I contribute this patch?

Realistically I don't think we can commit this to 2.9.x -- that branch
is purely bug fixes at this point.

Still it's possible others could make use of such a patch so if it's
not too much work you may as well post it?  It can lead to
improvements on the bulk postings branch too :)  The more patches the
merrier!

You only use PFor for the very high freq terms in 2.9.x right?  I've
wondered if we should do the same on bulkpostings... problem is for eg
range queries, that visit all docs for all terms b/w X and Y, you want
the bulk decode even for low freq terms...

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: any issues about the *perthread classes

2010-12-30 Thread Michael McCandless
Basically, we are moving the thread state upwards in Lucene's indexing chain.

Ie, very early on when indexing a doc you pick a thread-private state.
 Then, the thread does all indexing into this private state,
unfettered by any sync blocks.

This is akin to moving to a process-based concurrency model, ie, we
are most strongly separating threads to limit the number of locks that
must be acquired when indexing a doc, or when flushing.

This is an important change because it means flushing of a single
thread private state can take place concurrently with ongoing indexing
into other thread states.  Lucene cannot do this today since flushing
flushes all thread states, and it results in a serious bottleneck on
indexing throughput for machines w/ alot of available concurrency.  I
wrote about this problem here:

http://chbits.blogspot.com/2010/09/lucenes-indexing-is-fast.html

The takeaway is that using 6 indexing threads means we are blocked 50%
of the time waiting for flush, which is quite awful.  This was on a
machine w/ an SSD and 24 cores, so, Lucene was nowhere near able to
take advantage of this machine's concurrency.  Once flushing is
concurrent we should be able to fully saturate both IO and CPU
concurrency on such a machine...

Mike

On Wed, Dec 29, 2010 at 12:49 AM, xu cheng xcheng@gmail.com wrote:
 hi all
 I noticed that there are plenty *PerThread classes in the
 trunk http://svn.apache.org/repos/asf/lucene/dev/trunk/
 while in the realtime_search
 version http://svn.apache.org/repos/asf/lucene/dev/branches/realtime_search/
 the *PerThread classes are gone!
 this just confused me,  cos I'm new here.
 what's the purpose of such a design?what's the advantage? any issues refer
 to this ??
 any suggestion or references are appreciated!
 regards.
 xu

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

2010-12-30 Thread Uwe Schindler (JIRA)
Multi-Threading in IndexSearcher (after removal of MultiSearcher and 
ParallelMultiSearcher)
---

 Key: LUCENE-2840
 URL: https://issues.apache.org/jira/browse/LUCENE-2840
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Search
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 4.0


Spin-off from parent issue:

{quote}
We should discuss about how many threads should be spawned. If you have an 
index with many segments, even small ones, I think only the larger segments 
should be separate threads, all others should be handled sequentially. So maybe 
add a maxThreads cound, then sort the IndexReaders by maxDoc and then only 
spawn maxThreads-1 threads for the bigger readers and then one additional 
thread for the rest?
{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-30 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976020#action_12976020
 ] 

Uwe Schindler commented on LUCENE-2837:
---

bq. That sounds like a great improvement - Uwe can you open a new issue for 
that? Let's try to leave this issue as a rote refactoring...

Done: LUCENE-2840

 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

2010-12-30 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976027#action_12976027
 ] 

Earwin Burrfoot commented on LUCENE-2840:
-

I use the following scheme:
* There is a fixed pool of threads shared by all searches, that limits total 
concurrency.
* Each new search apprehends at most a fixed number of threads from this pool 
(say, 2-3 of 8 in my setup),
* and these threads churn through segments as through a queue (in maxDoc order, 
but I think even that is unnecessary).

No special smart binding between threads and segments (eg. 1 thread for each 
biggie, 1 thread for all of the small ones) -
means simpler code, and zero possibility of stalling, when there are threads to 
run, segments to search, but binding policy does not connect them.
Using fewer threads per-search than total available is a precaution against 
biggie searches blocking fast ones.

 Multi-Threading in IndexSearcher (after removal of MultiSearcher and 
 ParallelMultiSearcher)
 ---

 Key: LUCENE-2840
 URL: https://issues.apache.org/jira/browse/LUCENE-2840
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Search
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 4.0


 Spin-off from parent issue:
 {quote}
 We should discuss about how many threads should be spawned. If you have an 
 index with many segments, even small ones, I think only the larger segments 
 should be separate threads, all others should be handled sequentially. So 
 maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then 
 only spawn maxThreads-1 threads for the bigger readers and then one 
 additional thread for the rest?
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976031#action_12976031
 ] 

Robert Muir commented on LUCENE-2837:
-

i noticed the comment about the shutting down of executorservice... can we just 
make the executorservice arg mandatory for parallel?

in my opinion, whoever creates it should be responsible for shutting it down, 
no one else. 

so i don't like the dual mode where we sometimes make our own but you can set a 
different one.
we don't clean up correctly at all wrt this in ParallelMultiShredder today in 
my opinion.

 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976032#action_12976032
 ] 

Robert Muir commented on LUCENE-2837:
-

{quote}
OK so we have to fix the multi-threaded bug in FC first and then I think commit 
the newSearcher cutover from LUCENE-2751, then commit this issue.
{quote}

Well, you don't have to do all of that (you could commit this one, then chase 
down all the bugs). I was just warning you
so you don't get surprised.

 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2838) ConstantScoreQuery should directly support wrapping Query and simply strip off scores

2010-12-30 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2838:
--

Attachment: LUCENE-2838.patch

Cleaned up patch:
- removed a useless testcase, no longer needed
- added test for CSQ, that checks equals and hashCode
- code cleanup
- javadocs

I will commit this if nobody objects to 3.x and trunk. About deprecating QWF we 
should discuss in separate issues, maybe we can merge Filter and Query before 
4.0!

 ConstantScoreQuery should directly support wrapping Query and simply strip 
 off scores
 -

 Key: LUCENE-2838
 URL: https://issues.apache.org/jira/browse/LUCENE-2838
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2838.patch, LUCENE-2838.patch


 Especially in MultiTermQuery rewrite modes we often simply need to strip off 
 scores from Queries and make them constant score. Currently the code to do 
 this looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query))
 As the name says, QueryWrapperFilter should make any other Query constant 
 score, so why does it not take a Query as ctor param? This question was aldso 
 asked quite often by my customers and is simply correct, if you think about 
 it.
 Looking closer into the code, it is clear that this would also speed up MTQs:
 - One additional wrapping and method calls can be removed
 - Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only 
 used in tests and the use-case for this class is not really available) and 
 LUCENE-2831 does not need the stupid hack to make Simon's assertions pass
 - CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on 
 top-level now directly feeds the Collector. For that a small trick is used: 
 The score(Collector) calls are directly delegated and the scores are stripped 
 by wrapping the setScorer() method in Collector
 During that I found a visibility bug in Scorer (LUCENE-2839): The method 
 boolean score(Collector collector, int max, int firstDocID) should be 
 public not protected, as its not solely intended to be overridden by 
 subclasses and is called from other classes, too! This leads to no compiler 
 bugs as the other classes that calls it is mainly BooleanScorer(2) and thats 
 in same package, but visibility is wrong. I will open an issue for that and 
 fix it at least in trunk where we have no backwards-requirement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976035#action_12976035
 ] 

Robert Muir commented on LUCENE-2837:
-

Mike, also if you apply LUCENE-2751, tests randomly fails because of the 
LUCENE-2756 bug.

For example TestBoolean2.testRandomQueries will fail because sometimes it uses 
a wildcard query,
and if it then incorporates MUST_NOT, this will fail against the 
multisearcher/parallelmultisearcher 
because the combine() is wrong.

So I'm thinking we should add the newSearcher tests after you committed this 
one 
(as long as this one has some reasonable standalone tests to show it works)


 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs

2010-12-30 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2303:
--

Attachment: SOLR-2303.patch

 remove unnecessary (and problematic) log4j jars in contribs
 ---

 Key: SOLR-2303
 URL: https://issues.apache.org/jira/browse/SOLR-2303
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2303.patch


 In solr 4.0 there is log4j-over-slf4j.
 But if you have log4j jars also in the classpath (e.g. contrib/extraction, 
 contrib/clustering) you can get strange errors such as:
 java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V
 So I think we should remove the log4j jars in these contribs, all tests pass 
 with them removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs

2010-12-30 Thread Robert Muir (JIRA)
remove unnecessary (and problematic) log4j jars in contribs
---

 Key: SOLR-2303
 URL: https://issues.apache.org/jira/browse/SOLR-2303
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Robert Muir
 Fix For: 4.0
 Attachments: SOLR-2303.patch

In solr 4.0 there is log4j-over-slf4j.

But if you have log4j jars also in the classpath (e.g. contrib/extraction, 
contrib/clustering) you can get strange errors such as:
java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V

So I think we should remove the log4j jars in these contribs, all tests pass 
with them removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li
I did another test using lucene 4 trunk with default codecs. it's file
is the same as lucene 2.9.
the speed is almost the same as lucene 2.9

 I think it could be the
fact that AND query does block reads (64 doc/freqs at once) instead of
doc-at-once?  Ie, because of this, the query is efficitively scanning
the next block of 64 docs instead of skipping to them?  Our skipping
impl is unfortunately rather costly so if skip will not skip that many
docs it's better to scan.
I agree with this explanation. for high frequency terms, the skiplist can
not skip over many docs. it seems there are something need optimization.
e.g. for high frequent terms, we use scanning; for low frequent terms, we
use skiplist. but if we only care bad case, we can just care high frequent
terms only.

You only use PFor for the very high freq terms in 2.9.x right?
I use PFor if df is greater than 128. if not, I use VINT

until we fix Lucene to run a single search concurrently (which we
badly need to do).
I am interested in this idea.(I have posted it before) do you have some
resources such as papers or tech articles about it?
I have tried but it need to modify index format dramatically and we use
solr distributed search to relieve the problem of response time. so finally
give it up.
lucene4's index format is more flexible that it supports customed codecs
and it's now on development, I think it's good time to take it into
consideration
that let it support multithread searching for a single query.
I have a naive solution. dividing docList into many groups
e.g grouping docIds by it's even or odd
term1 df1=4  docList =  0  4  8  10
term1 df2=4  docList = 1  3  9  11

term2 df1=4  docList = 0  6  8  12
term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
and finally merge their results into a single on just like solr
distributed search.
But it's better than solr distributed search.
   First, it's in a single process and data communication between
threads is much
faster than network.
   Second, each threads process the same number of documents.For solr
distributed
search, one shard may process 7 documents and another shard may 1 document
Even if we can make each shard have the same document number. we can not
make it uniformly for each term.
e.g. shard1 has doc1 doc2
   shard2 has doc3 doc4
but term1 may only occur in doc1 and doc2
while term2 may only occur in doc3 and doc4
we may modify it
   shard1 doc1 doc3
   shard2 doc2 doc4
it's good for term1 and term2
but term3 may occur in doc1 and doc3...
So I think it's fine-grained distributed in index while solr
distributed search is coarse-
grained.







2010/12/30 Michael McCandless luc...@mikemccandless.com:
 On Mon, Dec 27, 2010 at 5:08 AM, Li Li fancye...@gmail.com wrote:
 I integrated pfor codec into lucene 2.9.3 and the search time
 comparsion is as follows:
                                   single term   and query   or query
 VINT in lucene 2.9.3         11.2            36.5           38.6
 PFor in lucene 2.9.3         8.7              27.6           33.4
 VINT in lucene 4 branch   10.6             26.5           35.4
 PFor in lcuene 4 branch    8.1              22.5           30.7

 My test terms are high frequncy terms because we are interested in bad case

 I agree it's the bad cases we should focus on in general.  If a super
 fast query gets somewhat slower it's relatively harmless (just a
 capacity question for high volume sites) but if the bad queries get
 slower it's awful (requires faster cutover to sharded architecture),
 until we fix Lucene to run a single search concurrently (which we
 badly need to do).

 It seems lucene 4 branch's implementation of and query(conjuction
 query) is well optimized that even for VINT codec, it's faster than
 PFor in lucene 2.9.3. Could any one tell me what optimization is done?
 is store docIDs and freqs separately making it faster? or anything
 else?

 Actually vInt on the bulkpostings branch stores freq/doc together.  Ie
 the format is the same as 2.9.x's format.  I think it could be the
 fact that AND query does block reads (64 doc/freqs at once) instead of
 doc-at-once?  Ie, because of this, the query is efficitively scanning
 the next block of 64 docs instead of skipping to them?  Our skipping
 impl is unfortunately rather costly so if skip will not skip that many
 docs it's better to scan.

 Another querstion, Is there anyone interested in integrating pfor
 codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
 1.4). And how do I contribute this patch?

 Realistically I don't think we can commit this to 2.9.x -- that branch
 is purely bug fixes at this point.

 Still it's possible others could make use of such a patch so if it's
 not too much work you may as well post it?  It can lead to
 improvements on the bulk postings branch too :)  The more patches the
 merrier!

 You only use PFor for the very high freq terms 

Re: strange problem of PForDelta decoder

2010-12-30 Thread Earwin Burrfoot
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
 distributed search is coarse-
 grained.
This is just crazy :)

The simple way is just to search different segments in parallel.
BalancedSegmentMergePolicy makes sure you have roughly even-sized
large segments (and small ones don't count, they're small!).
If you're bound on squeezing out that extra millisecond (and making
your life miserable along the way), you can search a single segment
with multiple threads (by dividing it in even chunks, and then doing
skipTo to position your iterators to the beginning of each chunk).

First approach is really easy to implement. Second one is harder, but
still doesn't require you to cook the number of CPU cores available
into your index!

It's the law of diminishing returns at play here. You're most likely
to search in parallel over mostly memory-resident index
(RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
tend to slow down considerably on parallel sequential reads, so you
already have pretty decent speed.
Searching different segments in parallel (with BSMP) makes you several
times faster.
Searching in parallel within a segment requires some weird hacks, but
has maybe a few percent advantage over previous solution.
Sharding posting lists requires a great deal of weird hacks, makes
index machine-bound, and boosts speed by another couple of percent.
Sounds worthless.

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Troy Howard
Scott,

I will gladly help put this proposal together and would like to
volunteer as a committer. I am  communicating with others to find some
additional candidates to be committers.

Regarding Heath, a quote from his last message in this thread:

While I have developed extensively against Lucene.net, I do not
possess the java skills needed to do a port of the code... So, while I
wouldn't mind being a committer, I do not think I am qualified.

Thanks,
Troy


On Thu, Dec 30, 2010 at 10:01 AM, Lombard, Scott
slomb...@kingindustries.com wrote:
 Grant,

 Thanks for your time explaining all the details.  I will be willing work on a 
 proposal to put Lucene.Net back in to incubation.  I will need other people 
 to step up and be committers as well.  Heath has volunteered and as Grant has 
 stated 4 committers are needed to for incubation.  Who else is willing to be 
 a committer?

 Grant I will definitely be taking you up on your offer to help on bring 
 Lucene.Net into incubation.

 Scott


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Thursday, December 30, 2010 12:32 PM
 To: lucene-net-u...@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org


 On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote:

 Hi Grant,

 Thanks for taking the time to respond.

 While I have developed extensively against Lucene.net, I do not possess the 
 java skills needed to do a port of the code... So, while I wouldn't mind 
 being a committer, I do not think I am qualified. (I guess if I was, I could 
 just use Lucene proper and that would be that)

 As to other duties of a committer, I think the ASF is perceived as a black 
 box of questions for most of us.

 For one, I don't think anyone outside the 4 committers even understand *why* 
 it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc.  Maybe 
 if there was an understanding of the why, the requirements of the ASF would 
 make more sense.  I think a lot of us right now just perceive the ASF as the 
 group that is wanting to kill Lucene.net.

 I don't think we have a desire to kill it, I just think we are faced with the 
 unfortunate reality that the project is already dead and now us on the PMC 
 have the unfortunate job of cleaning up the mess as best we can.  Again, it 
 is not even that we want to see it go away, we on the PMC just don't want to 
 be responsible for it's upkeep.  You give me the names of 4 people who are 
 willing to be committers (i.e. people willing to volunteer their time) and I 
 will do my best to get the project into the Incubator.  However, I have to 
 tell you, my willingness to help is diminishing with every trip we take 
 around this same circle of discussion.

 Simply put, given the way the vote has gone so far, the Lucene PMC is no 
 longer interested in sustaining this project.  If the community wishes to see 
 it live at the ASF then one of you had better step up and spend 20-30 minutes 
 of your time writing up the draft proposal (most of it can be copied and 
 pasted) and circulating it.  In fact, given the amount of time some of you 
 have no doubt spent writing on this and other related threads you could have 
 put together the large majority of the proposal, circulated the draft and got 
 other volunteers to help and already be moving forward in a positive 
 direction.  Truth be told, I would do it, but I am explicitly not going to 
 because I think that if the community can't take that one step to move 
 forward, then it truly doesn't deserve to.


 I get your comments about the slower than slow development, but that is also 
 somewhat of a sign that it works.  While 2.9.2 may be behind, it seems very 
 stable with very few issues.  If we send the project to the attic, how will 
 anyone be able to submit bugfixes ever?  Frankly, I use 2.9.2 every day and 
 have not found bugs in the areas that I use... but I'm sure they are in 
 there somewhere.

 As for the name, I thought Lucene.net was the name of the project back in 
 the SourceForge days...
 So my question is based on the premise that if the lucene.net name was 
 brought *to* ASF, why can the community not leave with it?

 Again, IANAL, but just b/c it was improperly used beforehand does not mean it 
 is legally owned by some other entity.  The Lucene name has been at the ASF 
 since 2001 and Lucene.NET is also now a part of the ASF.  (If your 
 interested, go look at the discussions around iBatis and the movement of that 
 community to MyBatis)

 -Grant


 This message (and any associated files) is intended only for the
 use of the individual or entity to which it is addressed and may
 contain information that is confidential, subject to copyright or
 constitutes a trade secret. If you are not the intended recipient
 you are hereby notified that any dissemination, copying or
 distribution of this message, or files associated with this message,
 is strictly prohibited. If you have received this message in 

[jira] Updated: (LUCENE-2838) ConstantScoreQuery should directly support wrapping Query and simply strip off scores

2010-12-30 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2838:
--

Attachment: LUCENE-2838-no-topscorer-opt.patch

After thinking one day about it, I found some problems with the collector 
hack and this style of decorator pattern:
- If you wrap multiple times, the setScorer() method in the wrapped collector 
may set the wrong scorer (you see this, if you wrap multiple 
ConstantScoreQueries on top of each other, then the boost of the inner one is 
returned. The problem is that the score(Collector) method inverts the decorator 
pattern.
- The inner scorer (like BoolenScorer with its buckets) may set a different 
scorer in the collector than itsself that implements doc() different, so 
setting the ConstantScorer always as collector's scorer can lead to wrong 
results (we dont see this in the test, as no collector uses Scorer.doc(), only 
Scorer.score(), which returns constant).

I changed the code so CSQ now passes always topScorer=false to Weight.scorer() 
of the wrapped query and does not overwrite score(Collector,...) methods. It 
still allows out-of-order collection. Now BooleanScorer2 is always used with 
MTQs.

The question is, would the previous but broken optimization make sense for 
speed? Mike/Mark?

 ConstantScoreQuery should directly support wrapping Query and simply strip 
 off scores
 -

 Key: LUCENE-2838
 URL: https://issues.apache.org/jira/browse/LUCENE-2838
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2838-no-topscorer-opt.patch, LUCENE-2838.patch, 
 LUCENE-2838.patch


 Especially in MultiTermQuery rewrite modes we often simply need to strip off 
 scores from Queries and make them constant score. Currently the code to do 
 this looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query))
 As the name says, QueryWrapperFilter should make any other Query constant 
 score, so why does it not take a Query as ctor param? This question was aldso 
 asked quite often by my customers and is simply correct, if you think about 
 it.
 Looking closer into the code, it is clear that this would also speed up MTQs:
 - One additional wrapping and method calls can be removed
 - Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only 
 used in tests and the use-case for this class is not really available) and 
 LUCENE-2831 does not need the stupid hack to make Simon's assertions pass
 - CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on 
 top-level now directly feeds the Collector. For that a small trick is used: 
 The score(Collector) calls are directly delegated and the scores are stripped 
 by wrapping the setScorer() method in Collector
 During that I found a visibility bug in Scorer (LUCENE-2839): The method 
 boolean score(Collector collector, int max, int firstDocID) should be 
 public not protected, as its not solely intended to be overridden by 
 subclasses and is called from other classes, too! This leads to no compiler 
 bugs as the other classes that calls it is mainly BooleanScorer(2) and thats 
 in same package, but visibility is wrong. I will open an issue for that and 
 fix it at least in trunk where we have no backwards-requirement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2010-12-30 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2611:


Attachment: LUCENE-2611.patch

Added IntelliJ codestyle definition and instructions for putting it in the 
correct location. Committing shortly.

 IntelliJ IDEA and Eclipse setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, 
 LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming.
 The attached patches add a new top level directory {{dev-tools/}} with 
 sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, 
 as well as top-level ant targets named idea and eclipse that copy these 
 files into the proper locations.  This arrangement avoids the messiness 
 attendant to in-place project configuration files directly checked into 
 source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit run configuration per module 
 is included.
 The Eclipse configuration includes a source entry for each 
 source/test/resource location and classpath setup: a library entry for each 
 jar.
 For IDEA, once {{ant idea}} has been run, the only configuration that must be 
 performed manually is configuring the project-level JDK.  For Eclipse, once 
 {{ant eclipse}} has been run, the user has to refresh the project 
 (right-click on the project and choose Refresh).
 If these patches is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination IDEA and Eclipse configuration 
 locations.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch for IDEA: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Grant Ingersoll
I would take an existing Incubator Proposal and copy and paste it into a new 
one and then send the link here and get people to start editing on it.

-Grant

On Dec 30, 2010, at 2:45 PM, Lombard, Scott wrote:

 
 From everything that was said it seems apparent to me that the only way for 
 Lucene.Net to stay alive is to move back to incubation.  So where do we go 
 from here?  More than 4 people have said they are willing to be committers.  
 Is this email list the best place to start working on a proposal, should it 
 be done between a small group offline or is there a way that the community 
 can work on it together?
 
 Thoughts?
 Scott
 
 
 -Original Message-
 From: Troy Howard [mailto:thowar...@gmail.com]
 Sent: Thursday, December 30, 2010 2:22 PM
 To: lucene-net-...@lucene.apache.org
 Cc: lucene-net-u...@lucene.apache.org
 Subject: Re: RE: Vote thread started on gene...@lucene.apache.org
 
 Marco,
 
 I agree with you on this front. I feel that the first tasks that a new
 Lucene.Net team should focus on, in terms of development are:
 
 - Fully automating a line-by-line port using a tool such as Sharpen.
 This needs to become a commodity function requiring very little
 development effort
 - Bring the existing forks back in as branches within the ASF project.
 I am very interested in pursuing continued development on a more .NET
 style port (i.e. the Lucere project I started or Aimee.Net
 
 The Lucene.Net project should be able to continue with both
 development paths in the same project.
 
 Thanks,
 Troy
 
 
 
 
 On Thu, Dec 30, 2010 at 10:15 AM, Marco Dissel marco.dis...@gmail.com wrote:
 What will be the goal of new committors? Convert the source into .net style
 code? If yes, we should try to stop will all the spin-offs and concentrate
 all the development in one project.
 Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com
 het volgende:
 Grant,
 
 Thanks for your time explaining all the details. I will be willing work on
 a proposal to put Lucene.Net back in to incubation. I will need other people
 to step up and be committers as well. Heath has volunteered and as Grant has
 stated 4 committers are needed to for incubation. Who else is willing to be
 a committer?
 
 Grant I will definitely be taking you up on your offer to help on bring
 Lucene.Net into incubation.
 
 Scott
 
 
 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Thursday, December 30, 2010 12:32 PM
 To: lucene-net-u...@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org
 
 
 On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote:
 
 Hi Grant,
 
 Thanks for taking the time to respond.
 
 While I have developed extensively against Lucene.net, I do not possess
 the java skills needed to do a port of the code... So, while I wouldn't mind
 being a committer, I do not think I am qualified. (I guess if I was, I could
 just use Lucene proper and that would be that)
 
 As to other duties of a committer, I think the ASF is perceived as a
 black box of questions for most of us.
 
 For one, I don't think anyone outside the 4 committers even understand
 *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc.
 Maybe if there was an understanding of the why, the requirements of the ASF
 would make more sense. I think a lot of us right now just perceive the ASF
 as the group that is wanting to kill Lucene.net.
 
 I don't think we have a desire to kill it, I just think we are faced with
 the unfortunate reality that the project is already dead and now us on the
 PMC have the unfortunate job of cleaning up the mess as best we can. Again,
 it is not even that we want to see it go away, we on the PMC just don't want
 to be responsible for it's upkeep. You give me the names of 4 people who are
 willing to be committers (i.e. people willing to volunteer their time) and I
 will do my best to get the project into the Incubator. However, I have to
 tell you, my willingness to help is diminishing with every trip we take
 around this same circle of discussion.
 
 Simply put, given the way the vote has gone so far, the Lucene PMC is no
 longer interested in sustaining this project. If the community wishes to see
 it live at the ASF then one of you had better step up and spend 20-30
 minutes of your time writing up the draft proposal (most of it can be copied
 and pasted) and circulating it. In fact, given the amount of time some of
 you have no doubt spent writing on this and other related threads you could
 have put together the large majority of the proposal, circulated the draft
 and got other volunteers to help and already be moving forward in a positive
 direction. Truth be told, I would do it, but I am explicitly not going to
 because I think that if the community can't take that one step to move
 forward, then it truly doesn't deserve to.
 
 
 I get your comments about the slower than slow development, but that is
 also somewhat of a sign that it 

Re: RE: Vote thread started on gene...@lucene.apache.org

2010-12-30 Thread Troy Howard
Scott,

We should communicate on the public list as much as possible. I'll put
together the draft proposal today, post it here, and ask for feedback
from both the Lucene PMC and the community. We will wait over the
weekend and Monday to allow people who might have additional input the
opportunity to either see this at home or at work.

On Tuesday (Jan 4th) we will move forward with whatever our best
effort has produced and go from there.

Thanks,
Troy


On Thu, Dec 30, 2010 at 11:45 AM, Lombard, Scott
slomb...@kingindustries.com wrote:

 From everything that was said it seems apparent to me that the only way for 
 Lucene.Net to stay alive is to move back to incubation.  So where do we go 
 from here?  More than 4 people have said they are willing to be committers.  
 Is this email list the best place to start working on a proposal, should it 
 be done between a small group offline or is there a way that the community 
 can work on it together?

 Thoughts?
 Scott


 -Original Message-
 From: Troy Howard [mailto:thowar...@gmail.com]
 Sent: Thursday, December 30, 2010 2:22 PM
 To: lucene-net-...@lucene.apache.org
 Cc: lucene-net-u...@lucene.apache.org
 Subject: Re: RE: Vote thread started on gene...@lucene.apache.org

 Marco,

 I agree with you on this front. I feel that the first tasks that a new
 Lucene.Net team should focus on, in terms of development are:

 - Fully automating a line-by-line port using a tool such as Sharpen.
 This needs to become a commodity function requiring very little
 development effort
 - Bring the existing forks back in as branches within the ASF project.
 I am very interested in pursuing continued development on a more .NET
 style port (i.e. the Lucere project I started or Aimee.Net

 The Lucene.Net project should be able to continue with both
 development paths in the same project.

 Thanks,
 Troy




 On Thu, Dec 30, 2010 at 10:15 AM, Marco Dissel marco.dis...@gmail.com wrote:
 What will be the goal of new committors? Convert the source into .net style
 code? If yes, we should try to stop will all the spin-offs and concentrate
 all the development in one project.
 Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com
 het volgende:
 Grant,

 Thanks for your time explaining all the details. I will be willing work on
 a proposal to put Lucene.Net back in to incubation. I will need other people
 to step up and be committers as well. Heath has volunteered and as Grant has
 stated 4 committers are needed to for incubation. Who else is willing to be
 a committer?

 Grant I will definitely be taking you up on your offer to help on bring
 Lucene.Net into incubation.

 Scott


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Thursday, December 30, 2010 12:32 PM
 To: lucene-net-u...@lucene.apache.org
 Subject: Re: Vote thread started on gene...@lucene.apache.org


 On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote:

 Hi Grant,

 Thanks for taking the time to respond.

 While I have developed extensively against Lucene.net, I do not possess
 the java skills needed to do a port of the code... So, while I wouldn't mind
 being a committer, I do not think I am qualified. (I guess if I was, I could
 just use Lucene proper and that would be that)

 As to other duties of a committer, I think the ASF is perceived as a
 black box of questions for most of us.

 For one, I don't think anyone outside the 4 committers even understand
 *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc.
 Maybe if there was an understanding of the why, the requirements of the ASF
 would make more sense. I think a lot of us right now just perceive the ASF
 as the group that is wanting to kill Lucene.net.

 I don't think we have a desire to kill it, I just think we are faced with
 the unfortunate reality that the project is already dead and now us on the
 PMC have the unfortunate job of cleaning up the mess as best we can. Again,
 it is not even that we want to see it go away, we on the PMC just don't want
 to be responsible for it's upkeep. You give me the names of 4 people who are
 willing to be committers (i.e. people willing to volunteer their time) and I
 will do my best to get the project into the Incubator. However, I have to
 tell you, my willingness to help is diminishing with every trip we take
 around this same circle of discussion.

 Simply put, given the way the vote has gone so far, the Lucene PMC is no
 longer interested in sustaining this project. If the community wishes to see
 it live at the ASF then one of you had better step up and spend 20-30
 minutes of your time writing up the draft proposal (most of it can be copied
 and pasted) and circulating it. In fact, given the amount of time some of
 you have no doubt spent writing on this and other related threads you could
 have put together the large majority of the proposal, circulated the draft
 and got other volunteers to help and already be moving forward in a 

[jira] Updated: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2010-12-30 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2611:


Attachment: LUCENE-2611-branch-3x.patch

branch_3x version of IntelliJ config files, including codestyle addition. 
Committing shortly.

 IntelliJ IDEA and Eclipse setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming.
 The attached patches add a new top level directory {{dev-tools/}} with 
 sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, 
 as well as top-level ant targets named idea and eclipse that copy these 
 files into the proper locations.  This arrangement avoids the messiness 
 attendant to in-place project configuration files directly checked into 
 source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit run configuration per module 
 is included.
 The Eclipse configuration includes a source entry for each 
 source/test/resource location and classpath setup: a library entry for each 
 jar.
 For IDEA, once {{ant idea}} has been run, the only configuration that must be 
 performed manually is configuring the project-level JDK.  For Eclipse, once 
 {{ant eclipse}} has been run, the user has to refresh the project 
 (right-click on the project and choose Refresh).
 If these patches is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination IDEA and Eclipse configuration 
 locations.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch for IDEA: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2301) RSS Feed URL Breaking

2010-12-30 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2301.


Resolution: Not A Problem

Based on the info you have provided, it seems that your problem has nothing to 
do with DIH, and everything to do with having an invalid XML file for your data 
config...

bq. [Fatal Error] :18:63: The reference to entity c must end with the ';' 
delimiter.

...c=19... is not valid in an xml file, you need to properly xml escape the 
 in the URL

 RSS Feed URL Breaking
 -

 Key: SOLR-2301
 URL: https://issues.apache.org/jira/browse/SOLR-2301
 Project: Solr
  Issue Type: Bug
  Components: clients - C#
Affects Versions: 1.4.1, 4.0
 Environment: Windows 7
Reporter: Adam Estrada

 This is an odd oneI am trying to index RSS feeds and have come across 
 several issues. Some are more pressing than others. Referring to SOLR-2286 ;-)
 Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work 
 with
 Home page:
 http://emergency.cdc.gov/rss/
 Page to Index:
 http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19
 The console reports the following and as you can see it's because it does not 
 like the param c. Any ideas on how to fix this?
 INFO: Processing configuration from solrconfig.xml: 
 {config=./solr/conf/dataimpo
 rthandler/rss.xml}
 [Fatal Error] :18:63: The reference to entity c must end with the ';' 
 delimite
 r.
 Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler 
 inf
 orm
 SEVERE: Exception while loading DataImporter
 org.apache.solr.handler.dataimport.DataImportHandlerException: Exception 
 occurre
 d while initializing context
 at 
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm
 porter.java:193)
 at 
 org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j
 ava:100)
 at 
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor
 tHandler.java:112)
 at 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav
 a:539)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:596)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294)
 at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain
 er.java:243)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs

2010-12-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976143#action_12976143
 ] 

Hoss Man commented on SOLR-2303:


I think the purpose of the log4j-over-slf4j jars was so that the third party 
jars included in solr (and in contribs) which use log4j logging will have all 
of their messages funneled through slf4j so all logging for basic solr users 
will be consistent (JUL) -- if you remove it, some solr logging will use 
slf4j-JUL and some will go direct to log4j.

I *think* the other log4j jars you mentioned (contrib/extraction, 
contrib/clustering) are the ones that should be removed. (untested that this 
doesn't break anything)



 remove unnecessary (and problematic) log4j jars in contribs
 ---

 Key: SOLR-2303
 URL: https://issues.apache.org/jira/browse/SOLR-2303
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2303.patch


 In solr 4.0 there is log4j-over-slf4j.
 But if you have log4j jars also in the classpath (e.g. contrib/extraction, 
 contrib/clustering) you can get strange errors such as:
 java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V
 So I think we should remove the log4j jars in these contribs, all tests pass 
 with them removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-975) admin-extra.html not currectly display when using multicore configuration

2010-12-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976145#action_12976145
 ] 

Hoss Man commented on SOLR-975:
---

FYI: i'm pretty sure yonik fixed this as part of SOLR-1930, but i haven't 
tested...

http://svn.apache.org/viewvc?view=revisionrevision=1054008

 admin-extra.html not currectly display when using multicore configuration
 -

 Key: SOLR-975
 URL: https://issues.apache.org/jira/browse/SOLR-975
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
 Environment: Jetty openjdk 1.6.0 1.0.b12 (EPEL package for EL5)
Reporter: Edward Rudd

 I'm having cross-talk issues with using the Solr nightlies (and probably w/ 
 1.3.0 release but have not tested as I needed newer features of the 
 DataImportHandler in the nightlies) 
 Basic scenario for this bug is as follows
 I have two cores configured and BOTH have a customized admin-extra.html, 
 however going to the admin pages uses the SAME admin-extra.html for all 
 cores.   the one used is whichever core is browsed first..This looks like 
 a caching bug where the cache is not taking into account the Core.
 Basically my admin-extra.html has a link to the data importer script and a 
 link to reload the core (which has to have the core name explicitly in the 
 per-core admin-extra.html).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs

2010-12-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976146#action_12976146
 ] 

Robert Muir commented on SOLR-2303:
---

hoss, exactly what I tested... I think it doesn't show in the patch, but I 
want to remove the log4j jars in the contribs. 

if these are in the classpath, it causes problems for velocity etc (its test 
will fail). so I think they should be removed from the contribs as it can break 
functionality in core if you use these contribs (besides just being unnecessary 
bloat)

 remove unnecessary (and problematic) log4j jars in contribs
 ---

 Key: SOLR-2303
 URL: https://issues.apache.org/jira/browse/SOLR-2303
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2303.patch


 In solr 4.0 there is log4j-over-slf4j.
 But if you have log4j jars also in the classpath (e.g. contrib/extraction, 
 contrib/clustering) you can get strange errors such as:
 java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V
 So I think we should remove the log4j jars in these contribs, all tests pass 
 with them removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs

2010-12-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976149#action_12976149
 ] 

Hoss Man commented on SOLR-2303:


I'm an idiot .. trying to catch up on mail i completely missread almost 
everything about this issue.

yes, yes .. agree with you 100% .. remove the log4j jars in the contribs

 remove unnecessary (and problematic) log4j jars in contribs
 ---

 Key: SOLR-2303
 URL: https://issues.apache.org/jira/browse/SOLR-2303
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2303.patch


 In solr 4.0 there is log4j-over-slf4j.
 But if you have log4j jars also in the classpath (e.g. contrib/extraction, 
 contrib/clustering) you can get strange errors such as:
 java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V
 So I think we should remove the log4j jars in these contribs, all tests pass 
 with them removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-3.x - Build # 227 - Failure

2010-12-30 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/227/

All tests passed

Build Log (for compile errors):
[...truncated 20926 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-975) admin-extra.html not currectly display when using multicore configuration

2010-12-30 Thread Edward Rudd (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976154#action_12976154
 ] 

Edward Rudd commented on SOLR-975:
--

From looking at the diff it looks like it could fix it.. But it needs to be 
verified that it is indeed fixed.  I'll at to my TODO list to pull down a 
nightly build next week and test.

 admin-extra.html not currectly display when using multicore configuration
 -

 Key: SOLR-975
 URL: https://issues.apache.org/jira/browse/SOLR-975
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
 Environment: Jetty openjdk 1.6.0 1.0.b12 (EPEL package for EL5)
Reporter: Edward Rudd

 I'm having cross-talk issues with using the Solr nightlies (and probably w/ 
 1.3.0 release but have not tested as I needed newer features of the 
 DataImportHandler in the nightlies) 
 Basic scenario for this bug is as follows
 I have two cores configured and BOTH have a customized admin-extra.html, 
 however going to the admin pages uses the SAME admin-extra.html for all 
 cores.   the one used is whichever core is browsed first..This looks like 
 a caching bug where the cache is not taking into account the Core.
 Basically my admin-extra.html has a link to the data importer script and a 
 link to reload the core (which has to have the core name explicitly in the 
 per-core admin-extra.html).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li
searching multi segments is a alternative solution but it has some
disadvantages.
1. idf is not global?(I am not familiar with its implementation) maybe
it's easy to solve it by share global idf
2. each segments will has it's own tii and tis files, which may make
search slower(that's why optimization of
index is neccessary)
3. one term's  docList is distributed in many files rather than one.
more than one frq files means
hard disk must seek different tracks, it's time consuming. if there is
only one segment, the are likely
stored in a single track.


2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
 distributed search is coarse-
 grained.
 This is just crazy :)

 The simple way is just to search different segments in parallel.
 BalancedSegmentMergePolicy makes sure you have roughly even-sized
 large segments (and small ones don't count, they're small!).
 If you're bound on squeezing out that extra millisecond (and making
 your life miserable along the way), you can search a single segment
 with multiple threads (by dividing it in even chunks, and then doing
 skipTo to position your iterators to the beginning of each chunk).

 First approach is really easy to implement. Second one is harder, but
 still doesn't require you to cook the number of CPU cores available
 into your index!

 It's the law of diminishing returns at play here. You're most likely
 to search in parallel over mostly memory-resident index
 (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
 tend to slow down considerably on parallel sequential reads, so you
 already have pretty decent speed.
 Searching different segments in parallel (with BSMP) makes you several
 times faster.
 Searching in parallel within a segment requires some weird hacks, but
 has maybe a few percent advantage over previous solution.
 Sharding posting lists requires a great deal of weird hacks, makes
 index machine-bound, and boosts speed by another couple of percent.
 Sounds worthless.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li
plus
2 means search a term need seek many times for tis(if it's not cached in tii)

2010/12/31 Li Li fancye...@gmail.com:
 searching multi segments is a alternative solution but it has some
 disadvantages.
 1. idf is not global?(I am not familiar with its implementation) maybe
 it's easy to solve it by share global idf
 2. each segments will has it's own tii and tis files, which may make
 search slower(that's why optimization of
 index is neccessary)
 3. one term's  docList is distributed in many files rather than one.
 more than one frq files means
 hard disk must seek different tracks, it's time consuming. if there is
 only one segment, the are likely
 stored in a single track.


 2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
 distributed search is coarse-
 grained.
 This is just crazy :)

 The simple way is just to search different segments in parallel.
 BalancedSegmentMergePolicy makes sure you have roughly even-sized
 large segments (and small ones don't count, they're small!).
 If you're bound on squeezing out that extra millisecond (and making
 your life miserable along the way), you can search a single segment
 with multiple threads (by dividing it in even chunks, and then doing
 skipTo to position your iterators to the beginning of each chunk).

 First approach is really easy to implement. Second one is harder, but
 still doesn't require you to cook the number of CPU cores available
 into your index!

 It's the law of diminishing returns at play here. You're most likely
 to search in parallel over mostly memory-resident index
 (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
 tend to slow down considerably on parallel sequential reads, so you
 already have pretty decent speed.
 Searching different segments in parallel (with BSMP) makes you several
 times faster.
 Searching in parallel within a segment requires some weird hacks, but
 has maybe a few percent advantage over previous solution.
 Sharding posting lists requires a great deal of weird hacks, makes
 index machine-bound, and boosts speed by another couple of percent.
 Sounds worthless.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-trunk - Build # 1411 - Failure

2010-12-30 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1411/

All tests passed

Build Log (for compile errors):
[...truncated 17900 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: strange problem of PForDelta decoder

2010-12-30 Thread Li Li
is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
it says Multithreading. Indices can be queried and scored concurrently.
maybe we can learn something from it.

2010/12/31 Li Li fancye...@gmail.com:
 plus
 2 means search a term need seek many times for tis(if it's not cached in tii)

 2010/12/31 Li Li fancye...@gmail.com:
 searching multi segments is a alternative solution but it has some
 disadvantages.
 1. idf is not global?(I am not familiar with its implementation) maybe
 it's easy to solve it by share global idf
 2. each segments will has it's own tii and tis files, which may make
 search slower(that's why optimization of
 index is neccessary)
 3. one term's  docList is distributed in many files rather than one.
 more than one frq files means
 hard disk must seek different tracks, it's time consuming. if there is
 only one segment, the are likely
 stored in a single track.


 2010/12/31 Earwin Burrfoot ear...@gmail.com:
until we fix Lucene to run a single search concurrently (which we
badly need to do).
 I am interested in this idea.(I have posted it before) do you have some
 resources such as papers or tech articles about it?
 I have tried but it need to modify index format dramatically and we use
 solr distributed search to relieve the problem of response time. so finally
 give it up.
 lucene4's index format is more flexible that it supports customed codecs
 and it's now on development, I think it's good time to take it into
 consideration
 that let it support multithread searching for a single query.
 I have a naive solution. dividing docList into many groups
 e.g grouping docIds by it's even or odd
 term1 df1=4  docList =  0  4  8  10
 term1 df2=4  docList = 1  3  9  11

 term2 df1=4  docList = 0  6  8  12
 term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
 and finally merge their results into a single on just like solr
 distributed search.
 But it's better than solr distributed search.
   First, it's in a single process and data communication between
 threads is much
 faster than network.
   Second, each threads process the same number of documents.For solr
 distributed
 search, one shard may process 7 documents and another shard may 1 document
 Even if we can make each shard have the same document number. we can not
 make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
 distributed search is coarse-
 grained.
 This is just crazy :)

 The simple way is just to search different segments in parallel.
 BalancedSegmentMergePolicy makes sure you have roughly even-sized
 large segments (and small ones don't count, they're small!).
 If you're bound on squeezing out that extra millisecond (and making
 your life miserable along the way), you can search a single segment
 with multiple threads (by dividing it in even chunks, and then doing
 skipTo to position your iterators to the beginning of each chunk).

 First approach is really easy to implement. Second one is harder, but
 still doesn't require you to cook the number of CPU cores available
 into your index!

 It's the law of diminishing returns at play here. You're most likely
 to search in parallel over mostly memory-resident index
 (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
 tend to slow down considerably on parallel sequential reads, so you
 already have pretty decent speed.
 Searching different segments in parallel (with BSMP) makes you several
 times faster.
 Searching in parallel within a segment requires some weird hacks, but
 has maybe a few percent advantage over previous solution.
 Sharding posting lists requires a great deal of weird hacks, makes
 index machine-bound, and boosts speed by another couple of percent.
 Sounds worthless.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 3205 - Failure

2010-12-30 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3205/

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin

Error Message:
expected:1 but was:0

Stack Trace:
junit.framework.AssertionFailedError: expected:1 but was:0
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:256)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1109)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1047)




Build Log (for compile errors):
[...truncated 7671 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr-3.x - Build # 213 - Failure

2010-12-30 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Solr-3.x/213/

All tests passed

Build Log (for compile errors):
[...truncated 20564 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org