RE: RE: Vote thread started on gene...@lucene.apache.org
Marco, My feeling would be to create strong automated conversion tools to allow java Lucene to be ported in to .NET in as few steps and as possible. The .net style goal is a noble one, but will require a significant more commitment to the project in the future. As each new version of java Lucene will have to be integrated by hand into the .net version. As the conversion tools get more advanced and robust .net style code may be implemented as part of the automated conversion process. Scott -Original Message- From: Marco Dissel [mailto:marco.dis...@gmail.com] Sent: Thursday, December 30, 2010 1:16 PM To: lucene-net-u...@lucene.apache.org Cc: lucene-net-dev@lucene.apache.org Subject: Re: RE: Vote thread started on gene...@lucene.apache.org What will be the goal of new committors? Convert the source into .net style code? If yes, we should try to stop will all the spin-offs and concentrate all the development in one project. Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com het volgende: Grant, Thanks for your time explaining all the details. I will be willing work on a proposal to put Lucene.Net back in to incubation. I will need other people to step up and be committers as well. Heath has volunteered and as Grant has stated 4 committers are needed to for incubation. Who else is willing to be a committer? Grant I will definitely be taking you up on your offer to help on bring Lucene.Net into incubation. Scott -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, December 30, 2010 12:32 PM To: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote: Hi Grant, Thanks for taking the time to respond. While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. (I guess if I was, I could just use Lucene proper and that would be that) As to other duties of a committer, I think the ASF is perceived as a black box of questions for most of us. For one, I don't think anyone outside the 4 committers even understand *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc. Maybe if there was an understanding of the why, the requirements of the ASF would make more sense. I think a lot of us right now just perceive the ASF as the group that is wanting to kill Lucene.net. I don't think we have a desire to kill it, I just think we are faced with the unfortunate reality that the project is already dead and now us on the PMC have the unfortunate job of cleaning up the mess as best we can. Again, it is not even that we want to see it go away, we on the PMC just don't want to be responsible for it's upkeep. You give me the names of 4 people who are willing to be committers (i.e. people willing to volunteer their time) and I will do my best to get the project into the Incubator. However, I have to tell you, my willingness to help is diminishing with every trip we take around this same circle of discussion. Simply put, given the way the vote has gone so far, the Lucene PMC is no longer interested in sustaining this project. If the community wishes to see it live at the ASF then one of you had better step up and spend 20-30 minutes of your time writing up the draft proposal (most of it can be copied and pasted) and circulating it. In fact, given the amount of time some of you have no doubt spent writing on this and other related threads you could have put together the large majority of the proposal, circulated the draft and got other volunteers to help and already be moving forward in a positive direction. Truth be told, I would do it, but I am explicitly not going to because I think that if the community can't take that one step to move forward, then it truly doesn't deserve to. I get your comments about the slower than slow development, but that is also somewhat of a sign that it works. While 2.9.2 may be behind, it seems very stable with very few issues. If we send the project to the attic, how will anyone be able to submit bugfixes ever? Frankly, I use 2.9.2 every day and have not found bugs in the areas that I use... but I'm sure they are in there somewhere. As for the name, I thought Lucene.net was the name of the project back in the SourceForge days... So my question is based on the premise that if the lucene.net name was brought *to* ASF, why can the community not leave with it? Again, IANAL, but just b/c it was improperly used beforehand does not mean it is legally owned by some other entity. The Lucene name has been at the ASF since 2001 and Lucene.NET is also now a part of the ASF. (If your interested, go look at the discussions around iBatis and the movement of that community to MyBatis) -Grant This message (and any associated files) is intended only for
Re: Vote thread started on gene...@lucene.apache.org
Scott, I agree with everything you said. My opinion is that one of the largest failings of the current Lucene.Net development effort is that there's too much magic in the conversion process. This is assuming we continue with Lucene.Net as a line-by-line automated port. As Heath said, the details of how we run the project are up to the next group of committers to decide once that group has been established. I'm sure this issue (as well as numerous other issues) will be discussed in great detail and length by the community at that time. Thanks, Troy On Thu, Dec 30, 2010 at 10:57 AM, Lombard, Scott slomb...@kingindustries.com wrote: Troy, My feeling is that a combination Java and .Net experience is needed. Some people will focus on Bug fixes in the .Net code while other focus on the translation of the code as their experience allows. One of the things I would like to see different with Lucene.Net is that the method conversion is kept in the SVN or Wiki. I feel the pre and post processing as well as possibly extensions to what ever tool that is used for the conversion are more important to this project then the actual executed code. Keeping a focus on making strong conversion tools as a community should help reduce the lag between a Java releases to a .Net releases. We then won't be waiting for one person to make the conversion. Scott -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Thursday, December 30, 2010 1:38 PM To: lucene-net-u...@lucene.apache.org Cc: lucene-net-dev@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org Scott, I will gladly help put this proposal together and would like to volunteer as a committer. I am communicating with others to find some additional candidates to be committers. Regarding Heath, a quote from his last message in this thread: While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. Thanks, Troy On Thu, Dec 30, 2010 at 10:01 AM, Lombard, Scott slomb...@kingindustries.com wrote: Grant, Thanks for your time explaining all the details. I will be willing work on a proposal to put Lucene.Net back in to incubation. I will need other people to step up and be committers as well. Heath has volunteered and as Grant has stated 4 committers are needed to for incubation. Who else is willing to be a committer? Grant I will definitely be taking you up on your offer to help on bring Lucene.Net into incubation. Scott -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, December 30, 2010 12:32 PM To: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote: Hi Grant, Thanks for taking the time to respond. While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. (I guess if I was, I could just use Lucene proper and that would be that) As to other duties of a committer, I think the ASF is perceived as a black box of questions for most of us. For one, I don't think anyone outside the 4 committers even understand *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc. Maybe if there was an understanding of the why, the requirements of the ASF would make more sense. I think a lot of us right now just perceive the ASF as the group that is wanting to kill Lucene.net. I don't think we have a desire to kill it, I just think we are faced with the unfortunate reality that the project is already dead and now us on the PMC have the unfortunate job of cleaning up the mess as best we can. Again, it is not even that we want to see it go away, we on the PMC just don't want to be responsible for it's upkeep. You give me the names of 4 people who are willing to be committers (i.e. people willing to volunteer their time) and I will do my best to get the project into the Incubator. However, I have to tell you, my willingness to help is diminishing with every trip we take around this same circle of discussion. Simply put, given the way the vote has gone so far, the Lucene PMC is no longer interested in sustaining this project. If the community wishes to see it live at the ASF then one of you had better step up and spend 20-30 minutes of your time writing up the draft proposal (most of it can be copied and pasted) and circulating it. In fact, given the amount of time some of you have no doubt spent writing on this and other related threads you could have put together the large majority of the proposal, circulated the draft and got other volunteers to help and already be moving
Re: RE: Vote thread started on gene...@lucene.apache.org
Marco, I agree with you on this front. I feel that the first tasks that a new Lucene.Net team should focus on, in terms of development are: - Fully automating a line-by-line port using a tool such as Sharpen. This needs to become a commodity function requiring very little development effort - Bring the existing forks back in as branches within the ASF project. I am very interested in pursuing continued development on a more .NET style port (i.e. the Lucere project I started or Aimee.Net The Lucene.Net project should be able to continue with both development paths in the same project. Thanks, Troy On Thu, Dec 30, 2010 at 10:15 AM, Marco Dissel marco.dis...@gmail.com wrote: What will be the goal of new committors? Convert the source into .net style code? If yes, we should try to stop will all the spin-offs and concentrate all the development in one project. Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com het volgende: Grant, Thanks for your time explaining all the details. I will be willing work on a proposal to put Lucene.Net back in to incubation. I will need other people to step up and be committers as well. Heath has volunteered and as Grant has stated 4 committers are needed to for incubation. Who else is willing to be a committer? Grant I will definitely be taking you up on your offer to help on bring Lucene.Net into incubation. Scott -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, December 30, 2010 12:32 PM To: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote: Hi Grant, Thanks for taking the time to respond. While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. (I guess if I was, I could just use Lucene proper and that would be that) As to other duties of a committer, I think the ASF is perceived as a black box of questions for most of us. For one, I don't think anyone outside the 4 committers even understand *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc. Maybe if there was an understanding of the why, the requirements of the ASF would make more sense. I think a lot of us right now just perceive the ASF as the group that is wanting to kill Lucene.net. I don't think we have a desire to kill it, I just think we are faced with the unfortunate reality that the project is already dead and now us on the PMC have the unfortunate job of cleaning up the mess as best we can. Again, it is not even that we want to see it go away, we on the PMC just don't want to be responsible for it's upkeep. You give me the names of 4 people who are willing to be committers (i.e. people willing to volunteer their time) and I will do my best to get the project into the Incubator. However, I have to tell you, my willingness to help is diminishing with every trip we take around this same circle of discussion. Simply put, given the way the vote has gone so far, the Lucene PMC is no longer interested in sustaining this project. If the community wishes to see it live at the ASF then one of you had better step up and spend 20-30 minutes of your time writing up the draft proposal (most of it can be copied and pasted) and circulating it. In fact, given the amount of time some of you have no doubt spent writing on this and other related threads you could have put together the large majority of the proposal, circulated the draft and got other volunteers to help and already be moving forward in a positive direction. Truth be told, I would do it, but I am explicitly not going to because I think that if the community can't take that one step to move forward, then it truly doesn't deserve to. I get your comments about the slower than slow development, but that is also somewhat of a sign that it works. While 2.9.2 may be behind, it seems very stable with very few issues. If we send the project to the attic, how will anyone be able to submit bugfixes ever? Frankly, I use 2.9.2 every day and have not found bugs in the areas that I use... but I'm sure they are in there somewhere. As for the name, I thought Lucene.net was the name of the project back in the SourceForge days... So my question is based on the premise that if the lucene.net name was brought *to* ASF, why can the community not leave with it? Again, IANAL, but just b/c it was improperly used beforehand does not mean it is legally owned by some other entity. The Lucene name has been at the ASF since 2001 and Lucene.NET is also now a part of the ASF. (If your interested, go look at the discussions around iBatis and the movement of that community to MyBatis) -Grant This message (and any associated files) is intended only for the use of the individual or
RE: RE: Vote thread started on gene...@lucene.apache.org
Folks, I will freely admit that I'm seizing the opportunity to raise an old point - but that problem would be non-existent if this was a project that implemented a methodology as opposed to being a continuous port effort. I will even go as far as suggesting that this would broaden (and ease) the recruitment of committers. It almost feels like the goal is not simply to port Lucene.java to Lucene.net but to also develop a technology that ports things automatically. I would almost suggest that this in itself could be an ASF TLP. It still feels to me that everyone is trying to cut the head off a two-headed dragon with a single sword and a single motion. Once search algorithms was discovered and implemented - it should be up to the language-specific programmers to implement these and optimize these as they see fit. Both languages have their strengths and their own frameworks - at the moment the java side has great benefits which in turn greatly hinder the success of the .net side. In a nutshell, while some cultures seem to be better at courtship - the fact that I don't speak some of these languages shouldn't make me less good at it. I think that a project for a Java-NET and NET-Java would be a great idea. Again, it would allow a lot of people that are doing the same for hundreds of other projects to simply pool their efforts. Just my Canadian 2 cents (which is almost at par with the American cents these days) Karell Ste-Marie C.I.O. - BrainBank Inc -Original Message- From: Lombard, Scott [mailto:slomb...@kingindustries.com] Sent: Thursday, December 30, 2010 2:17 PM To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: RE: RE: Vote thread started on gene...@lucene.apache.org Marco, My feeling would be to create strong automated conversion tools to allow java Lucene to be ported in to .NET in as few steps and as possible. The .net style goal is a noble one, but will require a significant more commitment to the project in the future. As each new version of java Lucene will have to be integrated by hand into the .net version. As the conversion tools get more advanced and robust .net style code may be implemented as part of the automated conversion process. Scott
RE: RE: Vote thread started on gene...@lucene.apache.org
From everything that was said it seems apparent to me that the only way for Lucene.Net to stay alive is to move back to incubation. So where do we go from here? More than 4 people have said they are willing to be committers. Is this email list the best place to start working on a proposal, should it be done between a small group offline or is there a way that the community can work on it together? Thoughts? Scott -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Thursday, December 30, 2010 2:22 PM To: lucene-net-dev@lucene.apache.org Cc: lucene-net-u...@lucene.apache.org Subject: Re: RE: Vote thread started on gene...@lucene.apache.org Marco, I agree with you on this front. I feel that the first tasks that a new Lucene.Net team should focus on, in terms of development are: - Fully automating a line-by-line port using a tool such as Sharpen. This needs to become a commodity function requiring very little development effort - Bring the existing forks back in as branches within the ASF project. I am very interested in pursuing continued development on a more .NET style port (i.e. the Lucere project I started or Aimee.Net The Lucene.Net project should be able to continue with both development paths in the same project. Thanks, Troy On Thu, Dec 30, 2010 at 10:15 AM, Marco Dissel marco.dis...@gmail.com wrote: What will be the goal of new committors? Convert the source into .net style code? If yes, we should try to stop will all the spin-offs and concentrate all the development in one project. Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com het volgende: Grant, Thanks for your time explaining all the details. I will be willing work on a proposal to put Lucene.Net back in to incubation. I will need other people to step up and be committers as well. Heath has volunteered and as Grant has stated 4 committers are needed to for incubation. Who else is willing to be a committer? Grant I will definitely be taking you up on your offer to help on bring Lucene.Net into incubation. Scott -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, December 30, 2010 12:32 PM To: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote: Hi Grant, Thanks for taking the time to respond. While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. (I guess if I was, I could just use Lucene proper and that would be that) As to other duties of a committer, I think the ASF is perceived as a black box of questions for most of us. For one, I don't think anyone outside the 4 committers even understand *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc. Maybe if there was an understanding of the why, the requirements of the ASF would make more sense. I think a lot of us right now just perceive the ASF as the group that is wanting to kill Lucene.net. I don't think we have a desire to kill it, I just think we are faced with the unfortunate reality that the project is already dead and now us on the PMC have the unfortunate job of cleaning up the mess as best we can. Again, it is not even that we want to see it go away, we on the PMC just don't want to be responsible for it's upkeep. You give me the names of 4 people who are willing to be committers (i.e. people willing to volunteer their time) and I will do my best to get the project into the Incubator. However, I have to tell you, my willingness to help is diminishing with every trip we take around this same circle of discussion. Simply put, given the way the vote has gone so far, the Lucene PMC is no longer interested in sustaining this project. If the community wishes to see it live at the ASF then one of you had better step up and spend 20-30 minutes of your time writing up the draft proposal (most of it can be copied and pasted) and circulating it. In fact, given the amount of time some of you have no doubt spent writing on this and other related threads you could have put together the large majority of the proposal, circulated the draft and got other volunteers to help and already be moving forward in a positive direction. Truth be told, I would do it, but I am explicitly not going to because I think that if the community can't take that one step to move forward, then it truly doesn't deserve to. I get your comments about the slower than slow development, but that is also somewhat of a sign that it works. While 2.9.2 may be behind, it seems very stable with very few issues. If we send the project to the attic, how will anyone be able to submit bugfixes ever? Frankly, I use 2.9.2 every day and have not found bugs in the areas that I use... but I'm sure they
Re: RE: Vote thread started on gene...@lucene.apache.org
Does the conversion tool actually help or hinder? My feeling is that the more dependency you have on a tool, the less likely this project will ever stand on its own. There should probably be parallelized branches. one that continues using the tool to provide for the current gaps between .net lucene while the other branch that focuses on more .net styled api is moved forward. It also seemed like other volunteers wanted to use Visual Studio 2010, move lucene.net to a more .net friendly api (hopefully adhere a bit better to the ms coding guidelineshttp://blogs.msdn.com/b/brada/archive/2005/01/26/361363.aspxso that figure one's way around the code base is less invovled), and let it evolve. As Grant points out, the biggest problem is getting people to not just discuss the future of lucene.net but actually to step up and get involved working on it. No one should be discarded for their lack of Java or programming knowledge if they have a sincere wish to learn and hours to give to the project. There are more things to be done than just coding or porting java code. They can learn as they go. Does one really need to know Java to write C# test cases? This project seriously lacks visibility, documentation, a decent website, blogging on lucene.net, or any kind of decent PR/Marketing pathway that will help build up the community and move it forward. Any future PMC should be cognizant of that as well as the landscape of .Net opensource and how that is changing of late. The java version has solr (which any language can talk to) built on top of it and can use other projects like tika / poi for indexing. Whats the business value of lucene.net if its line by line port of the lucene version that doesn't have anything extra that its father project already has? Something to think on. - Michael On Thu, Dec 30, 2010 at 2:17 PM, Lombard, Scott slomb...@kingindustries.com wrote: Marco, My feeling would be to create strong automated conversion tools to allow java Lucene to be ported in to .NET in as few steps and as possible. The .net style goal is a noble one, but will require a significant more commitment to the project in the future. As each new version of java Lucene will have to be integrated by hand into the .net version. As the conversion tools get more advanced and robust .net style code may be implemented as part of the automated conversion process. Scott -Original Message- From: Marco Dissel [mailto:marco.dis...@gmail.com] Sent: Thursday, December 30, 2010 1:16 PM To: lucene-net-u...@lucene.apache.org Cc: lucene-net-dev@lucene.apache.org Subject: Re: RE: Vote thread started on gene...@lucene.apache.org What will be the goal of new committors? Convert the source into .net style code? If yes, we should try to stop will all the spin-offs and concentrate all the development in one project. Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com het volgende: Grant, Thanks for your time explaining all the details. I will be willing work on a proposal to put Lucene.Net back in to incubation. I will need other people to step up and be committers as well. Heath has volunteered and as Grant has stated 4 committers are needed to for incubation. Who else is willing to be a committer? Grant I will definitely be taking you up on your offer to help on bring Lucene.Net into incubation. Scott -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, December 30, 2010 12:32 PM To: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote: Hi Grant, Thanks for taking the time to respond. While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. (I guess if I was, I could just use Lucene proper and that would be that) As to other duties of a committer, I think the ASF is perceived as a black box of questions for most of us. For one, I don't think anyone outside the 4 committers even understand *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc. Maybe if there was an understanding of the why, the requirements of the ASF would make more sense. I think a lot of us right now just perceive the ASF as the group that is wanting to kill Lucene.net. I don't think we have a desire to kill it, I just think we are faced with the unfortunate reality that the project is already dead and now us on the PMC have the unfortunate job of cleaning up the mess as best we can. Again, it is not even that we want to see it go away, we on the PMC just don't want to be responsible for it's upkeep. You give me the names of 4 people who are willing to be committers (i.e. people willing to volunteer their time) and I will do
Re: Vote thread started on gene...@lucene.apache.org
That is exactly what I would suggest. Sharpen looks like a great tool, since you can customize it's behaviour. In fact, the only downside is that you have to customize it's behaviour which requires a lot of upfront work. Thanks, Troy On Thu, Dec 30, 2010 at 11:42 AM, Prescott Nasser geobmx...@hotmail.com wrote: Maybe I'm misunderstanding you, but I think the technology is there - no generic porting tool will be 100%, it will always require pre/post processing. Sharpen is a pretty good generic conversion tool. I agree in that I think we need to focus on a process utilizing a tool such as sharpen and developing the pre/post processing clean up scripts that are specific to Lucene. ~Prescott Subject: RE: RE: Vote thread started on gene...@lucene.apache.org Date: Thu, 30 Dec 2010 14:29:21 -0500 From: stema...@brain-bank.com To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Folks, I will freely admit that I'm seizing the opportunity to raise an old point - but that problem would be non-existent if this was a project that implemented a methodology as opposed to being a continuous port effort. I will even go as far as suggesting that this would broaden (and ease) the recruitment of committers. It almost feels like the goal is not simply to port Lucene.java to Lucene.net but to also develop a technology that ports things automatically. I would almost suggest that this in itself could be an ASF TLP. It still feels to me that everyone is trying to cut the head off a two-headed dragon with a single sword and a single motion. Once search algorithms was discovered and implemented - it should be up to the language-specific programmers to implement these and optimize these as they see fit. Both languages have their strengths and their own frameworks - at the moment the java side has great benefits which in turn greatly hinder the success of the .net side. In a nutshell, while some cultures seem to be better at courtship - the fact that I don't speak some of these languages shouldn't make me less good at it. I think that a project for a Java-NET and NET-Java would be a great idea. Again, it would allow a lot of people that are doing the same for hundreds of other projects to simply pool their efforts. Just my Canadian 2 cents (which is almost at par with the American cents these days) Karell Ste-Marie C.I.O. - BrainBank Inc -Original Message- From: Lombard, Scott [mailto:slomb...@kingindustries.com] Sent: Thursday, December 30, 2010 2:17 PM To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: RE: RE: Vote thread started on gene...@lucene.apache.org Marco, My feeling would be to create strong automated conversion tools to allow java Lucene to be ported in to .NET in as few steps and as possible. The .net style goal is a noble one, but will require a significant more commitment to the project in the future. As each new version of java Lucene will have to be integrated by hand into the .net version. As the conversion tools get more advanced and robust .net style code may be implemented as part of the automated conversion process. Scott
RE: Vote thread started on gene...@lucene.apache.org
I think it took be 5 deletes of this e-mail and complete rewrites to try to say this in the best way possible: First off, Sharpen is a java tool (from the db4o SVN I found) - using sharpen to port lucene to .net means that people now have to install a jvm on their computers in order to contribute. While this may seem like it makes perfect sense in fact it is this type of requirements that scares pure .net developers away. You cannot ask someone to install a bunch of tools outside of their comfort zone in order to create a tool that works in their world. Furthermore, it's also saying that now - not only do contributors need to know java and have a jvm, but then they also need to know sharpen in order to make a c# product. Gentlemen, I would gladly contribute - I can assure you that I wouldn't be the best but I would be happy to lend a hand - but speaking strictly for myself I don't see myself learning 2-3 new pieces of technologies when I feel that I should just be a good c# programmer to help out. Would it not make more sense, given the fact that we want to reduce work and make a quality product that we become more selective about *what* goes through Sharpen and what can be hand-crafted? IE: Do we really need to port the Java methods of writing to files and handling Threading? What about WCF? Karell Ste-Marie C.I.O. - BrainBank Inc -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Thursday, December 30, 2010 2:46 PM To: lucene-net-dev@lucene.apache.org Cc: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org That is exactly what I would suggest. Sharpen looks like a great tool, since you can customize it's behaviour. In fact, the only downside is that you have to customize it's behaviour which requires a lot of upfront work. Thanks, Troy On Thu, Dec 30, 2010 at 11:42 AM, Prescott Nasser geobmx...@hotmail.com wrote: Maybe I'm misunderstanding you, but I think the technology is there - no generic porting tool will be 100%, it will always require pre/post processing. Sharpen is a pretty good generic conversion tool. I agree in that I think we need to focus on a process utilizing a tool such as sharpen and developing the pre/post processing clean up scripts that are specific to Lucene. ~Prescott
Re: Vote thread started on gene...@lucene.apache.org
Troy, et al, Given the recent positive shift in attitude regarding the Lucene.Net project, I would like to consider ways that I could help contribute as well. As with other people in the community, while my company is very small (I am both Chief Software Architect and Chief Bottle Washer), we do a have a vested interest in seeing this project succeed. One thing to consider while developing the incubator proposal is that the reason I stopped attempting to contribute was that very early on it was made very clear to me that this project was a one-man show and that any efforts I offered towards working on the port were not welcome. I think that in order to succeed the new proposal needs to embrace transparency in the entire port, testing and fix process so that more people (and potential committers) can have the opportunity to get their hands dirty and expect that their ideas will not be rejected out of hand. I'm not saying that everyone should be a committer but rather I would hope that the committers would at least consider input and help from the community. It's important to remember that Lucene.Net is "just" a (very good) line-by-line port*. This means that the skill set we need from committers is very different than what the Lucene Java project would be looking for. I agree with various people who have raised the good point that automation is the way to go for the initial pass. There are now multiple OSS Java-.NET conversion tools out there that while not perfect could offer a good starting point. The strength of working to customize scripts or even the tools themselves would be a repeatable and documented porting process that could be executed in parallel by multiple people with the expectation of deterministic results. Sharpen (db40): http://developer.db4o.com/Blogs/Product/tabid/167/entryid/94/Default.aspx Java 2 CSharp (ILOG/IBM): http://sourceforge.net/apps/mediawiki/j2cstranslator/index.php?title=Main_Page * Various spin-offs are embracing a functional port model but this is not what I am looking for and I get the feeling that some developers would prefer to stick with a "true" port as well. Also remember that we would need not only people to work on the porting mechanism and port but also people willing to develop and run the unit tests and such. In summary, I believe that if we can agree as a community to get away from this magic one-man black-box porting model then more people such as myself would come out of the woodwork and help out. My way is not the only way but it does represent my personal thoughts in any case. Thanks for your consideration, Ben Martz Troy Howard December 30, 2010 11:51 AM Scott, We should communicate on the public list as much as possible. I'll put together the draft proposal today, post it here, and ask for feedback from both the Lucene PMC and the community. We will wait over the weekend and Monday to allow people who might have additional input the opportunity to either see this at home or at work. On Tuesday (Jan 4th) we will move forward with whatever our best effort has produced and go from there. Thanks, Troy
Re: Vote thread started on gene...@lucene.apache.org
It's my opinion that we can basically commoditize an automated port which will fulfill the needs of the community, and allow the project to, at minimum, continue to release, in a timely fashion, direct ports of the Java Lucene releases... Meanwhile we can continue the efforts represented in Lucere, Lucille, and Aimee.Net to create an alternative API for Lucene.Net which may or may not include completely re-written code, depending on the specifics. I think both concepts can co-exist in a single project and that this will be the best way to move forward. If you followed the Lucere project, you'll see that my approach with TDD and Contract Driven Design was intended to facilitate just such an arrangement. Thanks, Troy On Thu, Dec 30, 2010 at 12:32 PM, Prescott Nasser geobmx...@hotmail.com wrote: In incubator we can probably rewrite the description of the project - but in the past we were pushed from doing anything but a straight port becuase the description of the project was line by line port - where a tool makes sense, and .NET specific contructs are basically avoided becuase that wouldn't be a line by line port. We talked about using things like Enums but we were shot down from this idea by someone... I agree with you whole heartly about utilizing sharpen and jvm just to port the code. The Lucere project was the idea of rewriting the java code to .Net, using standard constructs. I think the goal for the ASF project was to minimize work needed to be done to upgrade to new java things that come out. If we purse this direction, then every change needs to be manually ported. I've already said I think that is do-able once we are on part with the latest java. ~Prescott Nasser prescott.nas...@hotmail.com 650.208.4205 Subject: RE: Vote thread started on gene...@lucene.apache.org Date: Thu, 30 Dec 2010 15:24:32 -0500 From: stema...@brain-bank.com To: lucene-net-dev@lucene.apache.org CC: lucene-net-u...@lucene.apache.org I think it took be 5 deletes of this e-mail and complete rewrites to try to say this in the best way possible: First off, Sharpen is a java tool (from the db4o SVN I found) - using sharpen to port lucene to .net means that people now have to install a jvm on their computers in order to contribute. While this may seem like it makes perfect sense in fact it is this type of requirements that scares pure .net developers away. You cannot ask someone to install a bunch of tools outside of their comfort zone in order to create a tool that works in their world. Furthermore, it's also saying that now - not only do contributors need to know java and have a jvm, but then they also need to know sharpen in order to make a c# product. Gentlemen, I would gladly contribute - I can assure you that I wouldn't be the best but I would be happy to lend a hand - but speaking strictly for myself I don't see myself learning 2-3 new pieces of technologies when I feel that I should just be a good c# programmer to help out. Would it not make more sense, given the fact that we want to reduce work and make a quality product that we become more selective about *what* goes through Sharpen and what can be hand-crafted? IE: Do we really need to port the Java methods of writing to files and handling Threading? What about WCF? Karell Ste-Marie C.I.O. - BrainBank Inc -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Thursday, December 30, 2010 2:46 PM To: lucene-net-dev@lucene.apache.org Cc: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org That is exactly what I would suggest. Sharpen looks like a great tool, since you can customize it's behaviour. In fact, the only downside is that you have to customize it's behaviour which requires a lot of upfront work. Thanks, Troy On Thu, Dec 30, 2010 at 11:42 AM, Prescott Nasser geobmx...@hotmail.com wrote: Maybe I'm misunderstanding you, but I think the technology is there - no generic porting tool will be 100%, it will always require pre/post processing. Sharpen is a pretty good generic conversion tool. I agree in that I think we need to focus on a process utilizing a tool such as sharpen and developing the pre/post processing clean up scripts that are specific to Lucene. ~Prescott
Re: Vote thread started on gene...@lucene.apache.org
So perhaps the proposal should allow for a combination of a mostly automated baseline line-by-line port and the explicit provision that embraces drop-in (API compliant) .NET-specific replacements for specific classes? - Ben Troy Howard December 30, 2010 12:39 PM It's my opinion that we can basically commoditize an automated port which will fulfill the needs of the community, and allow the project to, at minimum, continue to release, in a timely fashion, direct ports of the Java Lucene releases... Meanwhile we can continue the efforts represented in Lucere, Lucille, and Aimee.Net to create an alternative API for Lucene.Net which may or may not include completely re-written code, depending on the specifics. I think both concepts can co-exist in a single project and that this will be the best way to move forward. If you followed the Lucere project, you'll see that my approach with TDD and Contract Driven Design was intended to facilitate just such an arrangement. Thanks, Troy
Re: Vote thread started on gene...@lucene.apache.org
Yes. I'm in the process of writing that proposal at this time. It will include language in the project description that express our intent to develop a C#/.NET idiomatic version of the library. Please find the in-progress draft version at: http://wiki.apache.org/incubator/Lucene.Net%20Proposal Thanks, Troy On Thu, Dec 30, 2010 at 12:43 PM, Ben Martz benma...@gmail.com wrote: So perhaps the proposal should allow for a combination of a mostly automated baseline line-by-line port and the explicit provision that embraces drop-in (API compliant) .NET-specific replacements for specific classes? - Ben -- Troy Howard thowar...@gmail.com December 30, 2010 12:39 PM It's my opinion that we can basically commoditize an automated port which will fulfill the needs of the community, and allow the project to, at minimum, continue to release, in a timely fashion, direct ports of the Java Lucene releases... Meanwhile we can continue the efforts represented in Lucere, Lucille, and Aimee.Net to create an alternative API for Lucene.Net which may or may not include completely re-written code, depending on the specifics. I think both concepts can co-exist in a single project and that this will be the best way to move forward. If you followed the Lucere project, you'll see that my approach with TDD and Contract Driven Design was intended to facilitate just such an arrangement. Thanks, Troy
Champion and Mentor
Grant, I'm working on the proposal and have come to the final section where I must list a Champion and list of Mentors. Can I put your name for Champion and possibly as a Mentor as well? Are there any other folk out there willing to Mentor our project during incubation? Should I instead wait for the Incubator PMC to assign Mentors to us? Thanks, Troy
Re: Incubator Proposal Draft
Sorry... I was in outer space with those dates. To clarify, I'll submit the application on Tuesday, January 11th, 2011 which gives us exactly 12 days as a community to determine our opinions, plans, develop our proposal and committer list. Thanks, Troy On Thu, Dec 30, 2010 at 4:13 PM, Troy Howard thowar...@gmail.com wrote: All, Please review the draft proposal located at: http://wiki.apache.org/incubator/Lucene.Net%20Proposal If you'd like to make an edit feel free create an account and edit the page as you see fit. I'd especially appreciate help with spelling and grammar proofreading in that regard. Regarding content I would appreciate direct comments on the text of the proposal presented in the mailing lists here for open discussion. Some points to note: I have only filled out information in the proposal about myself and Chris Currens. I work with Chris in real life and was able to discuss this with him in person. I am not going to take the liberty to include information about anyone else for fear of misrepresentation. If you'd like to include information about yourself in the proposal, please edit it and include that information. Since this is only a draft of the proposal, anything can change. What is there is mostly just to get the ball rolling on the application and have a concrete document to discuss. It's my intention to officially submit the proposal on Tuesday, January 11th, 2010. Please ensure that your contributions or commentary is provided before that time if you wish them to be considered for this proposal. This gives us, as a community, 2.5 weeks to prepare. Hopefully this will be more than enough time to discuss and settle on our official positions as a community. Thanks, Troy
RE: Initial committers list for Incubator Proposal
Troy, Thank you for all your work on the Incubator Proposal you have done an excellent job. I volunteered to be a committer and here is my brief qualification list. I have a BS in Electrical Engineering and currently work in the Automation field. I do extensive programming in MS SQL, ASP.NET, C# primarily to provide useful and pertinent information to my users, from data that is stored in many places and usually from legacy products. Currently I have been using Lucene.Net in a web application I developed to collate data stored in multiple Access databases to give users a simplified interface to our data. I am personally interested in the challenge of developing and documenting an automated process to convert Java Lucene to C#. The work I will be doing for the Lucene.NET project will be done for the most part outside of my job. As a committer I would have adequate time to devote to the project. I look forward to being an active member of the Lucene.Net project. Scott From: Troy Howard [thowar...@gmail.com] Sent: Thursday, December 30, 2010 7:01 PM To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: Initial committers list for Incubator Proposal All, I'm working on the Incubator Proposal now, and need to establish a list of initial committers. So far, the following people have come forward and offered to be committers (in alphabetical order): Alex Thompson Ben Martz Chris Currens Heath Aldrich Michael Herndon Prescott Nasser Scott Lombard Simone Chiaretta Troy Howard I would like to place an open request for any interested parties to respond to this message with their request to be a Committer. For people who are either on that list or for people who would like to be added, please send a message explaining (briefly) why you think you will be qualified to be involved in the project and specifically what ways you hope to be able to contribute. One thing I would like to point out is that in the Apache world there is a distinction between Committers and Contributors (aka developers). See this link for details: http://incubator.apache.org/guides/participation.html#committer Please consider whether or not you wish to be a Committer or a Contributor. Some quick rules of thumb: Committers: - Committers must be willing to submit a Contributor License Agreement (CLA). See: http://www.apache.org/licenses/#clas - Committers must have enough *consistent* free time to fulfill the expectations of the ASF in terms of reporting, process, documentation and remain responsive to the community in terms of communication and listening to, considering, and discussing community opinion. These kinds of tasks can consume a lot of time and are some of the first things people stop down when they start running out of time. - A Committer may not even write code, but may simply accept, review and commit code written by others. This is the primary responsibility of a Committer -- to commit code, whether they wrote it themselves or not - Committers may have to perform the unpleasant task of reject contribution from Contributors and explain why in a fair and objective manner. This can be frustrating and time consuming. You may need to play the part of a mentor or engage in debates. You may even be proved wrong and have to swallow your pride. - Committers have direct access to the source control and other resources and so must be personally accountable for the quality of the same and will need to operate under the process and restrictions ASF expects Contributors: - Contributors might have a lot of free time this month, but get really busy next month and have no time at all. They can develop code in short bursts but then drop off the face of the planet indefinitely after that. - Contributors could focus on code only or work from a task list without any need to interact with and be accountable to the community (as this is the responsibility of the Committers) - Contributors can do one-time or infrequently needed tasks like updating the website, documentation, wikis, etc.. - Contributors will need to have anything they create reviewed by a Committer and ultimately included by a Committer. Some people find this frustrating, if the Committers are slow to respond or critical of their work. So in your responses, please be clear about whether you would like to offer your help as a Committer or as a Contributor. Thanks, Troy This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976011#action_12976011 ] Michael McCandless commented on LUCENE-2837: {quote} bq. but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I did this on LUCENE-2751, but the tests won't all pass until we fix the FieldCache autodetect synchronization bug (the Numerics tests will fail with multiple threads)... {quote} Duh, I knew newSearcher() sounded familiar :) OK so we have to fix the multi-threaded bug in FC first and then I think commit the newSearcher cutover from LUCENE-2751, then commit this issue. Then, I think, separately create a new higher level MultiSearcher w/ a limited search API. I'll open a new issue for that. Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976012#action_12976012 ] Michael McCandless commented on LUCENE-2837: bq. We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? That sounds like a great improvement -- Uwe can you open a new issue for that? Let's try to leave this issue as a rote refactoring... Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: strange problem of PForDelta decoder
On Mon, Dec 27, 2010 at 5:08 AM, Li Li fancye...@gmail.com wrote: I integrated pfor codec into lucene 2.9.3 and the search time comparsion is as follows: single term and query or query VINT in lucene 2.9.3 11.2 36.5 38.6 PFor in lucene 2.9.3 8.7 27.6 33.4 VINT in lucene 4 branch 10.6 26.5 35.4 PFor in lcuene 4 branch 8.1 22.5 30.7 My test terms are high frequncy terms because we are interested in bad case I agree it's the bad cases we should focus on in general. If a super fast query gets somewhat slower it's relatively harmless (just a capacity question for high volume sites) but if the bad queries get slower it's awful (requires faster cutover to sharded architecture), until we fix Lucene to run a single search concurrently (which we badly need to do). It seems lucene 4 branch's implementation of and query(conjuction query) is well optimized that even for VINT codec, it's faster than PFor in lucene 2.9.3. Could any one tell me what optimization is done? is store docIDs and freqs separately making it faster? or anything else? Actually vInt on the bulkpostings branch stores freq/doc together. Ie the format is the same as 2.9.x's format. I think it could be the fact that AND query does block reads (64 doc/freqs at once) instead of doc-at-once? Ie, because of this, the query is efficitively scanning the next block of 64 docs instead of skipping to them? Our skipping impl is unfortunately rather costly so if skip will not skip that many docs it's better to scan. Another querstion, Is there anyone interested in integrating pfor codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr 1.4). And how do I contribute this patch? Realistically I don't think we can commit this to 2.9.x -- that branch is purely bug fixes at this point. Still it's possible others could make use of such a patch so if it's not too much work you may as well post it? It can lead to improvements on the bulk postings branch too :) The more patches the merrier! You only use PFor for the very high freq terms in 2.9.x right? I've wondered if we should do the same on bulkpostings... problem is for eg range queries, that visit all docs for all terms b/w X and Y, you want the bulk decode even for low freq terms... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: any issues about the *perthread classes
Basically, we are moving the thread state upwards in Lucene's indexing chain. Ie, very early on when indexing a doc you pick a thread-private state. Then, the thread does all indexing into this private state, unfettered by any sync blocks. This is akin to moving to a process-based concurrency model, ie, we are most strongly separating threads to limit the number of locks that must be acquired when indexing a doc, or when flushing. This is an important change because it means flushing of a single thread private state can take place concurrently with ongoing indexing into other thread states. Lucene cannot do this today since flushing flushes all thread states, and it results in a serious bottleneck on indexing throughput for machines w/ alot of available concurrency. I wrote about this problem here: http://chbits.blogspot.com/2010/09/lucenes-indexing-is-fast.html The takeaway is that using 6 indexing threads means we are blocked 50% of the time waiting for flush, which is quite awful. This was on a machine w/ an SSD and 24 cores, so, Lucene was nowhere near able to take advantage of this machine's concurrency. Once flushing is concurrent we should be able to fully saturate both IO and CPU concurrency on such a machine... Mike On Wed, Dec 29, 2010 at 12:49 AM, xu cheng xcheng@gmail.com wrote: hi all I noticed that there are plenty *PerThread classes in the trunk http://svn.apache.org/repos/asf/lucene/dev/trunk/ while in the realtime_search version http://svn.apache.org/repos/asf/lucene/dev/branches/realtime_search/ the *PerThread classes are gone! this just confused me, cos I'm new here. what's the purpose of such a design?what's the advantage? any issues refer to this ?? any suggestion or references are appreciated! regards. xu - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher) --- Key: LUCENE-2840 URL: https://issues.apache.org/jira/browse/LUCENE-2840 Project: Lucene - Java Issue Type: Sub-task Components: Search Reporter: Uwe Schindler Priority: Minor Fix For: 4.0 Spin-off from parent issue: {quote} We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976020#action_12976020 ] Uwe Schindler commented on LUCENE-2837: --- bq. That sounds like a great improvement - Uwe can you open a new issue for that? Let's try to leave this issue as a rote refactoring... Done: LUCENE-2840 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976027#action_12976027 ] Earwin Burrfoot commented on LUCENE-2840: - I use the following scheme: * There is a fixed pool of threads shared by all searches, that limits total concurrency. * Each new search apprehends at most a fixed number of threads from this pool (say, 2-3 of 8 in my setup), * and these threads churn through segments as through a queue (in maxDoc order, but I think even that is unnecessary). No special smart binding between threads and segments (eg. 1 thread for each biggie, 1 thread for all of the small ones) - means simpler code, and zero possibility of stalling, when there are threads to run, segments to search, but binding policy does not connect them. Using fewer threads per-search than total available is a precaution against biggie searches blocking fast ones. Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher) --- Key: LUCENE-2840 URL: https://issues.apache.org/jira/browse/LUCENE-2840 Project: Lucene - Java Issue Type: Sub-task Components: Search Reporter: Uwe Schindler Priority: Minor Fix For: 4.0 Spin-off from parent issue: {quote} We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976031#action_12976031 ] Robert Muir commented on LUCENE-2837: - i noticed the comment about the shutting down of executorservice... can we just make the executorservice arg mandatory for parallel? in my opinion, whoever creates it should be responsible for shutting it down, no one else. so i don't like the dual mode where we sometimes make our own but you can set a different one. we don't clean up correctly at all wrt this in ParallelMultiShredder today in my opinion. Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976032#action_12976032 ] Robert Muir commented on LUCENE-2837: - {quote} OK so we have to fix the multi-threaded bug in FC first and then I think commit the newSearcher cutover from LUCENE-2751, then commit this issue. {quote} Well, you don't have to do all of that (you could commit this one, then chase down all the bugs). I was just warning you so you don't get surprised. Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2838) ConstantScoreQuery should directly support wrapping Query and simply strip off scores
[ https://issues.apache.org/jira/browse/LUCENE-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2838: -- Attachment: LUCENE-2838.patch Cleaned up patch: - removed a useless testcase, no longer needed - added test for CSQ, that checks equals and hashCode - code cleanup - javadocs I will commit this if nobody objects to 3.x and trunk. About deprecating QWF we should discuss in separate issues, maybe we can merge Filter and Query before 4.0! ConstantScoreQuery should directly support wrapping Query and simply strip off scores - Key: LUCENE-2838 URL: https://issues.apache.org/jira/browse/LUCENE-2838 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2838.patch, LUCENE-2838.patch Especially in MultiTermQuery rewrite modes we often simply need to strip off scores from Queries and make them constant score. Currently the code to do this looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query)) As the name says, QueryWrapperFilter should make any other Query constant score, so why does it not take a Query as ctor param? This question was aldso asked quite often by my customers and is simply correct, if you think about it. Looking closer into the code, it is clear that this would also speed up MTQs: - One additional wrapping and method calls can be removed - Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only used in tests and the use-case for this class is not really available) and LUCENE-2831 does not need the stupid hack to make Simon's assertions pass - CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on top-level now directly feeds the Collector. For that a small trick is used: The score(Collector) calls are directly delegated and the scores are stripped by wrapping the setScorer() method in Collector During that I found a visibility bug in Scorer (LUCENE-2839): The method boolean score(Collector collector, int max, int firstDocID) should be public not protected, as its not solely intended to be overridden by subclasses and is called from other classes, too! This leads to no compiler bugs as the other classes that calls it is mainly BooleanScorer(2) and thats in same package, but visibility is wrong. I will open an issue for that and fix it at least in trunk where we have no backwards-requirement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976035#action_12976035 ] Robert Muir commented on LUCENE-2837: - Mike, also if you apply LUCENE-2751, tests randomly fails because of the LUCENE-2756 bug. For example TestBoolean2.testRandomQueries will fail because sometimes it uses a wildcard query, and if it then incorporates MUST_NOT, this will fail against the multisearcher/parallelmultisearcher because the combine() is wrong. So I'm thinking we should add the newSearcher tests after you committed this one (as long as this one has some reasonable standalone tests to show it works) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs
[ https://issues.apache.org/jira/browse/SOLR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2303: -- Attachment: SOLR-2303.patch remove unnecessary (and problematic) log4j jars in contribs --- Key: SOLR-2303 URL: https://issues.apache.org/jira/browse/SOLR-2303 Project: Solr Issue Type: Improvement Components: Build Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2303.patch In solr 4.0 there is log4j-over-slf4j. But if you have log4j jars also in the classpath (e.g. contrib/extraction, contrib/clustering) you can get strange errors such as: java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V So I think we should remove the log4j jars in these contribs, all tests pass with them removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs
remove unnecessary (and problematic) log4j jars in contribs --- Key: SOLR-2303 URL: https://issues.apache.org/jira/browse/SOLR-2303 Project: Solr Issue Type: Improvement Components: Build Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2303.patch In solr 4.0 there is log4j-over-slf4j. But if you have log4j jars also in the classpath (e.g. contrib/extraction, contrib/clustering) you can get strange errors such as: java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V So I think we should remove the log4j jars in these contribs, all tests pass with them removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: strange problem of PForDelta decoder
I did another test using lucene 4 trunk with default codecs. it's file is the same as lucene 2.9. the speed is almost the same as lucene 2.9 I think it could be the fact that AND query does block reads (64 doc/freqs at once) instead of doc-at-once? Ie, because of this, the query is efficitively scanning the next block of 64 docs instead of skipping to them? Our skipping impl is unfortunately rather costly so if skip will not skip that many docs it's better to scan. I agree with this explanation. for high frequency terms, the skiplist can not skip over many docs. it seems there are something need optimization. e.g. for high frequent terms, we use scanning; for low frequent terms, we use skiplist. but if we only care bad case, we can just care high frequent terms only. You only use PFor for the very high freq terms in 2.9.x right? I use PFor if df is greater than 128. if not, I use VINT until we fix Lucene to run a single search concurrently (which we badly need to do). I am interested in this idea.(I have posted it before) do you have some resources such as papers or tech articles about it? I have tried but it need to modify index format dramatically and we use solr distributed search to relieve the problem of response time. so finally give it up. lucene4's index format is more flexible that it supports customed codecs and it's now on development, I think it's good time to take it into consideration that let it support multithread searching for a single query. I have a naive solution. dividing docList into many groups e.g grouping docIds by it's even or odd term1 df1=4 docList = 0 4 8 10 term1 df2=4 docList = 1 3 9 11 term2 df1=4 docList = 0 6 8 12 term2 df2=4 docList = 3 9 11 15 then we can use 2 threads to search topN docs on even group and odd group and finally merge their results into a single on just like solr distributed search. But it's better than solr distributed search. First, it's in a single process and data communication between threads is much faster than network. Second, each threads process the same number of documents.For solr distributed search, one shard may process 7 documents and another shard may 1 document Even if we can make each shard have the same document number. we can not make it uniformly for each term. e.g. shard1 has doc1 doc2 shard2 has doc3 doc4 but term1 may only occur in doc1 and doc2 while term2 may only occur in doc3 and doc4 we may modify it shard1 doc1 doc3 shard2 doc2 doc4 it's good for term1 and term2 but term3 may occur in doc1 and doc3... So I think it's fine-grained distributed in index while solr distributed search is coarse- grained. 2010/12/30 Michael McCandless luc...@mikemccandless.com: On Mon, Dec 27, 2010 at 5:08 AM, Li Li fancye...@gmail.com wrote: I integrated pfor codec into lucene 2.9.3 and the search time comparsion is as follows: single term and query or query VINT in lucene 2.9.3 11.2 36.5 38.6 PFor in lucene 2.9.3 8.7 27.6 33.4 VINT in lucene 4 branch 10.6 26.5 35.4 PFor in lcuene 4 branch 8.1 22.5 30.7 My test terms are high frequncy terms because we are interested in bad case I agree it's the bad cases we should focus on in general. If a super fast query gets somewhat slower it's relatively harmless (just a capacity question for high volume sites) but if the bad queries get slower it's awful (requires faster cutover to sharded architecture), until we fix Lucene to run a single search concurrently (which we badly need to do). It seems lucene 4 branch's implementation of and query(conjuction query) is well optimized that even for VINT codec, it's faster than PFor in lucene 2.9.3. Could any one tell me what optimization is done? is store docIDs and freqs separately making it faster? or anything else? Actually vInt on the bulkpostings branch stores freq/doc together. Ie the format is the same as 2.9.x's format. I think it could be the fact that AND query does block reads (64 doc/freqs at once) instead of doc-at-once? Ie, because of this, the query is efficitively scanning the next block of 64 docs instead of skipping to them? Our skipping impl is unfortunately rather costly so if skip will not skip that many docs it's better to scan. Another querstion, Is there anyone interested in integrating pfor codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr 1.4). And how do I contribute this patch? Realistically I don't think we can commit this to 2.9.x -- that branch is purely bug fixes at this point. Still it's possible others could make use of such a patch so if it's not too much work you may as well post it? It can lead to improvements on the bulk postings branch too :) The more patches the merrier! You only use PFor for the very high freq terms
Re: strange problem of PForDelta decoder
until we fix Lucene to run a single search concurrently (which we badly need to do). I am interested in this idea.(I have posted it before) do you have some resources such as papers or tech articles about it? I have tried but it need to modify index format dramatically and we use solr distributed search to relieve the problem of response time. so finally give it up. lucene4's index format is more flexible that it supports customed codecs and it's now on development, I think it's good time to take it into consideration that let it support multithread searching for a single query. I have a naive solution. dividing docList into many groups e.g grouping docIds by it's even or odd term1 df1=4 docList = 0 4 8 10 term1 df2=4 docList = 1 3 9 11 term2 df1=4 docList = 0 6 8 12 term2 df2=4 docList = 3 9 11 15 then we can use 2 threads to search topN docs on even group and odd group and finally merge their results into a single on just like solr distributed search. But it's better than solr distributed search. First, it's in a single process and data communication between threads is much faster than network. Second, each threads process the same number of documents.For solr distributed search, one shard may process 7 documents and another shard may 1 document Even if we can make each shard have the same document number. we can not make it uniformly for each term. e.g. shard1 has doc1 doc2 shard2 has doc3 doc4 but term1 may only occur in doc1 and doc2 while term2 may only occur in doc3 and doc4 we may modify it shard1 doc1 doc3 shard2 doc2 doc4 it's good for term1 and term2 but term3 may occur in doc1 and doc3... So I think it's fine-grained distributed in index while solr distributed search is coarse- grained. This is just crazy :) The simple way is just to search different segments in parallel. BalancedSegmentMergePolicy makes sure you have roughly even-sized large segments (and small ones don't count, they're small!). If you're bound on squeezing out that extra millisecond (and making your life miserable along the way), you can search a single segment with multiple threads (by dividing it in even chunks, and then doing skipTo to position your iterators to the beginning of each chunk). First approach is really easy to implement. Second one is harder, but still doesn't require you to cook the number of CPU cores available into your index! It's the law of diminishing returns at play here. You're most likely to search in parallel over mostly memory-resident index (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems tend to slow down considerably on parallel sequential reads, so you already have pretty decent speed. Searching different segments in parallel (with BSMP) makes you several times faster. Searching in parallel within a segment requires some weird hacks, but has maybe a few percent advantage over previous solution. Sharding posting lists requires a great deal of weird hacks, makes index machine-bound, and boosts speed by another couple of percent. Sounds worthless. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Vote thread started on gene...@lucene.apache.org
Scott, I will gladly help put this proposal together and would like to volunteer as a committer. I am communicating with others to find some additional candidates to be committers. Regarding Heath, a quote from his last message in this thread: While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. Thanks, Troy On Thu, Dec 30, 2010 at 10:01 AM, Lombard, Scott slomb...@kingindustries.com wrote: Grant, Thanks for your time explaining all the details. I will be willing work on a proposal to put Lucene.Net back in to incubation. I will need other people to step up and be committers as well. Heath has volunteered and as Grant has stated 4 committers are needed to for incubation. Who else is willing to be a committer? Grant I will definitely be taking you up on your offer to help on bring Lucene.Net into incubation. Scott -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, December 30, 2010 12:32 PM To: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote: Hi Grant, Thanks for taking the time to respond. While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. (I guess if I was, I could just use Lucene proper and that would be that) As to other duties of a committer, I think the ASF is perceived as a black box of questions for most of us. For one, I don't think anyone outside the 4 committers even understand *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc. Maybe if there was an understanding of the why, the requirements of the ASF would make more sense. I think a lot of us right now just perceive the ASF as the group that is wanting to kill Lucene.net. I don't think we have a desire to kill it, I just think we are faced with the unfortunate reality that the project is already dead and now us on the PMC have the unfortunate job of cleaning up the mess as best we can. Again, it is not even that we want to see it go away, we on the PMC just don't want to be responsible for it's upkeep. You give me the names of 4 people who are willing to be committers (i.e. people willing to volunteer their time) and I will do my best to get the project into the Incubator. However, I have to tell you, my willingness to help is diminishing with every trip we take around this same circle of discussion. Simply put, given the way the vote has gone so far, the Lucene PMC is no longer interested in sustaining this project. If the community wishes to see it live at the ASF then one of you had better step up and spend 20-30 minutes of your time writing up the draft proposal (most of it can be copied and pasted) and circulating it. In fact, given the amount of time some of you have no doubt spent writing on this and other related threads you could have put together the large majority of the proposal, circulated the draft and got other volunteers to help and already be moving forward in a positive direction. Truth be told, I would do it, but I am explicitly not going to because I think that if the community can't take that one step to move forward, then it truly doesn't deserve to. I get your comments about the slower than slow development, but that is also somewhat of a sign that it works. While 2.9.2 may be behind, it seems very stable with very few issues. If we send the project to the attic, how will anyone be able to submit bugfixes ever? Frankly, I use 2.9.2 every day and have not found bugs in the areas that I use... but I'm sure they are in there somewhere. As for the name, I thought Lucene.net was the name of the project back in the SourceForge days... So my question is based on the premise that if the lucene.net name was brought *to* ASF, why can the community not leave with it? Again, IANAL, but just b/c it was improperly used beforehand does not mean it is legally owned by some other entity. The Lucene name has been at the ASF since 2001 and Lucene.NET is also now a part of the ASF. (If your interested, go look at the discussions around iBatis and the movement of that community to MyBatis) -Grant This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in
[jira] Updated: (LUCENE-2838) ConstantScoreQuery should directly support wrapping Query and simply strip off scores
[ https://issues.apache.org/jira/browse/LUCENE-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2838: -- Attachment: LUCENE-2838-no-topscorer-opt.patch After thinking one day about it, I found some problems with the collector hack and this style of decorator pattern: - If you wrap multiple times, the setScorer() method in the wrapped collector may set the wrong scorer (you see this, if you wrap multiple ConstantScoreQueries on top of each other, then the boost of the inner one is returned. The problem is that the score(Collector) method inverts the decorator pattern. - The inner scorer (like BoolenScorer with its buckets) may set a different scorer in the collector than itsself that implements doc() different, so setting the ConstantScorer always as collector's scorer can lead to wrong results (we dont see this in the test, as no collector uses Scorer.doc(), only Scorer.score(), which returns constant). I changed the code so CSQ now passes always topScorer=false to Weight.scorer() of the wrapped query and does not overwrite score(Collector,...) methods. It still allows out-of-order collection. Now BooleanScorer2 is always used with MTQs. The question is, would the previous but broken optimization make sense for speed? Mike/Mark? ConstantScoreQuery should directly support wrapping Query and simply strip off scores - Key: LUCENE-2838 URL: https://issues.apache.org/jira/browse/LUCENE-2838 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2838-no-topscorer-opt.patch, LUCENE-2838.patch, LUCENE-2838.patch Especially in MultiTermQuery rewrite modes we often simply need to strip off scores from Queries and make them constant score. Currently the code to do this looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query)) As the name says, QueryWrapperFilter should make any other Query constant score, so why does it not take a Query as ctor param? This question was aldso asked quite often by my customers and is simply correct, if you think about it. Looking closer into the code, it is clear that this would also speed up MTQs: - One additional wrapping and method calls can be removed - Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only used in tests and the use-case for this class is not really available) and LUCENE-2831 does not need the stupid hack to make Simon's assertions pass - CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on top-level now directly feeds the Collector. For that a small trick is used: The score(Collector) calls are directly delegated and the scores are stripped by wrapping the setScorer() method in Collector During that I found a visibility bug in Scorer (LUCENE-2839): The method boolean score(Collector collector, int max, int firstDocID) should be public not protected, as its not solely intended to be overridden by subclasses and is called from other classes, too! This leads to no compiler bugs as the other classes that calls it is mainly BooleanScorer(2) and thats in same package, but visibility is wrong. I will open an issue for that and fix it at least in trunk where we have no backwards-requirement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2611: Attachment: LUCENE-2611.patch Added IntelliJ codestyle definition and instructions for putting it in the correct location. Committing shortly. IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Vote thread started on gene...@lucene.apache.org
I would take an existing Incubator Proposal and copy and paste it into a new one and then send the link here and get people to start editing on it. -Grant On Dec 30, 2010, at 2:45 PM, Lombard, Scott wrote: From everything that was said it seems apparent to me that the only way for Lucene.Net to stay alive is to move back to incubation. So where do we go from here? More than 4 people have said they are willing to be committers. Is this email list the best place to start working on a proposal, should it be done between a small group offline or is there a way that the community can work on it together? Thoughts? Scott -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Thursday, December 30, 2010 2:22 PM To: lucene-net-...@lucene.apache.org Cc: lucene-net-u...@lucene.apache.org Subject: Re: RE: Vote thread started on gene...@lucene.apache.org Marco, I agree with you on this front. I feel that the first tasks that a new Lucene.Net team should focus on, in terms of development are: - Fully automating a line-by-line port using a tool such as Sharpen. This needs to become a commodity function requiring very little development effort - Bring the existing forks back in as branches within the ASF project. I am very interested in pursuing continued development on a more .NET style port (i.e. the Lucere project I started or Aimee.Net The Lucene.Net project should be able to continue with both development paths in the same project. Thanks, Troy On Thu, Dec 30, 2010 at 10:15 AM, Marco Dissel marco.dis...@gmail.com wrote: What will be the goal of new committors? Convert the source into .net style code? If yes, we should try to stop will all the spin-offs and concentrate all the development in one project. Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com het volgende: Grant, Thanks for your time explaining all the details. I will be willing work on a proposal to put Lucene.Net back in to incubation. I will need other people to step up and be committers as well. Heath has volunteered and as Grant has stated 4 committers are needed to for incubation. Who else is willing to be a committer? Grant I will definitely be taking you up on your offer to help on bring Lucene.Net into incubation. Scott -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, December 30, 2010 12:32 PM To: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote: Hi Grant, Thanks for taking the time to respond. While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. (I guess if I was, I could just use Lucene proper and that would be that) As to other duties of a committer, I think the ASF is perceived as a black box of questions for most of us. For one, I don't think anyone outside the 4 committers even understand *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc. Maybe if there was an understanding of the why, the requirements of the ASF would make more sense. I think a lot of us right now just perceive the ASF as the group that is wanting to kill Lucene.net. I don't think we have a desire to kill it, I just think we are faced with the unfortunate reality that the project is already dead and now us on the PMC have the unfortunate job of cleaning up the mess as best we can. Again, it is not even that we want to see it go away, we on the PMC just don't want to be responsible for it's upkeep. You give me the names of 4 people who are willing to be committers (i.e. people willing to volunteer their time) and I will do my best to get the project into the Incubator. However, I have to tell you, my willingness to help is diminishing with every trip we take around this same circle of discussion. Simply put, given the way the vote has gone so far, the Lucene PMC is no longer interested in sustaining this project. If the community wishes to see it live at the ASF then one of you had better step up and spend 20-30 minutes of your time writing up the draft proposal (most of it can be copied and pasted) and circulating it. In fact, given the amount of time some of you have no doubt spent writing on this and other related threads you could have put together the large majority of the proposal, circulated the draft and got other volunteers to help and already be moving forward in a positive direction. Truth be told, I would do it, but I am explicitly not going to because I think that if the community can't take that one step to move forward, then it truly doesn't deserve to. I get your comments about the slower than slow development, but that is also somewhat of a sign that it
Re: RE: Vote thread started on gene...@lucene.apache.org
Scott, We should communicate on the public list as much as possible. I'll put together the draft proposal today, post it here, and ask for feedback from both the Lucene PMC and the community. We will wait over the weekend and Monday to allow people who might have additional input the opportunity to either see this at home or at work. On Tuesday (Jan 4th) we will move forward with whatever our best effort has produced and go from there. Thanks, Troy On Thu, Dec 30, 2010 at 11:45 AM, Lombard, Scott slomb...@kingindustries.com wrote: From everything that was said it seems apparent to me that the only way for Lucene.Net to stay alive is to move back to incubation. So where do we go from here? More than 4 people have said they are willing to be committers. Is this email list the best place to start working on a proposal, should it be done between a small group offline or is there a way that the community can work on it together? Thoughts? Scott -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Thursday, December 30, 2010 2:22 PM To: lucene-net-...@lucene.apache.org Cc: lucene-net-u...@lucene.apache.org Subject: Re: RE: Vote thread started on gene...@lucene.apache.org Marco, I agree with you on this front. I feel that the first tasks that a new Lucene.Net team should focus on, in terms of development are: - Fully automating a line-by-line port using a tool such as Sharpen. This needs to become a commodity function requiring very little development effort - Bring the existing forks back in as branches within the ASF project. I am very interested in pursuing continued development on a more .NET style port (i.e. the Lucere project I started or Aimee.Net The Lucene.Net project should be able to continue with both development paths in the same project. Thanks, Troy On Thu, Dec 30, 2010 at 10:15 AM, Marco Dissel marco.dis...@gmail.com wrote: What will be the goal of new committors? Convert the source into .net style code? If yes, we should try to stop will all the spin-offs and concentrate all the development in one project. Op 30 dec. 2010 19:02 schreef Lombard, Scott slomb...@kingindustries.com het volgende: Grant, Thanks for your time explaining all the details. I will be willing work on a proposal to put Lucene.Net back in to incubation. I will need other people to step up and be committers as well. Heath has volunteered and as Grant has stated 4 committers are needed to for incubation. Who else is willing to be a committer? Grant I will definitely be taking you up on your offer to help on bring Lucene.Net into incubation. Scott -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, December 30, 2010 12:32 PM To: lucene-net-u...@lucene.apache.org Subject: Re: Vote thread started on gene...@lucene.apache.org On Dec 30, 2010, at 9:51 AM, Heath Aldrich wrote: Hi Grant, Thanks for taking the time to respond. While I have developed extensively against Lucene.net, I do not possess the java skills needed to do a port of the code... So, while I wouldn't mind being a committer, I do not think I am qualified. (I guess if I was, I could just use Lucene proper and that would be that) As to other duties of a committer, I think the ASF is perceived as a black box of questions for most of us. For one, I don't think anyone outside the 4 committers even understand *why* it is a good thing to be on the ASF vs. CodePlex, Sourceforge, etc. Maybe if there was an understanding of the why, the requirements of the ASF would make more sense. I think a lot of us right now just perceive the ASF as the group that is wanting to kill Lucene.net. I don't think we have a desire to kill it, I just think we are faced with the unfortunate reality that the project is already dead and now us on the PMC have the unfortunate job of cleaning up the mess as best we can. Again, it is not even that we want to see it go away, we on the PMC just don't want to be responsible for it's upkeep. You give me the names of 4 people who are willing to be committers (i.e. people willing to volunteer their time) and I will do my best to get the project into the Incubator. However, I have to tell you, my willingness to help is diminishing with every trip we take around this same circle of discussion. Simply put, given the way the vote has gone so far, the Lucene PMC is no longer interested in sustaining this project. If the community wishes to see it live at the ASF then one of you had better step up and spend 20-30 minutes of your time writing up the draft proposal (most of it can be copied and pasted) and circulating it. In fact, given the amount of time some of you have no doubt spent writing on this and other related threads you could have put together the large majority of the proposal, circulated the draft and got other volunteers to help and already be moving forward in a
[jira] Updated: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2611: Attachment: LUCENE-2611-branch-3x.patch branch_3x version of IntelliJ config files, including codestyle addition. Committing shortly. IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2301) RSS Feed URL Breaking
[ https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2301. Resolution: Not A Problem Based on the info you have provided, it seems that your problem has nothing to do with DIH, and everything to do with having an invalid XML file for your data config... bq. [Fatal Error] :18:63: The reference to entity c must end with the ';' delimiter. ...c=19... is not valid in an xml file, you need to properly xml escape the in the URL RSS Feed URL Breaking - Key: SOLR-2301 URL: https://issues.apache.org/jira/browse/SOLR-2301 Project: Solr Issue Type: Bug Components: clients - C# Affects Versions: 1.4.1, 4.0 Environment: Windows 7 Reporter: Adam Estrada This is an odd oneI am trying to index RSS feeds and have come across several issues. Some are more pressing than others. Referring to SOLR-2286 ;-) Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work with Home page: http://emergency.cdc.gov/rss/ Page to Index: http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19 The console reports the following and as you can see it's because it does not like the param c. Any ideas on how to fix this? INFO: Processing configuration from solrconfig.xml: {config=./solr/conf/dataimpo rthandler/rss.xml} [Fatal Error] :18:63: The reference to entity c must end with the ';' delimite r. Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler inf orm SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurre d while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm porter.java:193) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j ava:100) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor tHandler.java:112) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav a:539) at org.apache.solr.core.SolrCore.init(SolrCore.java:596) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain er.java:243) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs
[ https://issues.apache.org/jira/browse/SOLR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976143#action_12976143 ] Hoss Man commented on SOLR-2303: I think the purpose of the log4j-over-slf4j jars was so that the third party jars included in solr (and in contribs) which use log4j logging will have all of their messages funneled through slf4j so all logging for basic solr users will be consistent (JUL) -- if you remove it, some solr logging will use slf4j-JUL and some will go direct to log4j. I *think* the other log4j jars you mentioned (contrib/extraction, contrib/clustering) are the ones that should be removed. (untested that this doesn't break anything) remove unnecessary (and problematic) log4j jars in contribs --- Key: SOLR-2303 URL: https://issues.apache.org/jira/browse/SOLR-2303 Project: Solr Issue Type: Improvement Components: Build Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2303.patch In solr 4.0 there is log4j-over-slf4j. But if you have log4j jars also in the classpath (e.g. contrib/extraction, contrib/clustering) you can get strange errors such as: java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V So I think we should remove the log4j jars in these contribs, all tests pass with them removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-975) admin-extra.html not currectly display when using multicore configuration
[ https://issues.apache.org/jira/browse/SOLR-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976145#action_12976145 ] Hoss Man commented on SOLR-975: --- FYI: i'm pretty sure yonik fixed this as part of SOLR-1930, but i haven't tested... http://svn.apache.org/viewvc?view=revisionrevision=1054008 admin-extra.html not currectly display when using multicore configuration - Key: SOLR-975 URL: https://issues.apache.org/jira/browse/SOLR-975 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 1.4 Environment: Jetty openjdk 1.6.0 1.0.b12 (EPEL package for EL5) Reporter: Edward Rudd I'm having cross-talk issues with using the Solr nightlies (and probably w/ 1.3.0 release but have not tested as I needed newer features of the DataImportHandler in the nightlies) Basic scenario for this bug is as follows I have two cores configured and BOTH have a customized admin-extra.html, however going to the admin pages uses the SAME admin-extra.html for all cores. the one used is whichever core is browsed first..This looks like a caching bug where the cache is not taking into account the Core. Basically my admin-extra.html has a link to the data importer script and a link to reload the core (which has to have the core name explicitly in the per-core admin-extra.html). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs
[ https://issues.apache.org/jira/browse/SOLR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976146#action_12976146 ] Robert Muir commented on SOLR-2303: --- hoss, exactly what I tested... I think it doesn't show in the patch, but I want to remove the log4j jars in the contribs. if these are in the classpath, it causes problems for velocity etc (its test will fail). so I think they should be removed from the contribs as it can break functionality in core if you use these contribs (besides just being unnecessary bloat) remove unnecessary (and problematic) log4j jars in contribs --- Key: SOLR-2303 URL: https://issues.apache.org/jira/browse/SOLR-2303 Project: Solr Issue Type: Improvement Components: Build Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2303.patch In solr 4.0 there is log4j-over-slf4j. But if you have log4j jars also in the classpath (e.g. contrib/extraction, contrib/clustering) you can get strange errors such as: java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V So I think we should remove the log4j jars in these contribs, all tests pass with them removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2303) remove unnecessary (and problematic) log4j jars in contribs
[ https://issues.apache.org/jira/browse/SOLR-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976149#action_12976149 ] Hoss Man commented on SOLR-2303: I'm an idiot .. trying to catch up on mail i completely missread almost everything about this issue. yes, yes .. agree with you 100% .. remove the log4j jars in the contribs remove unnecessary (and problematic) log4j jars in contribs --- Key: SOLR-2303 URL: https://issues.apache.org/jira/browse/SOLR-2303 Project: Solr Issue Type: Improvement Components: Build Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2303.patch In solr 4.0 there is log4j-over-slf4j. But if you have log4j jars also in the classpath (e.g. contrib/extraction, contrib/clustering) you can get strange errors such as: java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V So I think we should remove the log4j jars in these contribs, all tests pass with them removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-3.x - Build # 227 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/227/ All tests passed Build Log (for compile errors): [...truncated 20926 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-975) admin-extra.html not currectly display when using multicore configuration
[ https://issues.apache.org/jira/browse/SOLR-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976154#action_12976154 ] Edward Rudd commented on SOLR-975: -- From looking at the diff it looks like it could fix it.. But it needs to be verified that it is indeed fixed. I'll at to my TODO list to pull down a nightly build next week and test. admin-extra.html not currectly display when using multicore configuration - Key: SOLR-975 URL: https://issues.apache.org/jira/browse/SOLR-975 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 1.4 Environment: Jetty openjdk 1.6.0 1.0.b12 (EPEL package for EL5) Reporter: Edward Rudd I'm having cross-talk issues with using the Solr nightlies (and probably w/ 1.3.0 release but have not tested as I needed newer features of the DataImportHandler in the nightlies) Basic scenario for this bug is as follows I have two cores configured and BOTH have a customized admin-extra.html, however going to the admin pages uses the SAME admin-extra.html for all cores. the one used is whichever core is browsed first..This looks like a caching bug where the cache is not taking into account the Core. Basically my admin-extra.html has a link to the data importer script and a link to reload the core (which has to have the core name explicitly in the per-core admin-extra.html). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: strange problem of PForDelta decoder
searching multi segments is a alternative solution but it has some disadvantages. 1. idf is not global?(I am not familiar with its implementation) maybe it's easy to solve it by share global idf 2. each segments will has it's own tii and tis files, which may make search slower(that's why optimization of index is neccessary) 3. one term's docList is distributed in many files rather than one. more than one frq files means hard disk must seek different tracks, it's time consuming. if there is only one segment, the are likely stored in a single track. 2010/12/31 Earwin Burrfoot ear...@gmail.com: until we fix Lucene to run a single search concurrently (which we badly need to do). I am interested in this idea.(I have posted it before) do you have some resources such as papers or tech articles about it? I have tried but it need to modify index format dramatically and we use solr distributed search to relieve the problem of response time. so finally give it up. lucene4's index format is more flexible that it supports customed codecs and it's now on development, I think it's good time to take it into consideration that let it support multithread searching for a single query. I have a naive solution. dividing docList into many groups e.g grouping docIds by it's even or odd term1 df1=4 docList = 0 4 8 10 term1 df2=4 docList = 1 3 9 11 term2 df1=4 docList = 0 6 8 12 term2 df2=4 docList = 3 9 11 15 then we can use 2 threads to search topN docs on even group and odd group and finally merge their results into a single on just like solr distributed search. But it's better than solr distributed search. First, it's in a single process and data communication between threads is much faster than network. Second, each threads process the same number of documents.For solr distributed search, one shard may process 7 documents and another shard may 1 document Even if we can make each shard have the same document number. we can not make it uniformly for each term. e.g. shard1 has doc1 doc2 shard2 has doc3 doc4 but term1 may only occur in doc1 and doc2 while term2 may only occur in doc3 and doc4 we may modify it shard1 doc1 doc3 shard2 doc2 doc4 it's good for term1 and term2 but term3 may occur in doc1 and doc3... So I think it's fine-grained distributed in index while solr distributed search is coarse- grained. This is just crazy :) The simple way is just to search different segments in parallel. BalancedSegmentMergePolicy makes sure you have roughly even-sized large segments (and small ones don't count, they're small!). If you're bound on squeezing out that extra millisecond (and making your life miserable along the way), you can search a single segment with multiple threads (by dividing it in even chunks, and then doing skipTo to position your iterators to the beginning of each chunk). First approach is really easy to implement. Second one is harder, but still doesn't require you to cook the number of CPU cores available into your index! It's the law of diminishing returns at play here. You're most likely to search in parallel over mostly memory-resident index (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems tend to slow down considerably on parallel sequential reads, so you already have pretty decent speed. Searching different segments in parallel (with BSMP) makes you several times faster. Searching in parallel within a segment requires some weird hacks, but has maybe a few percent advantage over previous solution. Sharding posting lists requires a great deal of weird hacks, makes index machine-bound, and boosts speed by another couple of percent. Sounds worthless. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: strange problem of PForDelta decoder
plus 2 means search a term need seek many times for tis(if it's not cached in tii) 2010/12/31 Li Li fancye...@gmail.com: searching multi segments is a alternative solution but it has some disadvantages. 1. idf is not global?(I am not familiar with its implementation) maybe it's easy to solve it by share global idf 2. each segments will has it's own tii and tis files, which may make search slower(that's why optimization of index is neccessary) 3. one term's docList is distributed in many files rather than one. more than one frq files means hard disk must seek different tracks, it's time consuming. if there is only one segment, the are likely stored in a single track. 2010/12/31 Earwin Burrfoot ear...@gmail.com: until we fix Lucene to run a single search concurrently (which we badly need to do). I am interested in this idea.(I have posted it before) do you have some resources such as papers or tech articles about it? I have tried but it need to modify index format dramatically and we use solr distributed search to relieve the problem of response time. so finally give it up. lucene4's index format is more flexible that it supports customed codecs and it's now on development, I think it's good time to take it into consideration that let it support multithread searching for a single query. I have a naive solution. dividing docList into many groups e.g grouping docIds by it's even or odd term1 df1=4 docList = 0 4 8 10 term1 df2=4 docList = 1 3 9 11 term2 df1=4 docList = 0 6 8 12 term2 df2=4 docList = 3 9 11 15 then we can use 2 threads to search topN docs on even group and odd group and finally merge their results into a single on just like solr distributed search. But it's better than solr distributed search. First, it's in a single process and data communication between threads is much faster than network. Second, each threads process the same number of documents.For solr distributed search, one shard may process 7 documents and another shard may 1 document Even if we can make each shard have the same document number. we can not make it uniformly for each term. e.g. shard1 has doc1 doc2 shard2 has doc3 doc4 but term1 may only occur in doc1 and doc2 while term2 may only occur in doc3 and doc4 we may modify it shard1 doc1 doc3 shard2 doc2 doc4 it's good for term1 and term2 but term3 may occur in doc1 and doc3... So I think it's fine-grained distributed in index while solr distributed search is coarse- grained. This is just crazy :) The simple way is just to search different segments in parallel. BalancedSegmentMergePolicy makes sure you have roughly even-sized large segments (and small ones don't count, they're small!). If you're bound on squeezing out that extra millisecond (and making your life miserable along the way), you can search a single segment with multiple threads (by dividing it in even chunks, and then doing skipTo to position your iterators to the beginning of each chunk). First approach is really easy to implement. Second one is harder, but still doesn't require you to cook the number of CPU cores available into your index! It's the law of diminishing returns at play here. You're most likely to search in parallel over mostly memory-resident index (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems tend to slow down considerably on parallel sequential reads, so you already have pretty decent speed. Searching different segments in parallel (with BSMP) makes you several times faster. Searching in parallel within a segment requires some weird hacks, but has maybe a few percent advantage over previous solution. Sharding posting lists requires a great deal of weird hacks, makes index machine-bound, and boosts speed by another couple of percent. Sounds worthless. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-trunk - Build # 1411 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1411/ All tests passed Build Log (for compile errors): [...truncated 17900 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: strange problem of PForDelta decoder
is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/) it says Multithreading. Indices can be queried and scored concurrently. maybe we can learn something from it. 2010/12/31 Li Li fancye...@gmail.com: plus 2 means search a term need seek many times for tis(if it's not cached in tii) 2010/12/31 Li Li fancye...@gmail.com: searching multi segments is a alternative solution but it has some disadvantages. 1. idf is not global?(I am not familiar with its implementation) maybe it's easy to solve it by share global idf 2. each segments will has it's own tii and tis files, which may make search slower(that's why optimization of index is neccessary) 3. one term's docList is distributed in many files rather than one. more than one frq files means hard disk must seek different tracks, it's time consuming. if there is only one segment, the are likely stored in a single track. 2010/12/31 Earwin Burrfoot ear...@gmail.com: until we fix Lucene to run a single search concurrently (which we badly need to do). I am interested in this idea.(I have posted it before) do you have some resources such as papers or tech articles about it? I have tried but it need to modify index format dramatically and we use solr distributed search to relieve the problem of response time. so finally give it up. lucene4's index format is more flexible that it supports customed codecs and it's now on development, I think it's good time to take it into consideration that let it support multithread searching for a single query. I have a naive solution. dividing docList into many groups e.g grouping docIds by it's even or odd term1 df1=4 docList = 0 4 8 10 term1 df2=4 docList = 1 3 9 11 term2 df1=4 docList = 0 6 8 12 term2 df2=4 docList = 3 9 11 15 then we can use 2 threads to search topN docs on even group and odd group and finally merge their results into a single on just like solr distributed search. But it's better than solr distributed search. First, it's in a single process and data communication between threads is much faster than network. Second, each threads process the same number of documents.For solr distributed search, one shard may process 7 documents and another shard may 1 document Even if we can make each shard have the same document number. we can not make it uniformly for each term. e.g. shard1 has doc1 doc2 shard2 has doc3 doc4 but term1 may only occur in doc1 and doc2 while term2 may only occur in doc3 and doc4 we may modify it shard1 doc1 doc3 shard2 doc2 doc4 it's good for term1 and term2 but term3 may occur in doc1 and doc3... So I think it's fine-grained distributed in index while solr distributed search is coarse- grained. This is just crazy :) The simple way is just to search different segments in parallel. BalancedSegmentMergePolicy makes sure you have roughly even-sized large segments (and small ones don't count, they're small!). If you're bound on squeezing out that extra millisecond (and making your life miserable along the way), you can search a single segment with multiple threads (by dividing it in even chunks, and then doing skipTo to position your iterators to the beginning of each chunk). First approach is really easy to implement. Second one is harder, but still doesn't require you to cook the number of CPU cores available into your index! It's the law of diminishing returns at play here. You're most likely to search in parallel over mostly memory-resident index (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems tend to slow down considerably on parallel sequential reads, so you already have pretty decent speed. Searching different segments in parallel (with BSMP) makes you several times faster. Searching in parallel within a segment requires some weird hacks, but has maybe a few percent advantage over previous solution. Sharding posting lists requires a great deal of weird hacks, makes index machine-bound, and boosts speed by another couple of percent. Sounds worthless. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 3205 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3205/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin Error Message: expected:1 but was:0 Stack Trace: junit.framework.AssertionFailedError: expected:1 but was:0 at org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:256) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1109) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1047) Build Log (for compile errors): [...truncated 7671 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr-3.x - Build # 213 - Failure
Build: https://hudson.apache.org/hudson/job/Solr-3.x/213/ All tests passed Build Log (for compile errors): [...truncated 20564 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org