[jira] Commented: (SOLR-777) backword match search, for domain search etc.
[ https://issues.apache.org/jira/browse/SOLR-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633160#action_12633160 ] Koji Sekiguchi commented on SOLR-777: - bq. Koji, I'd stick it in contrib. Oops. I didn't notice your reply and opened LUCENE-1398 that adds it in core. backword match search, for domain search etc. - Key: SOLR-777 URL: https://issues.apache.org/jira/browse/SOLR-777 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-777-reverseStringFilter.patch There is a requirement for searching domains with backward match. For example, using apache.org for a query string, www.apache.org, lucene.apache.org could be returned. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Solr 1.3.0 Release Lessons Learned
Hey Solr Devs, So, 1.3.0 is out. Whew! I think I survived. I hope y'all did too. At any rate, I promised Lars I would follow up on his comment: http://lucene.markmail.org/message/ynsnkigymbv7kfqn?q=%5BVOTE%5D+Solr+1%2E3 , so here goes. So, what are the lessons learned? What can we do to improve Solr's process, if any? I saw a few pain points that I think are easily addressed: 1. IMO, way too long between release of 1.2 and 1.3 (1 year, 3 months). Yes, releases cause everyone to pause and take stock, but they are worthwhile, and not just technically. Many users only use releases. Many people don't notice a project except when they see it via some PR. Releasing more often can help attract more contributors/ users which should lead to a better Solr. Additionally, I imagine some people upgrading from 1.2 to 1.3 are trying to swallow a pretty big pill of features. Granted, things should be back-compat, but even that is hard to track when something is a 1+ year ago. I'd suggest we shoot for every 6 mos. or so, and maybe even some bug fix releases more often. 2. Last minute changes. The mutlicore changes 1 week before release were pretty tough to swallow. Great job to those involved who took it on, but still, let's not do that again, eh? One _suggestion_ is that we try to front-load big features. Hard to do, but maybe the other approach is that if we are about to take on a big new feature, we consider what other big new features are already in Solr and then maybe consider publishing them first and holding off for the next version the new feature. Another possibility on this is a slight relaxation in back-compatibility policy in that for big features, we reserve the right to alter them in a build version release. The main thing that this addresses is a lot of people feel uncomfortable on trunk, so maybe it's a way of getting more eyeballs. Of course, we do this already to some extent when we mark things as experimental, so maybe nothing to change here. Just thinking out loud. 3. We need to keep better track of NOTICEs, headers and library stuff. Yonik and others did a lot to get these up to date again. I know I'm especially guilty of forgetting to put headers on. You can now run ant rat-sources for help in identifying offending files. Thoughts? Any thing else to consider? -Grant
Re: Solr 1.3.0 Release Lessons Learned
I'd suggest we shoot for every 6 mos. or so, and maybe even some bug fix releases more often. Maybe we should even do something where we have a stable and an experimental release. Especially for big things like the distributed search stuff this will potentially attract more people to test everything. It'd also be easier to do something that's not backwards compatible because people can always revert back to the stable release. Lars
Re: Solr 1.3.0 Release Lessons Learned
where we have a stable and an experimental release. Good idea. Also may be good to nominate a release manager. It seemed like features were being thrown in constantly that were perhaps beyond the intended scope (was there one?) of SOLR 1.3. Probably next time, maybe 2-3 large features and some bug fixes and then do a release. The biggest new feature right now that affects customers and users is distributed search with failover. It is hard to tell a customer that at any time the search servers could fail and there (i.e. the master can replicate bad data) is not much that can be done about it. On Mon, Sep 22, 2008 at 9:42 AM, Lars Kotthoff [EMAIL PROTECTED] wrote: I'd suggest we shoot for every 6 mos. or so, and maybe even some bug fix releases more often. Maybe we should even do something where we have a stable and an experimental release. Especially for big things like the distributed search stuff this will potentially attract more people to test everything. It'd also be easier to do something that's not backwards compatible because people can always revert back to the stable release. Lars
Re: Solr 1.3.0 Release Lessons Learned
On Sep 22, 2008, at 11:37 AM, Jason Rutherglen wrote: where we have a stable and an experimental release. Good idea. Also may be good to nominate a release manager. It seemed like features were being thrown in constantly that were perhaps beyond the intended scope (was there one?) of SOLR 1.3. Probably next time, maybe 2-3 large features and some bug fixes and then do a release. I only have two data points on this that I can add. On the Lucene Java side, it's always been a bit like herding cats. It really is up to the community, and usually a few committers to push for a release. We have a tacit agreement that we would like to release every 4-6 months. If you look at the Solr archives, we would often saw people asking when is 1.3 going to be out and our response was usually something like we're working on X, and X kept changing. This isn't a bad thing, necessarily. Open Source is hard to plan, you never know where some nice new idea is coming from, so it could be there is momentum towards a release and then bam, someone comes in w/ a big new bug or a big feature. Still, we could be better about saying, OK, this is great, let's do a release in 2 weeks and then add this nice new feature. On the flip side of my experience is Hadoop. Y! has assigned a number of resources to it, a number of which are committers, and also others as support. They have people providing management in JIRA, driving release planning, etc. Subscribing to the dev list is darn near overwhelming. They also have fairly well timed out releases w/ well enumerated list of features, fixes, etc. In other words, it's more commercially driven. Personally, I think we just need to have some sort of verbal agreement, as committers/contributors to work towards more timely, smaller releases. I don't want to release just for the sake of releasing, but I also want to see incremental advancements available to more people more often. -Grant
[jira] Created: (SOLR-784) Support loading queries from external files in QuerySenderListener
Support loading queries from external files in QuerySenderListener -- Key: SOLR-784 URL: https://issues.apache.org/jira/browse/SOLR-784 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 1.4 QuerySenderListener currently uses the NamedList format for loading queries. It is very cumbersome to write queries in such a verbose format. QuerySenderListener should support loading queries in the URL format (as parameters) from an external file (one per line) to make it easier to write and manage warming queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr 1.3.0 Release Lessons Learned
On Mon, Sep 22, 2008 at 7:04 PM, Grant Ingersoll [EMAIL PROTECTED]wrote: Hey Solr Devs, So, 1.3.0 is out. Whew! I think I survived. I hope y'all did too. At any rate, I promised Lars I would follow up on his comment: http://lucene.markmail.org/message/ynsnkigymbv7kfqn?q=%5BVOTE%5D+Solr+1%2E3, so here goes. Thanks for taking the initiative here, Grant. So, what are the lessons learned? What can we do to improve Solr's process, if any? I saw a few pain points that I think are easily addressed: 1. IMO, way too long between release of 1.2 and 1.3 (1 year, 3 months). Yes, releases cause everyone to pause and take stock, but they are worthwhile, and not just technically. Many users only use releases. Many people don't notice a project except when they see it via some PR. Releasing more often can help attract more contributors/users which should lead to a better Solr. Additionally, I imagine some people upgrading from 1.2 to 1.3 are trying to swallow a pretty big pill of features. Granted, things should be back-compat, but even that is hard to track when something is a 1+ year ago. I'd suggest we shoot for every 6 mos. or so, and maybe even some bug fix releases more often. I'd love to see more frequent releases and I'm more than happy to work towards that. I'd prefer if we put an upper bound to releases rather than a strict interval. As Jason suggested, we should plan releases according to features. Consider replication (SOLR-561), it is getting near to a fully baked patch and it'd be nice to make it available to users much before six months. I'd like to propose a more pro-active approach to release planning by the community. At any given time, let's have two versions in JIRA. Only those issues which a committer has assigned to himself should be in the first un-released version. All unassigned issues must be kept in the second un-released version. If a committer assigns and promotes an issue to the first un-released version, he should feel confident enough to resolve the issue one way or another within 3 months of the last release else he should mark it for the second version. At any given time, anybody can call a vote on releasing with the trunk features. If we feel confident enough and the list of resolved issues substantial enough, we can work according to our current way of release planning (deferring open issues, creating a branch, prioritizing bugs, putting up an RC and then release). The above strategy will ensure that we stay nimble and do frequent releases without putting a lot of pressure on committers or the release manager. Stable trunk features will not starve and large features will not delay releases indefinitely. We will have an upper bound on release dates as well as the flexibility to release when the community feels confident. Let us mark large features and core changes appropriately, creating additional versions in Jira as and when applicable. We always have the flexibility to promote features to earlier release versions if we feel it is getting matured enough. Thoughts? 2. Last minute changes. The mutlicore changes 1 week before release were pretty tough to swallow. Great job to those involved who took it on, but still, let's not do that again, eh? One _suggestion_ is that we try to front-load big features. Hard to do, but maybe the other approach is that if we are about to take on a big new feature, we consider what other big new features are already in Solr and then maybe consider publishing them first and holding off for the next version the new feature. Another possibility on this is a slight relaxation in back-compatibility policy in that for big features, we reserve the right to alter them in a build version release. The main thing that this addresses is a lot of people feel uncomfortable on trunk, so maybe it's a way of getting more eyeballs. Of course, we do this already to some extent when we mark things as experimental, so maybe nothing to change here. Just thinking out loud. +1 to avoid last minute changes. At one point, it seemed like we'll never release 1.3 3. We need to keep better track of NOTICEs, headers and library stuff. Yonik and others did a lot to get these up to date again. I know I'm especially guilty of forgetting to put headers on. You can now run ant rat-sources for help in identifying offending files. I too am guilty on this front going as far as suggesting to release with the stax libs unchanged. I promise to pay closer attention to these aspects. -- Regards, Shalin Shekhar Mangar.
Re: Solr 1.3.0 Release Lessons Learned
On 22-Sep-08, at 10:34 AM, Shalin Shekhar Mangar wrote: I'd like to propose a more pro-active approach to release planning by the community. At any given time, let's have two versions in JIRA. Only those issues which a committer has assigned to himself should be in the first un-released version. All unassigned issues must be kept in the second un-released version. If a committer assigns and promotes an issue to the first un-released version, he should feel confident enough to resolve the issue one way or another within 3 months of the last release else he should mark it for the second version. At any given time, anybody can call a vote on releasing with the trunk features. If we feel confident enough and the list of resolved issues substantial enough, we can work according to our current way of release planning (deferring open issues, creating a branch, prioritizing bugs, putting up an RC and then release). I think that this is the right approach, but I don't think that it needs to be that complicated. For issues without the expectation of completion that you mention, it is fine to just not assign a version to the issue. It _would_ be useful, OTOH, to have a 2.0 version in JIRA for issues we know won't be resolved back-compatibly. -Mike
Re: Solr 1.3.0 Release Lessons Learned
I agree with Mike. The simpler you make it the higher the chances of the plan being followed. I had to re-read the part about un-released versions. Moreover, rigid rules work great when people doing the work can really spend quality time on the project. This means that people working on Solr through/for their work are really the people who could stick to the plan and everyone else will do whatever is possible for that individual at a time. Hadoop is a good example, and with a couple of people working on Solr full-time now, more structured approach might be possible. That said, I'd be careful of not making things too work-like (deadlines, committments, etc.) or else you risk losing people who have enough deadlines and other pressures already. I'm not sure about that experimental vs. stable release suggestion - I think it can be as simple as treat the trunk as experimental/in development (which it is!) and only releases are stable. In other words, no need to change anything there IMHO. Otis - Original Message From: Mike Klaas [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Monday, September 22, 2008 1:40:59 PM Subject: Re: Solr 1.3.0 Release Lessons Learned On 22-Sep-08, at 10:34 AM, Shalin Shekhar Mangar wrote: I'd like to propose a more pro-active approach to release planning by the community. At any given time, let's have two versions in JIRA. Only those issues which a committer has assigned to himself should be in the first un-released version. All unassigned issues must be kept in the second un-released version. If a committer assigns and promotes an issue to the first un-released version, he should feel confident enough to resolve the issue one way or another within 3 months of the last release else he should mark it for the second version. At any given time, anybody can call a vote on releasing with the trunk features. If we feel confident enough and the list of resolved issues substantial enough, we can work according to our current way of release planning (deferring open issues, creating a branch, prioritizing bugs, putting up an RC and then release). I think that this is the right approach, but I don't think that it needs to be that complicated. For issues without the expectation of completion that you mention, it is fine to just not assign a version to the issue. It _would_ be useful, OTOH, to have a 2.0 version in JIRA for issues we know won't be resolved back-compatibly. -Mike
[jira] Commented: (SOLR-765) ant example fails if example/work directory doesn't exist
[ https://issues.apache.org/jira/browse/SOLR-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633436#action_12633436 ] Hoss Man commented on SOLR-765: --- 1) my previous comment was in regards to your possible to check for the existence and create them/ignore errors in the build script statement 2) regarding the patch: it can do harm. I could have sworn i posted a comment about this before resolving but clearly i forgot to hit submit or something... The purpose cleaning out the work directory is to provide a safety check to ensure that an older copy of the webapp (with old jars) isn't left lying around where Jetty might pick it up and use it by mistake -- it's not suppose to happen but it can. If we use failonerror=false it won't just silently ignore a missing work directly, it will also silently ignore any situation where files in the work directory can't be deleted by ant (because of the way the perms are set, or because windows prevents open files from being deleted, etc..) which means the whole point (the safety check) is gone -- people will unwittingly be using old jars and banging their heads against their keyboards not knowing why ant example fails if example/work directory doesn't exist - Key: SOLR-765 URL: https://issues.apache.org/jira/browse/SOLR-765 Project: Solr Issue Type: Bug Affects Versions: 1.3 Reporter: Lars Kotthoff Priority: Minor Attachments: SOLR-765.patch Running ant example when there's no example/work directory causes the build to fail because the task tries to delete the (non-existent) directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr 1.3.0 Release Lessons Learned
I guess I went a bit overboard with the plan :-) Yes, I agree about the point on hard deadlines. However, I do feel that marking an issue to the next immediate release should represent a priority to the issue from us. The core of my suggestion is continuous release planning to ensure releasing early and releasing often. It is also an incentive to scope issues appropriately. On Tue, Sep 23, 2008 at 12:08 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I agree with Mike. The simpler you make it the higher the chances of the plan being followed. I had to re-read the part about un-released versions. Moreover, rigid rules work great when people doing the work can really spend quality time on the project. This means that people working on Solr through/for their work are really the people who could stick to the plan and everyone else will do whatever is possible for that individual at a time. Hadoop is a good example, and with a couple of people working on Solr full-time now, more structured approach might be possible. That said, I'd be careful of not making things too work-like (deadlines, committments, etc.) or else you risk losing people who have enough deadlines and other pressures already. I'm not sure about that experimental vs. stable release suggestion - I think it can be as simple as treat the trunk as experimental/in development (which it is!) and only releases are stable. In other words, no need to change anything there IMHO. Otis - Original Message From: Mike Klaas [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Monday, September 22, 2008 1:40:59 PM Subject: Re: Solr 1.3.0 Release Lessons Learned On 22-Sep-08, at 10:34 AM, Shalin Shekhar Mangar wrote: I'd like to propose a more pro-active approach to release planning by the community. At any given time, let's have two versions in JIRA. Only those issues which a committer has assigned to himself should be in the first un-released version. All unassigned issues must be kept in the second un-released version. If a committer assigns and promotes an issue to the first un-released version, he should feel confident enough to resolve the issue one way or another within 3 months of the last release else he should mark it for the second version. At any given time, anybody can call a vote on releasing with the trunk features. If we feel confident enough and the list of resolved issues substantial enough, we can work according to our current way of release planning (deferring open issues, creating a branch, prioritizing bugs, putting up an RC and then release). I think that this is the right approach, but I don't think that it needs to be that complicated. For issues without the expectation of completion that you mention, it is fine to just not assign a version to the issue. It _would_ be useful, OTOH, to have a 2.0 version in JIRA for issues we know won't be resolved back-compatibly. -Mike -- Regards, Shalin Shekhar Mangar.
[jira] Updated: (SOLR-765) ant example fails if example/work directory doesn't exist
[ https://issues.apache.org/jira/browse/SOLR-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Kotthoff updated SOLR-765: --- Attachment: SOLR-765.patch Right, fair enough. I'm attaching a new patch which calls mkdir on example/work and example/log. If the directories exist, nothing happens. If for some reason they've been lost, they are created. ant example fails if example/work directory doesn't exist - Key: SOLR-765 URL: https://issues.apache.org/jira/browse/SOLR-765 Project: Solr Issue Type: Bug Affects Versions: 1.3 Reporter: Lars Kotthoff Priority: Minor Attachments: SOLR-765.patch, SOLR-765.patch Running ant example when there's no example/work directory causes the build to fail because the task tries to delete the (non-existent) directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-785) Distributed SpellCheckComponent
Distributed SpellCheckComponent --- Key: SOLR-785 URL: https://issues.apache.org/jira/browse/SOLR-785 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Shalin Shekhar Mangar Enhance the SpellCheckComponent to run in a distributed (sharded) environment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr 1.3.0 Release Lessons Learned
Yes, I think more consistent use of Fix for version will already be a very good step forward. We started using that more consistently only right before the release, I'd say. That is one easy thing we can do and it will allow us to quickly tell us where we are, whether we have enough meat for the release, what else is in the queue, etc. Personally, I think sticking with just better use of that + vote JIRA functionality would get us 90% there. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Shalin Shekhar Mangar [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Monday, September 22, 2008 3:33:53 PM Subject: Re: Solr 1.3.0 Release Lessons Learned I guess I went a bit overboard with the plan :-) Yes, I agree about the point on hard deadlines. However, I do feel that marking an issue to the next immediate release should represent a priority to the issue from us. The core of my suggestion is continuous release planning to ensure releasing early and releasing often. It is also an incentive to scope issues appropriately. On Tue, Sep 23, 2008 at 12:08 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I agree with Mike. The simpler you make it the higher the chances of the plan being followed. I had to re-read the part about un-released versions. Moreover, rigid rules work great when people doing the work can really spend quality time on the project. This means that people working on Solr through/for their work are really the people who could stick to the plan and everyone else will do whatever is possible for that individual at a time. Hadoop is a good example, and with a couple of people working on Solr full-time now, more structured approach might be possible. That said, I'd be careful of not making things too work-like (deadlines, committments, etc.) or else you risk losing people who have enough deadlines and other pressures already. I'm not sure about that experimental vs. stable release suggestion - I think it can be as simple as treat the trunk as experimental/in development (which it is!) and only releases are stable. In other words, no need to change anything there IMHO. Otis - Original Message From: Mike Klaas To: solr-dev@lucene.apache.org Sent: Monday, September 22, 2008 1:40:59 PM Subject: Re: Solr 1.3.0 Release Lessons Learned On 22-Sep-08, at 10:34 AM, Shalin Shekhar Mangar wrote: I'd like to propose a more pro-active approach to release planning by the community. At any given time, let's have two versions in JIRA. Only those issues which a committer has assigned to himself should be in the first un-released version. All unassigned issues must be kept in the second un-released version. If a committer assigns and promotes an issue to the first un-released version, he should feel confident enough to resolve the issue one way or another within 3 months of the last release else he should mark it for the second version. At any given time, anybody can call a vote on releasing with the trunk features. If we feel confident enough and the list of resolved issues substantial enough, we can work according to our current way of release planning (deferring open issues, creating a branch, prioritizing bugs, putting up an RC and then release). I think that this is the right approach, but I don't think that it needs to be that complicated. For issues without the expectation of completion that you mention, it is fine to just not assign a version to the issue. It _would_ be useful, OTOH, to have a 2.0 version in JIRA for issues we know won't be resolved back-compatibly. -Mike -- Regards, Shalin Shekhar Mangar.
Re: facet.sort parameter
: The redesign I propose is changing the facet.sort parameter from a boolean to a : string and explicitely specify the sort method (with a default method if the : parameter isn't specified). You'd use facet.sort=count to sort by facet count : and something like facet.sort=lex to sort lexicographically. This would be using : term order internally, but not expose it. This redesign would also increase the : flexibility as more sort methods can be added easily. : : What does everybody think? if i'm understanding you: you're really just suggesting a syntactic change, correct? true becomes count and false becomes lex ? I don't really see anything wrong with adding count and lex as aliases for true/false and deprecating true/false -- i agree that would probably make them easier to remember -- but we need to continue to support true/false for back-compat. (the slightly tricky thing is making it a string param but ensureing stuff continues to work for people who use bool name=facet.sorttrue/bool to set the params using default configs without getting into ClassCast/instanceOf nightmares) -Hoss