[Lucene.Net] Minor problem with using code from the trunk with vb.net project.
Morning, I checked out and compiled https://svn.apache.org/repos/asf/incubator/lucene.net/trunk yesterday, looking to update from 2.0.0.4 To get the library to work with VB.Net I found I had to edit TopDocs.cs (src/core/Search/TopDocs.cs). Being case-insensitive VB.Net can't differentiate between the three public variables (totalHits, scoreDocs maxScore) and the three public properties (TotalHits, ScoreDocs MaxScore) David
RE: [Lucene.Net] Minor problem with using code from the trunk with vb.net project.
I'm not concerned one way or another and realise trunk is still a work in progress. I just wanted to point out the clash between the same named variables/properties, I know C# is case sensitive and it isn't troubled by this, but vb.net is case-insensitive, and as it is currently, lucene.net will be unusable by any vb.net project where TopDocs is used. David -Original Message- From: Nicholas Paldino [.NET/C# MVP] [mailto:casper...@caspershouse.com] Sent: Friday, 6 May 2011 11:41 a.m. To: lucene-net-dev@lucene.apache.org Subject: RE: [Lucene.Net] Minor problem with using code from the trunk with vb.net project. David, Apologies if this is pedantic, but that should be one of the goals, to move toward .NET naming conventions (which Lucene.NET does not abide by, and it makes for an odd fit). - Nick -Original Message- From: David Smith [mailto:dav...@nzcity.co.nz] Sent: Thursday, May 05, 2011 6:18 PM To: lucene-net-dev@lucene.apache.org Subject: [Lucene.Net] Minor problem with using code from the trunk with vb.net project. Morning, I checked out and compiled https://svn.apache.org/repos/asf/incubator/lucene.net/trunk yesterday, looking to update from 2.0.0.4 To get the library to work with VB.Net I found I had to edit TopDocs.cs (src/core/Search/TopDocs.cs). Being case-insensitive VB.Net can't differentiate between the three public variables (totalHits, scoreDocs maxScore) and the three public properties (TotalHits, ScoreDocs MaxScore) David
[jira] [Created] (PYLUCENE-9) QueryParser replacing stop words with wildcards
QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029666#comment-13029666 ] Andi Vajda commented on PYLUCENE-9: --- Are you sure you're comparing the right versions ? Lucene.Net is quite behind Java Lucene and in more recent versions lots of things changed. For instance, trying different Version instances gives different results, notably LUCENE_24 works as you seem to expect: qp = QueryParser(Version.LUCENE_29, ft, StandardAnalyzer(Version.LUCENE_29)) qp.parse('Calendar Item as Msg') Query: ft:calendar item ? msg -- the 'as' stop word gets replaced by a hole as expected in that version qp = QueryParser(Version.LUCENE_24, ft, StandardAnalyzer(Version.LUCENE_24)) qp.parse('Calendar Item as Msg') Query: ft:calendar item msg -- works as Lucene.Net (probably, as I've never run it) I'm inclined to resolve this bug as INVALID unless I'm missing something here. Please, let me know. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029674#comment-13029674 ] Christopher Currens commented on PYLUCENE-9: I was very hesitant to report this as a bug, since pylucene isn't a port, rather just recompiled. I am positive I am comparing the correct versions (I'm a committer on Lucene.Net). I'll show you all the configurations I've done: Lucene.Net 2.9.2 - Valid Lucene.Net 2.9.4 - Valid Java Lucene (via Luke 1.0.1 (uses Lucene 2.9.4)) - Valid Java Lucene (via Luke 3.1.0 (uses Lucene 3.0)) - Valid pyLucene (Lucene 2.9.2) - Invalid replaced by single Wildcard ('?') pyLucene (Lucene 2.9.4) - Invalid replaced by single Wildcard ('?') pyLucene (Lucene 3.0.3) - Invalid replaced by single Wildcard ('?') Those tests are all on the 32-bin Win-XP. The ubuntu box I've used was using pyLucene w/ lucene 2.9.2. One thing I hadn't considered, though, was to see if it can be replicated outside of the many machines I've used myself to test, specifically if there's in issue with our building of it via JCC, or something in our environment. But considering I've tried it at work and at home, there's no real other place I can test it. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029691#comment-13029691 ] Andi Vajda commented on PYLUCENE-9: --- Could you please ask on the java-u...@lucene.apache.org list what is actually the expected behavior from Java Lucene's point of view with versions Version.LUCENE_24, 29 and 30 passed to both the QueryParser and StandardAnalyzer contructors. I remember this changing at some point but I'm not sure when. Nor do I see, without further investigation how PyLucene could be different there as it just invokes the embedded Java Lucene jar. Thanks ! QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029150#comment-13029150 ] Doron Cohen commented on LUCENE-3068: - This is more complex than I originally thought. # QueryParser creates a MultiplePhraseQuery (MPQ) when one of the (phrase) query positions is a multi-term. # MPQ has an implicit OR behavior - it is used for e.g. wildcarding a phrase query. # PhraseQuery (PQ) sloppy scorer assumes each query position has a single term. # PQ with several terms in same position cannot be created by parsing it with a QP, only manual. Manually created, it would have an AND semantics: only docs with ALL the terms in pos N should match. In other words, assume doc D terms and positions are: a:0 b:1 c:1 d:2 MPQ for (a,b):0 d:1 should match D, finding the phrase b:1 d:2 (OR semantics) PQ for (a,b):0 d:1 should not match D, because it does not contain 'a' and 'b' in the same position (AND semantics). Therefore, rewriting PQ into MPQ is not a valid fix, because it would turn the AND logic assumed by creating the PQ this way, by an OR logic as assumed in MPQ. {code:title=TestPositionIncrement.testSetPosition has a test for this case exactly} // phrase query should fail for non existing searched term // even if there exist another searched terms in the same searched position. q = new PhraseQuery(); q.add(new Term(field, 3),0); q.add(new Term(field, 9),0); hits = searcher.search(q, null, 1000).scoreDocs; assertEquals(0, hits.length); {code} Although QP by default will not create this PQ, I think we need to support it, for applications needing to be strict with the search results, with slop. So fixing this would need to take place inside SloppyScorer, digging further... The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch, LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3073) make compoundfilewriter public
[ https://issues.apache.org/jira/browse/LUCENE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029205#comment-13029205 ] Simon Willnauer commented on LUCENE-3073: - +1 - patch looks good make compoundfilewriter public -- Key: LUCENE-3073 URL: https://issues.apache.org/jira/browse/LUCENE-3073 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Priority: Minor Attachments: LUCENE-3073.patch CompoundFileReader is public, but CompoundFileWriter is not. I propose we make it public + @lucene.internal instead (just in case someone else finds themselves wanting to manipulate cfs files) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: I was accepted in GSoC!!!
By the way, guys. LuSolr SVN repository is mirrored @ git://git.apache.org/lucene-solr.git , which is in turn mirrored @ https://github.com/apache/lucene-solr . Working with git (maybe with stgit) is easier than juggling patches by hand. On Wed, May 4, 2011 at 15:00, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hi Uwe, do you mean one issue per GSoC proposal, or one for every logical unit in the project? If the second: Robert told me to use the flexscoring branch as a base for my project, since preliminary work has already been done in that branch. Should I open JIRA issues nevertheless? Thanks, David On 2011 May 04, Wednesday 09:56:02 Uwe Schindler wrote: Hi Vinicius, Submitting patches via JIRA is fine! We were just thinking about possibly providing some SVN to work with (as additional training), but came to the conclusion, that all students should go the standard Apache Lucene way of submitting patches to JIRA issues. You can of course still use SVN / GIT locally to organize your code. At the end we just need a patch to be committed by one of the core committers. Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3068: Attachment: LUCENE-3068.patch Attached patch fixes this bug by excluding fro the repeats check those PPs originated fro same offset in the query. This allows more strict phrase queries: strict on terms in same position (AND logic) but still sloppy. All tests pass, this is ready to go in (unless there are reservations). The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
Hey folks On Tue, May 3, 2011 at 6:49 PM, Michael McCandless luc...@mikemccandless.com wrote: Isn't our end goal here a bunch of well factored search modules? Ie, fast forward a year or two and I think we should have modules like these: I think we have two camps here (10k feet view): 1. wants to move towards modularization might support all the modules mike has listed below 2. wants to stick with Solr's current architecture and remain monolithic (not negative in this case) as much as possible I think we can meet somewhere in between and agree on certain module that should be available to lucene users as well. The ones I have in mind are primary search features like: - Faceting - Highlighting - Suggest - Function Query (consolidation is needed here!) - Analyzer factories things like distribution and replication should remain in solr IMO but might be moved to a more extensible API so that people can add their own implementation. I am thinking about things like the ZooKeeper support that might not be a good solution for everybody where folks have already JGroups infrastructure. So I think we can work towards 2 distinct goals. 1. extract common search features into modules 2. refactor solr to be more elastic / distributed and extensible with respect to those goals. maybe we can get agreement on such a basis though. let me know what you think simon * Faceting * Highlighting * Suggest (good patch is on LUCENE-2995) * Schema * Query impls * Query parsers * Analyzers (good progress here already, thanks Robert!), incl. factories/XML configuration (still need this) * Database import (DIH) * Web app * Distribution/replication * Doc set representations * Collapse/grouping * Caches * Similarity/scoring impls (BM25, etc.) * Codecs * Joins * Lucene core In this future, much of this code came from what is now Solr and Lucene, but we should freely and aggressively poach from other projects when appropriate (and license/provenance is OK). I keep seeing all these cool compressed int set projects popping up... surely these are useful for us. Solr poached a doc set impl from Nutch; probably there's other stuff to poach from Nutch, Mahout, etc. Katta's doing something sweet with distribution/replication; let's poach merge w/ Solr's approach. There are various facet impls out there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach merge with Solr's. Elastic Search has lots of cool stuff, too, under ASL2. All these external open-source projects are fair game for poaching and refactoring into shared modules, along with what is now Solr and Lucene sources. In this ideal future, Solr becomes the bundling and default/example configuration of the Web App and other modules, much like how the various Linux distros bundle different stuff together around the Linux kernel. And if you are an advanced app and don't need the webapp part, you can cherry pick the huper duper modules you do need and directly embedded into your app. Isn't this the future we are working towards? Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029229#comment-13029229 ] Shai Erera commented on LUCENE-3068: Patch looks good to me. One comment about the test - perhaps use the LTC methods that do random tests, like newDirectory(), newIndexWriterConfig() etc.? If you don't think it's appropriate for this test, that's ok with me. The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec
[ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029233#comment-13029233 ] Simon Willnauer commented on LUCENE-3070: - Robert patch looks great! some comments: * the simpletext nocommit should be a TODO IMO * for the preflex problem I think we need to add some infrastructure to add tests for 4.0 features somehow I will think about this * one problem we are having here is that our current implementation is somewhat wasteful. Currently on flush we pull a FieldsConsumer for every codec used in the indexing session (per DWPT) regardless if this field is indexed. so we are creating some unneeded files if you use one field for docvalues only. The other thing is that we need to somehow reset the FieldInfo#hasDocValues flag on an error when we are hitting non-aborting exceptions during indexing before we can actually create the corresponding consumer. That is something we should address in a spin-off issue I think. overall I think you should commit the current state and we work from here! Enable DocValues by default for every Codec --- Key: LUCENE-3070 URL: https://issues.apache.org/jira/browse/LUCENE-3070 Project: Lucene - Java Issue Type: Task Components: Index Affects Versions: CSF branch Reporter: Simon Willnauer Fix For: CSF branch Attachments: LUCENE-3070.patch Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3070) Enable DocValues by default for every Codec
[ https://issues.apache.org/jira/browse/LUCENE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029237#comment-13029237 ] Simon Willnauer commented on LUCENE-3070: - one more think I think preflex should throw UOE instead of returning null... At some point we should also think about a better name for Source, something like InMemoryDocValues or RamResidentDocValues - something more self speaking Enable DocValues by default for every Codec --- Key: LUCENE-3070 URL: https://issues.apache.org/jira/browse/LUCENE-3070 Project: Lucene - Java Issue Type: Task Components: Index Affects Versions: CSF branch Reporter: Simon Willnauer Fix For: CSF branch Attachments: LUCENE-3070.patch Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1064 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-docvalues-branch/1064/ No tests ran. Build Log (for compile errors): [...truncated 63 lines...] + cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout + JAVA_HOME=/home/hudson/tools/java/latest1.5 /home/hudson/tools/ant/latest1.7/bin/ant clean Buildfile: build.xml clean: clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build clean: clean: [echo] Building analyzers-common... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/common [echo] Building analyzers-icu... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/icu [echo] Building analyzers-phonetic... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/phonetic [echo] Building analyzers-smartcn... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/smartcn [echo] Building analyzers-stempel... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/stempel [echo] Building benchmark... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/benchmark/build clean-contrib: clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/analysis-extras/build [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/analysis-extras/lucene-libs clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/clustering/build clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/dataimporthandler/target clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/extraction/build clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/uima/build clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/build BUILD SUCCESSFUL Total time: 7 seconds + cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene + JAVA_HOME=/home/hudson/tools/java/latest1.5 /home/hudson/tools/ant/latest1.7/bin/ant compile compile-test build-contrib Buildfile: build.xml jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/classes/java [javac] Compiling 536 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/util/Version.java:80: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] public boolean onOrAfter(Version other) { [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/index/codecs/DefaultDocValuesConsumer.java:49: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:34: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getColumn(); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:41: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getLine(); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 1 error
Re: [JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1064 - Failure
I removed the @Override annotation on that file! simon On Thu, May 5, 2011 at 11:03 AM, Apache Jenkins Server hud...@hudson.apache.org wrote: Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-docvalues-branch/1064/ No tests ran. Build Log (for compile errors): [...truncated 63 lines...] + cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout + JAVA_HOME=/home/hudson/tools/java/latest1.5 /home/hudson/tools/ant/latest1.7/bin/ant clean Buildfile: build.xml clean: clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build clean: clean: [echo] Building analyzers-common... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/common [echo] Building analyzers-icu... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/icu [echo] Building analyzers-phonetic... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/phonetic [echo] Building analyzers-smartcn... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/smartcn [echo] Building analyzers-stempel... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/analysis/build/stempel [echo] Building benchmark... clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/modules/benchmark/build clean-contrib: clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/analysis-extras/build [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/analysis-extras/lucene-libs clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/clustering/build clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/dataimporthandler/target clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/extraction/build clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/contrib/uima/build clean: [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/solr/build BUILD SUCCESSFUL Total time: 7 seconds + cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene + JAVA_HOME=/home/hudson/tools/java/latest1.5 /home/hudson/tools/ant/latest1.7/bin/ant compile compile-test build-contrib Buildfile: build.xml jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/classes/java [javac] Compiling 536 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/util/Version.java:80: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] public boolean onOrAfter(Version other) { [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/index/codecs/DefaultDocValuesConsumer.java:49: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:34: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getColumn(); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:41: warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[jira] [Commented] (LUCENE-3073) make compoundfilewriter public
[ https://issues.apache.org/jira/browse/LUCENE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029251#comment-13029251 ] Michael McCandless commented on LUCENE-3073: +1 make compoundfilewriter public -- Key: LUCENE-3073 URL: https://issues.apache.org/jira/browse/LUCENE-3073 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Priority: Minor Attachments: LUCENE-3073.patch CompoundFileReader is public, but CompoundFileWriter is not. I propose we make it public + @lucene.internal instead (just in case someone else finds themselves wanting to manipulate cfs files) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3074) SimpleTextCodec needs SimpleText DocValues impl
SimpleTextCodec needs SimpleText DocValues impl --- Key: LUCENE-3074 URL: https://issues.apache.org/jira/browse/LUCENE-3074 Project: Lucene - Java Issue Type: Task Components: Index, Search Affects Versions: CSF branch Reporter: Simon Willnauer Assignee: Michael McCandless Fix For: CSF branch currently SimpleTextCodec uses binary docValues we should move that to a simple text impl. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3075) DocValues should be optionally be stored in a PerCodec CFS file to prevent too many files in the index
DocValues should be optionally be stored in a PerCodec CFS file to prevent too many files in the index -- Key: LUCENE-3075 URL: https://issues.apache.org/jira/browse/LUCENE-3075 Project: Lucene - Java Issue Type: Improvement Components: Index, Search Affects Versions: CSF branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: CSF branch Currently docvalues create one file per field to store the docvalues. Yet this could easily lead to too many open files so me might need to enable CFS per codec to keep the number of files reasonable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3073) make compoundfilewriter public
[ https://issues.apache.org/jira/browse/LUCENE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3073. - Resolution: Fixed Fix Version/s: 4.0 3.2 Committed revision 1099745, 1099747 (3x) make compoundfilewriter public -- Key: LUCENE-3073 URL: https://issues.apache.org/jira/browse/LUCENE-3073 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3073.patch CompoundFileReader is public, but CompoundFileWriter is not. I propose we make it public + @lucene.internal instead (just in case someone else finds themselves wanting to manipulate cfs files) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029274#comment-13029274 ] Doron Cohen commented on LUCENE-3068: - Thanks for reviewing Shai! I'll updated the patch with random newDirectory and newICFG - not the focus here, but may improve coverage anyhow, I added tests for the combined case - some AND some OR - that is, using MPQ, some add() with a single term (AND), some with an array longer than 1 (OR). Also refactored the tests a bit so that now there's a small test method for each test case. The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3068: Attachment: LUCENE-3068.patch Patch with more test cases - AND/OR logic for MPQ is combined, and test code made simpler. The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2458: -- Attachment: SOLR-2458.patch The solution was simple. Change the commit() method to do ?commit=true instead of posting as commit/ Also cleaned up dead meat, added a -Doptimize=yes option and accepts -h and --help in addition to -help post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar Attachments: SOLR-2458.patch SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2458: -- Attachment: SOLR-2458.patch This new patch bumps version number to 1.4 and adds examples to the help of how to post csv, json and pdf post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Labels: post.jar Attachments: SOLR-2458.patch, SOLR-2458.patch SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029315#comment-13029315 ] Michael McCandless commented on SOLR-2493: -- bq. Long term, i would love to see the custom config system we have replaced with something standard... like spring, or simly POJOs that are loaded (and saved!) via XStream. This is the bigger pile of work I was referring to. +1 I think XML is an poor configuration language. It's great for one computer to talk to another, but for files that humans may edit, it's bad -- too much stuff to type for the computer's benefit, too easy to make a silly mistake. I think something like Yaml is a better choice... this is what ElasticSearch uses, for example. And, while we're at it, I think we should make Solr's error checking brittle on startup: if anything is off about the configuration, the server refuses to start (see http://markmail.org/thread/ywkfmxjboyixkrjc). SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On May 5, 2011, at 4:15 AM, Simon Willnauer wrote: Hey folks On Tue, May 3, 2011 at 6:49 PM, Michael McCandless luc...@mikemccandless.com wrote: Isn't our end goal here a bunch of well factored search modules? Ie, fast forward a year or two and I think we should have modules like these: I think we have two camps here (10k feet view): I'd say 3 camps: 1. wants to move towards modularization might support all the modules mike has listed below 2. wants to stick with Solr's current architecture and remain monolithic (not negative in this case) as much as possible 3. Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work. I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it. I'm happy to review where/when I can. I think we can meet somewhere in between and agree on certain module that should be available to lucene users as well. The ones I have in mind are primary search features like: - Faceting Yeah, for instance, Bobo seems to have some interesting faceting implementations that are ASL, perhaps we can combine into this new faceting module. - Highlighting - Suggest - Function Query (consolidation is needed here!) - Analyzer factories +1. things like distribution and replication should remain in solr IMO but might be moved to a more extensible API so that people can add their own implementation. And, of course, all the web tier stuff (response writers, inputs, etc.) I am thinking about things like the ZooKeeper support that might not be a good solution for everybody where folks have already JGroups infrastructure. Or other similar solutions. I wonder about using a ZeroConf implementation that can do self-discovery. So I think we can work towards 2 distinct goals. 1. extract common search features into modules 2. refactor solr to be more elastic / distributed and extensible with respect to those goals. 3. Make it easier for Solr to be programmatically configured by decoupling the reading of schema.xml and solrconfig.xml from the code that actually contains the structures for the properties (IndexSchema and SolrConfig) maybe we can get agreement on such a basis though. let me know what you think I think it's reasonable. At the end of the day, it broadens the appeal of both Lucene and Solr. Solr still exists and is not just a shell and at the end of the day, remains the primary choice for people who don't want to stitch everything together themselves. All of it is easier to contribute to b/c people can focus in on the core area they know w/o having to know everything else per se. Stuff should be better tested b/c of it as well since it will receive broader use. That being said, and not to be discouraging, but I see it as a ton of work. simon * Faceting * Highlighting * Suggest (good patch is on LUCENE-2995) * Schema * Query impls * Query parsers * Analyzers (good progress here already, thanks Robert!), incl. factories/XML configuration (still need this) * Database import (DIH) * Web app * Distribution/replication * Doc set representations * Collapse/grouping * Caches * Similarity/scoring impls (BM25, etc.) * Codecs * Joins * Lucene core In this future, much of this code came from what is now Solr and Lucene, but we should freely and aggressively poach from other projects when appropriate (and license/provenance is OK). I keep seeing all these cool compressed int set projects popping up... surely these are useful for us. Solr poached a doc set impl from Nutch; probably there's other stuff to poach from Nutch, Mahout, etc. Katta's doing something sweet with distribution/replication; let's poach merge w/ Solr's approach. There are various facet impls out there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach merge with Solr's. Elastic Search has lots of cool stuff, too, under ASL2. All these external open-source projects are fair game for poaching and refactoring into shared modules, along with what is now Solr and Lucene sources. In this ideal future, Solr becomes the bundling and default/example configuration of the Web App and other modules, much like how the various Linux distros bundle different stuff together around the Linux kernel. And if you are an advanced app and don't need the webapp part, you can cherry pick the huper duper modules you do need and directly embedded into your app. Isn't this the future we are working towards? Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Improvements to the maven build
Hi Ryan, On 5/4/2011 at 7:14 PM, Ryan McKinley wrote: As a rule, everything should go through JIRA on its way to svn -- this is important so that we have somewhere to point for why we did things. Even small things. Your phrase As a rule provides wiggle room that we all use. Even small things. Um, I don't think so. E.g. no-one is going to go through JIRA for a small typo fix. This judgment about what's big enough to warrant a JIRA issue is one each committer has to make. As a result, this argument (David's patch should have gone through JIRA because everything should go through JIRA) doesn't work for me. With patches from contributors it is especially important they are added to JIRA because they need to grant the license to ASF. Also attachments are often stripped from mailing list archives, so down the road its really hard to know what happened. These are both excellent points. Non-trivial non-committer patches should definitely go through JIRA for these reasons. I understand the desire to keep maven support low key -- but we should do that with a good README in dev-tools. I agree that the Maven build should be documented - I plan on putting something together soon, as suggested by David. This seems completely orthogonal to me, though, to the question of using JIRA issues for Maven build changes. Even as officially non-official tools, it still gets into svn so we need a trail of where it came from and hopefully a log of why we thought it was important. I agree in principle, but again, I'll continue to use my own judgment about whether to use JIRA for small changes, especially to stuff under dev-tools/. Steve
Re: modularization discussion
On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote: 3. Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work. I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it. I'm happy to review where/when I can. +1. From what I have gathered, Grant and I come down pretty much on the same page on most of this stuff. Yeah, that mean's I'm reevaluating my position :) but seems to be the case. Except I'm more open to IRC discussion :) - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2897) apply delete-by-Term and docID immediately to newly flushed segments
[ https://issues.apache.org/jira/browse/LUCENE-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2897. Resolution: Fixed apply delete-by-Term and docID immediately to newly flushed segments Key: LUCENE-2897 URL: https://issues.apache.org/jira/browse/LUCENE-2897 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.2, 4.0 Attachments: LUCENE-2897.patch, LUCENE-2897.patch Spinoff from LUCENE-2324. When we flush deletes today, we keep them as buffered Term/Query/docIDs that need to be deleted. But, for a newly flushed segment (ie fresh out of the DWPT), this is silly, because during flush we visit all terms and we know their docIDs. So it's more efficient to apply the deletes (for this one segment) at that time. We still must buffer deletes for all prior segments, but these deletes don't need to map to a docIDUpto anymore; ie we just need a Set. This issue should wait until LUCENE-1076 is in since that issue cuts over buffered deletes to a transactional stream. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3076) add -Dtests.codecprovider
add -Dtests.codecprovider - Key: LUCENE-3076 URL: https://issues.apache.org/jira/browse/LUCENE-3076 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Currently to test a codec (or set of codecs) you have to add them to lucene's core and edit a couple of arrays here and there... It would be nice if when using the test-framework you could instead specify a codecprovider by classname (possibly containing your own set of huper-duper codecs). For example I made the following little codecprovider in contrib: {noformat} public class AppendingCodecProvider extends CodecProvider { public AppendingCodecProvider() { register(new AppendingCodec()); register(new SimpleTextCodec()); } } {noformat} Then, I'm able to run tests with 'ant -lib build/contrib/misc/lucene-misc-4.0-SNAPSHOT.jar test-core -Dtests.codecprovider=org.apache.lucene.index.codecs.appending.AppendingCodecProvider', and it always picks from my set of codecs (in this case Appending and SimpleText), and I can set -Dtests.codec=Appending if i want to set just one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3076) add -Dtests.codecprovider
[ https://issues.apache.org/jira/browse/LUCENE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3076: Attachment: LUCENE-3076.patch add -Dtests.codecprovider - Key: LUCENE-3076 URL: https://issues.apache.org/jira/browse/LUCENE-3076 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3076.patch Currently to test a codec (or set of codecs) you have to add them to lucene's core and edit a couple of arrays here and there... It would be nice if when using the test-framework you could instead specify a codecprovider by classname (possibly containing your own set of huper-duper codecs). For example I made the following little codecprovider in contrib: {noformat} public class AppendingCodecProvider extends CodecProvider { public AppendingCodecProvider() { register(new AppendingCodec()); register(new SimpleTextCodec()); } } {noformat} Then, I'm able to run tests with 'ant -lib build/contrib/misc/lucene-misc-4.0-SNAPSHOT.jar test-core -Dtests.codecprovider=org.apache.lucene.index.codecs.appending.AppendingCodecProvider', and it always picks from my set of codecs (in this case Appending and SimpleText), and I can set -Dtests.codec=Appending if i want to set just one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Thu, May 5, 2011 at 4:41 PM, Mark Miller markrmil...@gmail.com wrote: On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote: 3. Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work. I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it. I'm happy to review where/when I can. +1. From what I have gathered, Grant and I come down pretty much on the same page on most of this stuff. Yeah, that mean's I'm reevaluating my position :) but seems to be the case. so this is one thing I really don't understand. you say you are in the 3rd camp. Guys in that camp have not much time to do the work but still are not willing to sign up for what we want to modularize. Nobody asks you to do the work I only ask you to say ok I think this is good and NOT sitting in the way blocking others. This is really what the 3rd camp is about to me but maybe I miss-understand something here. Again you are saying you are not in camp 1 but you want to still fiddle around with long discussion before we get anything done (and eventually be against it - nothing personal) because you don't have enough time to fit stiff in your schedule. This makes no sense to me. That case by case stuff makes me sick. Lets put some goals out and say ok this makes sense in a module this doesn't and let folks work on it. We need some agreement here and I think we have written enough emails to make our points. I think we should agree on a set of things and once we are there we can talk again. Dreams vs. Babysteps! Lets settle on something now, today or next week and stop this wast of time. I am happy with an agreement that we don't factor anything out. all remains in solr but we need to move here! After all these discussion I don't have any motivation to work on it anyway. I think I need to step back for a while along those lines! simon Except I'm more open to IRC discussion :) - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1418) QueryParser can throw NullPointerException during parsing of some queries in case if default field passed to constructor is null
[ https://issues.apache.org/jira/browse/LUCENE-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029356#comment-13029356 ] David Smiley commented on LUCENE-1418: -- Ok, thanks for your attention Chris. I could have sworn I cornered the bug with my debugger the other day but at the moment I can't seem to reproduce it. It very well may be user error :-( -- a typo in AJAX-Solr which used *.* instead of *:*... probably it was that, in hindsight. QueryParser can throw NullPointerException during parsing of some queries in case if default field passed to constructor is null Key: LUCENE-1418 URL: https://issues.apache.org/jira/browse/LUCENE-1418 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.4 Environment: CentOS 5.2 (probably any applies) Reporter: Alexei Dets Priority: Minor In case if QueryParser was constructed using QueryParser(String f, Analyzer a) constructor and f equals null then QueryParser can fail with NullPointerException during parsing of some queries that _does_ contain field name but have unbalanced parenthesis. Example 1: Query: field:(expr1) expr2) Result: java.lang.NullPointerException at org.apache.lucene.index.Term.init(Term.java:50) at org.apache.lucene.index.Term.init(Term.java:36) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1324) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1168) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170) Example2: Query: field:(expr1) expr2) Result: java.lang.NullPointerException at org.apache.lucene.index.Term.init(Term.java:50) at org.apache.lucene.index.Term.init(Term.java:36) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:612) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1459) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1168) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170) Workaround: pass in constructor empty string as a default field name - in this case QueryParser.parse method will throw ParseException (expected result because query string is wrong) instead of NullPointerException. It is not obvious to me how to fix this so I'll describe my usecase, may be I'm doing something completely wrong. Basically I have a set of per-field queries entered by user and need to programmatically construct (after some preprocessing) one real Lucene query combined from these user-entered per-field subqueries. To achieve this I basically do the following (simplified a bit): QueryParser parser = new QueryParser(null, analyzer); // I'll always provide a field name in a query string as it is different each time and I don't have any default BooleanQuery query = new BooleanQuery(); Query subQuery1 = parser.parse(field1 + :( + queryString1 + ')'); query.add(subQuery1, operator1); // operator = BooleanClause.Occur.MUST, BooleanClause.Occur.MUST_NOT or BooleanClause.Occur.SHOULD Query subQuery2 = parser.parse(field2 + :( + queryString2 + ')'); query.add(subQuery2, operator2); Query subQuery3 = parser.parse(field3 + :( + queryString3 + ')'); query.add(subQuery3, operator3); ... IMHO either QueryParser constructor should be changed to throw NullPointerException/InvalidArgumentException in case of null field passed (and API documentation updated) or QueryParser.parse behavior should be fixed to correctly throw ParseException instead of NullPointerException. Also IMHO of a great help can be _public_ setField/getField methods of QueryParser (that set/get field), this can help in use cases like my: QueryParser parser = new QueryParser(null, analyzer); // or add constructor with analyzer _only_ for such cases BooleanQuery query = new BooleanQuery(); parser.setField(field1); Query subQuery1 = parser.parse(queryString1); query.add(subQuery1, operator1); parser.setField(field2); Query subQuery2 = parser.parse(queryString2); query.add(subQuery2, operator2); ... -- This message is automatically
[jira] [Resolved] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running
[ https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2904. Resolution: Fixed non-contiguous LogMergePolicy should be careful to not select merges already running Key: LUCENE-2904 URL: https://issues.apache.org/jira/browse/LUCENE-2904 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2904.patch Now that LogMP can do non-contiguous merges, the fact that it disregards which segments are already being merged is more problematic since it could result in it returning conflicting merges and thus failing to run multiple merges concurrently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3071: -- Attachment: LUCENE-3071.patch Proposed patch attached. Working against Lucene 3.1 (remove the {{path.length()}} last parameter to assert call). But I am having difficulties making the tests work against trunk ({{ant}} and {{ant test}} fail, at global scope). PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029369#comment-13029369 ] Robert Muir commented on LUCENE-3071: - bq. But I am having difficulties making the tests work against trunk (ant and ant test fail, at global scope). Can you provide more details about this? If possible stuff like ant version, whether you are using an svn checkout (and what the full path is), logs of what error messages, etc would be great. Feel free to open a new jira issue for these problems! PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2498) Stop consolidating boosts for values of multivalued fields
Stop consolidating boosts for values of multivalued fields -- Key: SOLR-2498 URL: https://issues.apache.org/jira/browse/SOLR-2498 Project: Solr Issue Type: Improvement Affects Versions: 3.1, 4.0, Next Reporter: Neil Hooey Currently, if you boost a value in a multivalued field during index time, the boosts are consolidated for every field, and the individual values are lost. So, for example, given a list of photos with a multivalue field keywords, and a boost for a keyword assigned to a photo corresponds to the number of times that photo was downloaded after searching for that particular keyword. {code} photo1: Photo of a cat by itself: keywords: [ cat:600 feline:100 ] = boost total = 700 photo2: Photo of a cat driving a truck: keywords: [ cat:100 feline:90 animal:80 truck:1000 ] = boost total = 1270 {code} If you search for cat feline, photo2 will rank higher, since the boost of cat-like words was consolidated for the truck boost anomoly to score a total of 1270. Whereas photo1, which has more cat feline downloads, only gets a score of 700, and ranks lower. *Intuitively the boosts should be separate, so only the boosts for the terms searched will be counted.* Given the current behaviour, you are forced to do one of the following: 1. Assemble all of the multi-values into a string, and use payloads in place of boosts. 2. Use dynamic fields, such as keyword_*, and boost them independently. Neither of these solutions are ideal, as using payloads requires writing your own BoostingTermQuery, and defining a new dynamic field per multi-value makes searching more difficult than with mutlivalued fields. There's a blog link that describes the current behaviour: http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On May 5, 2011, at 11:03 AM, Simon Willnauer wrote: On Thu, May 5, 2011 at 4:41 PM, Mark Miller markrmil...@gmail.com wrote: On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote: 3. Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work. I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it. I'm happy to review where/when I can. +1. From what I have gathered, Grant and I come down pretty much on the same page on most of this stuff. Yeah, that mean's I'm reevaluating my position :) but seems to be the case. so this is one thing I really don't understand. you say you are in the 3rd camp. Guys in that camp have not much time to do the work but still are not willing to sign up for what we want to modularize. I don't follow this leap. (BTW, I'm actually mostly in camp #1 and a little in camp #3, I just want to make sure, based on what I've read that all sides are represented. I like Mike's approach, but I also know it is a ton of work and details matter.) Nobody asks you to do the work I only ask you to say ok I think this is good and NOT sitting in the way blocking others. This is really what the 3rd camp is about to me but maybe I miss-understand something here. Again you are saying you are not in camp 1 but you want to still fiddle around with long discussion before we get anything done (and eventually be against it - nothing personal) I don't think that is what Mark is saying nor is it what camp #3 is saying. And I don't think we are fiddling w/ long discussions (it's only been a couple of days.) This is hugely important. We need consensus to move forward. because you don't have enough time to fit stiff in your schedule. This makes no sense to me. That case by case stuff makes me sick. Lets put some goals out and say ok this makes sense in a module this doesn't and let folks work on it. To me, the third camp is just saying the proof is in the pudding. If you want to refactor, then go for it. Just make sure everything still works, which of course I know people will (but part of that means actually running Solr, IMO). Perhaps, more importantly don't get mad that if I have only one day a week to work on Lucene/Solr that I spend it putting a specific feature in a specific place. Just because something can/should be modularized, doesn't mean that a person working in that area must do it before they add whatever they were working on. For instance, if and when function queries are a module, I will add to them there and be happy to do so. In the meantime, I will likely add to them in Solr if that is something I happen to be interested in at that time b/c I can certainly add a new function in a day, but I can't refactor the whole module _and_ add my new function in a day. In the end, I think we are in agreement (at least you and me), actually. To me, the best place to start on this is: 1. Function queries 2. Spatial 3. Faceting (In that order) -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2499) Index-time boosts for multivalue fields are consolidated
[ https://issues.apache.org/jira/browse/SOLR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Hooey updated SOLR-2499: - Description: Currently, if you boost a value in a multivalue field during index time, the boosts are consolidated for every field, and the individual values are lost. So, for example, given a list of photos with a multivalue field keywords, and a boost for a keyword assigned to a photo corresponds to the number of times that photo was downloaded after searching for that particular keyword, we have documents like this: {code} photo1: Photo of a cat by itself keywords: [ cat:600 feline:100 ] = boost total = 700 photo2: Photo of a cat driving a truck keywords: [ cat:100 feline:90 animal:80 truck:1000 ] = boost total = 1270 {code} If you search for cat feline, photo2 will rank higher, since the boost of cat-like words was consolidated with the truck boost anomaly. Whereas photo1, which has more downloads for cat and feline, ranks lower with a lower consolidated boost, even though the total boost for the relevant keywords is higher than for photo1. *Intuitively, the boosts should be separate, so only the boosts for the terms searched will be counted.* Given the current behaviour, you are forced to do one of the following: 1. Assemble all of the multi-values into a string, and use payloads in place of boosts. 2. Use dynamic fields, such as keyword_*, and boost them independently. Neither of these solutions are ideal, as using payloads requires writing your own BoostingTermQuery, and defining a new dynamic field per multi-value makes searching more difficult than with multivalue fields. There's a blog entry that describes the current behaviour: http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2 was: Currently, if you boost a value in a multivalue field during index time, the boosts are consolidated for every field, and the individual values are lost. So, for example, given a list of photos with a multivalue field keywords, and a boost for a keyword assigned to a photo corresponds to the number of times that photo was downloaded after searching for that particular keyword, we have documents like this: {code} photo1: Photo of a cat by itself keywords: [ cat:600 feline:100 ] = boost total = 700 photo2: Photo of a cat driving a truck keywords: [ cat:100 feline:90 animal:80 truck:1000 ] = boost total = 1270 {code} If you search for cat feline, photo2 will rank higher, since the boost of cat-like words was consolidated with the truck boost anomaly. Whereas photo1, which has more downloads for cat and feline, ranks lower with a lower consolidated boost. *Intuitively, the boosts should be separate, so only the boosts for the terms searched will be counted.* Given the current behaviour, you are forced to do one of the following: 1. Assemble all of the multi-values into a string, and use payloads in place of boosts. 2. Use dynamic fields, such as keyword_*, and boost them independently. Neither of these solutions are ideal, as using payloads requires writing your own BoostingTermQuery, and defining a new dynamic field per multi-value makes searching more difficult than with multivalue fields. There's a blog entry that describes the current behaviour: http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2 Index-time boosts for multivalue fields are consolidated Key: SOLR-2499 URL: https://issues.apache.org/jira/browse/SOLR-2499 Project: Solr Issue Type: Improvement Affects Versions: 3.1, 4.0, Next Reporter: Neil Hooey Labels: boost, multivalue, multivalued Currently, if you boost a value in a multivalue field during index time, the boosts are consolidated for every field, and the individual values are lost. So, for example, given a list of photos with a multivalue field keywords, and a boost for a keyword assigned to a photo corresponds to the number of times that photo was downloaded after searching for that particular keyword, we have documents like this: {code} photo1: Photo of a cat by itself keywords: [ cat:600 feline:100 ] = boost total = 700 photo2: Photo of a cat driving a truck keywords: [ cat:100 feline:90 animal:80 truck:1000 ] = boost total = 1270 {code} If you search for cat feline, photo2 will rank higher, since the boost of cat-like words was consolidated with the truck boost anomaly. Whereas photo1, which has more downloads for cat and feline, ranks lower with a lower consolidated boost, even though the total boost for the relevant keywords is higher than for photo1. *Intuitively, the boosts should be separate, so only the boosts for the terms searched will be counted.* Given the current behaviour, you are forced to do one of the following: 1. Assemble all of the
[jira] [Closed] (SOLR-2498) Stop consolidating boosts for values of multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Hooey closed SOLR-2498. Resolution: Duplicate Stop consolidating boosts for values of multivalued fields -- Key: SOLR-2498 URL: https://issues.apache.org/jira/browse/SOLR-2498 Project: Solr Issue Type: Improvement Reporter: Neil Hooey I accidentally double-submitted this bug when my browser crashed. Here is the real one: https://issues.apache.org/jira/browse/SOLR-2499 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2498) Stop consolidating boosts for values of multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Hooey updated SOLR-2498: - Labels: (was: boost multivalue multivalued) Description: I accidentally double-submitted this bug when my browser crashed. Here is the real one: https://issues.apache.org/jira/browse/SOLR-2499 was: Currently, if you boost a value in a multivalued field during index time, the boosts are consolidated for every field, and the individual values are lost. So, for example, given a list of photos with a multivalue field keywords, and a boost for a keyword assigned to a photo corresponds to the number of times that photo was downloaded after searching for that particular keyword. {code} photo1: Photo of a cat by itself: keywords: [ cat:600 feline:100 ] = boost total = 700 photo2: Photo of a cat driving a truck: keywords: [ cat:100 feline:90 animal:80 truck:1000 ] = boost total = 1270 {code} If you search for cat feline, photo2 will rank higher, since the boost of cat-like words was consolidated for the truck boost anomoly to score a total of 1270. Whereas photo1, which has more cat feline downloads, only gets a score of 700, and ranks lower. *Intuitively the boosts should be separate, so only the boosts for the terms searched will be counted.* Given the current behaviour, you are forced to do one of the following: 1. Assemble all of the multi-values into a string, and use payloads in place of boosts. 2. Use dynamic fields, such as keyword_*, and boost them independently. Neither of these solutions are ideal, as using payloads requires writing your own BoostingTermQuery, and defining a new dynamic field per multi-value makes searching more difficult than with mutlivalued fields. There's a blog link that describes the current behaviour: http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2 Affects Version/s: (was: Next) (was: 4.0) (was: 3.1) Stop consolidating boosts for values of multivalued fields -- Key: SOLR-2498 URL: https://issues.apache.org/jira/browse/SOLR-2498 Project: Solr Issue Type: Improvement Reporter: Neil Hooey I accidentally double-submitted this bug when my browser crashed. Here is the real one: https://issues.apache.org/jira/browse/SOLR-2499 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2499) Index-time boosts for multivalue fields are consolidated
[ https://issues.apache.org/jira/browse/SOLR-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Hooey updated SOLR-2499: - Comment: was deleted (was: Double-submission, oops! This issue is the canonical one.) Index-time boosts for multivalue fields are consolidated Key: SOLR-2499 URL: https://issues.apache.org/jira/browse/SOLR-2499 Project: Solr Issue Type: Improvement Affects Versions: 3.1, 4.0, Next Reporter: Neil Hooey Labels: boost, multivalue, multivalued Currently, if you boost a value in a multivalue field during index time, the boosts are consolidated for every field, and the individual values are lost. So, for example, given a list of photos with a multivalue field keywords, and a boost for a keyword assigned to a photo corresponds to the number of times that photo was downloaded after searching for that particular keyword, we have documents like this: {code} photo1: Photo of a cat by itself keywords: [ cat:600 feline:100 ] = boost total = 700 photo2: Photo of a cat driving a truck keywords: [ cat:100 feline:90 animal:80 truck:1000 ] = boost total = 1270 {code} If you search for cat feline, photo2 will rank higher, since the boost of cat-like words was consolidated with the truck boost anomaly. Whereas photo1, which has more downloads for cat and feline, ranks lower with a lower consolidated boost, even though the total boost for the relevant keywords is higher than for photo1. *Intuitively, the boosts should be separate, so only the boosts for the terms searched will be counted.* Given the current behaviour, you are forced to do one of the following: 1. Assemble all of the multi-values into a string, and use payloads in place of boosts. 2. Use dynamic fields, such as keyword_*, and boost them independently. Neither of these solutions are ideal, as using payloads requires writing your own BoostingTermQuery, and defining a new dynamic field per multi-value makes searching more difficult than with multivalue fields. There's a blog entry that describes the current behaviour: http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Consolodation of boosts on multivalued fields
Currently when you assign boosts to multivalue fields during index-time, they are consolidated, and the individual boosts are lost. There are some relevant cases where the individual boost values are important, so I'd like to fix this behaviour. I've created an issue here, which gives some examples: https://issues.apache.org/jira/browse/SOLR-2499 Do you have any ideas of where to get started with this fix, or have an idea of how difficult the fix might be? Thanks, - Neil - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3071: -- Attachment: ant.log.tar.bz2 I'm using Ubuntu 10.04.2 LTS. ant -version Apache Ant version 1.7.1 compiled on September 8 2010 I followed the wiki: http://wiki.apache.org/lucene-java/HowToContribute I used svn checkout http://svn.eu.apache.org/repos/asf/lucene/dev/trunk/ lucene-trunk. I'm working under revision 1099843 (yours). See ant log attached. PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2500) TestSolrCoreProperties sometimes fails with no such core: core0
TestSolrCoreProperties sometimes fails with no such core: core0 - Key: SOLR-2500 URL: https://issues.apache.org/jira/browse/SOLR-2500 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir [junit] Testsuite: org.apache.solr.client.solrj.embedded.TestSolrProperties [junit] Testcase: testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): Caused an ERROR [junit] No such core: core0 [junit] org.apache.solr.common.SolrException: No such core: core0 [junit] at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118) [junit] at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) [junit] at org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029393#comment-13029393 ] Robert Muir commented on LUCENE-3071: - Hi Olivier, thanks for uploading the log. This test fails for me sometimes too, somehow we should get to the bottom of it. I opened an issue: SOLR-2500 As a workaround, perhaps using 'ant clean test' will help... I fought with this test a little bit the other day and somehow 'clean' seemed to temporarily get the test passing... PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running
[ https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029403#comment-13029403 ] Earwin Burrfoot commented on LUCENE-2904: - I think we should simply change the API for MergePolicy. Instead of SegmentInfos it should accept a SetSegmentInfo with SIs eligible for merging (eg, completely written not elected for another merge). IW.getMergingSegments() is a damn cheat, and Expert notice is not an excuse! :) Why should each and every MP do the set substraction when IW can do it for them once and for all? non-contiguous LogMergePolicy should be careful to not select merges already running Key: LUCENE-2904 URL: https://issues.apache.org/jira/browse/LUCENE-2904 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2904.patch Now that LogMP can do non-contiguous merges, the fact that it disregards which segments are already being merged is more problematic since it could result in it returning conflicting merges and thus failing to run multiple merges concurrently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: SOLR-2497.patch MoreLikeThis problem solved, it was as I said. The test included a TrieInt field into the similarity fields, so it was used to calculate similarity. As with previous Solr the TrieField was invisible to MLT this had no effect. By the way: There is a commented out part with explicitely the MLT field, but I dont understand it. It seems that it was never understood/supported. Now, all numeric fields should work with MLT. Now only the TestDistributedSearch is still failing with a strange date failure. I'll dig. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, SOLR-2497.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: (was: SOLR-2497.patch) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Comment: was deleted (was: Ideally this could be done with the schema-like approach of one of the GSoC projects? We already discussed about that: We can use the FieldsReader/FieldsWriter type flag (which currently says, binary/text and compressed (unused now)) in the index file format to mark a field as NumericField. In that case, Document.getField() would return the NumericField instance. For Lucene backwards we should still support creating text-only fields. The new binary format would also be compatible with solr, as on getField, Solr would get a NumericField and can decide using instanceof what to do. Old Solr indexes without the NumericField marker flag would return as byte[], in which case, solr would do the decoding. For storing on index side, Solr could move to NumericField completely (I dont like the current approach using NumericTokenStream and to/fromInternal wrappers around conventional Field).) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Comment: was deleted (was: Patch against 3.x. I moved the to/from byte[] methods from Solr's TrieField into Lucene's NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's flags to know if the field is Numeric, and which type. I added a random test case to verify we now get the right NumericField back, when we stored NumericField during indexing. Old indices are handled fine (you'll get a String-ified Field back like you did before). Spookily, nothing failed in Solr... I assume there's somewhere in Solr that must now be fixed to handle the fact that a field can come back as NumericField? Anyone know where...?) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065
[ https://issues.apache.org/jira/browse/SOLR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2497: Attachment: SOLR-2497.patch MoreLikeThis problem solved, it was as I said. The test included a TrieInt field into the similarity fields, so it was used to calculate similarity. As with previous Solr the TrieField was invisible to MLT this had no effect. By the way: There is a commented out part with explicitely the MLT field, but I dont understand it. It seems that it was never understood/supported. Now, all numeric fields should work with MLT. Now only the TestDistributedSearch is still failing with a strange date failure. I'll dig. Move Solr to new NumericField stored field impl of LUCENE-3065 -- Key: SOLR-2497 URL: https://issues.apache.org/jira/browse/SOLR-2497 Project: Solr Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.2, 4.0 Attachments: SOLR-2497.patch, SOLR-2497.patch This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField Co would use NumericField for indexing and reading stored fields. To enable this some missing changes in Solr's internals (Field - Fieldable) need to be done. Also some backwards compatible stored fields parsing is needed to read pre-3.2 indexes without reindexing (as the format changed a little bit and Document.getFieldable returns NumericField instances now). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running
[ https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029408#comment-13029408 ] Earwin Burrfoot commented on LUCENE-2904: - Ok, I'm wrong. We need both a list of all SIs and eligible SIs for calculations. But that should be handled through API change, not a new public method on IW. non-contiguous LogMergePolicy should be careful to not select merges already running Key: LUCENE-2904 URL: https://issues.apache.org/jira/browse/LUCENE-2904 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2904.patch Now that LogMP can do non-contiguous merges, the fact that it disregards which segments are already being merged is more problematic since it could result in it returning conflicting merges and thus failing to run multiple merges concurrently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
Sorry, I did not want to delete this one, my huper duper browser gots totally confused and disturbed... - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler (JIRA) [mailto:j...@apache.org] Sent: Thursday, May 05, 2011 6:13 PM To: dev@lucene.apache.org Subject: [jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format) [ https://issues.apache.org/jira/browse/LUCENE- 3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Comment: was deleted (was: Ideally this could be done with the schema-like approach of one of the GSoC projects? We already discussed about that: We can use the FieldsReader/FieldsWriter type flag (which currently says, binary/text and compressed (unused now)) in the index file format to mark a field as NumericField. In that case, Document.getField() would return the NumericField instance. For Lucene backwards we should still support creating text-only fields. The new binary format would also be compatible with solr, as on getField, Solr would get a NumericField and can decide using instanceof what to do. Old Solr indexes without the NumericField marker flag would return as byte[], in which case, solr would do the decoding. For storing on index side, Solr could move to NumericField completely (I dont like the current approach using NumericTokenStream and to/fromInternal wrappers around conventional Field).) NumericField should be stored in binary format in index (matching Solr's format) -- -- Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE- 1701?focusedCommentId=127 21972page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tab panel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029410#comment-13029410 ] Uwe Schindler commented on LUCENE-3065: --- Sorry my browser or JIRA deleted wrong comments, so I removed one from me and one from Mike :( - Sorry. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Comment: was deleted (was: MoreLikeThis problem solved, it was as I said. The test included a TrieInt field into the similarity fields, so it was used to calculate similarity. As with previous Solr the TrieField was invisible to MLT this had no effect. By the way: There is a commented out part with explicitely the MLT field, but I dont understand it. It seems that it was never understood/supported. Now, all numeric fields should work with MLT. Now only the TestDistributedSearch is still failing with a strange date failure. I'll dig.) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029412#comment-13029412 ] Uwe Schindler commented on LUCENE-3065: --- Patch against 3.x. I moved the to/from byte[] methods from Solr's TrieField into Lucene's NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's flags to know if the field is Numeric, and which type. I added a random test case to verify we now get the right NumericField back, when we stored NumericField during indexing. Old indices are handled fine (you'll get a String-ified Field back like you did before). Spookily, nothing failed in Solr... I assume there's somewhere in Solr that must now be fixed to handle the fact that a field can come back as NumericField? Anyone know where...?) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029412#comment-13029412 ] Uwe Schindler edited comment on LUCENE-3065 at 5/5/11 4:22 PM: --- Revert of deletion of Mike's first comment (sorry) {quote} Patch against 3.x. I moved the to/from byte[] methods from Solr's TrieField into Lucene's NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's flags to know if the field is Numeric, and which type. I added a random test case to verify we now get the right NumericField back, when we stored NumericField during indexing. Old indices are handled fine (you'll get a String-ified Field back like you did before). Spookily, nothing failed in Solr... I assume there's somewhere in Solr that must now be fixed to handle the fact that a field can come back as NumericField? Anyone know where...?) {quote} was (Author: thetaphi): Patch against 3.x. I moved the to/from byte[] methods from Solr's TrieField into Lucene's NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's flags to know if the field is Numeric, and which type. I added a random test case to verify we now get the right NumericField back, when we stored NumericField during indexing. Old indices are handled fine (you'll get a String-ified Field back like you did before). Spookily, nothing failed in Solr... I assume there's somewhere in Solr that must now be fixed to handle the fact that a field can come back as NumericField? Anyone know where...?) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029413#comment-13029413 ] Olivier Favre commented on LUCENE-3071: --- {{ant clean test}} did it for me, thanks! As for the failing tests, it is because of the {{finalOffset}} that I set to {{path.length()}}. I'm not sure whether I should use {{path.length()}}, as my tokens don't go up to there when using the reverse mode. When I take a look at the the end() function, I think that I should set it to the end of the string. But I can't see it on the javadoc. If the purpose of the {{finalOffset}} parameter in {{assertTokenStreamContents()}} it to make sure of the {{endOffset}} of the last term, then I should not use {{path.length()}} blindly when using reverse and skip. Can you help me with the purpose of {{finalOffset}}? Or can I simply skip it in my tests (they are working if I skip it)? Thanks PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029416#comment-13029416 ] Robert Muir commented on LUCENE-3071: - bq. Can you help me with the purpose of finalOffset? Or can I simply skip it in my tests (they are working if I skip it)? The finalOffset is supposed to be the offset of the entire document, this is useful so that offsets are correct on multivalued fields. Example multivalued field foo with two values: bar -- this one ends with a space baz With a whitespace tokenizer, value 1 will have a single token bar with startOffset=0, endOffset=3. But, finalOffset needs to be 4 (essentially however many chars you read in from the Reader) This way, the offsets will then accumulate correctly for baz. PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029421#comment-13029421 ] Earwin Burrfoot commented on LUCENE-3065: - It's sad NumericFields are hardbaked into index format. Eg - I have some fields that are similar to Numeric in that they are 'stringified' binary structures, and they can't become first-class in the same manner as Numeric. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029427#comment-13029427 ] Uwe Schindler commented on LUCENE-3065: --- Earwin: The long-term plan for flexible indexing is to make also stored fields flexible. For now its not possible, so NumericFields are handled separately. In the future, this might be a stored fields codec. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029426#comment-13029426 ] Jan Høydahl commented on SOLR-2493: --- +1 @Michael, agree on this. But instead of relying on a monolithic solrconfig.xml file or .yml file, isn't it better to re-design configuration to fit a path/node concept more fine-grained (like ZK nodes)? It doesn't feel quite right to store solrconfig.xml and schema.xml as a huge string in the SolrCloud ZK schema. It would be better to have stuff like /solr/configs/configA/general/abortOnConfigurationError=false as a separate config node. Likewise /solr/configs/configA/schema/types/text_en to define fieldType text_en. The config concept won't need to be bound to ZK either. There could be pluggable backend implementations, where one could read/write the existing XML formats. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors FSDir.open)
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029430#comment-13029430 ] Michael McCandless commented on LUCENE-1877: Uggh, sorry about that Greg. Somehow this obviously very important note was lost in this issue. Can you describe how you use NFS and Lucene? Is there a single machine writing to the NFS dir, or more than one? Use NativeFSLockFactory as default for new API (direct ctors FSDir.open) -- Key: LUCENE-1877 URL: https://issues.apache.org/jira/browse/LUCENE-1877 Project: Lucene - Java Issue Type: Improvement Components: Javadocs Reporter: Mark Miller Assignee: Uwe Schindler Fix For: 2.9 Attachments: LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch A user requested we add a note in IndexWriter alerting the availability of NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm exit). Seems reasonable to me - we want users to be able to easily stumble upon this class. The below code looks like a good spot to add a note - could also improve whats there a bit - opening an IndexWriter does not necessarily create a lock file - that would depend on the LockFactory used. {code} pOpening an codeIndexWriter/code creates a lock file for the directory in use. Trying to open another codeIndexWriter/code on the same directory will lead to a {@link LockObtainFailedException}. The {@link LockObtainFailedException} is also thrown if an IndexReader on the same directory is used to delete documents from the index./p{code} Anyone remember why NativeFSLockFactory is not the default over SimpleFSLockFactory? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
+1 to Mike's proposal here. Each of these could easily be patches/issues. The top ones would probably be the basics, eg, faceting and schemas. As the easiest short term solution for allowing other systems to use Solr or it's features, it would be great if a 'committer' responded to SOLR-1431. Eg, it's assigned to someone and they should respond. The issue should probably be unassigned or assigned to someone else. Lucene is a great project that many people rely on. Refactoring Solr will help the project by allowing more people to do more things with Lucene. That's an overall 'good' thing for everyone. Also have we lost the ability to execute distributed queries in Lucene? Taking a step back I'd ask some of the owners of the projects mentioned why they do not simply submit patches directly to the Apache Lucene project as opposed to starting their own external projects? On Tue, May 3, 2011 at 9:49 AM, Michael McCandless luc...@mikemccandless.com wrote: Isn't our end goal here a bunch of well factored search modules? Ie, fast forward a year or two and I think we should have modules like these: * Faceting * Highlighting * Suggest (good patch is on LUCENE-2995) * Schema * Query impls * Query parsers * Analyzers (good progress here already, thanks Robert!), incl. factories/XML configuration (still need this) * Database import (DIH) * Web app * Distribution/replication * Doc set representations * Collapse/grouping * Caches * Similarity/scoring impls (BM25, etc.) * Codecs * Joins * Lucene core In this future, much of this code came from what is now Solr and Lucene, but we should freely and aggressively poach from other projects when appropriate (and license/provenance is OK). I keep seeing all these cool compressed int set projects popping up... surely these are useful for us. Solr poached a doc set impl from Nutch; probably there's other stuff to poach from Nutch, Mahout, etc. Katta's doing something sweet with distribution/replication; let's poach merge w/ Solr's approach. There are various facet impls out there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach merge with Solr's. Elastic Search has lots of cool stuff, too, under ASL2. All these external open-source projects are fair game for poaching and refactoring into shared modules, along with what is now Solr and Lucene sources. In this ideal future, Solr becomes the bundling and default/example configuration of the Web App and other modules, much like how the various Linux distros bundle different stuff together around the Linux kernel. And if you are an advanced app and don't need the webapp part, you can cherry pick the huper duper modules you do need and directly embedded into your app. Isn't this the future we are working towards? Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029433#comment-13029433 ] Michael McCandless commented on SOLR-2493: -- Jan, I don't have any experience with ZooKeeper, but that sounds neat :) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running
[ https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029435#comment-13029435 ] Michael McCandless commented on LUCENE-2904: Earwin, that sounds great (changing current API instead of new IW method), I think? Can you open a new issue? Thanks. non-contiguous LogMergePolicy should be careful to not select merges already running Key: LUCENE-2904 URL: https://issues.apache.org/jira/browse/LUCENE-2904 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2904.patch Now that LogMP can do non-contiguous merges, the fact that it disregards which segments are already being merged is more problematic since it could result in it returning conflicting merges and thus failing to run multiple merges concurrently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3071: -- Attachment: LUCENE-3071.patch I fixed my code accordingly. Tests run fine now. Ready to ship? PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029444#comment-13029444 ] Olivier Favre edited comment on LUCENE-3071 at 5/5/11 5:19 PM: --- Fixed patch attached. Tests run fine now. Ready to ship? was (Author: ofavre): I fixed my code accordingly. Tests run fine now. Ready to ship? PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Bug in boilerpipe 1.1.0 referenced from solr-cell
Andrew, you can get to Boilerplate author's email address on http://code.google.com/p/boilerpipe/ Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Andrew Bisson andrew.bis...@gossinteractive.com To: dev@lucene.apache.org Sent: Wed, May 4, 2011 7:48:07 AM Subject: Bug in boilerpipe 1.1.0 referenced from solr-cell Solr-cell references boilerpipe 1.1.0 which contains a modified version of nekohtml 1.9.9. It seems that this version of nekohtml is broken in that it references the class LostText without including it. The unmodified release of nekohtml 1.9.9 does not reference or include this class and the latest release, 1.9.14, both references and includes it. As a result, our application has been broken because it independently uses nekohtml and is now finding a broken version of the jar. How should I report this issue as it is not directly a bug in solr? Andrew Le Couteur Bisson Senior Software Engineer GOSS Interactive t: 0844 880 3637 f: 0844 880 3638 e: andrew.bis...@gossinteractive.com w:www.gossinteractive.com Have you registered for our e-Newsletter? www.gossinteractive.com/newsletter Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, Plymouth, PL1 1LG. Company Registration No: 3553908 This email contains proprietary information, some or all of which may be legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on this email. Email transmission cannot be guaranteed to be secure or error free, as information may be intercepted, corrupted, lost, destroyed, arrive late or incomplete or contain viruses. This email and any files attached to it have been checked with virus detection software before transmission. You should nonetheless carry out your own virus check before opening any attachment. GOSS Interactive Ltd accepts no liability for any loss or damage that may be caused by software viruses.
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029471#comment-13029471 ] Robert Muir commented on LUCENE-3071: - Hi Olivier, at a glance the patch looks really great to me, thanks! I took a quick look (not in enough detail) but had these thoughts, neither of which I think are really mandatory for this feature, just ideas: * do you think it would be cleaner if we made it a separate tokenizer? (e.g. ReversePath). The main logic of the tokenizer seems to be completely split, depending on whether you are reversing or not. * i think its possible in the future we could simplify the way finalOffset is being tracked, such that we just accumulate it on every read(), and then correctOffset a single time in end(). (i don't think this has much to do with your patch, just looking at the code in general). PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2918) IndexWriter should prune 100% deleted segs even in the NRT case
[ https://issues.apache.org/jira/browse/LUCENE-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2918. Resolution: Fixed IndexWriter should prune 100% deleted segs even in the NRT case --- Key: LUCENE-2918 URL: https://issues.apache.org/jira/browse/LUCENE-2918 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2918.patch We now prune 100% deleted segs on commit from IW or IR (LUCENE-2010), but this isn't quite aggressive enough, because in the NRT case you rarely call commit. Instead, the moment we delete the last doc of a segment, it should be pruned from the in-memory segmentInfos. This way, if you open an NRT reader, or a merge kicks off, or commit is called, the 100% deleted segment is already gone. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: jira issues falling off the radar -- Next JIRA version
: We should definitely kill of Next ... i would suggest just removing it, : and not bulk applying a new version (there is no requirement that issues : have a version) ... : Based on that, I think it would be irresponsible to just delete Next : because any issues assigned to this version on the basis of that : description (like SOLR-2191) is going to be dropped on the floor. Of course you're right ... i was thinking about it from a what's the minimum that must be done in order to eliminate a version, but that doesn't mean it would leave those issues in a good state. Doing a little more reading about Jira version management, I realized that Jira allows Versions to be merged I suggest we marge Next into 3.2 ... http://confluence.atlassian.com/display/JIRA/Managing+Versions#ManagingVersions-Mergingmultipleversions ...objections? -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: jira issues falling off the radar -- Next JIRA version
On Thu, May 5, 2011 at 3:24 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I suggest we marge Next into 3.2 ... +1 Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: jira issues falling off the radar -- Next JIRA version
Marge away ;-) On May 5, 2011, at 3:24 PM, Chris Hostetter wrote: I suggest we marge Next into 3.2 ... http://confluence.atlassian.com/display/JIRA/Managing+Versions#ManagingVersions-Mergingmultipleversions ...objections? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 7757 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7757/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testHealthyness Error Message: flushingQueue: DWDQ: [ generation: 9 ] currentqueue: DWDQ: [ generation: 10 ] perThread queue: DWDQ: [ generation: 0 ] numDocsInRam: 3 Stack Trace: junit.framework.AssertionFailedError: flushingQueue: DWDQ: [ generation: 9 ] currentqueue: DWDQ: [ generation: 10 ] perThread queue: DWDQ: [ generation: 0 ] numDocsInRam: 3 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) at org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:326) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:500) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2622) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2599) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1051) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1015) at org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testHealthyness(TestFlushByRamOrCountsPolicy.java:276) Build Log (for compile errors): [...truncated 3370 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: jira issues falling off the radar -- Next JIRA version
+1 - next should be nuked, the issues should simply be plopped into the next likely release and dealt with (done, moved, pushed) before release. On May 5, 2011, at 3:24 PM, Chris Hostetter wrote: : We should definitely kill of Next ... i would suggest just removing it, : and not bulk applying a new version (there is no requirement that issues : have a version) ... : Based on that, I think it would be irresponsible to just delete Next : because any issues assigned to this version on the basis of that : description (like SOLR-2191) is going to be dropped on the floor. Of course you're right ... i was thinking about it from a what's the minimum that must be done in order to eliminate a version, but that doesn't mean it would leave those issues in a good state. Doing a little more reading about Jira version management, I realized that Jira allows Versions to be merged I suggest we marge Next into 3.2 ... http://confluence.atlassian.com/display/JIRA/Managing+Versions#ManagingVersions-Mergingmultipleversions ...objections? -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029533#comment-13029533 ] Ryan McKinley commented on LUCENE-3071: --- bq. do you think it would be cleaner if we made it a separate tokenizer? I think its a tossup -- having keeping them together makes one less factory in solr (not much of an argument) and the other three parameters (delimiter,replacement,skip) are nice to keep consistent. PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated LUCENE-3071: -- Attachment: LUCENE-3071.patch updated patch that includes solr factory. Robert if this looks ok to you, i will go ahead and commit PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029540#comment-13029540 ] Robert Muir commented on LUCENE-3071: - bq. having keeping them together makes one less factory in solr (not much of an argument) I don't understand this? You can still have one solr factory, if reverse=true it creates ReverseXXX... PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 7757 - Failure
the actual exception we are tripping here is java.lang.RuntimeException: java.lang.AssertionError [junit] at org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:328) [junit] Caused by: java.lang.AssertionError [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.setFlushPending(DocumentsWriterFlushControl.java:169) [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.internalTryCheckOutForFlush(DocumentsWriterFlushControl.java:202) [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:333) [junit] at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:500) [junit] at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2622) [junit] at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2599) [junit] at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2465) [junit] at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2538) [junit] at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2520) [junit] at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2504) [junit] at org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:326) [junit] *** Thread: Thread-106 *** I will take care of it tomorrow... On Thu, May 5, 2011 at 9:45 PM, Apache Jenkins Server hud...@hudson.apache.org wrote: Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7757/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testHealthyness Error Message: flushingQueue: DWDQ: [ generation: 9 ] currentqueue: DWDQ: [ generation: 10 ] perThread queue: DWDQ: [ generation: 0 ] numDocsInRam: 3 Stack Trace: junit.framework.AssertionFailedError: flushingQueue: DWDQ: [ generation: 9 ] currentqueue: DWDQ: [ generation: 10 ] perThread queue: DWDQ: [ generation: 0 ] numDocsInRam: 3 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) at org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:326) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:500) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2622) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2599) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1051) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1015) at org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testHealthyness(TestFlushByRamOrCountsPolicy.java:276) Build Log (for compile errors): [...truncated 3370 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors FSDir.open)
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029599#comment-13029599 ] Greg Tarr commented on LUCENE-1877: --- Instances of lucene run on machines with the indexes hosted remotely on a SAN with access through a fileserver. We've now changed our implementation to SimpleFSLockFactory in the hope this will lead to the write.lock files behaving properly. Use NativeFSLockFactory as default for new API (direct ctors FSDir.open) -- Key: LUCENE-1877 URL: https://issues.apache.org/jira/browse/LUCENE-1877 Project: Lucene - Java Issue Type: Improvement Components: Javadocs Reporter: Mark Miller Assignee: Uwe Schindler Fix For: 2.9 Attachments: LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch A user requested we add a note in IndexWriter alerting the availability of NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm exit). Seems reasonable to me - we want users to be able to easily stumble upon this class. The below code looks like a good spot to add a note - could also improve whats there a bit - opening an IndexWriter does not necessarily create a lock file - that would depend on the LockFactory used. {code} pOpening an codeIndexWriter/code creates a lock file for the directory in use. Trying to open another codeIndexWriter/code on the same directory will lead to a {@link LockObtainFailedException}. The {@link LockObtainFailedException} is also thrown if an IndexReader on the same directory is used to delete documents from the index./p{code} Anyone remember why NativeFSLockFactory is not the default over SimpleFSLockFactory? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029607#comment-13029607 ] Hoss Man commented on LUCENE-3071: -- bq. You can still have one solr factory, if reverse=true it creates ReverseXXX... right ... if it makes the code cleaner to have two distinct Tokenizer impls, they can still share one factory. PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7762 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7762/ 1 tests failed. FAILED: org.apache.lucene.util.TestStringIntern.Monitor file (/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/backwards/test/1/junitvmwatcher8452078351423411177.properties) missing, location not writable, testcase not started or mixing ant versions? Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at java.lang.Thread.run(Thread.java:636) Build Log (for compile errors): [...truncated 53 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Improvements to the maven build
I agree in principle, but again, I'll continue to use my own judgment ... This is always good policy! I was mostly reacting to sending a patch to the mailing list. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065.patch Updated patch with some improvements: - NumericField now lazy inits the NumericTokenStream only when tokenStreamValue() is caled for the first time. This speeds up stored fields reading, as the TokenStream is generally not needed in that case. - I currently dont like the instanceof chains in FieldsWriter and this lazy init code. Maybe NumericField and NumericTokenStream should define an enum type for the value so you can call NumericField.getValueType() - does anybody have a better idea? - Improved JavaDocs for NumericField to reflect the new stored fields format NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065
[ https://issues.apache.org/jira/browse/SOLR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2497: Attachment: SOLR-2497.patch Updated patch (some improvements in TrieField converter methods). Still distributed numeric facetting (TestDistributedSearch) fails for trie dates - i have no idea why! *I need help!* Move Solr to new NumericField stored field impl of LUCENE-3065 -- Key: SOLR-2497 URL: https://issues.apache.org/jira/browse/SOLR-2497 Project: Solr Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.2, 4.0 Attachments: SOLR-2497.patch, SOLR-2497.patch, SOLR-2497.patch This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField Co would use NumericField for indexing and reading stored fields. To enable this some missing changes in Solr's internals (Field - Fieldable) need to be done. Also some backwards compatible stored fields parsing is needed to read pre-3.2 indexes without reindexing (as the format changed a little bit and Document.getFieldable returns NumericField instances now). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029645#comment-13029645 ] Robert Muir commented on LUCENE-3071: - this looks great! PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1076) Allow MergePolicy to select non-contiguous merges
[ https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1076. Resolution: Fixed Thanks Shai! Allow MergePolicy to select non-contiguous merges - Key: LUCENE-1076 URL: https://issues.apache.org/jira/browse/LUCENE-1076 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-1076-3x.patch, LUCENE-1076.patch, LUCENE-1076.patch, LUCENE-1076.patch I started work on this but with LUCENE-1044 I won't make much progress on it for a while, so I want to checkpoint my current state/patch. For backwards compatibility we must leave the default MergePolicy as selecting contiguous merges. This is necessary because some applications rely on temporal monotonicity of doc IDs, which means even though merges can re-number documents, the renumbering will always reflect the order in which the documents were added to the index. Still, for those apps that do not rely on this, we should offer a MergePolicy that is free to select the best merges regardless of whether they are continuguous. This requires fixing IndexWriter to accept such a merge, and, fixing LogMergePolicy to optionally allow it the freedom to do so. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2966) SegmentReader.doCommit should be sync'd; norms methods need not be sync'd
[ https://issues.apache.org/jira/browse/LUCENE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2966. Resolution: Fixed SegmentReader.doCommit should be sync'd; norms methods need not be sync'd - Key: LUCENE-2966 URL: https://issues.apache.org/jira/browse/LUCENE-2966 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.2, 4.0 Attachments: LUCENE-2966.patch I fixed the failure in TestNRTThreads, but in the process tripped an assert because SegmentReader.doCommit isn't sync'd. So I sync'd it, but I don't think the norms APIs need to be sync'd -- we populate norms up front and then never change them. Un-sync'ing them is important so that in the NRT case calling IW.commit doesn't block searches trying to pull norms. Also some small code refactoring. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize
[ https://issues.apache.org/jira/browse/LUCENE-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-854. --- Resolution: Fixed Create merge policy that doesn't periodically inadvertently optimize Key: LUCENE-854 URL: https://issues.apache.org/jira/browse/LUCENE-854 Project: Lucene - Java Issue Type: New Feature Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-854.patch The current merge policy, at every maxBufferedDocs * power-of-mergeFactor docs added, will do a fully cascaded merge, which is the same as an optimize. I think this is not good because at that optimization poin, the particular addDocument call is [surprisingly] very expensive. While, amortized over all addDocument calls, the cost is low, the cost is paid up front and in a very bunched up manner. I think of this as pay it forward: you are paying the full cost of an optimize right now on the expectation / hope that you will be adding a great many more docs. But, if you don't add that many more docs, then, the amortized cost for your index is in fact far higher than it should have been. Better to pay as you go instead. So we could make a small change to the policy by only merging the first mergeFactor segments once we hit 2X the merge factor. With mergeFactor=10, when we have created the 20th level 0 (just flushed) segment, we merge the first 10 into a level 1 segment. Then on creating another 10 level 0 segments, we merge the second set of 10 level 0 segments into a level 1 segment, etc. With this new merge policy, an index that's a bit bigger than a current optimization point would then have a lower amortized cost per document. Plus the merge cost is less bunched up and less pay it forward: instead you pay for what you are actually using. We can start by creating this merge policy (probably, combined with with the by size not by doc count segment level computation from LUCENE-845) and then later decide whether we should make it the default merge policy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3051) don't call SegmentInfo.sizeInBytes for the merging segments
[ https://issues.apache.org/jira/browse/LUCENE-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3051. Resolution: Fixed don't call SegmentInfo.sizeInBytes for the merging segments --- Key: LUCENE-3051 URL: https://issues.apache.org/jira/browse/LUCENE-3051 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3051.patch Selckin has been running Lucene's tests on the RT branch, and hit this: {noformat} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testDeleteAllSlowly(org.apache.lucene.index.TestIndexWriter): FAILED [junit] Some threads threw uncaught exceptions! [junit] junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:535) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) [junit] [junit] [junit] Tests run: 67, Failures: 1, Errors: 0, Time elapsed: 38.357 sec [junit] [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testDeleteAllSlowly -Dtests.seed=-4291771462012978364:4550117847390778918 [junit] The following exceptions were thrown by threads: [junit] *** Thread: Lucene Merge Thread #1 *** [junit] org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: _4_1.del [junit] at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507) [junit] at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472) [junit] Caused by: java.io.FileNotFoundException: _4_1.del [junit] at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:290) [junit] at org.apache.lucene.store.MockDirectoryWrapper.fileLength(MockDirectoryWrapper.java:549) [junit] at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:287) [junit] at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3280) [junit] at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2956) [junit] at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379) [junit] at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447) [junit] NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, f6=Pulsing(freqCutoff=15), f7=MockFixedIntBlock(blockSize=1606), f8=SimpleText, f9=MockSep, f1=MockVariableIntBlock(baseBlockSize=99), f0=MockFixedIntBlock(blockSize=1606), f3=Pulsing(freqCutoff=15), f2=MockSep, f5=SimpleText, f4=Standard, f=MockFixedIntBlock(blockSize=1606), c=MockSep, termVector=MockRandom, d9=MockFixedIntBlock(blockSize=1606), d8=Pulsing(freqCutoff=15), d5=SimpleText, d4=Standard, d7=MockRandom, d6=MockVariableIntBlock(baseBlockSize=99), d25=MockRandom, d0=MockRandom, c29=MockFixedIntBlock(blockSize=1606), d24=MockVariableIntBlock(baseBlockSize=99), d1=Standard, c28=Standard, d23=SimpleText, d2=MockFixedIntBlock(blockSize=1606), c27=MockRandom, d22=Standard, d3=MockVariableIntBlock(baseBlockSize=99), d21=Pulsing(freqCutoff=15), d20=MockSep, c22=MockFixedIntBlock(blockSize=1606), c21=Pulsing(freqCutoff=15), c20=MockRandom, d29=MockFixedIntBlock(blockSize=1606), c26=Standard, d28=Pulsing(freqCutoff=15), c25=MockRandom, d27=MockRandom, c24=MockSep, d26=MockVariableIntBlock(baseBlockSize=99), c23=SimpleText, e9=MockRandom, e8=MockSep, e7=SimpleText, e6=MockFixedIntBlock(blockSize=1606), e5=Pulsing(freqCutoff=15), c17=MockFixedIntBlock(blockSize=1606), e3=Standard, d12=MockVariableIntBlock(baseBlockSize=99), c16=Pulsing(freqCutoff=15), e4=SimpleText, d11=MockFixedIntBlock(blockSize=1606), c19=MockSep, e1=MockSep, d14=Pulsing(freqCutoff=15), c18=SimpleText, e2=Pulsing(freqCutoff=15), d13=MockSep, e0=MockVariableIntBlock(baseBlockSize=99), d10=Standard, d19=MockVariableIntBlock(baseBlockSize=99), c11=SimpleText, c10=Standard, d16=Pulsing(freqCutoff=15), c13=MockRandom, c12=MockVariableIntBlock(baseBlockSize=99), d15=MockSep, d18=SimpleText, c15=MockFixedIntBlock(blockSize=1606), d17=Standard, c14=Pulsing(freqCutoff=15), b3=MockSep, b2=SimpleText, b5=Standard, b4=MockRandom,
[jira] [Commented] (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors FSDir.open)
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029654#comment-13029654 ] Michael McCandless commented on LUCENE-1877: But multiple machines are able to write to the same index on the SAN? (And must therefore rely on write.lock to protect the index from two writers at once). What corruption are you seeing...? Use NativeFSLockFactory as default for new API (direct ctors FSDir.open) -- Key: LUCENE-1877 URL: https://issues.apache.org/jira/browse/LUCENE-1877 Project: Lucene - Java Issue Type: Improvement Components: Javadocs Reporter: Mark Miller Assignee: Uwe Schindler Fix For: 2.9 Attachments: LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch, LUCENE-1877.patch A user requested we add a note in IndexWriter alerting the availability of NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm exit). Seems reasonable to me - we want users to be able to easily stumble upon this class. The below code looks like a good spot to add a note - could also improve whats there a bit - opening an IndexWriter does not necessarily create a lock file - that would depend on the LockFactory used. {code} pOpening an codeIndexWriter/code creates a lock file for the directory in use. Trying to open another codeIndexWriter/code on the same directory will lead to a {@link LockObtainFailedException}. The {@link LockObtainFailedException} is also thrown if an IndexReader on the same directory is used to delete documents from the index./p{code} Anyone remember why NativeFSLockFactory is not the default over SimpleFSLockFactory? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3076) add -Dtests.codecprovider
[ https://issues.apache.org/jira/browse/LUCENE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029655#comment-13029655 ] Michael McCandless commented on LUCENE-3076: +1 this is great! This means a codec writer can easily run all of Lucene/Solr's tests against his/her codec(s)... add -Dtests.codecprovider - Key: LUCENE-3076 URL: https://issues.apache.org/jira/browse/LUCENE-3076 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3076.patch Currently to test a codec (or set of codecs) you have to add them to lucene's core and edit a couple of arrays here and there... It would be nice if when using the test-framework you could instead specify a codecprovider by classname (possibly containing your own set of huper-duper codecs). For example I made the following little codecprovider in contrib: {noformat} public class AppendingCodecProvider extends CodecProvider { public AppendingCodecProvider() { register(new AppendingCodec()); register(new SimpleTextCodec()); } } {noformat} Then, I'm able to run tests with 'ant -lib build/contrib/misc/lucene-misc-4.0-SNAPSHOT.jar test-core -Dtests.codecprovider=org.apache.lucene.index.codecs.appending.AppendingCodecProvider', and it always picks from my set of codecs (in this case Appending and SimpleText), and I can set -Dtests.codec=Appending if i want to set just one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Minor problem with using code from the trunk with vb.net project.
David, Apologies if this is pedantic, but that should be one of the goals, to move toward .NET naming conventions (which Lucene.NET does not abide by, and it makes for an odd fit). - Nick -Original Message- From: David Smith [mailto:dav...@nzcity.co.nz] Sent: Thursday, May 05, 2011 6:18 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Minor problem with using code from the trunk with vb.net project. Morning, I checked out and compiled https://svn.apache.org/repos/asf/incubator/lucene.net/trunk yesterday, looking to update from 2.0.0.4 To get the library to work with VB.Net I found I had to edit TopDocs.cs (src/core/Search/TopDocs.cs). Being case-insensitive VB.Net can't differentiate between the three public variables (totalHits, scoreDocs maxScore) and the three public properties (TotalHits, ScoreDocs MaxScore) David
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7768 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7768/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin Error Message: expected:0 but was:1 Stack Trace: junit.framework.AssertionFailedError: expected:0 but was:1 at org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:327) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1156) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1084) Build Log (for compile errors): [...truncated 10752 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065
[ https://issues.apache.org/jira/browse/SOLR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029722#comment-13029722 ] Chris Male commented on SOLR-2497: -- Hey Uwe, I spent quite some time tracking this down. The problem is that the dates cannot be parsed because they are lacking the compulsory 'Z' on the end (its required by the date parser). You need to change TrieDateField#indexedToReadable to: return wrappedField.indexedToReadable(indexedForm) + Z; with that change, the test now passes for me. You can see in DateField#indexedToReadable it does the same thing. Move Solr to new NumericField stored field impl of LUCENE-3065 -- Key: SOLR-2497 URL: https://issues.apache.org/jira/browse/SOLR-2497 Project: Solr Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.2, 4.0 Attachments: SOLR-2497.patch, SOLR-2497.patch, SOLR-2497.patch This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField Co would use NumericField for indexing and reading stored fields. To enable this some missing changes in Solr's internals (Field - Fieldable) need to be done. Also some backwards compatible stored fields parsing is needed to read pre-3.2 indexes without reindexing (as the format changed a little bit and Document.getFieldable returns NumericField instances now). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7770 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7770/ No tests ran. Build Log (for compile errors): [...truncated 472 lines...] [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:50: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:61: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:61: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:72: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:72: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:83: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:83: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:94: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path), 1 ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:94: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac]
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7771 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7771/ No tests ran. Build Log (for compile errors): [...truncated 472 lines...] [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:50: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:61: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:61: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:72: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:72: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:83: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:83: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path) ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:94: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac] ReversePathHierarchyTokenizer t = new ReversePathHierarchyTokenizer( new StringReader(path), 1 ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/path/TestReversePathHierarchyTokenizer.java:94: cannot find symbol [javac] symbol : class ReversePathHierarchyTokenizer [javac] location: class org.apache.lucene.analysis.path.TestReversePathHierarchyTokenizer [javac]
[JENKINS] Solr-3.x - Build # 346 - Failure
Build: https://builds.apache.org/hudson/job/Solr-3.x/346/ No tests ran. Build Log (for compile errors): [...truncated 13314 lines...] [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/core/IndexDeletionPolicyWrapper.java:157: warning: getFileNames() in org.apache.solr.core.IndexDeletionPolicyWrapper.IndexCommitWrapper overrides getFileNames() in org.apache.lucene.index.IndexCommit; return type requires unchecked conversion [javac] found : java.util.Collection [javac] required: java.util.Collectionjava.lang.String [javac] public Collection getFileNames() throws IOException { [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/core/IndexDeletionPolicyWrapper.java:211: warning: getUserData() in org.apache.solr.core.IndexDeletionPolicyWrapper.IndexCommitWrapper overrides getUserData() in org.apache.lucene.index.IndexCommit; return type requires unchecked conversion [javac] found : java.util.Map [javac] required: java.util.Mapjava.lang.String,java.lang.String [javac] public Map getUserData() throws IOException { [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/RequestHandlerBase.java:173: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(handlerStart,handlerStart); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/RequestHandlerBase.java:174: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(requests, numRequests); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/RequestHandlerBase.java:175: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(errors, numErrors); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/RequestHandlerBase.java:176: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(timeouts, numTimeouts); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/RequestHandlerBase.java:177: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(totalTime,totalTime); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/RequestHandlerBase.java:178: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(avgTimePerRequest, (float) totalTime / (float) this.numRequests); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/RequestHandlerBase.java:179: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(avgRequestsPerSecond, (float) numRequests*1000 / (float)(System.currentTimeMillis()-handlerStart)); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/component/ResponseBuilder.java:291: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] rsp.getResponseHeader().add( partialResults, Boolean.TRUE ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/search/FunctionQParser.java:254: warning: [unchecked] unchecked conversion [javac] found : java.util.HashMap [javac] required: java.util.Mapjava.lang.String,java.lang.String [javac] int end = QueryParsing.parseLocalParams(qs, start, nestedLocalParams, getParams()); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/solr/src/java/org/apache/solr/handler/component/FacetComponent.java:405: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] facet_counts.add(exception,fi.exceptionList); [javac]