Re: Proposal about Version API relaxation
Coming in late to the discussion, and without really understanding the underlying Lucene issues, but... The size of the problem of reindexing is under-appreciated I think. Somewhere in my company is the original data I indexed. But the effort it would take to resurrect it is O(unknown). An unfortunate reality of commercial products is that the often receive very little love for extended periods of time until all of the sudden more work is required. There ensues an extended period of re-orientation, even if the people who originally worked on the project are still around. *Assuming* the data is available to reindex (and there are many reasons besides poor practice on the part of the company that it may not be), remembering/finding out exactly which of the various backups you made of the original data is the one that's actually in your product can be highly non-trivial. Compounded by the fact that the product manager will be adamant about Do NOT surprise our customers. So I can be in a spot of saying I *think* I have the original data set, and I *think* I have the original code used to index it, and if I get a new version of Lucene I *think* I can recreate the index and I *think* that the user will see the expected change. After all that effort is completed, I *think* we'll see the expected changes, but we won't know until we try it puts me in a very precarious position. This assumes that I have a reasonable chance of getting the original data. But say I've been indexing data from a live feed. Sure as hell hope I stored the data somewhere, because going back to the source and saying please resend me 10 years worth of data that I have in my index is...er...hard. Or say that the original provider has gone out of business, or the licensing arrangement specifies a one-time transmission of data that may not be retained in its original form or. The point of this long diatribe is that there are many reasons why reindexing is impossible and/or impractical. Making any decision that requires reindexing for a new version is locking a user into a version potentially forever. We should not underestimate how painful that can be and should never think that just reindex is acceptable in all situations. It's not. Period. Be very clear that some number of Lucene users will absolutely not be able to reindex. We may still make a decision that requires this, but let's make it without deluding ourselves that it's a possible solution for everyone. So an upgrade tool seems like a reasonable compromise. I agree that being hampered in what we can develop in Lucene by having to accomodate reading old indexes slows new features etc. It's always nice to be able to work without dealing with pesky legacy issues G. Perhaps splitting out the indexing upgrades into a separate program lets us accommodate both concerns. FWIW Erick On Thu, Apr 15, 2010 at 9:42 AM, Danil ŢORIN torin...@gmail.com wrote: True. Just need the tool. On Thu, Apr 15, 2010 at 16:39, Earwin Burrfoot ear...@gmail.com wrote: On Thu, Apr 15, 2010 at 17:17, Yonik Seeley yo...@lucidimagination.com wrote: Seamless online upgrades have their place too... say you are upgrading one server at a time in a cluster. Nothing here that can't be solved with an upgrade tool. Down one server, upgrade index, upgrade sofware, up. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Proposal about Version API relaxation
'Cause some exec finally noticed the product was losing market share. Or got a wild hair strategically placed. My point is only that we should be clear that some number of Lucene users *will* be in such a position. I'm actually fine with a decision that we're not going to support such a scenario, but let's be clear that that's the decision we're making. And corporate competence aside, there's still licensing that may prevent me archiving the raw data Erick On Thu, Apr 15, 2010 at 10:20 AM, Earwin Burrfoot ear...@gmail.com wrote: I think the need to upgrade to latest and greatest lucene for poor corporate users that lost all their data is somewhat overblown. Why the heck do you need to upgrade if your app rotted in neglect for years?? On Thu, Apr 15, 2010 at 18:14, Erick Erickson erickerick...@gmail.com wrote: Coming in late to the discussion, and without really understanding the underlying Lucene issues, but... The size of the problem of reindexing is under-appreciated I think. Somewhere in my company is the original data I indexed. But the effort it would take to resurrect it is O(unknown). An unfortunate reality of commercial products is that the often receive very little love for extended periods of time until all of the sudden more work is required. There ensues an extended period of re-orientation, even if the people who originally worked on the project are still around. *Assuming* the data is available to reindex (and there are many reasons besides poor practice on the part of the company that it may not be), remembering/finding out exactly which of the various backups you made of the original data is the one that's actually in your product can be highly non-trivial. Compounded by the fact that the product manager will be adamant about Do NOT surprise our customers. So I can be in a spot of saying I *think* I have the original data set, and I *think* I have the original code used to index it, and if I get a new version of Lucene I *think* I can recreate the index and I *think* that the user will see the expected change. After all that effort is completed, I *think* we'll see the expected changes, but we won't know until we try it puts me in a very precarious position. This assumes that I have a reasonable chance of getting the original data. But say I've been indexing data from a live feed. Sure as hell hope I stored the data somewhere, because going back to the source and saying please resend me 10 years worth of data that I have in my index is...er...hard. Or say that the original provider has gone out of business, or the licensing arrangement specifies a one-time transmission of data that may not be retained in its original form or. The point of this long diatribe is that there are many reasons why reindexing is impossible and/or impractical. Making any decision that requires reindexing for a new version is locking a user into a version potentially forever. We should not underestimate how painful that can be and should never think that just reindex is acceptable in all situations. It's not. Period. Be very clear that some number of Lucene users will absolutely not be able to reindex. We may still make a decision that requires this, but let's make it without deluding ourselves that it's a possible solution for everyone. So an upgrade tool seems like a reasonable compromise. I agree that being hampered in what we can develop in Lucene by having to accomodate reading old indexes slows new features etc. It's always nice to be able to work without dealing with pesky legacy issues G. Perhaps splitting out the indexing upgrades into a separate program lets us accommodate both concerns. FWIW Erick On Thu, Apr 15, 2010 at 9:42 AM, Danil ŢORIN torin...@gmail.com wrote: True. Just need the tool. On Thu, Apr 15, 2010 at 16:39, Earwin Burrfoot ear...@gmail.com wrote: On Thu, Apr 15, 2010 at 17:17, Yonik Seeley yo...@lucidimagination.com wrote: Seamless online upgrades have their place too... say you are upgrading one server at a time in a cluster. Nothing here that can't be solved with an upgrade tool. Down one server, upgrade index, upgrade sofware, up. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5
Re: [jira] Account password
A, good. That means the very long e-mail that came to my regular account about someone hacking the JIRA server is bogus too I assume.. Erick On Tue, Apr 13, 2010 at 5:58 PM, Uwe Schindler u...@thetaphi.de wrote: LOL! This user is assigned to very old bugzilla issues :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: j...@apache.org [mailto:j...@apache.org] Sent: Tuesday, April 13, 2010 10:54 PM To: java-dev@lucene.apache.org Subject: [jira] Account password You (or someone else) has reset your password. - Your password has been changed to: MCwqNr You can change your password here: https://issues.apache.org/jira/secure/ViewProfile.jspa Here are the details of your account: - Username: java-dev@lucene.apache.org Email: java-dev@lucene.apache.org Full Name: Lucene Developers Password: MCwqNr (You can always retrieve these via the Forgot Password link on the signup page) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Account password
Oops, that'll teach me to just skim things, won't it? Erick On Tue, Apr 13, 2010 at 6:14 PM, Andi Vajda va...@osafoundation.org wrote: On Tue, 13 Apr 2010, Erick Erickson wrote: A, good. That means the very long e-mail that came to my regular account about someone hacking the JIRA server is bogus too I assume.. Err, no, it's real. You should change your password. Andi.. Erick On Tue, Apr 13, 2010 at 5:58 PM, Uwe Schindler u...@thetaphi.de wrote: LOL! This user is assigned to very old bugzilla issues :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: j...@apache.org [mailto:j...@apache.org] Sent: Tuesday, April 13, 2010 10:54 PM To: java-dev@lucene.apache.org Subject: [jira] Account password You (or someone else) has reset your password. - Your password has been changed to: MCwqNr You can change your password here: https://issues.apache.org/jira/secure/ViewProfile.jspa Here are the details of your account: - Username: java-dev@lucene.apache.org Email: java-dev@lucene.apache.org Full Name: Lucene Developers Password: MCwqNr (You can always retrieve these via the Forgot Password link on the signup page) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Created: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space
What kind of JVM settings are you using? Lots of people index lots of documents without running into this, can you provide more specifics about your indexing settings? On Tue, Apr 6, 2010 at 10:51 PM, Shivender Devarakonda (JIRA) j...@apache.org wrote: java.lang.OutOfMemoryError:Java heap space -- Key: LUCENE-2376 URL: https://issues.apache.org/jira/browse/LUCENE-2376 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda I see an OutOfMemory error in our product and it is happening when we have some data objects on which we built the index. I see the following OutOfmemory error, this is happening after we call Indexwriter.optimize(): 4/06/10 02:03:42.160 PM PDT [ERROR] [Lucene Merge Thread #12] In thread Lucene Merge Thread #12 and the message is org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space 4/06/10 02:03:42.207 PM PDT [VERBOSE] [Lucene Merge Thread #12] [Manager] Uncaught Exception in thread Lucene Merge Thread #12 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:462) at java.util.HashMap.addEntry(HashMap.java:755) at java.util.HashMap.put(HashMap.java:385) at org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:256) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:366) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) 4/06/10 02:03:42.895 PM PDT [ERROR] this writer hit an OutOfMemoryError; cannot complete optimize -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
My snap impression is that moving lucene to a sub-tree under SOLR would introduce some confusion in the minds of new folks looking at the code. *We* all know that Lucene stands by itself, but putting it under a solr makes that less obvious. I claim that there would be questions like so can I just use Lucene without SOLR?. That said, the questions about release management, branching, tagging, etc. take complete precedence over minor confusion when the answer is just go to directory X and checkout if you want Lucene only. FWIW Erick On Tue, Mar 16, 2010 at 8:30 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Mar 16, 2010 at 3:43 AM, Simon Willnauer simon.willna...@googlemail.com wrote: One more thing which I wonder about even more is that this whole merging happens so quickly for reasons I don't see right now. I don't want to keep anybody from making progress but it appears like a rush to me. By the way, the serious changes we applied to the branch, most of them have been sitting in JIRA over 3 months not doing much: SOLR-1659 if you follow the linked issues, you can see all the stuff that got put in the branch... the branch was helpful for me, as I could help Mark with the ton of little things, like TokenStreams embedded inside JSP files :) As its just a branch, if you want to go look at those patches (especially anything I did) and provide technical feedback, that would be great! But I think its a mistake to say things are rushed when the work has been done for months. -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: How can I use QueryScorer() to find only perfect matches??
Try +contents:term +contents:query. By misplacing the '+' you're getting the default OR operator and the '+' is probably being thrown away by the analyzer. Luke will help here a lot. HTH Erick On Mon, Mar 15, 2010 at 9:46 AM, christian stadler stadler.christ...@web.de wrote: Hi there, I have an issue with the QueryScorer(query) method at the moment and I need some assistance. I was indexing my e-book lucene in action and based on this index-db I started to play around with some boolean queries like: (contents:+term contents:+query) As a result I'm expecting as a perfect match for the phrase term query four hits. But when I run my sample to highlight this phrase in the context then I get a lot more results. It also finds all the matches for term and query independently. I think the problem is the QueryScorer() which softens the former exact boolean query. Then I was trying the following: private static Highlighter GetHits(Query query, Formatter formatter) { string filed = contents BooleanQuery termsQuery = new BooleanQuery(); WeightedTerm[] terms = QueryTermExtractor.GetTerms(query, true, field); foreach (WeightedTerm term in terms) { TermQuery termQuery = new TermQuery(new Term(field, term.GetTerm())); termsQuery.Add(termQuery, BooleanClause.Occur.MUST); } // create query scorer based on term queries (field specific) QueryScorer scorer = new QueryScorer(termsQuery); Highlighter highlighter = new Highlighter(formatter, scorer); highlighter.SetTextFragmenter(new SimpleFragmenter(20)); return highlighter; } to rewrite the query and set the term attribute from SHOULD to MUST But the result was the same. Do you have any example how I can use the QueryScorer() in exactly the same way as to mimic a BooleanSearch?? thanks in advance Christian - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2308) Separately specify a field's type
Congrats Chris! I vote for thinkAboutNotIncludingNormsMaybe(true|false) G. Seriously double negatives are ugly IMO, +1 for changing Erick On Fri, Mar 12, 2010 at 12:56 PM, Chris Male (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844587#action_12844587] Chris Male commented on LUCENE-2308: I agree entirely. This is definitely the moment to remove any ambiguity or confusion in this API. I'll make sure to incorporate this idea. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2280) IndexWriter.optimize() throws NullPointerException
Quick side note: The recommended upgrade path is to upgrade to 2.9.latest, fix all of the deprecation warnings, *then* upgrade to 3.0. The 2.9.X - 3.0 upgrade just removed all the deprecated stuff. FWIW Erick On Mon, Mar 8, 2010 at 8:51 AM, Ritesh Nigam (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842657#action_12842657] Ritesh Nigam commented on LUCENE-2280: -- I checked the documentation of IndexWriter in 2.3.2, API commit() is not available with this version (I think it is introduced in 2.4), I am not explicitely setting autoCommit, so it should take default value which I believe is true. One more thing I am catching any exception hitting during indexing or optimizing, and then in finally block i am closing the IndexWriter by calling close(), method which sould take care of commit internally? Please suggest me if there is any equivalent method which i can use in place of commit() I have not upgraded to the newer version of lucene, but probably i will try 3.0.0 version of lucene in future. IndexWriter.optimize() throws NullPointerException -- Key: LUCENE-2280 URL: https://issues.apache.org/jira/browse/LUCENE-2280 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.2 Environment: Win 2003, lucene version 2.3.2, IBM JRE 1.6 Reporter: Ritesh Nigam Attachments: lucene.jar I am using lucene 2.3.2 search APIs for my application, i am indexing 45GB database which creates approax 200MB index file, after finishing the indexing and while running optimize() i can see NullPointerExcception thrown in my log and index file is getting corrupted, log says Caused by: java.lang.NullPointerException at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:49) at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:40) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:566) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:135) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3273) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2968) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240) and this is happening quite frequently, although I am not able to reproduce it on demand, I saw an issue logged which is some what related to mine issue ( http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200809.mbox/%3c6e4a40db-5efc-42da-a857-d59f4ec34...@mikemccandless.com%3e) but the only difference here is I am not using Store.Compress for my fields, i am using Store.NO instead. please note that I am using IBM JRE for my application. Is this an issue with lucene?, if yes it is fixed in which version? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene Filter
The very first thing I'd recommend is to get a copy of Luke (google Lucene, Luke) and examine your index to see if what you *think* is in there is *actually* in there. One popular learning experience is to do something like Document = new Document(); while (more docs to add) { add field add field add doc } Problem is that the document simply accumulates. The first add doc puts your first document in the index. The second puts the contents of both the first and second doc in the second doc of the index. The third puts the contents of 3 documents in for the third doc, etc. Cure this by moving the new Document inside the while loop If this doesn't help, please show your indexing and searching code HTH Erick On Tue, Mar 2, 2010 at 9:35 AM, Dyutiman dyutiman.chaudh...@gmail.comwrote: Hi, I am new in this forum and new to Lucene also. I m getting some issue while trying to filter my Lucene result. While creating the index I am creating a field called sentiment and possible values are 'positive', 'negative' 'neutral', I am indexing this field like doc.add(new Field(sentiment, sentiment, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); Now I want to search within my index but get only positive sentiment results for the searched string. For this I am doing something like this : QueryParser qp = new QueryParser(Version.LUCENE_CURRENT, contents, analyzer); Query query = qp.parse(searchString); Filter filter = new TermRangeFilter(sentiment, positive, positive, true, true); topDocs = searcher.search(query, filter, 20); But I am getting results mixed with all 3 sentiments. I tried other filters also but the result is same. Anybody got any solutions for me please help.. thanks Dyutiman -- View this message in context: http://old.nabble.com/Lucene-Filter-tp27756577p27756577.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene Filter
Taking a quick glance at the code, I don't see anything obviously wrong as far as the problem you describe goes. What happens if you just add a required clause to your query string rather than use a Filter? Something like +sentiment:positive? If you do that, query.toString is your friend G... Erick On Tue, Mar 2, 2010 at 10:04 AM, Dyutiman dyutiman.chaudh...@gmail.comwrote: Thanks Erick for your quick reply. I am going to try Luke and examine my index. In the mean time let me tell you that I am indexing the documents every time creating the new document. Let me attach the code I am using here. thanks Dyutiman http://old.nabble.com/file/p27756896/IndexUtil.javaIndexUtil.java Erick Erickson wrote: The very first thing I'd recommend is to get a copy of Luke (google Lucene, Luke) and examine your index to see if what you *think* is in there is *actually* in there. One popular learning experience is to do something like Document = new Document(); while (more docs to add) { add field add field add doc } Problem is that the document simply accumulates. The first add doc puts your first document in the index. The second puts the contents of both the first and second doc in the second doc of the index. The third puts the contents of 3 documents in for the third doc, etc. Cure this by moving the new Document inside the while loop If this doesn't help, please show your indexing and searching code HTH Erick On Tue, Mar 2, 2010 at 9:35 AM, Dyutiman dyutiman.chaudh...@gmail.comwrote: Hi, I am new in this forum and new to Lucene also. I m getting some issue while trying to filter my Lucene result. While creating the index I am creating a field called sentiment and possible values are 'positive', 'negative' 'neutral', I am indexing this field like doc.add(new Field(sentiment, sentiment, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); Now I want to search within my index but get only positive sentiment results for the searched string. For this I am doing something like this : QueryParser qp = new QueryParser(Version.LUCENE_CURRENT, contents, analyzer); Query query = qp.parse(searchString); Filter filter = new TermRangeFilter(sentiment, positive, positive, true, true); topDocs = searcher.search(query, filter, 20); But I am getting results mixed with all 3 sentiments. I tried other filters also but the result is same. Anybody got any solutions for me please help.. thanks Dyutiman -- View this message in context: http://old.nabble.com/Lucene-Filter-tp27756577p27756577.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- View this message in context: http://old.nabble.com/Lucene-Filter-tp27756577p27756896.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Adding .classpath.tmpl
Tangentially related, but the link on the how to contribute page to the IntelliJ code style file is broken, it reached over into the SOLR Wiki... I stole the one from SOLR and added it as an attachment and the how to contribute page now links to it Erick On Sun, Feb 28, 2010 at 5:14 AM, Shai Erera ser...@gmail.com wrote: I've read BUILD.txt and it doesn't look like it'll fit there. That files discusses how to build Lucene using Ant and JDK. The word IDE is not mentioned, nor Eclipse. BTW, there is a typo in the file before returning to this README - not sure if the word README is intended to be like that, or a leftover from when this was once in README? Shai On Sun, Feb 28, 2010 at 12:11 PM, Uwe Schindler u...@thetaphi.de wrote: Maybe this change is better in BUILD.txt? I am not sure. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de *From:* Shai Erera [mailto:ser...@gmail.com] *Sent:* Sunday, February 28, 2010 10:55 AM *To:* java-dev@lucene.apache.org *Subject:* Re: Adding .classpath.tmpl Index: README.txt === --- README.txt(revision 917047) +++ README.txt(working copy) @@ -28,8 +28,6 @@ part of the core library. Of special note are the JAR files in the analyzers directory which contain various analyzers that people may find useful in place of the StandardAnalyzer. - - docs/index.html The contents of the Lucene website. @@ -42,3 +40,10 @@ src/demo Some example code. + +SET UP THE ENVIRONMENT + +Checkout the HowToContribute wiki page +(http://wiki.apache.org/lucene-java/HowToContribute) which includes useful +information on how to contribute code to Lucene, as well as how to set up your +environment quickly (code formatting rules and setting the classpath quickly). \ No newline at end of file Is this ok? Shai On Sun, Feb 28, 2010 at 11:07 AM, Uwe Schindler u...@thetaphi.de wrote: I think we can add this to the README.txt! Do you have a patch? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de *From:* Shai Erera [mailto:ser...@gmail.com] *Sent:* Sunday, February 28, 2010 6:30 AM *To:* java-dev@lucene.apache.org *Subject:* Re: Adding .classpath.tmpl I uploaded the file to http://wiki.apache.org/lucene-java/HowToContribute(bottom of the page). But I don't see any good spot to stuff it in the README. There is no pointer to the HowToContribute page at all, nor to the code formatting styles ... what do you think - create such section at the bottom of README, or leave it out? On Fri, Feb 26, 2010 at 2:58 PM, Shai Erera ser...@gmail.com wrote: Thanks for your response. I will update the Wiki with the file. After I do that, I'll add some text to the README file. I'll need one of you to help me commit it though. Thanks again, Shai On Thu, Feb 25, 2010 at 6:21 PM, Mark Miller markrmil...@gmail.com wrote: +1 - I'd prefer this stay out of svn as well - I'd rather it go on the wiki too - perhaps in the same place that you can find the formatting file for eclipse and intellij. -- - Mark http://www.lucidimagination.com On 02/25/2010 11:10 AM, Grant Ingersoll wrote: To me, this is stuff that can go on the wiki or somewhere else, otherwise over time, there will be others to add in, etc. We could simply add a pointer to the wiki page in the README. On Feb 24, 2010, at 11:55 PM, Shai Erera wrote: Hi I always find it annoying when I checkout the code to a new project in eclipse, that I need to put everything that I care about in the classpath and adding the dependent libraries. On another project I'm involved with, we did that process once, adding all the source code to the classpath and the libraries and created a .classpath.tmpl. Now when people checkout the code, they can copy the content of that file to their .classpath file and setting up the project is reducing from a couple of minutes to few seconds. I don't want to check-in .classpath because not everyone wants all the code in their classpath. I attached such file to the mail. Note that the only dependency which will break on other machines is the ant.jar dependency, which on my Windows is located under c:\ant. That jar is required to compile contrib/ant from eclipse. Not sure how to resolve that, except besides removing that line from the file and document separately that that's what you need to do if you want to add contrib/ant ... The file is sorted by name, putting the core stuff at the top - so it's easy for people to selectively add the interesting packages. I don't know if an issue is required, if so I can create it in and move the discussion there. Shai lucene.classpath.tmpl - To unsubscribe, e-mail:
Re: [jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
I won't be able to look at this till tonight, I'll see what I can see. On Fri, Feb 26, 2010 at 9:02 AM, Uwe Schindler (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838872#action_12838872] Uwe Schindler commented on LUCENE-2037: --- Committed revision: 916685 Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037-getName.patch, LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037_remove_testwatchman.patch, LUCENE-2037_revised_2.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r916685 - in /lucene/java/trunk/src/test/org/apache/lucene/util: InterceptTestCaseEvents.java LuceneTestCaseJ4.java
Nice simplification! On Fri, Feb 26, 2010 at 9:02 AM, uschind...@apache.org wrote: Author: uschindler Date: Fri Feb 26 14:02:08 2010 New Revision: 916685 URL: http://svn.apache.org/viewvc?rev=916685view=rev Log: LUCENE-2037: Add support for LuceneTestCase.getName() for backwards compatibility when reporting failed tests. Also removed The InterceptTestCaseEvents class and added as anonymous class (simplified, no reflection) Removed: lucene/java/trunk/src/test/org/apache/lucene/util/InterceptTestCaseEvents.java Modified: lucene/java/trunk/src/test/org/apache/lucene/util/LuceneTestCaseJ4.java Modified: lucene/java/trunk/src/test/org/apache/lucene/util/LuceneTestCaseJ4.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/util/LuceneTestCaseJ4.java?rev=916685r1=916684r2=916685view=diff == --- lucene/java/trunk/src/test/org/apache/lucene/util/LuceneTestCaseJ4.java (original) +++ lucene/java/trunk/src/test/org/apache/lucene/util/LuceneTestCaseJ4.java Fri Feb 26 14:02:08 2010 @@ -25,6 +25,8 @@ import org.junit.After; import org.junit.Before; import org.junit.Rule; +import org.junit.rules.TestWatchman; +import org.junit.runners.model.FrameworkMethod; import java.io.PrintStream; import java.util.Arrays; @@ -98,14 +100,21 @@ // Think of this as start/end/success/failed // events. @Rule - public InterceptTestCaseEvents intercept = new InterceptTestCaseEvents(this); + public final TestWatchman intercept = new TestWatchman() { - public LuceneTestCaseJ4() { - } +@Override +public void failed(Throwable e, FrameworkMethod method) { + reportAdditionalFailureInfo(); + super.failed(e, method); +} - public LuceneTestCaseJ4(String name) { -this.name = name; - } +@Override +public void starting(FrameworkMethod method) { + LuceneTestCaseJ4.this.name = method.getName(); + super.starting(method); +} + + }; @Before public void setUp() throws Exception { @@ -291,6 +300,6 @@ // static members private static final Random seedRnd = new Random(); - private String name = ; + private String name = unknown; }
Re: Uwe's question
You can use Junit4 whenever you want right now. Just derive from LuceneTestCaseJ4 rather than LuceneTestCase. And annotate each test with @Test and you should be fine. Junit4 does allow you to mix-n-match 3/4 tests *on a whole class basis*. That is, all of the tests in a class must be either 3-style deriving from TestCase and named appropriately) or 4-style (annotated, with whatever Junit4 features you'd like). The consensus seems to be that converting old tests to Junit4 just to get them all using Junit4 isn't a good use of time, and at least introduces the possibility that it would mess things up. Upgrading old tests to Junit4 to improve them, especially to speed them up (@BeforeClass and @ AfterClass can help) *is* a good use of time. I might convert an old-style test case if I was working in it, but that's probably a personal preference. I've never tried to learn a command-line invocation of a test case for a single test method, I've always just used the IDE to run individual methods Erick On Fri, Feb 26, 2010 at 11:31 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Lets go to JUnit 4 if possible... Does it provide method level testing? (i.e. one doesn't need to execute every test method just to check the results of one method) On Thu, Feb 25, 2010 at 8:15 PM, Shai Erera ser...@gmail.com wrote: Ok this seems a discussion related to JUnit 4, so I'll port what I've said about it from the other thread (doing the code cleanup): {quote} Erik, I'm totally with you on JUnit 4. I think the @Test annotation is really not a big deal (it's actually very easy to migrate all the current tests to JUnit 4 with the added import using some script. Even manually it shouldn't be such a big deal. @Ignore is a perfect other advantage of JUnit4. I've found some tests which were prefixed with _, i.e. _testXYZ just to disable them. Nobody knows about them until he looks at the code (and pays attention). @Ignore would have been better. And there are lots of other advantages, like the @Before and @After (not only class). Another problem I've found in the tests is that not all extended LuceneTestCase, and usually their setUp and tearDown implementations were wrong - not calling super first/last. When I moved them to extend LuceneTestCase they broke (I fixed them, don't worry). However, that could never happen if the super's methods were tagged w/ @Before/After, because JUnit would take care running them before/after their sub-classes' @Before/After. So that's another win for JUnit4. And of course the @Before/AfterClass are really great ! {quote} I think the @Before/After annotations can be a real win for our tests. My two cents, Shai On Fri, Feb 26, 2010 at 4:57 AM, Erick Erickson erickerick...@gmail.com wrote: Well, Things got busy (tm). Uwe's point if valid; unless there's demonstrable gain, moving things to Junit4 just for fun is wasted motion, indeed dangerous. I was focusing on LocalizedTestCase to understand the place of runBare etc. in the scheme of things since when I created LuceneTestCaseJ4 that was something I wanted to figure out to make it a replacement for LuceneTestCase. I can't point to a compelling reason to shake up the code, the only improvement it would have is having a demonstration of using the Junit4 @RunWith annotation for future reference. So, I've no compelling reason to push that patch forward. If y'all think it's worth it I'll be happy to crank that patch back up again, it'll take a few days though. It does affect a several files, and if the main value here is an exemplar of the @RunWith annotation, perhaps there's a better place to put that in. Erick On Thu, Feb 25, 2010 at 9:06 PM, Robert Muir rcm...@gmail.com wrote: LocalizedTestCase called runBare in LuceneTestCase which reported the seed value if an exception was thrown. I couldn't find a good way to access runBare or analogs in Junit4, but the interceptor pattern worked as well. The interceptor is called by the Junit framework on test events, so there aren't references to it in the Lucene test code. There are other places that call runBare, so I assumed that if anyone wanted to use Junit4 with those classes it would be a good thing to allow. I didn't forget about your patch Erick, in my opinion there is nothing wrong with it. I hope its not discouraging you, the problem is a few of us have spent countless hours trying to debug this hard-to-reproduce Thai test failure problem. It failed in the existing tests, too, with Junit 3 on hudson (one time!). At this point, i start to wonder if it could be related to stuff like this: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6683975 I don't think we should let this stop progress with the tests, if you think we should move LocalizedTestCase to junit 4 lets do it. -- Robert Muir rcm
Re: Uwe's question
I poked around a little and didn't find any joy. But the *really clumsy* way of doing this would be to add the @Ignore annotation to any test in the class that you didn't want to run, then just run the class. Or, equivalently, comment out the @Test annotation. I'd prefer adding the @Ignore though so there's be some chance of noticing if it was inadvertently checked in. FWIW Erick On Fri, Feb 26, 2010 at 3:31 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I've never tried to learn a command-line invocation of a test case for a single test method, I've always just used the IDE to run individual methods Right, I've been doing bunches of Solr dev which for me only works from the command line... I'm open to suggestions though! On Fri, Feb 26, 2010 at 10:16 AM, Erick Erickson erickerick...@gmail.com wrote: You can use Junit4 whenever you want right now. Just derive from LuceneTestCaseJ4 rather than LuceneTestCase. And annotate each test with @Test and you should be fine. Junit4 does allow you to mix-n-match 3/4 tests *on a whole class basis*. That is, all of the tests in a class must be either 3-style deriving from TestCase and named appropriately) or 4-style (annotated, with whatever Junit4 features you'd like). The consensus seems to be that converting old tests to Junit4 just to get them all using Junit4 isn't a good use of time, and at least introduces the possibility that it would mess things up. Upgrading old tests to Junit4 to improve them, especially to speed them up (@BeforeClass and @ AfterClass can help) *is* a good use of time. I might convert an old-style test case if I was working in it, but that's probably a personal preference. I've never tried to learn a command-line invocation of a test case for a single test method, I've always just used the IDE to run individual methods Erick On Fri, Feb 26, 2010 at 11:31 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Lets go to JUnit 4 if possible... Does it provide method level testing? (i.e. one doesn't need to execute every test method just to check the results of one method) On Thu, Feb 25, 2010 at 8:15 PM, Shai Erera ser...@gmail.com wrote: Ok this seems a discussion related to JUnit 4, so I'll port what I've said about it from the other thread (doing the code cleanup): {quote} Erik, I'm totally with you on JUnit 4. I think the @Test annotation is really not a big deal (it's actually very easy to migrate all the current tests to JUnit 4 with the added import using some script. Even manually it shouldn't be such a big deal. @Ignore is a perfect other advantage of JUnit4. I've found some tests which were prefixed with _, i.e. _testXYZ just to disable them. Nobody knows about them until he looks at the code (and pays attention). @Ignore would have been better. And there are lots of other advantages, like the @Before and @After (not only class). Another problem I've found in the tests is that not all extended LuceneTestCase, and usually their setUp and tearDown implementations were wrong - not calling super first/last. When I moved them to extend LuceneTestCase they broke (I fixed them, don't worry). However, that could never happen if the super's methods were tagged w/ @Before/After, because JUnit would take care running them before/after their sub-classes' @Before/After. So that's another win for JUnit4. And of course the @Before/AfterClass are really great ! {quote} I think the @Before/After annotations can be a real win for our tests. My two cents, Shai On Fri, Feb 26, 2010 at 4:57 AM, Erick Erickson erickerick...@gmail.com wrote: Well, Things got busy (tm). Uwe's point if valid; unless there's demonstrable gain, moving things to Junit4 just for fun is wasted motion, indeed dangerous. I was focusing on LocalizedTestCase to understand the place of runBare etc. in the scheme of things since when I created LuceneTestCaseJ4 that was something I wanted to figure out to make it a replacement for LuceneTestCase. I can't point to a compelling reason to shake up the code, the only improvement it would have is having a demonstration of using the Junit4 @RunWith annotation for future reference. So, I've no compelling reason to push that patch forward. If y'all think it's worth it I'll be happy to crank that patch back up again, it'll take a few days though. It does affect a several files, and if the main value here is an exemplar of the @RunWith annotation, perhaps there's a better place to put that in. Erick On Thu, Feb 25, 2010 at 9:06 PM, Robert Muir rcm...@gmail.com wrote: LocalizedTestCase called runBare in LuceneTestCase which reported the seed value if an exception was thrown. I couldn't find a good way to access runBare
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839126#action_12839126 ] Erick Erickson commented on LUCENE-2037: Uwe: You were asking about getName in LuceneTestCaseJ4. It appears that you've taken care of this, is there still anything to do? There's no longer a c'tor that takes the test name. But I did some poking around and came up with the following from someplace on the web. The only two place I could find that used getName were TestFieldScoreQuery and TestOrdValues. This bit of code works if you put it in these classes. private String testName() { return getClass().getName()+.+ name.getMethodName(); // was getName() from LuceneTestCaseJ4... } @Rule public final TestName name = new TestName(); See: http://kentbeck.github.com/junit/javadoc/4.7/org/junit/rules/TestName.html Note that this site is better than anything I could find at junit.org Once I found that, I thought gee, if I put that in the base class, it would be available to everyone. Which is exactly what you made LuceneTestCaseJ4.getName() do G. But at least I found Kent Beck's version of the docs, which is a plus... So I guess there's nothing to do as far as getName is concerned If there is, let me know Erick Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037-getName.patch, LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037_remove_testwatchman.patch, LUCENE-2037_revised_2.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Uwe's question
Here are some tantalizing hints. I'll look at this tomorrow if someone hasn't beaten me to it, but there are *better things I can be doing late Friday night than messing around with stupid tests* G. From : http://junit.sourceforge.net/doc/cookbook/cookbook.htm Once you have tests, you'll want to run them. JUnit provides tools to define the suite to be run and to display its results. To run tests and see the results on the console, run this from a Java program: org.junit.runner.JUnitCore.runClasses(TestClass1.class, ...); or this from the command line, with both your test class and junit on the classpath: java org.junit.runner.JUnitCore TestClass1.class [...other test classes...] From: http://kentbeck.github.com/junit/javadoc/4.7/index.html See JunitCore and Request, especially Request.method. From: http://old.nabble.com/How-to-run-individual-test-case-within-a-test-class-from-command-line%28JUnit-4.x%29-td20003338.html new JUnitCore.run(*Request*.*method*(class, *methodName*)); I think the still-remaining clumsy part of this is specifying the test class file in the classpath. I can imagine that this could be part of a shell script, but is it worth the effort if things run from the IDE? Alternatively, a small Java program taking two arguments might do the trick. But as I said, it's late and even *sleeping* would be better than this G. Sggghhh. Manning has a MEAP for JUnit In Action (herinafter JUIA) that covers up through Junit 4.5. Anybody dare me to spring for the $30 and see what wisdom is in there? I'm frustrated enough with the sparse documentation that it sure seems worth it Erick P.S. no Double-Dog-Dares allowed. On Fri, Feb 26, 2010 at 6:14 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Feb 26, 2010 at 3:31 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I've never tried to learn a command-line invocation of a test case for a single test method, I've always just used the IDE to run individual methods Right, I've been doing bunches of Solr dev which for me only works from the command line... I'm open to suggestions though! Should work from the IDE provided you've set the working directory to src/test/test-files But I'd love a way to run a single method from the command line too. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Stored fields access
Does LazyLoading address this? I'm assuming your issue is that the default behavior loads the entire document regardless of whether you actually want all the fields. Erick On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot ear...@gmail.com wrote: I'm thinking, should Lucene introduce new interface to read stored document fields? Current 'Document document(int n)' mechanism is barely usable due to overhead involved. While I believe underlying index structure works pretty fast (if it fits in memory, as is the case for most performance-concerned installations), there's no adequate access to it and people are forced to introduce contraptions like LinkedIn's payload-assisted luceneId-appId mapping or similar caches we employ. What I am thinking about is something along the lines of existing iterators like TermDocs/TermPositions. Iterate over docs, then iterate over fields stored for each, extract data, ???, profit. Comments? -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Updated: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
I'm so glad somebody else gets bugged by all the trivial warnings, all along I thought it was a personal problem G.. As I remember, I deprecated LuceneTestCase entirely to encourage people to migrate to the Junit4 variant (LuceneTestCaseJ4). So removing those deprecations should be approached with some caution. Of course this may have changed in the interim Erick On Thu, Feb 25, 2010 at 10:01 AM, Shai Erera (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Shai Erera updated LUCENE-2285: --- Attachment: LUCENE-2285.patch Quite a large patch. I've started off with 3832 compiler warnings based on my eclipse settings and we're now down to 510. All tests pass, including core, contrib and tag. I've also fixed a bunch of javadocs warnings, and ant javadocs now passes cleanly. I did not do any formatting to the code, in order to preserve the patch as clear and focused as possible, even though it's a very large one ... It touches a lot of files. So the sooner someone can help me commit it the better (before these files change). Code cleanup from all sorts of (trivial) warnings - Key: LUCENE-2285 URL: https://issues.apache.org/jira/browse/LUCENE-2285 Project: Lucene - Java Issue Type: Improvement Reporter: Shai Erera Priority: Minor Fix For: 3.1 Attachments: LUCENE-2285.patch I would like to do some code cleanup and remove all sorts of trivial warnings, like unnecessary casts, problems w/ javadocs, unused variables, redundant null checks, unnecessary semicolon etc. These are all very trivial and should not pose any problem. I'll create another issue for getting rid of deprecated code usage, like LuceneTestCase and all sorts of deprecated constructors. That's also trivial because it only affects Lucene code, but it's a different type of change. Another issue I'd like to create is about introducing more generics in the code, where it's missing today - not changing existing API. There are many places in the code like that. So, with you permission, I'll start with the trivial ones first, and then move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
Junit4: Well, simply disliking the @Test annotation seems like a poor reason to stay with Junit3, although I admit it's a pain in the neck to change. Which is why I didn't try to change all of them. The current system lends itself to the practice of mangling the test name as a way of not running it, which far too easily allows the test case to be forever ignored. One concrete advantage of annotations in Junit4 is the ability to add another stupid annotation @Ignore, which then gets reported and thus doesn't get lost. As I remember, that last place we left localization what that Mike (?) saw some intermittent problem that I couldn't reproduce. I could dust off that code and see what the current state of affairs is since this has come up again. The other problem was that the implementation I used lead to *increased* test run times. The localization tests basically spun through all the Locales available and ran all the tests in the class against them. The current system only runs *some* of the tests in a test class through the localization process. This can be addressed by, at worst, splitting the test class up, but in my proof-of-concept that seemed like too much detail... My purpose in deprecating LuceneTestCase was to explicitly encourage migration to Junit4, the deprecation warnings being the goad. I vote against removing it FWIW Erick On Thu, Feb 25, 2010 at 10:54 AM, Uwe Schindler (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838384#action_12838384] Uwe Schindler commented on LUCENE-2285: --- Hi Shai, I applied the patch to my checkout, so it will not get out-of date. As mentioned before, I have to review each change, as on my first diagonal look-around I found a removed cast in TestCharArraySet/Map that is important to call the right method, without the cast the test would pass, but the affected method is never called. I am also not want to remove some casts in NumericRange and other parts, where the casts were added for more clearness in code. Especially at some places without the cast it is not clear what javac will do, so the cast is for more security even if not needed. So please excuse by complaints, but two people looking over such a large patch is really needed. Thanks for the work! Uwe Code cleanup from all sorts of (trivial) warnings - Key: LUCENE-2285 URL: https://issues.apache.org/jira/browse/LUCENE-2285 Project: Lucene - Java Issue Type: Improvement Reporter: Shai Erera Assignee: Uwe Schindler Priority: Minor Fix For: 3.1 Attachments: LUCENE-2285.patch I would like to do some code cleanup and remove all sorts of trivial warnings, like unnecessary casts, problems w/ javadocs, unused variables, redundant null checks, unnecessary semicolon etc. These are all very trivial and should not pose any problem. I'll create another issue for getting rid of deprecated code usage, like LuceneTestCase and all sorts of deprecated constructors. That's also trivial because it only affects Lucene code, but it's a different type of change. Another issue I'd like to create is about introducing more generics in the code, where it's missing today - not changing existing API. There are many places in the code like that. So, with you permission, I'll start with the trivial ones first, and then move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Stored fields access
OK, never mind G Erick On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot ear...@gmail.com wrote: My issue is with extra objects created in the process. Field selection can be handled with, well, FieldSelector. 2010/2/25 Erick Erickson erickerick...@gmail.com: Does LazyLoading address this? I'm assuming your issue is that the default behavior loads the entire document regardless of whether you actually want all the fields. Erick On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot ear...@gmail.com wrote: I'm thinking, should Lucene introduce new interface to read stored document fields? Current 'Document document(int n)' mechanism is barely usable due to overhead involved. While I believe underlying index structure works pretty fast (if it fits in memory, as is the case for most performance-concerned installations), there's no adequate access to it and people are forced to introduce contraptions like LinkedIn's payload-assisted luceneId-appId mapping or similar caches we employ. What I am thinking about is something along the lines of existing iterators like TermDocs/TermPositions. Iterate over docs, then iterate over fields stored for each, extract data, ???, profit. Comments? -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
I don't have my heart set on keeping the deprecation, so taking it off works for me. I'd also agree that we need a concerted effort to either completely convert or we should leave it un-deprecated so feel free. Let's move the junit4 stuff off to another discussion. Erick On Thu, Feb 25, 2010 at 1:27 PM, Shai Erera ser...@gmail.com wrote: Erik, I'm totally with you on JUnit 4. I think the @Test annotation is really not a big deal (it's actually very easy to migrate all the current tests to JUnit 4 with the added import using some script. Even manually it should be such a big deal. @Ignore is a perfect other advantage of JUnit4. I've found some tests which were prefixed with _, i.e. _testXYZ just to disable them. Nobody knows about it until he looks at the code (and pays attention). @Ignore would have been better. And there are lots of other advantages, like the @Before and @After (not only class). Another problem I've found in the tests is that not all extended LuceneTestCase, and usually their setUp and tearDown implementations were wrong - not calling super first/last. When I moved them to extend LuceneTestCase they broke (I fixed them, don't worry). However, that could never happen if the super's methods were tagged w/ @Before/After, because JUnit would take care running them before/after their sub-classes' @Before/After. So that's another win for JUnit4. And of course the @Before/AfterClass are really great ! So all in all, I'm a big fan of JUnit4, and if the discussion will start again, I'll pay more attention to it and participate (I admit I didn't follow it before). As long as it happens on the list and not on some IRC channel (!?!?). But like Uwe said, that's slightly unrelated to that issue. Because that deprecation alone produced 500 warnings (probably even much more), I un-deprecated it, and when we make a decision one way or the other, we should simply remove it (in case that's the decision). Until then, let's get rid of the unnecessary noise, agree? Shai On Thu, Feb 25, 2010 at 7:15 PM, Uwe Schindler u...@thetaphi.de wrote: This discussion is out oft he scope of this issue. We can start the flamewar again. In IRC we came to the conculsion, that our primary intent is to make the test runs faster, which we achieved by patching lots of tests to not change static defaults and so be able to run all tests in the same JVM without forking. More speed improvements can be done by moving read-only index creation for search tests into static @BeforeClass and setting IndexReaders/-Searchers to NULL in @AfterClass to allow GC of static fields holding RAMDirectory and so on. The @Test annotation lead to more confusion and errors at our delevopers. E.g. we had a test merged back from 3.0 (without Junit4) to trunk or even new tests were added, but nobody added @Test to it, leading to the fact that the test were never run. So the most important change to LuceneTestCaseJ4 would be to emulate the old test* method names as if they have @Test. By that you could still disable them as mentioned, but it would reduce the burden of these dumb import statements and useless annotations. By the way, why does LuceneTestCaseJ4 extend TestWatchman and also a instance field extends that class? I do not understand the whole magic behind, this is totally confusing to me – annotating a field that is never used in code by an annotation is stupid and looks totally incorrect (I mean the field holding the TestWatchman-subclass). - This is another thing why I am against the migration of our already proven tests. Because of that we don’t want to deprecate LuceneTestCase and instead only transform new tests and such needing @BeforeClass/@AfterClass for more speed to the new API. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de *From:* Erick Erickson [mailto:erickerick...@gmail.com] *Sent:* Thursday, February 25, 2010 5:27 PM *To:* java-dev@lucene.apache.org *Subject:* Re: [jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings Junit4: Well, simply disliking the @Test annotation seems like a poor reason to stay with Junit3, although I admit it's a pain in the neck to change. Which is why I didn't try to change all of them. The current system lends itself to the practice of mangling the test name as a way of not running it, which far too easily allows the test case to be forever ignored. One concrete advantage of annotations in Junit4 is the ability to add another stupid annotation @Ignore, which then gets reported and thus doesn't get lost. As I remember, that last place we left localization what that Mike (?) saw some intermittent problem that I couldn't reproduce. I could dust off that code and see what the current state of affairs is since this has come up again. The other problem was that the implementation I used lead to *increased* test
Uwe's question
By the way, why does LuceneTestCaseJ4 extend TestWatchman and also a instance field extends that class? No good reason, I plead confusion when figuring out how to use it. I've attached a patch to Lucene 2037 that removes the LuceneTestCaseJ4 extending TestWatchman. I do not understand the whole magic behind, this is totally confusing to me – annotating a field that is never used in code by an annotation is stupid and looks totally incorrect (I mean the field holding the TestWatchman-subclass). Well, this is to provide the same functionality as LuceneTestCase. I'm reaching a bit here since I haven't been in that code lately, but... LocalizedTestCase called runBare in LuceneTestCase which reported the seed value if an exception was thrown. I couldn't find a good way to access runBare or analogs in Junit4, but the interceptor pattern worked as well. The interceptor is called by the Junit framework on test events, so there aren't references to it in the Lucene test code. There are other places that call runBare, so I assumed that if anyone wanted to use Junit4 with those classes it would be a good thing to allow. I think the interceptor pattern is an elegant way to do something at discrete points in the test run, although it is a bit opaque. Most of this was put in when I was trying to move LocalizedTestCase to the Junit4 world. We didn't do that, but this still needs to be kept if we want LuceneTestCaseJ4 to be a drop-in replacement for LuceneTestCase. - This is another thing why I am against the migration of our already proven tests. If you'll recall the discussion at the time, neither am I. I do believe, though, that if anyone wants to change a test class to use Junit4 it's a good thing to have something that'll drop in without surprises, which is what I was trying for. Erick
[jira] Updated: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-2037: --- Attachment: LUCENE-2037_remove_testwatchman.patch Removed unnecessary derivation from TestWatchman. Corrected minor typo in comment. Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037_remove_testwatchman.patch, LUCENE-2037_revised_2.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Uwe's question
Hmmm, didn't reopen the JIRA, should I? Or will it just magically get into Michael's queue? On Thu, Feb 25, 2010 at 8:52 PM, Erick Erickson erickerick...@gmail.comwrote: By the way, why does LuceneTestCaseJ4 extend TestWatchman and also a instance field extends that class? No good reason, I plead confusion when figuring out how to use it. I've attached a patch to Lucene 2037 that removes the LuceneTestCaseJ4 extending TestWatchman. I do not understand the whole magic behind, this is totally confusing to me – annotating a field that is never used in code by an annotation is stupid and looks totally incorrect (I mean the field holding the TestWatchman-subclass). Well, this is to provide the same functionality as LuceneTestCase. I'm reaching a bit here since I haven't been in that code lately, but... LocalizedTestCase called runBare in LuceneTestCase which reported the seed value if an exception was thrown. I couldn't find a good way to access runBare or analogs in Junit4, but the interceptor pattern worked as well. The interceptor is called by the Junit framework on test events, so there aren't references to it in the Lucene test code. There are other places that call runBare, so I assumed that if anyone wanted to use Junit4 with those classes it would be a good thing to allow. I think the interceptor pattern is an elegant way to do something at discrete points in the test run, although it is a bit opaque. Most of this was put in when I was trying to move LocalizedTestCase to the Junit4 world. We didn't do that, but this still needs to be kept if we want LuceneTestCaseJ4 to be a drop-in replacement for LuceneTestCase. - This is another thing why I am against the migration of our already proven tests. If you'll recall the discussion at the time, neither am I. I do believe, though, that if anyone wants to change a test class to use Junit4 it's a good thing to have something that'll drop in without surprises, which is what I was trying for. Erick
Re: Uwe's question
Well, Things got busy (tm). Uwe's point if valid; unless there's demonstrable gain, moving things to Junit4 just for fun is wasted motion, indeed dangerous. I was focusing on LocalizedTestCase to understand the place of runBare etc. in the scheme of things since when I created LuceneTestCaseJ4 that was something I wanted to figure out to make it a replacement for LuceneTestCase. I can't point to a compelling reason to shake up the code, the only improvement it would have is having a demonstration of using the Junit4 @RunWith annotation for future reference. So, I've no compelling reason to push that patch forward. If y'all think it's worth it I'll be happy to crank that patch back up again, it'll take a few days though. It does affect a several files, and if the main value here is an exemplar of the @RunWith annotation, perhaps there's a better place to put that in. Erick On Thu, Feb 25, 2010 at 9:06 PM, Robert Muir rcm...@gmail.com wrote: LocalizedTestCase called runBare in LuceneTestCase which reported the seed value if an exception was thrown. I couldn't find a good way to access runBare or analogs in Junit4, but the interceptor pattern worked as well. The interceptor is called by the Junit framework on test events, so there aren't references to it in the Lucene test code. There are other places that call runBare, so I assumed that if anyone wanted to use Junit4 with those classes it would be a good thing to allow. I didn't forget about your patch Erick, in my opinion there is nothing wrong with it. I hope its not discouraging you, the problem is a few of us have spent countless hours trying to debug this hard-to-reproduce Thai test failure problem. It failed in the existing tests, too, with Junit 3 on hudson (one time!). At this point, i start to wonder if it could be related to stuff like this: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6683975 I don't think we should let this stop progress with the tests, if you think we should move LocalizedTestCase to junit 4 lets do it. -- Robert Muir rcm...@gmail.com
Re: FileNotFoundException for write.lock
Please repost this over on the users list. This list is for internal development discussions. Thanks Erick On Sat, Jan 23, 2010 at 9:56 PM, jchang jchangkihat...@gmail.com wrote: By the way: this happens with a brand new directory with no files at all in it. jchang wrote: When I try to start my service and construct an IndexWriter, I get this: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@ /home/jchang/IdeaProjects/index-service_trunk/target/testindexA/index/indexablemaildata: files: [write.lock] It is odd. The problem is not that it is complaining about a lock file. There is none there. It seems to be complaining that there is NOT a lock file. Why? -- View this message in context: http://old.nabble.com/FileNotFoundException-for-write.lock-tp27291955p27291981.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Finding frequency of regex query match in a field
Could I ask you to re-post this on the java user's list? This list is for *internal* Lucene development discussion. Thanks Erick On Fri, Jan 15, 2010 at 8:28 AM, Altimatic chris.stuckl...@gmail.comwrote: Hi All, I have an application that has to count the frequency that a specific regular expression is matched on a particular field for each document in an indexed directory. For example. Lets say I have 10 documents in the directory and each document has 3 fields, table, column and data. Example Doc(s): //*** Document doc1 = new Document(); doc1.add(new Field(table, EMPLOYEE_US, Field.Store.NO, Field.Index.ANALYZED); doc11.add(new Field(column, F_NAME, Field.Store.NO, Field.Index.ANALYZED); doc.add(new Field(data, Chris Hank Tony Cody Tom Tina Crystal, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); Document doc2 = new Document(); doc2.add(new Field(table, EMPLOYEE_CA, Field.Store.NO, Field.Index.ANALYZED); doc2.add(new Field(column, F_NAME, Field.Store.NO, Field.Index.ANALYZED); doc2.add(new Field(data, Bob Billy Tom Toby Charles Krista Madonna, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); //I know I can create a query to search for a regular expression and that will return each //document that contains a match. IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); writer.addDocument(doc); writer.optimize(); writer.close(); searcher = new IndexSearcher(directory); RegexQuery query = new RegexQuery( newTerm(data, ^T.*)); ScoreDoc[] hits = searcher.search(query, null, maxNumOfHits).scoreDocs;//grab the score docs and go through them to find the documents that contain a match //* The code above will tell me that both doc1 and doc2 contain a match for the constructed query. However I need to know how many times the regular expression was matched in each document. ie. doc1 = 3 doc2 = 2 I hope I am being clear...and thanks in advance. Cheers -- View this message in context: http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175040p27175040.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r894224 - in /lucene/java/trunk/contrib: benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/ highlighter/src/java/org/apache/lucene/search/highlight/ instantiated/src/java/o
I once knew of a math prof in the early days of electronic book submissions who had a helpful person change all the iffs into if thinking they were all typo's... in all the proofs in a math text... As his fellow faculty member was relaying the story added putting them back was non-trivial Erick On Mon, Dec 28, 2009 at 3:02 PM, Robert Muir rcm...@gmail.com wrote: Simon, are we sure these are spelling issues, I think this iff stands for 'if and only if' in these cases? http://en.wikipedia.org/wiki/If_and_only_if On Mon, Dec 28, 2009 at 1:52 PM, sim...@apache.org wrote: Author: simonw Date: Mon Dec 28 18:52:19 2009 New Revision: 894224 URL: http://svn.apache.org/viewvc?rev=894224view=rev Log: fixed trivial spelling issues in javadoc Modified: lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/PerfTask.java lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java lucene/java/trunk/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedTermPositions.java Modified: lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/PerfTask.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/PerfTask.java?rev=894224r1=894223r2=894224view=diff == --- lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/PerfTask.java (original) +++ lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/PerfTask.java Mon Dec 28 18:52:19 2009 @@ -287,7 +287,7 @@ /** * Sub classes that supports parameters must override this method to return true. - * @return true iff this task supports command line params. + * @return true if this task supports command line params. */ public boolean supportsParams () { return false; Modified: lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java?rev=894224r1=894223r2=894224view=diff == --- lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java (original) +++ lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java Mon Dec 28 18:52:19 2009 @@ -53,8 +53,8 @@ * Checks to see if this term is valid at codeposition/code. * * @param position - *to check against valid term postions - * @return true iff this term is a hit at this position + *to check against valid term positions + * @return true if this term is a hit at this position */ public boolean checkPosition(int position) { // There would probably be a slight speed improvement if PositionSpans Modified: lucene/java/trunk/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedTermPositions.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedTermPositions.java?rev=894224r1=894223r2=894224view=diff == --- lucene/java/trunk/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedTermPositions.java (original) +++ lucene/java/trunk/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedTermPositions.java Mon Dec 28 18:52:19 2009 @@ -80,7 +80,7 @@ /** * Skips entries to the first beyond the current whose document number is - * greater than or equal to currentTermPositionIndextarget/currentTermPositionIndex. pReturns true iff there is such + * greater than or equal to currentTermPositionIndextarget/currentTermPositionIndex. pReturns true if there is such * an entry. pBehaves as if written: pre * boolean skipTo(int target) { * do { -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r890427 - /lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java
Oooh, nice... On Mon, Dec 14, 2009 at 1:26 PM, rm...@apache.org wrote: Author: rmuir Date: Mon Dec 14 18:26:26 2009 New Revision: 890427 URL: http://svn.apache.org/viewvc?rev=890427view=rev Log: LUCENE-2155: add assertion to check if something changes default locale behind our back when using LocalizedTestCase Modified: lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java Modified: lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java?rev=890427r1=890426r2=890427view=diff == --- lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java (original) +++ lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java Mon Dec 14 18:26:26 2009 @@ -73,6 +73,8 @@ @Override protected void tearDown() throws Exception { +assertEquals(default locale unexpectedly changed:, locale, Locale +.getDefault()); Locale.setDefault(defaultLocale); super.tearDown(); }
Re: svn commit: r890427 - /lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java
Thanks for letting me know, it's quite a relief to be able to trust the @Parameterized stuff. Just let me know if you need me to regenerate the patch whenever you want to apply it. Between now and then I'll find something else to do G... Erick On Mon, Dec 14, 2009 at 1:59 PM, Robert Muir rcm...@gmail.com wrote: yeah i am convinced this is not a problem with your junit 4 patch Erick... as Uwe ran into the same trouble I ran into with the existing LocalizedTestCase however, if you don't mind, I'd like to let it set with the junit 3 impl a little bit longer and see if we get more random-hard-to-reproduce failures. On Mon, Dec 14, 2009 at 1:46 PM, Erick Erickson erickerick...@gmail.comwrote: Oooh, nice... On Mon, Dec 14, 2009 at 1:26 PM, rm...@apache.org wrote: Author: rmuir Date: Mon Dec 14 18:26:26 2009 New Revision: 890427 URL: http://svn.apache.org/viewvc?rev=890427view=rev Log: LUCENE-2155: add assertion to check if something changes default locale behind our back when using LocalizedTestCase Modified: lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java Modified: lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java?rev=890427r1=890426r2=890427view=diff == --- lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java (original) +++ lucene/java/trunk/src/test/org/apache/lucene/util/LocalizedTestCase.java Mon Dec 14 18:26:26 2009 @@ -73,6 +73,8 @@ @Override protected void tearDown() throws Exception { +assertEquals(default locale unexpectedly changed:, locale, Locale +.getDefault()); Locale.setDefault(defaultLocale); super.tearDown(); } -- Robert Muir rcm...@gmail.com
Re: [jira] Updated: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
its possible the problems are not reproduceable because they are a crazy problem with these tests. Agree absolutely. I was just making sure we considered the *possibility* that the paramaterized version was showing an underlying Lucene problem rather than assuming the fault was with Junit4. Having spent a way too much of my programming life being absolutely sure I knew what part of the system was causing the failure only to find that the problem was waaay over *there* instead, I'm kinda sensitive that way G... I'll eagerly await your results... Erick On Sun, Dec 13, 2009 at 5:24 PM, Robert Muir rcm...@gmail.com wrote: Erick it might be a gremlin on my computer or my brain... also i think i was inadvertently using different JVM's for running ant test (sometimes java 5/64bit sometimes 6/32bit). this is because i was doing something with forrest and changed my JAVA_HOME in one shell window. so i'm going to run 100 ant clean tests with each JVM, logging to a file. if these work reliably then I think I will conclude I was doing something stupid before... (like forgot to ant clean or something like that) this computer is windows, so you are right it might have different locales than your mac. however, i think we should consider your last comment: its possible the problems are not reproduceable because they are a crazy problem with these tests. for example, i think we should be extra cautious and call Calendar.clear() on all our calendars before changing time values and then asserting expected results. I don't see any obvious problem though, just thinking if something based on the 'current time' was affecting the tests, then this might make it hard to reproduce. On Sat, Dec 12, 2009 at 9:26 PM, Erick Erickson erickerick...@gmail.comwrote: H, you can't get either patch to work reliably. On the other hand, I can't get either patch to fail. I ran the whole ant clean test thing half a dozen times. I'll make a script to loop all night tonight and we'll see. I also ran just the TestQueryParser around 700 times from Ant via a shell script. No problems. No problems in IntelliJ. Siiggghhh. Anybody else want to try applying either patch and see what happens? I'd hate to lose the capabilities of the Parameterized tests because of a gremlin that only exists on Robert's machine. I'd also hate to introduce cool new capabilities that started training us to ignore test failures. That's bad. Very bad. Robert: What kind of machine are you running on? I'm running on a Macbook Pro... As it stands, I'm not sure whether parameterized tests are the issue or whether the issue is Locale testing. Or whether Robert has some peculiar setup. Or, for that matter, whether I have some peculiar setup that makes it work by hiding an instability. It sure would be nice to figure out where the fragility is before relying on Parameterized tests... Robert: If you have the patience, could you try your patch out and capture the failure? I'm especially curious if your patch fails on the same language every time. Who knows? On your machine, this *could* be hitting an edge case, that's actually a flaw in the code somewhere rather than an artifact of the test framework. I don't even know if my machine is using all of the same Locale's as yours I'd have at figuring out what was going on, but I can't make it fail. It works on my machine doesn't leave me very many directions forward But I'm so glad that Robert is finding this nonsense *before* we get too much farther down this road rather than after I'll poke around on the internet and see if there's anything there that I can see. Erick On Sat, Dec 12, 2009 at 8:55 AM, Robert Muir (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Robert Muir updated LUCENE-2122: Assignee: (was: Robert Muir) i am unassigning in case someone else can figure this one out, at my wits end here :) perhaps its just something wierd about my environment or something Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
Uwe: Thanks, I'll remember that in the future On Sun, Dec 13, 2009 at 5:31 AM, Uwe Schindler u...@thetaphi.de wrote: Hi Erick, sadly, the eMail reply to JIRA issues does not work for mails sent to this mailing list (because the list overrides reply-to header so JIRA does not get the answer). If you answer only on the ML, we loss those comments in the issue. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -- *From:* Erick Erickson [mailto:erickerick...@gmail.com] *Sent:* Sunday, December 13, 2009 4:02 AM *To:* java-dev@lucene.apache.org *Subject:* Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase Robert: The -r4 patch runs for you and you want me to look at your patch compared to r4? Sure, I'll do that, but not til tomorrow, I do much better work when I'm not tired G. I confess I haven't looked at your patch beyond installing it to see if I could reproduce the failure (looks like our emails crossed). But it's *still* peculiar that it behaves differently between our two machines. OTOH, maybe your patch will fail on my machine sometime tonight, my 4 successes aren't very statistically significant after all.. Erick On Sat, Dec 12, 2009 at 9:14 PM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789837#action_12789837] Robert Muir commented on LUCENE-2122: - btw, I left 'ant clean test' running in a loop and just checked it with this patch, no problems. so perhaps its my own incompetence. Erick can you take a look? Do you see some obvious problem? Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
IndexWriter failure
I was running the whole ant-clean-test in a loop last night for LUCENE-2122 and had this error in IndexWriter occur once in 30+ runs. I now there has been some work on spurious failures here lately and thought I'd add this on the chance it'd help anyone tracking this issue. Didn't see a JIRA... I updated the trunk yesterday (12-Dec) afternoon sometime [junit] Testcase: testMaxBufferedDocsChange(org.apache.lucene.index.TestIndexWriterMergePolicy): FAILED [junit] maxMergeDocs=2147483647; numSegments=11; upperBound=10; mergeFactor=10 [junit] junit.framework.AssertionFailedError: maxMergeDocs=2147483647; numSegments=11; upperBound=10; mergeFactor=10 [junit] at org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:234) [junit] at org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:164) [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:212) [junit] FWIW Erick
Re: [jira] Updated: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
H, you can't get either patch to work reliably. On the other hand, I can't get either patch to fail. I ran the whole ant clean test thing half a dozen times. I'll make a script to loop all night tonight and we'll see. I also ran just the TestQueryParser around 700 times from Ant via a shell script. No problems. No problems in IntelliJ. Siiggghhh. Anybody else want to try applying either patch and see what happens? I'd hate to lose the capabilities of the Parameterized tests because of a gremlin that only exists on Robert's machine. I'd also hate to introduce cool new capabilities that started training us to ignore test failures. That's bad. Very bad. Robert: What kind of machine are you running on? I'm running on a Macbook Pro... As it stands, I'm not sure whether parameterized tests are the issue or whether the issue is Locale testing. Or whether Robert has some peculiar setup. Or, for that matter, whether I have some peculiar setup that makes it work by hiding an instability. It sure would be nice to figure out where the fragility is before relying on Parameterized tests... Robert: If you have the patience, could you try your patch out and capture the failure? I'm especially curious if your patch fails on the same language every time. Who knows? On your machine, this *could* be hitting an edge case, that's actually a flaw in the code somewhere rather than an artifact of the test framework. I don't even know if my machine is using all of the same Locale's as yours I'd have at figuring out what was going on, but I can't make it fail. It works on my machine doesn't leave me very many directions forward But I'm so glad that Robert is finding this nonsense *before* we get too much farther down this road rather than after I'll poke around on the internet and see if there's anything there that I can see. Erick On Sat, Dec 12, 2009 at 8:55 AM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Robert Muir updated LUCENE-2122: Assignee: (was: Robert Muir) i am unassigning in case someone else can figure this one out, at my wits end here :) perhaps its just something wierd about my environment or something Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
Robert: The -r4 patch runs for you and you want me to look at your patch compared to r4? Sure, I'll do that, but not til tomorrow, I do much better work when I'm not tired G. I confess I haven't looked at your patch beyond installing it to see if I could reproduce the failure (looks like our emails crossed). But it's *still* peculiar that it behaves differently between our two machines. OTOH, maybe your patch will fail on my machine sometime tonight, my 4 successes aren't very statistically significant after all.. Erick On Sat, Dec 12, 2009 at 9:14 PM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789837#action_12789837] Robert Muir commented on LUCENE-2122: - btw, I left 'ant clean test' running in a loop and just checked it with this patch, no problems. so perhaps its my own incompetence. Erick can you take a look? Do you see some obvious problem? Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
So I ran this test suite from Idea a dozen times or so and no problem. Then I ran it a couple of thousand times through Ant via a shell script. No problem. So I'm tending toward thinking it's an Eclipse issue, what do you think? Erick On Thu, Dec 10, 2009 at 4:23 PM, Erick Erickson erickerick...@gmail.comwrote: I'll give this a whirl tonight. The reason I was wondering what language is to insure that my machine *also* tests the offending locale. A bit of a nit, the flaw in the approach with LocalizedTestCase is that *every* test in the class is run against *all* locales.. To change this, as I understand it, we'd need to break the tests out into a separate class... Intermittent errors often smell like a race condition, so I'll be on the lookout for one. But I also wonder if you'd ever get this error running outside of Eclipse. I really, really, really hate ones like this. Let's say you have a script that runs 1,000 times flawlessly from the shell. What does that prove? nasty grin. But maybe if I relentlessly press the test button on that class it'll happen to me too FWIW Erick On Thu, Dec 10, 2009 at 3:30 PM, Robert Muir rcm...@gmail.com wrote: i just right clicked TestQueryParser and said 'run as junit test' i could not tell which locales failed, (just testing your original patch, no modifications) the way they are shown instead is like an array of 135 elements... [0]: testCJK[0] (0.000s) testSimple[0] (0.001s) ... [1]: testCJK[1] (0.000s) ... [135] testCJK[135] the only tests that failed were the localized methods like the date stuff, where its going to create an 'expected' localized string and then compare against that. it makes me suspect that somehow there is some race, and the default locale is actually changing as the test is running, or something crazy like this?! On Thu, Dec 10, 2009 at 3:23 PM, Erick Erickson erickerick...@gmail.comwrote: Yep, that sure makes me nervous too. I've never seen a failure in IntelliJ or from a shell window. How often do you need to run it to see an error? And what language is it using? And what test? I can try this in my IntelliJ setup and see if I can reproduce it. Note I'm running on a Macbook Pro... I wonder if a repeating script would show an intermittent error Erick On Thu, Dec 10, 2009 at 3:10 PM, Robert Muir (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1274#action_1274] Robert Muir commented on LUCENE-2122: - Hi Erick, I played with this patch some and (not intentionally trying) I would get random test failures for TestQueryParser under eclipse... its not really something I am able to repeat though. maybe some race condition (I do not know how eclipse executes parameterized tests) ? if it is a problem with my IDE that is one thing, just makes me a little nervous right now. trying to think what could cause this Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Robert Muir Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Robert Muir rcm...@gmail.com
Re: Lucene Analyzer that can handle C++ vs C#
This type of question is not appropriate on the developers list, this list is devoted to development. Please please post this kind of question on the user's list. As it happens, this very topic is being discussed under a thread Recover special terms from StandardTokenizer, that should give you some ideas. ERick On Fri, Dec 11, 2009 at 11:19 AM, maxSchlein m_schl...@hotmail.com wrote: Can someone please point me in the right direction. We are creating an application that needs to beable to search on C++ and get back doc's that have C++ in it. The StandardAnalyzer does not seem to index the +, so a search for C++ will bring back docs that contain, C++, C, C#, etc. The WhiteSpaceAnalyzer will index the +, but if we have the term C++. that is, if C++ is at the end of a sentence, it will index C++. so a search for C++ will not return the doc. I have heard of maybe a CustomAnalyzer; however, it seems like there would actually need to be a CustomFilter/CustomTokenizer, I looked at: - StandardAnalyzer.java - StandardFilter.java - StandardTokenizer.java - StandardTokenizerImpl.java - StandardTokenizerImpl.jflex I would guess that the StandardTokenizer is where the changes would need to be made to allow the + character, but I am unclear as to how. Any and all help is greatly appreciated. -- View this message in context: http://old.nabble.com/Lucene-Analyzer-that-can-handle-C%2B%2B-vs-C--tp26747079p26747079.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2133) [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
Mike: Which of these do you think this patch *should* address before committing? Just the last two? As many as Christian has energy for G? On Thu, Dec 10, 2009 at 12:24 PM, Michael McCandless (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788798#action_12788798] Michael McCandless commented on LUCENE-2133: This patch is a good step forward -- it associates the cache directly with IndexReader, where it belongs; it cleanly decouples cache from reader (vs the hack we have today with IndexReader.getFieldCacheKey), so that eg cloned readers can share the same cache; it also preserves back compat, which is quite a stunning accomplishment :) But... there are many more things I don't like about FieldCache, that I'm not sure (?) the patch addresses: * Uninversion to derive eg an int[] is horribly slow, compared to say loading the pre-encoded binary ints from disk, created during indexing. Ie, I think, if we are going to overhaul FieldCache API, we should somehow make LUCENE-1231 feasible. * There's no pluggability to customize where the int[] comes from for a given field -- most you can do is provide your own IntParser that the uninverter uses. EG the fact that the patch had to move FieldCacheRange/TermsFilter down, is strange -- these filters (and in general any future cache consumers) should live in oal.search, but simply pull from a different int[] source, somehow. * Error checking of single-value-per-field (for StringIndex) is dangerous, today -- it's intermittent, and, it's an unchecked exception. We should probably just remove the exception, or, maybe make it checked. Actually I'll go open a new issue for this. We should simply fix this. * Single-value-per-field limitation (though, that's a nice to have, future improvement) * Even accepting the single-value-per-field limitaiton, we should allow multiple values per field during uninversion, w/ customizable logic about which value is kept as the single one (there is an issue open for this I think). This really should be some sort of added extensibility to whatever class drives uninversion... * The terror of accidentally asking for the array at the top-level of Multi/DirReader. I think this shouldn't even be allowed, at least not easily, ie Dir/MultiReader.getIndexCache should throw UOE. If we really wanted to, we could provide sugar methods in maybe ReaderUtil to glom N int[]'s into a new int[]. But it should be named something scary :) Then we wouldn't need any insanity checking. * No control over caching policy (cannot evict things) * If we make field cache flexible enough, we could maybe fold norms deleted docs into it (would be a separate future issue to actually do so...). Some other questions about the patch: * Consumers of the cache API (the sort comparator, FieldCacheTerms/RangeFilter, and any other future users of the field cache) shouldn't have to move down into fields sub-package? * It's a little strange that the term vectors fields reader also got pulled into the cache? [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField - Key: LUCENE-2133 URL: https://issues.apache.org/jira/browse/LUCENE-2133 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.9.1, 3.0 Reporter: Christian Kohlschütter Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch Hi all, up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open The FieldCache is flawed because it is incorrect to assume that: 1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data. 2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information. 3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl. Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level. There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
Yep, that sure makes me nervous too. I've never seen a failure in IntelliJ or from a shell window. How often do you need to run it to see an error? And what language is it using? And what test? I can try this in my IntelliJ setup and see if I can reproduce it. Note I'm running on a Macbook Pro... I wonder if a repeating script would show an intermittent error Erick On Thu, Dec 10, 2009 at 3:10 PM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1274#action_1274] Robert Muir commented on LUCENE-2122: - Hi Erick, I played with this patch some and (not intentionally trying) I would get random test failures for TestQueryParser under eclipse... its not really something I am able to repeat though. maybe some race condition (I do not know how eclipse executes parameterized tests) ? if it is a problem with my IDE that is one thing, just makes me a little nervous right now. trying to think what could cause this Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Robert Muir Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
I'll give this a whirl tonight. The reason I was wondering what language is to insure that my machine *also* tests the offending locale. A bit of a nit, the flaw in the approach with LocalizedTestCase is that *every* test in the class is run against *all* locales.. To change this, as I understand it, we'd need to break the tests out into a separate class... Intermittent errors often smell like a race condition, so I'll be on the lookout for one. But I also wonder if you'd ever get this error running outside of Eclipse. I really, really, really hate ones like this. Let's say you have a script that runs 1,000 times flawlessly from the shell. What does that prove? nasty grin. But maybe if I relentlessly press the test button on that class it'll happen to me too FWIW Erick On Thu, Dec 10, 2009 at 3:30 PM, Robert Muir rcm...@gmail.com wrote: i just right clicked TestQueryParser and said 'run as junit test' i could not tell which locales failed, (just testing your original patch, no modifications) the way they are shown instead is like an array of 135 elements... [0]: testCJK[0] (0.000s) testSimple[0] (0.001s) ... [1]: testCJK[1] (0.000s) ... [135] testCJK[135] the only tests that failed were the localized methods like the date stuff, where its going to create an 'expected' localized string and then compare against that. it makes me suspect that somehow there is some race, and the default locale is actually changing as the test is running, or something crazy like this?! On Thu, Dec 10, 2009 at 3:23 PM, Erick Erickson erickerick...@gmail.comwrote: Yep, that sure makes me nervous too. I've never seen a failure in IntelliJ or from a shell window. How often do you need to run it to see an error? And what language is it using? And what test? I can try this in my IntelliJ setup and see if I can reproduce it. Note I'm running on a Macbook Pro... I wonder if a repeating script would show an intermittent error Erick On Thu, Dec 10, 2009 at 3:10 PM, Robert Muir (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1274#action_1274] Robert Muir commented on LUCENE-2122: - Hi Erick, I played with this patch some and (not intentionally trying) I would get random test failures for TestQueryParser under eclipse... its not really something I am able to repeat though. maybe some race condition (I do not know how eclipse executes parameterized tests) ? if it is a problem with my IDE that is one thing, just makes me a little nervous right now. trying to think what could cause this Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Robert Muir Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Robert Muir rcm...@gmail.com
Patch for LUCENE-2122 ready to go
Does someone with commit rights want to pick this up? I've incorporated the changes suggested by Robert (Thanks!) and think it's ready to go. Erick
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
Sh. I'll look at it again tonight On Wed, Dec 9, 2009 at 9:13 AM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788100#action_12788100] Robert Muir commented on LUCENE-2122: - Hi Erick, in the Date tools test I think you can delete the public static CollectionLocale[] data(), I think you might have accidentally included it? Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
[ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-2122: --- Attachment: LUCENE-2122-r4.patch OK, I plead advanced senility or some other excuse for the last patch. Robert: Thanks so much for looking this over, I have no clue what I was thinking with the TestDateTools. Or the other classes that derive from LocalizedTestCase. The @Parameterized and @RunWith only needed to be in LocalizedTestCase and all the inheriting classes just rely on the base class to collect the different locales. Anyway, this one should be much better Erick Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Robert Muir Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
It's embarrassing that I had to poke around for 1/2 hour to find *code that I had written recently*. siiiggghhh. Maybe this time it'll stick In LuceneTestCaseJ4, we added an @Rule-annotated class InterceptTestCaseEvents whose methods get called whenever an event happens, things like succeeded, failed, started, etc.. The failed method looks for a method in the failing class called reportAdditionalFailureInfo. So by adding something like the below to LocalizedTestCase you can print any information you have available whenever things fail. It gets printed in addition to the usual information Junit prints. Warning: I tested this *very* lightly, at least it worked in the one case I tried.. @Override public void reportAdditionalFailureInfo() { System.out.println(Failing locale is + _currentLocale.getDisplayName(_origDefault)); super.reportAdditionalFailureInfo(); // call to super.report. UNTESTED! and probably not necessary in this context. Left as an exercise for the reader G. } Currently this is only does extra stuff for failed cases, but it would be trivial to extend for start, end, succeeded whenever there's a need. Your second question seems quite do-able,just by putting the default locale in the list before getting into the loop as the first entry. I'm not sure removing the default language is worth the effort, so it gets run twice. But if you're writing the code, do whatever you want. Gotta get some sleep G... Erick On Wed, Dec 9, 2009 at 9:45 PM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788455#action_12788455] Robert Muir commented on LUCENE-2122: - thanks Erick, i will play around with the patch some, generally just double-check the locale stuff is doing what we want, looks like it will. i havent tested yet, but looking at the code i have a few questions (i can try to add these to the patch just curious what you think): 1. if a test fails under some locale, say th_TH, will junit 4 attempt to print this parameter out in some way so I know that it failed? If not do you know of a hack? 2. i am thinking about reordering the locale array so that it tests the default one first. if you are trying to do some test-driven dev it might be strange if the test fails under a different locale first. I think this one is obvious, I will play with it to see how it behaves now. Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Robert Muir Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122-r4.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
[ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-2122: --- Attachment: LUCENE-2122-r3.patch Made LocalizedTestCase abstract... Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122-r3.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
[ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-2122: --- Attachment: LUCENE-2122.patch All tests pass. This modifies all test classes (core and contrib) that derive from LocalizedTestCase. LocalizedTestCase now tests all test methods in all derived classes against all available Locales. If we want some of the tests to NOT run against all locales, we'd need to refactor them into their own test class Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Resolved: (LUCENE-2119) If you pass Integer.MAX_VALUE as 2nd param to search(Query, int) you hit unexpected NegativeArraySizeException
This may be a silly question, and I admit that I haven't looked a the code, but was there a good reason to +1 it in the first place or was that just paranoia to prevent off-by-one errors? If there *was* a valid reason, might it make sense to +1 min(nDocs, maxDoc())? Erick On Sun, Dec 6, 2009 at 6:43 AM, Michael McCandless (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Michael McCandless resolved LUCENE-2119. Resolution: Fixed Thanks Paul! If you pass Integer.MAX_VALUE as 2nd param to search(Query, int) you hit unexpected NegativeArraySizeException -- Key: LUCENE-2119 URL: https://issues.apache.org/jira/browse/LUCENE-2119 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2119.patch Note that this is a nonsense value to pass in, since our PQ impl allocates the array up front. It's because PQ takes 1+ this value (which wraps to -1), and attempts to allocate that. We should bounds check it, and drop PQ size by one in this case. Better, maybe: in IndexSearcher, if that n is ever maxDoc(), set it to maxDoc(). This trips users up fairly often because they assume our PQ doesn't statically pre-allocate (a reasonable assumption...). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (LUCENE-2119) If you pass Integer.MAX_VALUE as 2nd param to search(Query, int) you hit unexpected NegativeArraySizeException
Should have mentioned in my first message that all I was really after was prompting folks who actually know something about the code in question to avoid the mistake I've made, oh, several thousand times... There's no reason for that to be there, I'll just take it out G Erick On Sun, Dec 6, 2009 at 6:45 PM, Michael McCandless luc...@mikemccandless.com wrote: On Sun, Dec 6, 2009 at 5:51 PM, Uwe Schindler u...@thetaphi.de wrote: On Sun, Dec 06, 2009 at 05:31:53PM -0500, Erick Erickson wrote: This may be a silly question, and I admit that I haven't looked a the code, but was there a good reason to +1 it in the first place or was that just paranoia to prevent off-by-one errors? IIRC, this implementation of the priority queue algo leaves open slot 0 to simplify internal calculations. It was that way when I ported 1.4.3, and I doubt that's changed. Thats still the same. Because calculations in heaps are simplier when 1-based. Because of that heap[0] is unused. Thanks for raising this Erick... it's a good question. Technically, removing the +1 would be a bug if anyone ever inserted 2B items into the PQ, but I think this is exceptionally unlikely to occur in practice. If there *was* a valid reason, might it make sense to +1 min(nDocs, maxDoc())? I think the patch is fine. It's really only needed to provide a more accurate error message in the event somebody specifies that they want Integer.MAX_VALUE elements, not realizing that they will be allocated up front rather than lazily -- they'll get an OOME rather than a NegativeArraySizeException. The new patch is more intelligent, it will not allocate such a big queue as far as I have seen. It takes the numDocs() of index reader/searcher into account. Hmm actually it takes maxDoc() into account, but it should in fact use numDocs(). I'll fix. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
I just made a comment on how many times I've made the that looks unnecessary, I'll take it out mistake. Now I get to add one to that total. I'll attach a revised patch momentarily with this change. Thanks for pointing this out! Erick On Sun, Dec 6, 2009 at 8:00 PM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786749#action_12786749] Robert Muir commented on LUCENE-2122: - Hi Erick, I am a little nervous about the change to LocalizedTestCase.tearDown() here. I think we must restore the users default Locale, since its a JRE-system wide global thing and we are changing it on the fly here. this was stashed away here before: {code} /** * Before changing the default Locale, save the default Locale here so that it * can be restored. */ private final Locale defaultLocale = Locale.getDefault(); {code} and restored in tearDown()... otherwise strange things could happen, such as your IDE could go bonkers after running the tests! (but maybe I am missing something) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
[ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-2122: --- Attachment: LUCENE-2122-r2.patch Restoring original default Locale after test class has been run. Thanks Robert! Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
Hmmm, you're probably right. There's no earthly reason for a test writer to create an instance of LocalizedTestCase, it has no use except as a superclass by its nature even though it has no abstract methods. So making it abstract will server to flag that fact to anyone who tries to instantiate it. I'll change this too, hold off on applying this patch, I'll wait for a bit to gather more comments and put them all together in an r3 version. Erick On Sun, Dec 6, 2009 at 9:02 PM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786757#action_12786757] Robert Muir commented on LUCENE-2122: - Erick do you think LocalizedTestCase should be abstract? Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122-r2.patch, LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2122) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase
Well, under any circumstances, the line Locale.setDefault(Locale.getDefault()); was just plain silly. At the cost of setting the default locale exactly once per test *class* (I used @BeforeClass/@AfterClass), I'd far rather err on the side of paranoia than cause someone to spend *hours* figuring it out... On Sun, Dec 6, 2009 at 8:41 PM, Robert Muir rcm...@gmail.com wrote: Erick, btw I may not be right about this... certainly if you are invoking each test in its own JVM it should be no problem... its just some paranoia. also this same changing of JRE-system wide variable would prevent these tests from being parallelized in the same jvm, in case that matters... (they should run in their own jvm sequentially) LocalizedTestCase is nasty, I admit, but it works and prevents hours of changing variables and running ant test under different locales... just one of those things thanks for tackling this one On Sun, Dec 6, 2009 at 8:30 PM, Erick Erickson erickerick...@gmail.comwrote: I just made a comment on how many times I've made the that looks unnecessary, I'll take it out mistake. Now I get to add one to that total. I'll attach a revised patch momentarily with this change. Thanks for pointing this out! Erick On Sun, Dec 6, 2009 at 8:00 PM, Robert Muir (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786749#action_12786749] Robert Muir commented on LUCENE-2122: - Hi Erick, I am a little nervous about the change to LocalizedTestCase.tearDown() here. I think we must restore the users default Locale, since its a JRE-system wide global thing and we are changing it on the fly here. this was stashed away here before: {code} /** * Before changing the default Locale, save the default Locale here so that it * can be restored. */ private final Locale defaultLocale = Locale.getDefault(); {code} and restored in tearDown()... otherwise strange things could happen, such as your IDE could go bonkers after running the tests! (but maybe I am missing something) Use JUnit4 capabilites for more thorough Locale testing for classes deriving from LocalizedTestCase --- Key: LUCENE-2122 URL: https://issues.apache.org/jira/browse/LUCENE-2122 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: LUCENE-2122.patch Use the @Parameterized capabilities of Junit4 to allow more extensive testing of Locales. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Robert Muir rcm...@gmail.com
[jira] Assigned: (LUCENE-2096) Investigate parallelizing Ant junit tests
[ https://issues.apache.org/jira/browse/LUCENE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned LUCENE-2096: -- Assignee: (was: Erick Erickson) Maybe for later Investigate parallelizing Ant junit tests - Key: LUCENE-2096 URL: https://issues.apache.org/jira/browse/LUCENE-2096 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Erick Erickson Priority: Minor Ant Contrib has a ForEach construct that may speed up running all of the Junit tests by parallelizing them with a configurable number of threads. I envision this in several stages. First, see if ForEach works for us with hard-coded lists, distribute this for testing then make the changes for real. I intend to hard-code the list for the first pass, ordered by the time they take. This won't do for check-in, but will give us a fast proof-of-concept. This approach will be most useful for multi-core machines. In particular, we need to see whether the parallel tasks are isolated enough from each other to prevent mutual interference. All this assumes the fragmentary reference I found is still available... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-2037: --- Attachment: LUCENE-2037.patch Had enough time this morning to reconcile this with Kay Kay's changes, All tests pass. Junit 3.X no longer necessary, running with Junit 4.7 jar runs junit 3 style tests as well as annotated Junit4 style tests. It's preferable (but not necessary) to import from org.junit rather than junit.framework. Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037_revised_2.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (LUCENE-2037) Allow Junit4 tests in our environment.
Sure, but it won't be until late Saturday at the earliest, more likely Sunday. Got a busy Fri/Sat Erick On Fri, Dec 4, 2009 at 3:34 PM, Michael McCandless luc...@mikemccandless.com wrote: Thanks Kay Kay! Erick can you have a look / iterate? Thanks. Mike On Fri, Dec 4, 2009 at 3:30 PM, Kay Kay kaykay.uni...@gmail.com wrote: Erick / Mike - With 2065 commited onto trunk now - I created another patch for 2037 and attached in the ticket. 3 classes remain pending though due to conflicts , that I had listed with the patch. But we can probably revisit them subsequently. Please review them to serve as a starting point for the same. Erick Erickson wrote: Mike: I should be able to create a new 2037 patch pretty easily if you want to apply 2065 first. Let me know Erick On Thu, Dec 3, 2009 at 9:05 PM, Kay Kay kaykay.uni...@gmail.com mailto:kaykay.uni...@gmail.com wrote: Mike - I have attached another patch to LUCENE-2065 , in sync with the trunk now. Erick Erickson wrote: That's up to Mike, whichever way he finds easiest, I'll deal. Erick On Thu, Dec 3, 2009 at 8:43 PM, Kay Kay kaykay.uni...@gmail.com mailto:kaykay.uni...@gmail.com mailto:kaykay.uni...@gmail.com mailto:kaykay.uni...@gmail.com wrote: I created Lucene-2065 while working on 1257 , the original generics related ticket , and since we were running out of time for 3.0 , I guess we could not get src/test converted in. In any case , if you were comitting this one (2037) to trunk , may be I can wait before creating the patch again. Erick Erickson wrote: I didn't realize 2065 had already been down this path, thought you were volunteering to change all the code starting from scratch. Your approach sounds like a fine plan. Note that I'm not entirely sure that I cleaned up *everything*, but we need to get to a known state before tackling the rest, so I'll wait for these two patches to be applied before looking back at it... Not to mention the Localized test thing. Erick On Thu, Dec 3, 2009 at 5:57 PM, Michael McCandless luc...@mikemccandless.com mailto:luc...@mikemccandless.com mailto:luc...@mikemccandless.com mailto:luc...@mikemccandless.com mailto:luc...@mikemccandless.com mailto:luc...@mikemccandless.com mailto:luc...@mikemccandless.com mailto:luc...@mikemccandless.com wrote: On Thu, Dec 3, 2009 at 5:48 PM, Erick Erickson erickerick...@gmail.com mailto:erickerick...@gmail.com mailto:erickerick...@gmail.com mailto:erickerick...@gmail.com mailto:erickerick...@gmail.com mailto:erickerick...@gmail.com mailto:erickerick...@gmail.com mailto:erickerick...@gmail.com wrote: I generified the searches/function files in patch 2037. I don't really think there's a conflict, just commit my patch and have at generifying the rest. OK so then we'll start with 2037, then take 2065's patch, hopefully updated to current trunk, but minus search/function sources. I know, I know. I did two things at once. So sue me. Honest, I'll try not to do this very often G... In fact I prefer this. I used to think we shouldn't do that but I flip-flopped and now think in practice you just have to clean code while you're there, otherwise it won't get cleaned. Mike: You really want to to the generify the whole shootin' match or do you want to partition them? I'll be happy to take a set of them. Or would that make things too complicated to apply? 2065 already has done alot here (adding generics to the tests)... I think we start from that and take it from there? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org mailto:java-dev-unsubscr...@lucene.apache.org mailto:java-dev-unsubscr...@lucene.apache.org
Re: [jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
I generified the searches/function files in patch 2037. I don't really think there's a conflict, just commit my patch and have at generifying the rest. I know, I know. I did two things at once. So sue me. Honest, I'll try not to do this very often G... Mike: You really want to to the generify the whole shootin' match or do you want to partition them? I'll be happy to take a set of them. Or would that make things too complicated to apply? Erick On Thu, Dec 3, 2009 at 3:15 PM, Michael McCandless (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785479#action_12785479] Michael McCandless commented on LUCENE-2037: bq. but there is another patch - LUCENE-2065 to port the existing tests to Java 5 generics Ahh thanks for the reminder -- I can take this one as well, but, there will be conflicts b/w the two patches, I think. Should we do the generics first (simpler change, but touches many files), and then the junit4 upgrade? Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (LUCENE-2037) Allow Junit4 tests in our environment.
I didn't realize 2065 had already been down this path, thought you were volunteering to change all the code starting from scratch. Your approach sounds like a fine plan. Note that I'm not entirely sure that I cleaned up *everything*, but we need to get to a known state before tackling the rest, so I'll wait for these two patches to be applied before looking back at it... Not to mention the Localized test thing. Erick On Thu, Dec 3, 2009 at 5:57 PM, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Dec 3, 2009 at 5:48 PM, Erick Erickson erickerick...@gmail.com wrote: I generified the searches/function files in patch 2037. I don't really think there's a conflict, just commit my patch and have at generifying the rest. OK so then we'll start with 2037, then take 2065's patch, hopefully updated to current trunk, but minus search/function sources. I know, I know. I did two things at once. So sue me. Honest, I'll try not to do this very often G... In fact I prefer this. I used to think we shouldn't do that but I flip-flopped and now think in practice you just have to clean code while you're there, otherwise it won't get cleaned. Mike: You really want to to the generify the whole shootin' match or do you want to partition them? I'll be happy to take a set of them. Or would that make things too complicated to apply? 2065 already has done alot here (adding generics to the tests)... I think we start from that and take it from there? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (LUCENE-2037) Allow Junit4 tests in our environment.
That's up to Mike, whichever way he finds easiest, I'll deal. Erick On Thu, Dec 3, 2009 at 8:43 PM, Kay Kay kaykay.uni...@gmail.com wrote: I created Lucene-2065 while working on 1257 , the original generics related ticket , and since we were running out of time for 3.0 , I guess we could not get src/test converted in. In any case , if you were comitting this one (2037) to trunk , may be I can wait before creating the patch again. Erick Erickson wrote: I didn't realize 2065 had already been down this path, thought you were volunteering to change all the code starting from scratch. Your approach sounds like a fine plan. Note that I'm not entirely sure that I cleaned up *everything*, but we need to get to a known state before tackling the rest, so I'll wait for these two patches to be applied before looking back at it... Not to mention the Localized test thing. Erick On Thu, Dec 3, 2009 at 5:57 PM, Michael McCandless luc...@mikemccandless.com mailto:luc...@mikemccandless.com wrote: On Thu, Dec 3, 2009 at 5:48 PM, Erick Erickson erickerick...@gmail.com mailto:erickerick...@gmail.com wrote: I generified the searches/function files in patch 2037. I don't really think there's a conflict, just commit my patch and have at generifying the rest. OK so then we'll start with 2037, then take 2065's patch, hopefully updated to current trunk, but minus search/function sources. I know, I know. I did two things at once. So sue me. Honest, I'll try not to do this very often G... In fact I prefer this. I used to think we shouldn't do that but I flip-flopped and now think in practice you just have to clean code while you're there, otherwise it won't get cleaned. Mike: You really want to to the generify the whole shootin' match or do you want to partition them? I'll be happy to take a set of them. Or would that make things too complicated to apply? 2065 already has done alot here (adding generics to the tests)... I think we start from that and take it from there? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org mailto:java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org mailto:java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (LUCENE-2037) Allow Junit4 tests in our environment.
Mike: I should be able to create a new 2037 patch pretty easily if you want to apply 2065 first. Let me know Erick On Thu, Dec 3, 2009 at 9:05 PM, Kay Kay kaykay.uni...@gmail.com wrote: Mike - I have attached another patch to LUCENE-2065 , in sync with the trunk now. Erick Erickson wrote: That's up to Mike, whichever way he finds easiest, I'll deal. Erick On Thu, Dec 3, 2009 at 8:43 PM, Kay Kay kaykay.uni...@gmail.com mailto: kaykay.uni...@gmail.com wrote: I created Lucene-2065 while working on 1257 , the original generics related ticket , and since we were running out of time for 3.0 , I guess we could not get src/test converted in. In any case , if you were comitting this one (2037) to trunk , may be I can wait before creating the patch again. Erick Erickson wrote: I didn't realize 2065 had already been down this path, thought you were volunteering to change all the code starting from scratch. Your approach sounds like a fine plan. Note that I'm not entirely sure that I cleaned up *everything*, but we need to get to a known state before tackling the rest, so I'll wait for these two patches to be applied before looking back at it... Not to mention the Localized test thing. Erick On Thu, Dec 3, 2009 at 5:57 PM, Michael McCandless luc...@mikemccandless.com mailto:luc...@mikemccandless.com mailto:luc...@mikemccandless.com mailto:luc...@mikemccandless.com wrote: On Thu, Dec 3, 2009 at 5:48 PM, Erick Erickson erickerick...@gmail.com mailto:erickerick...@gmail.com mailto:erickerick...@gmail.com mailto:erickerick...@gmail.com wrote: I generified the searches/function files in patch 2037. I don't really think there's a conflict, just commit my patch and have at generifying the rest. OK so then we'll start with 2037, then take 2065's patch, hopefully updated to current trunk, but minus search/function sources. I know, I know. I did two things at once. So sue me. Honest, I'll try not to do this very often G... In fact I prefer this. I used to think we shouldn't do that but I flip-flopped and now think in practice you just have to clean code while you're there, otherwise it won't get cleaned. Mike: You really want to to the generify the whole shootin' match or do you want to partition them? I'll be happy to take a set of them. Or would that make things too complicated to apply? 2065 already has done alot here (adding generics to the tests)... I think we start from that and take it from there? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org mailto:java-dev-unsubscr...@lucene.apache.org mailto:java-dev-unsubscr...@lucene.apache.org mailto:java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org mailto:java-dev-h...@lucene.apache.org mailto:java-dev-h...@lucene.apache.org mailto:java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org mailto:java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org mailto:java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
LUCENE-2037 (Junit4 capabilities)
Is anyone thinking about committing this patch? And/or what do I need to do/should have done to indicate it's ready for review? Poor lonely patch, sitting out there all alone and neglected G... Erick
[jira] Commented: (LUCENE-2096) Investigate parallelizing Ant junit tests
[ https://issues.apache.org/jira/browse/LUCENE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783436#action_12783436 ] Erick Erickson commented on LUCENE-2096: Parallelizing tests is proving trickier than I'd hoped. Part of the problem is my not-wonderful ant skills... But what I've found so far with trying to use ForEach is that stuff gets in the way. In particular, the sequential tag in the test-macro body I'm pretty sure defeats any parallelizing attempts by ForEach. Taking it out isn't straightforward. In some of my experiments, I got tests to fire off in parallel, but then started running into wonky errors that were so strange now I can't remember them, but some having to do with what looked like file contention for some temporary test files. Googling around I think I remember posts by Jason Ruthgren trying to so something similar in SOLR (?). Jason: if I'm remembering right did you find any joy? Then we'd have to rework how success and failure are handled because there's contention for that file as well. Now I'm wondering if the scary python script gets us more bang for the buck. I wrote a Groovy script the probably is a near-cousin for experiments and I'm wondering what would happen if we wrote a special testcase-type target that did NOT depend upon compile-test or, really, much of anything else and counted on the user to make sure to build the system first before using whatever script wecame up with. We don't really lose functionality by recursively looking for Test*.java files because that's what's done internally in the build files anyway. So doing that outside or inside the ant files doesn't seem like a loss. I'm putting this in the JIRA issue to preserve it for posterity. Meanwhile, I'll appeal to Ant gurus if they want to try whacking the Ant build files, and see what the script notion brings... Investigate parallelizing Ant junit tests - Key: LUCENE-2096 URL: https://issues.apache.org/jira/browse/LUCENE-2096 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Ant Contrib has a ForEach construct that may speed up running all of the Junit tests by parallelizing them with a configurable number of threads. I envision this in several stages. First, see if ForEach works for us with hard-coded lists, distribute this for testing then make the changes for real. I intend to hard-code the list for the first pass, ordered by the time they take. This won't do for check-in, but will give us a fast proof-of-concept. This approach will be most useful for multi-core machines. In particular, we need to see whether the parallel tasks are isolated enough from each other to prevent mutual interference. All this assumes the fragmentary reference I found is still available... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-2037: --- Attachment: LUCENE-2037.patch See JIRA comments Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783442#action_12783442 ] Erick Erickson commented on LUCENE-2037: Darn it! I'll get the comments right sometime and not have to retype them after making an attachment Anyway, this patch allows us to use Junit4 constructs as well as Junit3 constructs. It includes a sibling class to LuceneTestCase called LuceneTestCaseJ4 that provides the functionality we used to get from LuceneTestCase. When creating Junit4-style tests, preferentially import from org.junit rather than from junit.framework. Junit-3.8.2.jar may (should?) be removed from the distro, all tests run just fine under Junit-4.7,jar, which is attached to this issue. I wrote a little script that compares the results of running the tests and we run exactly the same number of TestSuites and each runs exactly the same number of tests, so I'm pretty confident about this one. I may be wrong, but I'm not uncertain. Single data-points aren't worth much, but on my Macbook Pro, running under Junit4 took about a minute longer than Junit3 (about 23 1/2 minutes). Which could have been the result of my Time Machine running for all I know All the tests in test...search.function have been converted to use LuceneTestCaseJ4 as an exemplar. I've deprecated LuceneTestCase to prompt people. When you derive from LuceneTestCaseJ4, you *must* use the @Before, @After and @Test annotations to get the functionality you expect, as must *all* subclasses. So one gotcha people will surely run across is deriving form J4 and failing to put @Test Converting all the tests was my way of working through the derivation issues. I don't particularly see the value in doing a massive conversion just for the heck of it. Unless someone has a real urge. More along the lines of I'm in this test anyway, lets upgrade it and add new ones. What about new tests? Should we encourage new patches to use Junit4 rather than Junit3? If so, how? I've noticed the convention of putting underscores in front of some tests to keep them from running. The Junit4 convention is the @Ignore annotation, which will cause the @Ignored tests to be reported (something like 1300 successful, 0 failures, 23 ignored), which is a nice way to keep these from getting lost in the shuffle. When this gets applied, I can put up the patch for LocalizedTestCase and we can give that a whirl Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (LUCENE-1844) Speed up junit tests
But then I got to thinking. I admit I've only scratched the surface of the JUnit4 parallelization stuff. That said, it seems like the real benefit comes from making use of multiple cores, we don't get huge speedups just from running multiple threads at once on a single core. Which makes sense if you're not doing much in the way of I/O. This notion was inspired by the scary Python script comment. So what if we use Ant ForEach construct instead? Yet again this is a fuzzy idea I'm throwing out without much to back it up. Mostly I'm wondering if anyone's thought about it before or can shoot it down before it takes wing. Or if it is worth exploring. Assuming we structure our test directories so there are only directories at the root of the test area, could we persuade Ant to fire off the tests N directories at a time in parallel? N would default to 1 but could be passed in to the task, something like -DmaxThreads=4. ForEach actually has a maxThreads parameter. In fact, we wouldn't even need to have only directories at the test root, but the individual test files at the root would probably be inefficiently run. I suspect that keeping the test directories in balance would be much less work that trying to parallelize using JUnit4, and be much less fraught with gremlins. This assumes we get sufficient isolation by Ant running separate threads, about which I have absolutely NO information. Like I said, mostly I'm wondering if anybody's gone down this path before and has wisdom to offer. Which *still* doesn't mean we shouldn't do whatever we can to speed up individual tests, but looking that the timings there's no obvious low-hanging fruit I wonder if we could somehow run the various directories in time order, longest-to-shortest in the hope that all the threads would finish up close enough to the same time. I haven't thought about *how* to make this happen yet though Anyway, I'll be happy to pursue this if y'all think it has merit, let me know and I'll open a JIRA and take it on. For the benefit of those aforementioned *real* people with *real* machines, who I'll rely upon to help test this notion Is the poor-mans version of this on a dual-core machine just running test-core and test-contrib in two separate windows? Best Erick On Thu, Nov 26, 2009 at 10:38 AM, Erick Erickson erickerick...@gmail.comwrote: Despite my long rambling, I agree that speeding things up is worthwhile. Just not a huge deal for some of us poor peons who are on dinky little 2-core machines and feel inadequate even *talking* to people who have *real* machines G... Time to go get ready to eat Turkey Erick On Thu, Nov 26, 2009 at 9:02 AM, Mark Miller markrmil...@gmail.comwrote: right - as soon as you have to start running the tests often enough, any decent savings turns into less waiting and more work. Waiting for tests to run is time that could be better spent elsewhere. And many of us runthe tests *a lot* considering how long they take. And we will only keep adding more and will continue to do so. Also, many of us *are* on multicore and should be able to benifit from it. I don't dev on anything less than 4 cores these days. It's a life changer :) and cheap currently. I'd like 8. - Mark http://www.lucidimagination.com (mobile) On Nov 26, 2009, at 5:24 AM, Michael McCandless luc...@mikemccandless.com wrote: I still think there's value to faster tests, even if they don't become so fast as to enable fully interactive testing. Plus, this is an ongoing goal with time, not a one-time event. As we create tests we should generally try to maximize coverage and minimize CPU cost, as long as the effort is smallish. Mike On Wed, Nov 25, 2009 at 9:32 PM, Erick Erickson erickerick...@gmail.com wrote: I posted a rather long diatribe outlining why I think speed-ups are a false goal for Lucene. Briefly, I'm convinced that as long as the tests are run when Hudson builds Lucene, 99% of the value of unit tests is realized. I suppose this implies that the hard-core committers agree that as long as failed tests are caught/corrected within a day things are fine. Although coming from a background where unit tests are not always required, my viewpoint may be suspect G. er...@nottobeconfusedwithhatcher.com On Wed, Nov 25, 2009 at 8:43 PM, Michael McCandless (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782716#action_12782716 ] Michael McCandless commented on LUCENE-1844: Will we also speed up back-compat tests? Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments
Re: (LUCENE-1844) Speed up junit tests
Also, will ant's ForEach take a set of say 30 things to work on, and take the # threads to use, and just pull from that queue of 30, in order? That's the implication I took from here: http://ant-contrib.sourceforge.net/tasks/tasks/index.html Ignorance is bliss, I didn't find the ForEach by looking at Ant documentation, but by googling ant parallel. Turns out this is in Contrib. I don't even know if it's current. Tell ya' what. I'll take a quick whack at it. I'm a believer in prototyping if at all possible. So I'll create a really stupid implementation of this with a hard-coded list of tests to run and see what happens. If it works for me, I'll pass it along to whoever wants to give it a spin and we'll get a clue whether it provides enough of an improvement to pursue seriously. I'll open a JIRA since at least Mike and I seem to be interested Erick On Fri, Nov 27, 2009 at 1:27 PM, Michael McCandless luc...@mikemccandless.com wrote: On Fri, Nov 27, 2009 at 10:52 AM, Erick Erickson erickerick...@gmail.com wrote: But then I got to thinking. I admit I've only scratched the surface of the JUnit4 parallelization stuff. That said, it seems like the real benefit comes from making use of multiple cores, we don't get huge speedups just from running multiple threads at once on a single core. Which makes sense if you're not doing much in the way of I/O. Right, it's the multi-core machines that gain the most from this. This notion was inspired by the scary Python script comment. So what if we use Ant ForEach construct instead? Yet again this is a fuzzy idea I'm throwing out without much to back it up. Mostly I'm wondering if anyone's thought about it before or can shoot it down before it takes wing. Or if it is worth exploring. Assuming we structure our test directories so there are only directories at the root of the test area, could we persuade Ant to fire off the tests N directories at a time in parallel? N would default to 1 but could be passed in to the task, something like -DmaxThreads=4. ForEach actually has a maxThreads parameter. In fact, we wouldn't even need to have only directories at the test root, but the individual test files at the root would probably be inefficiently run. I suspect that keeping the test directories in balance would be much less work that trying to parallelize using JUnit4, and be much less fraught with gremlins. This assumes we get sufficient isolation by Ant running separate threads, about which I have absolutely NO information. Like I said, mostly I'm wondering if anybody's gone down this path before and has wisdom to offer. I think this rough idea is a good approach, though I don't know much about ant's ForEach. One thing the scary Python script does is divide up index search packages into 2 parts (a and b), by breaking up the tests according to 1st letter. We might be able to take a similar approach, so that we're not forced to unnaturally separate tests into subdirs? The entire index or search package was too slow to run otherwise (ie, I needed to throw concurrency at it). Which *still* doesn't mean we shouldn't do whatever we can to speed up individual tests, but looking that the timings there's no obvious low-hanging fruit Yup. It's definitely an ongoing thing too... I wonder if we could somehow run the various directories in time order, longest-to-shortest in the hope that all the threads would finish up close enough to the same time. I haven't thought about *how* to make this happen yet though This is very important -- I do the same thing in the python script. Also, will ant's ForEach take a set of say 30 things to work on, and take the # threads to use, and just pull from that queue of 30, in order? Anyway, I'll be happy to pursue this if y'all think it has merit, let me know and I'll open a JIRA and take it on. For the benefit of those aforementioned *real* people with *real* machines, who I'll rely upon to help test this notion Is the poor-mans version of this on a dual-core machine just running test-core and test-contrib in two separate windows? I think you could, except, I think they share sub-tasks (eg, compile-core) so the two will sometimes stomp on each other. The scary python script first uses a single thread to compile everything, then runs N threads pulling from the queue. BUT: I apply a temporary patch to the ant build files, so that the N threads do not try to, eg, compile-core or jar-core, separately. Also one thing I'd love to try is NOT forking the JVM for each test (fork=no in the junit task). I wonder how much time that'd buy... Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2096) Investigate parallelizing Ant junit tests
Investigate parallelizing Ant junit tests - Key: LUCENE-2096 URL: https://issues.apache.org/jira/browse/LUCENE-2096 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Ant Contrib has a ForEach construct that may speed up running all of the Junit tests by parallelizing them with a configurable number of threads. I envision this in several stages. First, see if ForEach works for us with hard-coded lists, distribute this for testing then make the changes for real. I intend to hard-code the list for the first pass, ordered by the time they take. This won't do for check-in, but will give us a fast proof-of-concept. This approach will be most useful for multi-core machines. In particular, we need to see whether the parallel tasks are isolated enough from each other to prevent mutual interference. All this assumes the fragmentary reference I found is still available... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1844) Speed up junit tests
[ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-1844: --- Attachment: LUCENE-1844-Junit3.patch Speeds up TestBooleanMinShouldMatch and TestCustomScoreQuery without using JUnit4 Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png, LUCENE-1844-Junit3.patch, LUCENE-1844.patch As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1844) Speed up junit tests
[ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782915#action_12782915 ] Erick Erickson commented on LUCENE-1844: OK, fire when ready Gridley. Pretty soon I'll understand when to comment and how to keep from multiple comments This patch does NOT use the Java5 features like generics etc. I've done that work and it'll be included in the TestCustomScoreQuery changes for JUnit4. Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png, LUCENE-1844-Junit3.patch, LUCENE-1844.patch As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782916#action_12782916 ] Erick Erickson commented on LUCENE-2037: Hold off on this patch until I get a chance to submit a new one, we're straightening out LUCENE-1844 interdependencies between patches. Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (LUCENE-1844) Speed up junit tests
Despite my long rambling, I agree that speeding things up is worthwhile. Just not a huge deal for some of us poor peons who are on dinky little 2-core machines and feel inadequate even *talking* to people who have *real* machines G... Time to go get ready to eat Turkey Erick On Thu, Nov 26, 2009 at 9:02 AM, Mark Miller markrmil...@gmail.com wrote: right - as soon as you have to start running the tests often enough, any decent savings turns into less waiting and more work. Waiting for tests to run is time that could be better spent elsewhere. And many of us runthe tests *a lot* considering how long they take. And we will only keep adding more and will continue to do so. Also, many of us *are* on multicore and should be able to benifit from it. I don't dev on anything less than 4 cores these days. It's a life changer :) and cheap currently. I'd like 8. - Mark http://www.lucidimagination.com (mobile) On Nov 26, 2009, at 5:24 AM, Michael McCandless luc...@mikemccandless.com wrote: I still think there's value to faster tests, even if they don't become so fast as to enable fully interactive testing. Plus, this is an ongoing goal with time, not a one-time event. As we create tests we should generally try to maximize coverage and minimize CPU cost, as long as the effort is smallish. Mike On Wed, Nov 25, 2009 at 9:32 PM, Erick Erickson erickerick...@gmail.com wrote: I posted a rather long diatribe outlining why I think speed-ups are a false goal for Lucene. Briefly, I'm convinced that as long as the tests are run when Hudson builds Lucene, 99% of the value of unit tests is realized. I suppose this implies that the hard-core committers agree that as long as failed tests are caught/corrected within a day things are fine. Although coming from a background where unit tests are not always required, my viewpoint may be suspect G. er...@nottobeconfusedwithhatcher.com On Wed, Nov 25, 2009 at 8:43 PM, Michael McCandless (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782716#action_12782716 ] Michael McCandless commented on LUCENE-1844: Will we also speed up back-compat tests? Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png, LUCENE-1844.patch As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org I posted a rather long diatribe outlining why I think speed-ups are a false goal for Lucene. Briefly, I'm convinced that as long as the tests are run when Hudson builds Lucene, 99% of the value of unit tests is realized. I suppose this implies that the hard-core committers agree that as long as failed tests are caught/corrected within a day things are fine. Although coming from a background where unit tests are not always required, my viewpoint may be suspect G. er...@nottobeconfusedwithhatcher.com On Wed, Nov 25, 2009 at 8:43 PM, Michael McCandless (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782716#action_12782716 ] Michael McCandless commented on LUCENE-1844: Will we also speed up back-compat tests? Speed up junit tests
Re: Jira emails via Gmail
Agreed, annoying. Haven't found any solution either. Erick On Wed, Nov 25, 2009 at 7:51 AM, Uwe Schindler u...@thetaphi.de wrote: I would like to have a link to the patch/file/... in the mail about an update to the attached files. This is also annoying. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Wednesday, November 25, 2009 1:38 PM To: java-dev@lucene.apache.org Subject: Re: Jira emails via Gmail I would be very interested in a solution too. kind of annoying... simon On Wed, Nov 25, 2009 at 11:34 AM, Michael McCandless luc...@mikemccandless.com wrote: Here's a Jira issue (on Jira!) about the problem: http://jira.atlassian.com/browse/JRA-12640 But doesn't point to a workaround... Mike On Wed, Nov 25, 2009 at 5:20 AM, Michael McCandless luc...@mikemccandless.com wrote: Sort of off topic, but I wanted to see if anyone else is using Gmail's web UI and has solved this... When an issue is updated, Jira sends out an email... but Gmail doesn't group all such emails together... it groups them into separate groups (for updated, file attached, edited, etc.), which I'm now getting very tired of... Has anyone found a solution for this? I could swear I've seen a Python script in that past that logs in via IMAP and does something to solve this, but I can't find it right now. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1844) Speed up junit tests
They're ready to go, but at Uwe's suggestion, I've been waiting for 3.0 to get settled before prompting someone to apply this patch. I was going to generate a new patch for this and for 2037 (junit4 tests) just to make sure they were easy to apply. But if you're willing, the patches are already attached to the JIRA issues. Do note that the decision in MinBooleanShouldMatch to stop checking the query after 100 rather than checking all 1,000 is included in the patch Do you want to apply the patches or should I regenerate? It's no big deal to regenerate them and I'll have a better feel for reconciling any conflicts. I don't know whether there even *are* any conflicts, but just in case For my info, though, if I have a more recent patch that *replaces* an earlier patch, especially one that hasn't yet been applied, is it preferred to delete the earlier patch when providing a new one? I'm not pleased with the Junit4 documentation, most of what I've been able to glean has come from brave souls blogging. Does anyone have a gold mine or is it as hit-or-miss as I think? There are hints of parallelization capabilities in Junit4, but I'm having a hard time finding anything in much depth. The Junit website is pathetic, I can't even find 4.7 javadocs, it keeps giving me 4.5, as evidenced by no @Rule docs or @Intercept. And no version information in the javadocs. Or I'm completely missing the boat I was thinking about getting the entire project over the weekend and generating my own if I have the time Erick On Wed, Nov 25, 2009 at 11:49 AM, Michael McCandless (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782497#action_12782497] Michael McCandless commented on LUCENE-1844: Is this ready to go in? I'd really love to see unit tests run faster :) Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png, LUCENE-1844.patch As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1844) Speed up junit tests
Hmmm, the patches that I supplied for Junit4 *require* 4.7 anyway, which I included in the patch... Is this a problem? Or just a document problem? Erick On Wed, Nov 25, 2009 at 1:14 PM, Mark Miller markrmil...@gmail.com wrote: junit 4 parallelization is still in its infancy. I think the docs for it are just in the changes file that it was first released with. That version had severe bugs that made it almost unusable - I think thats mostly fixed in a newer release. There is also a much better impl of one of the key classes (I think they call it computer) written by someone else that will eventually go into the code base I think (written by the guy(s) that I think found/fixed the initial buggy-ness) - essentially, I think its still unbaked. Here are some docs from the release notes of 4.6: http://sourceforge.net/project/shownotes.php?release_id=675664group_id=15278 Thats also an issue - it arrived only in 4.6 - so this would need to be optional unless we bumped up our req from 4 - and it really requires at least 4.7 for the fixes (if everything is even fixed). You also have to setup which tests run in parallel by hand essentially. No ant task to help with this last I looked. So it will probably end up being an alternate way to run the tests initially (at best). - Mark Erick Erickson wrote: They're ready to go, but at Uwe's suggestion, I've been waiting for 3.0 to get settled before prompting someone to apply this patch. I was going to generate a new patch for this and for 2037 (junit4 tests) just to make sure they were easy to apply. But if you're willing, the patches are already attached to the JIRA issues. Do note that the decision in MinBooleanShouldMatch to stop checking the query after 100 rather than checking all 1,000 is included in the patch Do you want to apply the patches or should I regenerate? It's no big deal to regenerate them and I'll have a better feel for reconciling any conflicts. I don't know whether there even *are* any conflicts, but just in case For my info, though, if I have a more recent patch that *replaces* an earlier patch, especially one that hasn't yet been applied, is it preferred to delete the earlier patch when providing a new one? I'm not pleased with the Junit4 documentation, most of what I've been able to glean has come from brave souls blogging. Does anyone have a gold mine or is it as hit-or-miss as I think? There are hints of parallelization capabilities in Junit4, but I'm having a hard time finding anything in much depth. The Junit website is pathetic, I can't even find 4.7 javadocs, it keeps giving me 4.5, as evidenced by no @Rule docs or @Intercept. And no version information in the javadocs. Or I'm completely missing the boat I was thinking about getting the entire project over the weekend and generating my own if I have the time Erick On Wed, Nov 25, 2009 at 11:49 AM, Michael McCandless (JIRA) j...@apache.org mailto:j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782497#action_12782497 https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782497#action_12782497 ] Michael McCandless commented on LUCENE-1844: Is this ready to go in? I'd really love to see unit tests run faster :) Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png, LUCENE-1844.patch As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work
Re: (LUCENE-1844) Speed up junit tests
IMHO there are other reasons to upgrade to junit4 besides parallelization, there are some nice new capabilities. I suppose the analogous question is why upgrade to Lucene 2.9? Especially since it's not a matter of upgrading. Junit3 tests run just fine under junit4. I've tested after removing the junit3 jar from lib, no problem. It even seems to run slightly faster, which makes me wonder... So really, we have the best of both worlds. No work involved in using Junit4 with the current tests, but the ability to use the new features of Junit4. Although I'm sure there'll be *something* that bites us, I have great faith in Murphy. Kinda reminds me of the Lucene drop-in replacement policy G... But on the topic of parallelization: I'm not at all sure it's worth the effort. As far as I can tell, it really only gets significant gains when you have more cores to run with. It's not at all clear to me how much time we spend doing I/O in the tests... very little I suspect (although I confess I don't know for sure). And if we're CPU bound anyway, parallelization doesn't help. Anybody know for sure? And say we did all the work to parallelize all the tests. And say that instead of taking 25 minutes on my 3 year old MacBook Pro, we got it down to 10 minutes. Who cares? 10 minutes is still too long according to the eXtreme Programming (XP) folks, and I sympathize with their point of view. Even though I did spend some time trying to trim some time. The XP approach to unit testing is to run it almost every time you change a line of code. OK, I'm exaggerating, but not by too much with the die-hard XP folks. Even at 10 minutes, we can't do that. So, I think the value for Lucene/SOLR comes *not* from running the tests 15 times an hour. I think the real value comes from not letting errors hide for days/weeks/months/releases. So I'm quite willing to let the automated builds catch the unit test failures in unexpected places in those instances where I don't run all of the tests before a patch is committed. As long as we fix them as soon as they're found. OK, I'm rambling. I'm off for Thanksgiving, and my daughter is at her in-laws until tomorrow (they're visiting from CA). So sue me G. Best Erick On Wed, Nov 25, 2009 at 5:07 PM, Michael McCandless luc...@mikemccandless.com wrote: Is the only reason to upgrade to junit 4, to get the parallelization possibility (which isn't sounding very compelling!)? Ie, making our unit tests lean is fully independent of junit 4? Mike On Wed, Nov 25, 2009 at 4:17 PM, Uwe Schindler u...@thetaphi.de wrote: junit 4 parallelization is still in its infancy. I think the docs for it are just in the changes file that it was first released with. That version had severe bugs that made it almost unusable - I think thats mostly fixed in a newer release. There is also a much better impl of one of the key classes (I think they call it computer) written by someone else that will eventually go into the code base I think (written by the guy(s) that I think found/fixed the initial buggy-ness) - essentially, I think its still unbaked. There is another problem. Parallelization would only work with tests, that do not change gloabl defaults. E.g. LocalizedTestCase changes the default locale. If another test would run in Paralale, it would break. Son only isolated tests can run in parallel. This LocalizedTestCase cannot solved in another way. The same would have been in 2.9 with the TokenStream.useOnlyNewAPI switch, but this is now longer the case for 3.1. Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1844) Speed up junit tests
I posted a rather long diatribe outlining why I think speed-ups are a false goal for Lucene. Briefly, I'm convinced that as long as the tests are run when Hudson builds Lucene, 99% of the value of unit tests is realized. I suppose this implies that the hard-core committers agree that as long as failed tests are caught/corrected within a day things are fine. Although coming from a background where unit tests are not always required, my viewpoint may be suspect G. er...@nottobeconfusedwithhatcher.com On Wed, Nov 25, 2009 at 8:43 PM, Michael McCandless (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782716#action_12782716] Michael McCandless commented on LUCENE-1844: Will we also speed up back-compat tests? Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png, LUCENE-1844.patch As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2092) BooleanQuery.hashCode and equals ignore isCoordDisabled
[ https://issues.apache.org/jira/browse/LUCENE-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781716#action_12781716 ] Erick Erickson commented on LUCENE-2092: Well, if it's been there since 1.9 and this is the first time it's been reported, it hasn't caused the world to stop yet. So I don't think it's worth the work unless we have to spin another 3.0 for additional reasons. BooleanQuery.hashCode and equals ignore isCoordDisabled --- Key: LUCENE-2092 URL: https://issues.apache.org/jira/browse/LUCENE-2092 Project: Lucene - Java Issue Type: Bug Components: Query/Scoring Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1 Reporter: Hoss Man Assignee: Michael McCandless Attachments: LUCENE-2092.patch BooleanQuery.isCoordDisabled() is not considered by BooleanQuery's hashCode() or equals() methods ... this can cause serious badness to happen when caching BooleanQueries. bug traces back to at least 1.9 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780368#action_12780368 ] Erick Erickson commented on LUCENE-2037: Well, last night I changed LocalizedTestCase to do the @RunWith and @Parameterized thing and it works just fine with a minimal change to subclasses, mainly adding @Test and a c'tor with a Locale parameter. Total, it adds probably a minute to the test run. About the cross product of versions and locales. The @Parameterized thingy returns a list of Object[], where the elements of the list are matched against a c'tor. So if each object[] in your list has, say, an (int, float, int), then as long as you have a matching c'tor with a signature that takes an (int, float, int) you're good to go. So to handle the mXn case you mentioned, if your @Parameters method returned a list of object[], one object[] for each Locale, Version pair, you'd get all your Locales run against all your versions. Whether we *want* this to happen or not is another question. It's a worthwhile question whether we really *need* to run all the possible locales or if there's a subset of locales that would serve. It's kind of ironic that I have a patch waiting to be applied that cuts down on the time it takes to run the unit tests and another patch that adds to the time it takes. Two steps forward, one step back and a jink sideways just for fun. Best Erick Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779449#action_12779449 ] Erick Erickson commented on LUCENE-2037: I was thinking more about TestQueryParser. One of the features of the current setup is that you specify which tests in a class you want to have run under all locales. Tests not in that list are run only under the default locale.Always assuming I'm reading things right... I don't see a clean way to emulate that part of the behavior without either refactoring or introducing a test in the tests we don't want to run under all locales and aborting early. But I think we're finding different ways to agree here. I'm interpreting your comments that running all the tests in the class is OK at least for now... But I did notice last night that a number of tests in contrib reference LocalizedTestCase (I have two separate projects, core and contrib so it wasn't obvious until I ran the ant task). I'll look into those tonight or tomorrow night. Erick Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779491#action_12779491 ] Erick Erickson commented on LUCENE-2037: I think you're mis-reading this. This is the annotation for the static method that return a list of parameters, not for a method that is an actual test. The thing that causes the framework to gather the list and run test for each element on the list is the @RunWith annotation on the class AFAIK. Or I'm misreading it Erick Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779492#action_12779492 ] Erick Erickson commented on LUCENE-2037: Frankly, I don't see how that would work without getting into the guts of the @RunWith(value = Parameterized.class) Junit4 annotation. As I understand it, that annotation *on the class* causes the framework to make a call to the static method that provides a list of parameters (annotated with @Parameters). The framework then takes the returned list and, *for each element in the list* calls a constructor with that element and runs all the tests in the class. So annotating a test with @AllLocales would somehow have to get in there and change what the framework does. No doubt it's do-able, but until I see more than 10 seconds difference in running the tests I'm not sure it's worth it. Nor would I advocate altering the behavior of the framework for back-compat, I'd far rather refactor the tests into those that run for all locales and those that don't. I suppose one could to the inverse, that is create an annotation @DefaultLocaleOnly that aborts early if the locale isn't the default, but again I think the first approach I'd advocate would be to work within the framework until it was too painful FWIW Erick Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779530#action_12779530 ] Erick Erickson commented on LUCENE-2037: {quote} Yes, I do feel we should keep LocalizedTestCase. It is handy, we might use it in more places to prevent test failures in other locales for new code. {quote} Light went off when walking around. I think I can just change the LocalizedTestCase class and put the @RunWith() and @Parameters *there*. Which makes waay more sense than what I was doing which was putting those in every subclass of the current LocalizedTestCase. Doh! I'll take a peek tonight. Although last night I was thinking Gee, this is repetitive There are only two classes in core that use LocalizedTestCase, but there are several in contrib too. They'll all require the @Test annotation if I munge LocalizedTesCase, but that should be the only change necessary then, assuming we're content to run all the locales past all the test cases in all derived classes. H, why was subclassing invented again? Something about putting common behavior in one place or some nonsense like that G. Erick Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779251#action_12779251 ] Erick Erickson commented on LUCENE-2037: Well, it all depend on how you feel about 10 seconds as far as LocalizedTestCase is concerned. JUnit4 is really not built to run some tests in a class with the @Parameterized notation and some not, it runs all the tests in the class with all the parameters. In the case of TestQueryParser, which is the only test class I saw that made use of the include some tests but not others' in LocalizedTestCase, I hacked in running all the tests with all the locales available (152 in my case). Which pushes the number of tests in that one class up over 4,000 FWIW. Running that test case went from around 5 seconds to around 15 seconds on my 2 year old Macbook Pro, from inside IntelliJ. I don't think it's worth trying to refactor that class into two classes, one that has all the tests run with all the locales and one that has the rest of the tests run only with the default locale (which is how I read the code in LocalizedTestcase) for 10 seconds worth of time savings. One could emulate the old process of excluding some tests by returning immediately from those tests that *weren't* intended to be run with all locales if the current locale wasn't the default, but I don't see that as worth the effort, although I could be convinced otherwise if people feel strongly. I'll provide a patch for this if there are no objections later this week, perhaps I'll get a chance to look at BaseTokenStreamTestCase before then. This will make LocalizedTestCase obsolete and I'll remove it in the patch. Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Why release 3.0?
One of my specialties is asking obvious questions just to see if everyone's assumptions are aligned. So with the discussion about branching 3.0 I have to ask Is there going to be any 3.0 release intended for *production*?. And if not, would we save a lot of work by just not worrying about retrofitting fixes to a 3.0 branch and carrying on with 3.1 as the first *supported* 3.x release? Since 3.0 is upgrade-to-java5 and remove deprecations, I'm not sure *as a user* I see a good reason to upgrade to 3.0. Getting a beta/snapshot release to get a head start on cleaning up my code does seem worthwhile, if I have the spare time. And having a base 3.0 version that's not changing all over the place would be useful for that. That said, I'm also not terribly comfortable with a release that's out there and unsupported. Apologies if this has already been discussed, but I don't remember it. Although my memory isn't what it used to be (but some would claim it never wasG)... Erick
Re: Why release 3.0?
On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Erick, 3.0 is **not** unsupported or beta release, it is the cleaned up 2.9.1 release. You are right, it is not needed for 2.9.1 users to upgrade (but they can), but for new users starting with Lucene, the recommendadion is to use it and not 2.9. 3.0 also contains some cleanups needed for 3.1, as the compressed fields are no longer supported, so they must be uncompressed, which is done during optimizing/merging in 3.0. Later versions will remove support for older index types, but you should really update your indexes, especially because flex indexing will possibly remove more support for older indexes (as it gets more complex to maintain all the different file formats). So 3.0 is recommended for users starting new Java 5 projects and want a clean API. People needing backwards compatibility can use 2.9.1, but support for that version will be cancelled in future and bugfixes will only go into 3.x. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -- *From:* Erick Erickson [mailto:erickerick...@gmail.com] *Sent:* Monday, November 16, 2009 7:10 PM *To:* java-dev@lucene.apache.org *Subject:* Why release 3.0? One of my specialties is asking obvious questions just to see if everyone's assumptions are aligned. So with the discussion about branching 3.0 I have to ask Is there going to be any 3.0 release intended for *production*?. And if not, would we save a lot of work by just not worrying about retrofitting fixes to a 3.0 branch and carrying on with 3.1 as the first *supported* 3.x release? Since 3.0 is upgrade-to-java5 and remove deprecations, I'm not sure *as a user* I see a good reason to upgrade to 3.0. Getting a beta/snapshot release to get a head start on cleaning up my code does seem worthwhile, if I have the spare time. And having a base 3.0 version that's not changing all over the place would be useful for that. That said, I'm also not terribly comfortable with a release that's out there and unsupported. Apologies if this has already been discussed, but I don't remember it. Although my memory isn't what it used to be (but some would claim it never wasG)... Erick
Re: Why release 3.0?
Oops, stupid mouse made me send a blank message. Ok, I withdraw the question since there *are* good reasons to put 3.0 in a prod environment G. It's also an easier thing to say new Lucene users should start with 3.0 rather than new Lucene users should start with 3.1. Use 3.0 until we release 3.1 but be aware we're not going to support 3.0 Yccc Erick On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Erick, 3.0 is **not** unsupported or beta release, it is the cleaned up 2.9.1 release. You are right, it is not needed for 2.9.1 users to upgrade (but they can), but for new users starting with Lucene, the recommendadion is to use it and not 2.9. 3.0 also contains some cleanups needed for 3.1, as the compressed fields are no longer supported, so they must be uncompressed, which is done during optimizing/merging in 3.0. Later versions will remove support for older index types, but you should really update your indexes, especially because flex indexing will possibly remove more support for older indexes (as it gets more complex to maintain all the different file formats). So 3.0 is recommended for users starting new Java 5 projects and want a clean API. People needing backwards compatibility can use 2.9.1, but support for that version will be cancelled in future and bugfixes will only go into 3.x. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -- *From:* Erick Erickson [mailto:erickerick...@gmail.com] *Sent:* Monday, November 16, 2009 7:10 PM *To:* java-dev@lucene.apache.org *Subject:* Why release 3.0? One of my specialties is asking obvious questions just to see if everyone's assumptions are aligned. So with the discussion about branching 3.0 I have to ask Is there going to be any 3.0 release intended for *production*?. And if not, would we save a lot of work by just not worrying about retrofitting fixes to a 3.0 branch and carrying on with 3.1 as the first *supported* 3.x release? Since 3.0 is upgrade-to-java5 and remove deprecations, I'm not sure *as a user* I see a good reason to upgrade to 3.0. Getting a beta/snapshot release to get a head start on cleaning up my code does seem worthwhile, if I have the spare time. And having a base 3.0 version that's not changing all over the place would be useful for that. That said, I'm also not terribly comfortable with a release that's out there and unsupported. Apologies if this has already been discussed, but I don't remember it. Although my memory isn't what it used to be (but some would claim it never wasG)... Erick
Re: [jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
Good suggestions, it's really helpful to have someone intimately familiar with the code suggest the next direction. I didn't want to go too far afield for the proof-of-concept, I mostly wanted to have a place to start. LuceneTestCaseJ4 should be useful both as a template and a base to build with. If you wanted to put in a JIRA or two and assign them to me I'd be happy to take a look. I'm pushing this off on you since you have a better sense of what's important here About reformatting. I'm torn, for all the reasons I'm certain you can quote. Of course I'll abide by the sense of the community, but the community doesn't speak with one voice. Michael McCandless and I had an exchange on this very topic and he is in the opposite camp. I guess I was heavily influenced by Martin Fowler's Refactoring book and the eXtreme Programming folks What I'd personally like would be for someone with heavy commit privileges to reformat the whole thing at once and just get it *done*, as was apparently discussed at ApacheCon. Eclipse makes this easy. I'd also like to be wealthy Look at the bright side, I'm not trying to convince anyone that my way of formatting is obviously superior because I put braces on their own line G Best Erick On Sun, Nov 15, 2009 at 6:16 AM, Uwe Schindler (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778086#action_12778086] Uwe Schindler commented on LUCENE-2037: --- One thing that would also be good: We have LocalizedTestCase, which has the possibility to run each test for all available Locales (it overrides currently runBare() and iterates while setting Locale.setDefault()). As this test should only be ran for specific methods, how about adding a annotation in addition to @Test (with Retention(method) like @TestLocalized. What to do with BaseTokenStreamTestCase? In 2.9 it had also overridden runBare(), but not anymore (because we only have the new TS API anymore), but this is also a typical example when we want to rerun tests multiple times. One on our plan is that this test now runs all analyzer test for different default versions (iterate over Version enum constants). We need then something like @TestAllVersions or something like that. If we jump to JUnit4, we should use the new features for a more elegant solution of these multiple-run tests. One note: It would be good to *not* reformat the whole tests with an Eclipse cleanup, just change the lines you modified, not reformat everything or organize imports and so on. Its hard to find out what has changed. Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Issue Comment Edited: (LUCENE-2037) Allow Junit4 tests in our environment.
That thought occurred to me earlier, but I don't know enough specifics yet. I intend to find out though Erick On Sun, Nov 15, 2009 at 8:46 AM, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778101#action_12778101] Robert Muir edited comment on LUCENE-2037 at 11/15/09 1:45 PM: --- Is there some way to use Junit4 parameterized tests to do this LocalizedTestCase-type thing, so we don't have to override runBare()? was (Author: rcmuir): Is there some way to use Junit4 parameterized tests to do this LocalizedTestCase-type thing, so we don't have to override runBase()? Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Junit4
Well, the patch is in shape to submit, but looking at the various comments on the 3.0 release, I guess I should wait until 3.0 is actually out the door before submitting unless someone just can't wait. How do we include a new jar file in a patch? Best Erick On Fri, Nov 13, 2009 at 6:25 PM, Erick Erickson erickerick...@gmail.comwrote: OK thanks for adding me to the ACL. I'll have it tomorrow sometime. Does anyone object to deprecating LuceneTestCase with notations to use LuceneTestCaseJ4? I tried two approaches, both work. Both allow you to use LuceneTestCaseJ4 rather than LuceneTestCase as a superclass, with the caveat you have to use the proper annotations with the J4 variant. The difference is that for one approach, I copied LuceneTestCase to LuceneTestCaseJ4 and hacked. The other approach was extracting the meat of LuceneTestCase to a common class, and using that class as a member of both variants, delegating to avoid code duplication. Personally, I think it'll be cleanest to just clone LuceneTestCase and NOT extract to common. Eventually LuceneTestCase will fade away, enhancements should be made to the J4 variant as needed. But if folks have strong opinions, let me know. Best Erick On Fri, Nov 13, 2009 at 5:02 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : putting too many irons in the fire, especially non-critical ones. I don't : see a way to assign it to myself, either I'm missing something or I'm just : underprivileged G, so if someone would go ahead and assign it to me I'll : work on it post 3.0. Jira's ACLs prevent issues from being assigned to people who aren't listed in the Contributors group. THe policy has been to add people to that list (for issue assignment) on request, so i hooked you up. (NOTE: if anyone else has issues they're actively working on and would like to be flagged as a Contributor in Jira so that the issues can be assigned directly to you for tracking purpose, please speak up) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-2037: --- Attachment: junit-4.7.jar LUCENE-2037.patch LuceneTestCaseJ4 should replace LuceneTestCase. There's a bit of overkill here to emulate the override of runBare in LuceneTestCase, but I thought it was worth it to work out the mechanisms. We'll need to put the lucene 4.7 jar in the right place. Allow Junit4 tests in our environment. -- Key: LUCENE-2037 URL: https://issues.apache.org/jira/browse/LUCENE-2037 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 3.1 Environment: Development Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.1 Attachments: junit-4.7.jar, LUCENE-2037.patch Original Estimate: 8h Remaining Estimate: 8h Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should have to be rewritten. We should start this for the 3.1 release so we can get a clean 3.0 out smoothly. It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1844) Speed up junit tests
[ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-1844: --- Attachment: (was: LUCENE-1844.patch) Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1844) Speed up junit tests
[ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-1844: --- Attachment: LUCENE-1844.patch This supersedes the first patch I submitted. Apply after LUCENE-2037. Render judgment on whether TestBooleanMinShouldMatch it's really OK to cut off checking the queries after 100. Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1844) Speed up junit tests
[ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-1844: --- Attachment: (was: LUCENE-1844.patch) Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1844) Speed up junit tests
[ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-1844: --- Attachment: LUCENE-1844.patch Saves 3-4 minutes overall. Arbitrarily limited the TestBooleanMinShouldMatch to stop checking queries after 100. I don't see much point in checking the *same* queries again and again in TestCustomScoreQuery, so I just moved the check outside the loop. Apply this patch *after* LUCENE-2037 since TestCustomScoreQuery happens to be common to both patches. Sorry about the noise with the license grant... Speed up junit tests Key: LUCENE-1844 URL: https://issues.apache.org/jira/browse/LUCENE-1844 Project: Lucene - Java Issue Type: Improvement Reporter: Mark Miller Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png, LUCENE-1844.patch As Lucene grows, so does the number of JUnit tests. This is obviously a good thing, but it comes with longer and longer test times. Now that we also run back compat tests in a standard test run, this problem is essentially doubled. There are some ways this may get better, including running parallel tests. You will need the hardware to fully take advantage, but it should be a nice gain. There is already an issue for this, and Junit 4.6, 4.7 have the beginnings of something we might be able to count on soon. 4.6 was buggy, and 4.7 still doesn't come with nice ant integration. Parallel tests will come though. Beyond parallel testing, I think we also need to concentrate on keeping our tests lean. We don't want to sacrifice coverage or quality, but I'm sure there is plenty of fat to skim. I've started making a list of some of the longer tests - I think with some work we can make our tests much faster - and then with parallelization, I think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Junit4
OK thanks for adding me to the ACL. I'll have it tomorrow sometime. Does anyone object to deprecating LuceneTestCase with notations to use LuceneTestCaseJ4? I tried two approaches, both work. Both allow you to use LuceneTestCaseJ4 rather than LuceneTestCase as a superclass, with the caveat you have to use the proper annotations with the J4 variant. The difference is that for one approach, I copied LuceneTestCase to LuceneTestCaseJ4 and hacked. The other approach was extracting the meat of LuceneTestCase to a common class, and using that class as a member of both variants, delegating to avoid code duplication. Personally, I think it'll be cleanest to just clone LuceneTestCase and NOT extract to common. Eventually LuceneTestCase will fade away, enhancements should be made to the J4 variant as needed. But if folks have strong opinions, let me know. Best Erick On Fri, Nov 13, 2009 at 5:02 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : putting too many irons in the fire, especially non-critical ones. I don't : see a way to assign it to myself, either I'm missing something or I'm just : underprivileged G, so if someone would go ahead and assign it to me I'll : work on it post 3.0. Jira's ACLs prevent issues from being assigned to people who aren't listed in the Contributors group. THe policy has been to add people to that list (for issue assignment) on request, so i hooked you up. (NOTE: if anyone else has issues they're actively working on and would like to be flagged as a Contributor in Jira so that the issues can be assigned directly to you for tracking purpose, please speak up) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1257) Port to Java5
About formatting. I know the how to contribute section of the Wiki warns against gratuitous reformatting, but if *someone* with commit privileges wanted to, they could format an entire tree in Eclipse from the context menu of, say, the contrib directory. It'd have to be coordinated for a moment when not too many others were editing the code... I mention this since we're doing a bunch of non-functional changes for the 3.0 release, and it might be a reasonable thing to do so future commits were easier to compare, at least after the reformatting was done. As long as we're all using the same formatting, it might be worthwhile. Somebody mentioned uploading a new codestyle.xml for Eclipse. Were there any changes or is this just getting the one from SOLR up there? Because I'm using IntelliJ Erick On Tue, Nov 10, 2009 at 7:08 PM, Uwe Schindler (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776184#action_12776184] Uwe Schindler commented on LUCENE-1257: --- Kay Kay: We only have SuppressWarnings at some places in core, marked with a big TODO (will be done when flex indeixng comes). The wanted @SuppressWarnings are only at places, where generic Arrays are created. There is no way to fix this (see Sun Generics Howto). Port to Java5 - Key: LUCENE-1257 URL: https://issues.apache.org/jira/browse/LUCENE-1257 Project: Lucene - Java Issue Type: Improvement Components: Analysis, Examples, Index, Other, Query/Scoring, QueryParser, Search, Store, Term Vectors Affects Versions: 3.0 Reporter: Cédric Champeau Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: instantiated_fieldable.patch, LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, LUCENE-1257-CompoundFileReaderWriter.patch, LUCENE-1257-ConcurrentMergeScheduler.patch, LUCENE-1257-DirectoryReader.patch, LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, LUCENE-1257-IndexDeleter.patch, LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_benchmark_2.patch, LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_contrib_memory.patch, LUCENE-1257_contrib_misc.patch, LUCENE-1257_contrib_smartcn.patch, LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_lucil.patch, LUCENE-1257_lucli.patch, LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_demo.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_precendence_parser.patch, LUCENE-1257_queryParser_jj.patch, LUCENE-1257_swing_wikipedia_wordnet_xmlqp.patch, LUCENE-1257_unnecessary_casts.patch, LUCENE-1257_unnnecessary_casts_2.patch, lucene1257surround1.patch, lucene1257surround1.patch, shinglematrixfilter_generified.patch For my needs I've updated Lucene so that it uses Java 5 constructs. I know Java 5 migration had been planned for 2.1 someday in the past, but don't know when it is planned now. This patch against the trunk includes : - most obvious generics usage (there are tons of usages of sets, ... Those which are commonly used have been generified) - PriorityQueue generification - replacement of indexed for loops with for each constructs - removal of unnececessary unboxing The code is to my opinion much more readable with those features (you actually *know* what is stored in collections reading the code, without the need to lookup for field definitions everytime) and it simplifies many algorithms. Note that this patch also includes an interface for the Query class. This has been done for my company's needs for building custom Query
Re: [jira] Commented: (LUCENE-1257) Port to Java5
And here I was hoping to make Uwe stay up for *days* without sleep finding all the gotchas G. Thanks for the response. I'll see if I can update my IntelliJ codestyle appropriately, but probably won't get there 'til this weekend. I'll upload it to the Wiki or attach it to a Jira if nobody beats me to it. Erick On Tue, Nov 10, 2009 at 7:37 PM, Robert Muir rcm...@gmail.com wrote: this was the similar to the discussion we had at apachecon, where i wanted to create a jira issue as Uwe Schindlersome invisible unicode space and suggest a patch to reformat all of contrib! (would never attribute such a thing to my name but this formatting issue consistently gets in my way) On Tue, Nov 10, 2009 at 7:29 PM, Uwe Schindler u...@thetaphi.de wrote: Yes this one is new, but it is almost the default Java 1.5 style with tabs=2chars and the modified generics formatting. I know about the reformatting method in Eclipse, but that would break more patches now L (a lot of are already broken). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -- *From:* Erick Erickson [mailto:erickerick...@gmail.com] *Sent:* Wednesday, November 11, 2009 1:27 AM *To:* java-dev@lucene.apache.org *Subject:* Re: [jira] Commented: (LUCENE-1257) Port to Java5 About formatting. I know the how to contribute section of the Wiki warns against gratuitous reformatting, but if *someone* with commit privileges wanted to, they could format an entire tree in Eclipse from the context menu of, say, the contrib directory. It'd have to be coordinated for a moment when not too many others were editing the code... I mention this since we're doing a bunch of non-functional changes for the 3.0 release, and it might be a reasonable thing to do so future commits were easier to compare, at least after the reformatting was done. As long as we're all using the same formatting, it might be worthwhile. Somebody mentioned uploading a new codestyle.xml for Eclipse. Were there any changes or is this just getting the one from SOLR up there? Because I'm using IntelliJ Erick On Tue, Nov 10, 2009 at 7:08 PM, Uwe Schindler (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776184#action_12776184] Uwe Schindler commented on LUCENE-1257: --- Kay Kay: We only have SuppressWarnings at some places in core, marked with a big TODO (will be done when flex indeixng comes). The wanted @SuppressWarnings are only at places, where generic Arrays are created. There is no way to fix this (see Sun Generics Howto). Port to Java5 - Key: LUCENE-1257 URL: https://issues.apache.org/jira/browse/LUCENE-1257 Project: Lucene - Java Issue Type: Improvement Components: Analysis, Examples, Index, Other, Query/Scoring, QueryParser, Search, Store, Term Vectors Affects Versions: 3.0 Reporter: Cédric Champeau Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: instantiated_fieldable.patch, LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, LUCENE-1257-CompoundFileReaderWriter.patch, LUCENE-1257-ConcurrentMergeScheduler.patch, LUCENE-1257-DirectoryReader.patch, LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, LUCENE-1257-IndexDeleter.patch, LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_benchmark_2.patch, LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_contrib_memory.patch, LUCENE-1257_contrib_misc.patch, LUCENE-1257_contrib_smartcn.patch, LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_lucil.patch, LUCENE-1257_lucli.patch, LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_demo.patch, LUCENE
Re: Lucene - Text Classification.
Please re-post this question on the lucene user's list, this list is intended for development discussions Best Erick On Mon, Nov 9, 2009 at 10:02 AM, lucenenew mitesh.jes...@yahoo.com wrote: i want to classify sentences stored as strings to a bunch of keywords related to a certain category. so i will have 10 strings which will be a sentence long. and i will want to compare each string to a set of 30 keywords stored somewhere, and then compare with another set of 30 keywords, so on. i want to rank each string based on the number of times it matches a set of keywords. so basically i want to categorize each sentence. is this possible with lucene, or would any other approach be more efficient. will this process take long? in terms of speed of program. and what tools would i need? any help would be great. thanks. -- View this message in context: http://old.nabble.com/Lucene---Text-Classification.-tp26267794p26267794.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org