RE: [Lucene.Net] new structure
LZOCompressor? Do we need it? DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Saturday, April 23, 2011 3:07 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] new structure I guess by 'today' I meant 'In about 6 days'. Anyhow, I completed the commit of the new directory structure.. I did not delete the OLD directory structure, because they can live side-by-side. Also, please note that I only created vs2010 solutions and upgraded the projects to same. Please pull down the latest revision and validate these changes. If all goes well, I'll delete the old directory structure (everything under the 'C#' directory). Thanks, Troy On Sat, Apr 16, 2011 at 3:42 PM, Troy Howard thowar...@gmail.com wrote: Apologize. I got a bit derailed. Will be commiting today. On Apr 16, 2011 2:20 PM, Prescott Nasser geobmx...@hotmail.com wrote: Hey Troy any status update on the new structure? I'm hesistant to do updates since I know you're going to be modifying it all shortly ~P
RE: [Lucene.Net] new structure
Everything seems to be OK. +1 for removing old directory structure. Thanks Troy DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Saturday, April 23, 2011 3:07 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] new structure I guess by 'today' I meant 'In about 6 days'. Anyhow, I completed the commit of the new directory structure.. I did not delete the OLD directory structure, because they can live side-by-side. Also, please note that I only created vs2010 solutions and upgraded the projects to same. Please pull down the latest revision and validate these changes. If all goes well, I'll delete the old directory structure (everything under the 'C#' directory). Thanks, Troy On Sat, Apr 16, 2011 at 3:42 PM, Troy Howard thowar...@gmail.com wrote: Apologize. I got a bit derailed. Will be commiting today. On Apr 16, 2011 2:20 PM, Prescott Nasser geobmx...@hotmail.com wrote: Hey Troy any status update on the new structure? I'm hesistant to do updates since I know you're going to be modifying it all shortly ~P
[Lucene.Net] [jira] [Closed] (LUCENENET-399) Port changes from Java Lucene 2.9.3 and 2.9.4 releases
[ https://issues.apache.org/jira/browse/LUCENENET-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy closed LUCENENET-399. -- Resolution: Fixed No more waiting tasks. I am closing this issue Port changes from Java Lucene 2.9.3 and 2.9.4 releases -- Key: LUCENENET-399 URL: https://issues.apache.org/jira/browse/LUCENENET-399 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Core, Lucene.Net Test Reporter: Troy Howard Assignee: Scott Lombard Fix For: Lucene.Net 2.9.4 Time Spent: 10h Remaining Estimate: 10h Port changes from Java Lucene 2.9.3 and 2.9.4 releases. The Lucene.Net 2.9.4 release will roll up the changes from both of those releases into one. Unit tests should be added or updated to reflect the changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.
Replacing ArrayLists, Hashtables etc. with appropriate Generics. Key: LUCENENET-412 URL: https://issues.apache.org/jira/browse/LUCENENET-412 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some performance gains. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Lucene Jenkins slave out of disk
On Fri, Apr 22, 2011 at 5:13 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Apr 22, 2011 at 9:13 AM, Uwe Schindler u...@thetaphi.de wrote: Hi Robert, Thanks for pointing to that issue. Indeed the leftover test files in Lucene take approx. 3 GB per build. With our 9 builds that’s 30 GB - useless. If the tests clean up the thing successfully after running, we should be fine. I resolved this for trunk, branch_3x, and backwards. any other branches (realtime? docvalues?) currently being tested by hudson should merge up as soon as we can Thanks robert, I will merge RT now and commit... DocValues build is disabled currently I will make sure that I merge before reenabling it... simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [HUDSON] Lucene-trunk - Build # 1537 - Failure
On Fri, Apr 22, 2011 at 2:44 PM, Robert Muir rcm...@gmail.com wrote: On Fri, Apr 22, 2011 at 8:41 AM, Thomas Matthijs thomas.matth...@actonomy.com wrote: On Fri, Apr 22, 2011 at 14:15, Uwe Schindler u...@thetaphi.de wrote: Hi Simon, I had no success to change anything. As root I can at least call ulimit -n, but the limit does not raise. Lowering is easily possible: [root@lucene ~]# ulimit -n 32768 Probably kernel level enforced max, try raising it with sysctl, i think there are options named kern.maxfilesperproc or kern.maxfiles you can list them with # sysctl -a Are you sure we should do this? I've had this discussion with mikemccand before, the concern is that if we have too many open files this is definitely a realistic problem (it comes up on the userlist quite often). -- open files (-n) 11095 thats quite a ok setting though... the problem I see here is that there are some tests around that could produce tons of files due to some settings like maxBufferedDocs = 2 if then no merge policy is used we getting pretty close to those limits. The problems on the userlist are coming up since ever not sure what to do here then.. simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Lucene Jenkins slave out of disk
any other branches (realtime? docvalues?) currently being tested by hudson should merge up as soon as we can Thanks robert, I will merge RT now and commit... DocValues build is disabled currently I will make sure that I merge before reenabling it... Don't hurry, the FreeBSD machine hosting the Jail is down since about 18 hrs. Major problems as it seems - or they are updating harddisks? *g* Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Lucene Jenkins slave out of disk
Hi, On Sat, Apr 23, 2011 at 9:47 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, Can you also check that all new tests in realtime use the new _TestUtils API for getting an index dir? That would be nice. This only applies if we are getting an explicit index dir right? Yes! Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically
[ https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-2868: --- Assignee: Simon Willnauer It should be easy to make use of TermState; rewritten queries should be shared automatically Key: LUCENE-2868 URL: https://issues.apache.org/jira/browse/LUCENE-2868 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Reporter: Karl Wright Assignee: Simon Willnauer Attachments: LUCENE-2868.patch, LUCENE-2868.patch, lucene-2868.patch, lucene-2868.patch, query-rewriter.patch When you have the same query in a query hierarchy multiple times, tremendous savings can now be had if the user knows enough to share the rewritten queries in the hierarchy, due to the TermState addition. But this is clumsy and requires a lot of coding by the user to take advantage of. Lucene should be smart enough to share the rewritten queries automatically. This can be most readily (and powerfully) done by introducing a new method to Query.java: Query rewriteUsingCache(IndexReader indexReader) ... and including a caching implementation right in Query.java which would then work for all. Of course, all callers would want to use this new method rather than the current rewrite(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Lucene Jenkins slave out of disk
Hi, On Sat, Apr 23, 2011 at 9:47 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, Can you also check that all new tests in realtime use the new _TestUtils API for getting an index dir? That would be nice. This only applies if we are getting an explicit index dir right? Yes! Addition: I meant such code to be replaced: -indexDir = new File(workDir, testIndex); +indexDir = _TestUtil.getTempDir(testIndex); Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023526#comment-13023526 ] Simon Willnauer commented on LUCENE-3041: - I like the simple interface but the name is somewhat misleading here I think. Either we make this a 'real' visitor pattern and add accept methods to Query which I don't think is necessary or we should make the name specific for the task. Since this is really for walking the Query 'AST' during the rewrite process we should make this very clean in the IF name. QueryRewriter or something like that would make more sense and it would justify the Query return value, no? Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Jenkins slave out of disk
On Sat, Apr 23, 2011 at 10:11 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, On Sat, Apr 23, 2011 at 9:47 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, Can you also check that all new tests in realtime use the new _TestUtils API for getting an index dir? That would be nice. This only applies if we are getting an explicit index dir right? Yes! Addition: I meant such code to be replaced: - indexDir = new File(workDir, testIndex); + indexDir = _TestUtil.getTempDir(testIndex); ok will do! simon Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2796) Tests need to clean up after themselves
[ https://issues.apache.org/jira/browse/LUCENE-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023528#comment-13023528 ] Simon Willnauer commented on LUCENE-2796: - I merged RT with trunk so hudson builds should be fine there too! Tests need to clean up after themselves --- Key: LUCENE-2796 URL: https://issues.apache.org/jira/browse/LUCENE-2796 Project: Lucene - Java Issue Type: Bug Components: Build Affects Versions: 3.1, 4.0 Reporter: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-2796.patch I havent run 'ant clean' for a while. The randomly generated temporarily file names just piled up from running the tests many times... so ant clean is still running after quite a long time. We should take the logic in the base solr test cases, and push it into LuceneTestCase, so a test cleans up all its temporary stuff. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023536#comment-13023536 ] Chris Male commented on LUCENE-3041: I remain weary of calling it QueryRewriter since there is already Query rewriting support through Query#rewrite, but I take your point. What about QueryOptimizer? Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3042) AttributeSource can have an invalid computed state
[ https://issues.apache.org/jira/browse/LUCENE-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3042: -- Fix Version/s: 3.1 3.0.4 2.9.5 I thought about it: This bug is so serious it should be fixed in all branches, too (even if never released anymore). This is important for e.g. 2.9 users whcih are stuck with that version. Committed 3.1 branch revision: 1096127 Committed 3.0 branch revision: 1096128 Committed 2.9 branch revision: 1096129 AttributeSource can have an invalid computed state -- Key: LUCENE-3042 URL: https://issues.apache.org/jira/browse/LUCENE-3042 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 4.0 Reporter: Robert Muir Assignee: Uwe Schindler Priority: Critical Fix For: 2.9.5, 3.0.4, 3.1, 3.2, 4.0 Attachments: LUCENE-3042.patch, LUCENE-3042.patch If you work a tokenstream, consume it, then reuse it and add an attribute to it, the computed state is wrong. thus for example, clearAttributes() will not actually clear the attribute added. So in some situations, addAttribute is not actually clearing the computed state when it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023558#comment-13023558 ] Simon Willnauer commented on LUCENE-3041: - bq. What about QueryOptimizer? QueryProcessor or QueryPreProcessor? Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023560#comment-13023560 ] Chris Male commented on LUCENE-3041: I'm happy to settle with QueryProcessor#process Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Updated] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.
[ https://issues.apache.org/jira/browse/LUCENENET-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-412: --- Attachment: LUCENENET-412.patch Patch for Document.fields(all unit tests pass) If it is OK, we can continue this way. DIGY Replacing ArrayLists, Hashtables etc. with appropriate Generics. Key: LUCENENET-412 URL: https://issues.apache.org/jira/browse/LUCENENET-412 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 Attachments: LUCENENET-412.patch This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some performance gains. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023561#comment-13023561 ] Simon Willnauer commented on LUCENE-3041: - bq. I'm happy to settle with QueryProcessor#process +1 - Chris are you cranking out a patch for this? I think if we have a QueryProcessor we should somehow make it possible to optionally hook it into IndexSearcher to essentially replace the direct call to Query#rewrite Eventually it should be the QueryProcessor's responsibility to rewrite the query and pass the actual 'primitive' query to the searcher once done. I think its good to keep that interface super lean and let more fancy impl. follow up on it. Stuff like automatic dispatch for certain query types might need some cglib magic or at least req. java 6 to perform so they might need to go to contrib/misc. Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023562#comment-13023562 ] Chris Male commented on LUCENE-3041: Yup I have a patch cooking. bq. Stuff like automatic dispatch for certain query types might need some cglib magic or at least req. java 6 to perform so they might need to go to contrib/misc. I don't think this will be the case. I am striving to use Java 5 reflection classes and that seems to be working fine. Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 50 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/50/ No tests ran. Build Log (for compile errors): [...truncated 2915 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2560) random analyzer tests
[ https://issues.apache.org/jira/browse/LUCENE-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2560: Attachment: LUCENE-2560.patch here's a patch: this test already found an attributes bug (LUCENE-3042) and I found a bug in the german stemmer (the one that used to be used by default before lucene 3.1). I'll open a separate bug for that one, I @Ignored it here because I want to get the tests in. random analyzer tests - Key: LUCENE-2560 URL: https://issues.apache.org/jira/browse/LUCENE-2560 Project: Lucene - Java Issue Type: Test Components: contrib/analyzers Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2560.patch we have been finding+fixing lots of bugs by randomizing lucene tests. in r966878 I added a variant of random unicode string that gives you a random string within the same unicode block (for other purposes) I think we should use this to test the analyzers better, for example we should pound tons of random greek strings against the greek analyzer and at least make sure there aren't exceptions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3043) o.a.l.analysis.de.GermanStemmer crashes on some inputs
o.a.l.analysis.de.GermanStemmer crashes on some inputs -- Key: LUCENE-3043 URL: https://issues.apache.org/jira/browse/LUCENE-3043 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir See the tests from LUCENE-2560. GermanAnalyzer no longer uses this stemmer by default, but we should fix it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2560) random analyzer tests
[ https://issues.apache.org/jira/browse/LUCENE-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2560. - Resolution: Fixed Fix Version/s: 3.2 Committed revisions 1096178, 1096186 random analyzer tests - Key: LUCENE-2560 URL: https://issues.apache.org/jira/browse/LUCENE-2560 Project: Lucene - Java Issue Type: Test Components: contrib/analyzers Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-2560.patch we have been finding+fixing lots of bugs by randomizing lucene tests. in r966878 I added a variant of random unicode string that gives you a random string within the same unicode block (for other purposes) I think we should use this to test the analyzers better, for example we should pound tons of random greek strings against the greek analyzer and at least make sure there aren't exceptions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3043) o.a.l.analysis.de.GermanStemmer crashes on some inputs
[ https://issues.apache.org/jira/browse/LUCENE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3043: Attachment: LUCENE-3043.patch o.a.l.analysis.de.GermanStemmer crashes on some inputs -- Key: LUCENE-3043 URL: https://issues.apache.org/jira/browse/LUCENE-3043 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-3043.patch See the tests from LUCENE-2560. GermanAnalyzer no longer uses this stemmer by default, but we should fix it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3043) o.a.l.analysis.de.GermanStemmer crashes on some inputs
[ https://issues.apache.org/jira/browse/LUCENE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023583#comment-13023583 ] Robert Muir commented on LUCENE-3043: - this problem occurs if this stemmer encounters an empty term (some things like keywordtokenizer/regex do this). the fix is trivial... i'll commit soon. o.a.l.analysis.de.GermanStemmer crashes on some inputs -- Key: LUCENE-3043 URL: https://issues.apache.org/jira/browse/LUCENE-3043 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-3043.patch See the tests from LUCENE-2560. GermanAnalyzer no longer uses this stemmer by default, but we should fix it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Jenkins slave out of disk
On Sat, Apr 23, 2011 at 4:17 AM, Simon Willnauer simon.willna...@googlemail.com wrote: Addition: I meant such code to be replaced: - indexDir = new File(workDir, testIndex); + indexDir = _TestUtil.getTempDir(testIndex); Thanks for merging Simon! also, for what its worth, we should at some point review these tests creating explicit directories. If a test wants to create an index, it can use newDirectory() or newFSDirectory(). The latter will only select filesystem-based implementations, never RAMDirectory, etc. When you use these methods, a unique temporary directory is automatically produced (and of course, deleted after the test if it passes). Most of these tests probably don't care what the directories actual name is... - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3043) o.a.l.analysis.de.GermanStemmer crashes on some inputs
[ https://issues.apache.org/jira/browse/LUCENE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3043. - Resolution: Fixed Fix Version/s: 4.0 3.2 Assignee: Robert Muir Committed revisions 1096194, 1096199 o.a.l.analysis.de.GermanStemmer crashes on some inputs -- Key: LUCENE-3043 URL: https://issues.apache.org/jira/browse/LUCENE-3043 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-3043.patch See the tests from LUCENE-2560. GermanAnalyzer no longer uses this stemmer by default, but we should fix it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3036) TestFSTs.testRealTerms is a terrible unit test
[ https://issues.apache.org/jira/browse/LUCENE-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023610#comment-13023610 ] Michael McCandless commented on LUCENE-3036: Patch looks good! TestFSTs.testRealTerms is a terrible unit test -- Key: LUCENE-3036 URL: https://issues.apache.org/jira/browse/LUCENE-3036 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-3036.patch This test: * uses FSDirectory.open (platform-specific behavior) * is a random test, but runs to a certain amount of seconds, then quits (makes it hard to reproduce with seed, as its behavior is dependent on your computers speed etc) After waiting 3 hours to download the 1 gigabyte file to reproduce the corrupt index it made in (https://hudson.apache.org/hudson/job/Lucene-trunk/1533/testReport/junit/org.apache.lucene.util.automaton.fst/TestFSTs/testRealTerms/), I found some of this frustrating. I managed to finally reproduce it but its no fun fiddling with a test that runs for 5 minutes to reproduce a fail. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Lucene Jenkins slave out of disk
Hi Robert, On Hudson there is still one very bad test, all others are cleaned up: [root@lucene /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4]# ls -lh total 1286544 -rw--- 1 hudson hudson 602M Apr 23 22:00 core.java.9439 -rw--- 1 hudson hudson 656M Apr 23 22:00 core.java.9440 -rw-r--r-- 1 hudson hudson10K Apr 23 22:00 hs_err_pid9439.log -rw-r--r-- 1 hudson hudson11K Apr 23 22:00 hs_err_pid9440.log -rw-r--r-- 1 hudson hudson 0B Apr 23 22:00 quiet.ant This one should be cleaned up or we should disable coredumps... How to do this - for this test we need no coredumps, maybe we can add a parameter to the ProcessBuilder? That’s of course your JDK-crasher: [...] Stack: [0x7e4e9000,0x7e5e9000], sp=0x7e5e8160, free space=3fc0001k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x71bc61] j sun.misc.Unsafe.putAddress(JJ)V+0 v ~StubRoutines::call_stub [...] Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Saturday, April 23, 2011 7:56 PM To: dev@lucene.apache.org; simon.willna...@gmail.com Subject: Re: Lucene Jenkins slave out of disk On Sat, Apr 23, 2011 at 4:17 AM, Simon Willnauer simon.willna...@googlemail.com wrote: Addition: I meant such code to be replaced: -indexDir = new File(workDir, testIndex); +indexDir = _TestUtil.getTempDir(testIndex); Thanks for merging Simon! also, for what its worth, we should at some point review these tests creating explicit directories. If a test wants to create an index, it can use newDirectory() or newFSDirectory(). The latter will only select filesystem-based implementations, never RAMDirectory, etc. When you use these methods, a unique temporary directory is automatically produced (and of course, deleted after the test if it passes). Most of these tests probably don't care what the directories actual name is... - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Jenkins slave out of disk
On Sat, Apr 23, 2011 at 6:27 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Robert, On Hudson there is still one very bad test, all others are cleaned up: [root@lucene /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4]# ls -lh total 1286544 -rw--- 1 hudson hudson 602M Apr 23 22:00 core.java.9439 -rw--- 1 hudson hudson 656M Apr 23 22:00 core.java.9440 -rw-r--r-- 1 hudson hudson 10K Apr 23 22:00 hs_err_pid9439.log -rw-r--r-- 1 hudson hudson 11K Apr 23 22:00 hs_err_pid9440.log -rw-r--r-- 1 hudson hudson 0B Apr 23 22:00 quiet.ant This one should be cleaned up or we should disable coredumps... How to do this - for this test we need no coredumps, maybe we can add a parameter to the ProcessBuilder? i think this can be configured with sysctl, hopefully its per-jail and we can just set it... i'll take a look - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Jenkins slave out of disk
On Sat, Apr 23, 2011 at 6:27 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Robert, On Hudson there is still one very bad test, all others are cleaned up: [root@lucene /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4]# ls -lh total 1286544 -rw--- 1 hudson hudson 602M Apr 23 22:00 core.java.9439 -rw--- 1 hudson hudson 656M Apr 23 22:00 core.java.9440 -rw-r--r-- 1 hudson hudson 10K Apr 23 22:00 hs_err_pid9439.log -rw-r--r-- 1 hudson hudson 11K Apr 23 22:00 hs_err_pid9440.log -rw-r--r-- 1 hudson hudson 0B Apr 23 22:00 quiet.ant This one should be cleaned up or we should disable coredumps... How to do this - for this test we need no coredumps, maybe we can add a parameter to the ProcessBuilder? we are not allowed to change the sysctl here... maybe i can change the test to use a CWD of its tempdir, so its corefiles will get deleted... I thought i did this already, but maybe i screwed it up - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Jenkins slave out of disk
On Sat, Apr 23, 2011 at 6:40 PM, Robert Muir rcm...@gmail.com wrote: I thought i did this already, but maybe i screwed it up Sorry, silly test bug... fixed in Revision: 1096249 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2795) Genericize DirectIOLinuxDir - UnixDir
[ https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023614#comment-13023614 ] Michael McCandless commented on LUCENE-2795: It looks like recent Linux kernels have better behavior with SEQUENTIAL flag: http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html?showComment=1303235497682#c2572106601600642254 If possible we should test on kernels after that patch was merged to see if passing SEQUENTIAL for merging prevents eviction of hot pages being used for searching... Genericize DirectIOLinuxDir - UnixDir -- Key: LUCENE-2795 URL: https://issues.apache.org/jira/browse/LUCENE-2795 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Michael McCandless Assignee: Simon Willnauer Labels: gsoc2011, lucene-gsoc-11, mentor Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to use it for indexWriter and not IndexReader (searching). It's a trap. But, once we do LUCENE-2793, we can make it fully general purpose because then a single native Dir impl can be used. I'd also like to make it generic to other Unices, if we can, so that it becomes UnixDirectory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2522) add simple japanese tokenizer, based on tinysegmenter
[ https://issues.apache.org/jira/browse/LUCENE-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2522: Attachment: LUCENE-2522.patch attached is an updated patch, its still a work in progress (needs some more tests and benchmarking and some other things little fixes). Theres a general pattern for these segmenters (this one, smartchinese, sen) thats a little tricky, that is they want to really look at sentences to determine how to segment. So, I added a base class for this to make writing these segmenters easier, and also to hopefully improve segmentation accuracy. (I would like to switch smartchinese over to it) This class makes it easy to segment sentences with a Sentence BreakIterator... in my opinion it doesnt matter how theoretically good the word tokenization is for these things, if the sentence tokenizer is really bad (I found this issue with both sen and smartchinese). hope to get it committable soon add simple japanese tokenizer, based on tinysegmenter - Key: LUCENE-2522 URL: https://issues.apache.org/jira/browse/LUCENE-2522 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Fix For: 4.0 Attachments: LUCENE-2522.patch, LUCENE-2522.patch, LUCENE-2522.patch TinySegmenter (http://www.chasen.org/~taku/software/TinySegmenter/) is a tiny japanese segmenter. It was ported to java/lucene by Kohei TAKETA k-...@void.in, and is under friendly license terms (BSD, some files explicitly disclaim copyright to the source code, giving a blessing instead) Koji knows the author, and already contacted about incorporating into lucene: {noformat} I've contacted Takeda-san who is the creater of Java version of TinySegmenter. He said he is happy if his program is part of Lucene. He is a co-author of my book about Solr published in Japan, BTW. ;-) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2436: - Attachment: SOLR-2436.patch A new patch that includes updated CHANGES.txt and README.txt. I'll commit tonight. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-trunk - Build # 1539 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-trunk/1539/ 3 tests failed. FAILED: org.apache.lucene.util.packed.TestPackedInts.testIsOptimized Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. REGRESSION: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: null Stack Trace: junit.framework.AssertionFailedError: at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1247) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) at org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:367) REGRESSION: org.apache.lucene.index.TestParallelReader.testIsOptimized Error Message: directory '/usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/test4463631349252227317tmp' exists and is a directory, but cannot be listed: list() returned null Stack Trace: java.io.IOException: directory '/usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/test4463631349252227317tmp' exists and is a directory, but cannot be listed: list() returned null at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:239) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:250) at org.apache.lucene.store.MockDirectoryWrapper.listAll(MockDirectoryWrapper.java:519) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:568) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:84) at org.apache.lucene.index.IndexReader.open(IndexReader.java:500) at org.apache.lucene.index.IndexReader.open(IndexReader.java:293) at org.apache.lucene.index.TestParallelReader.testIsOptimized(TestParallelReader.java:198) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1247) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) Build Log (for compile errors): [...truncated 11810 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org