Re: lucene4.0 release
On Jul 6, 2012, at 0:27, Roman Chyla roman.ch...@gmail.com wrote: Lucene is 4.0 in alpha release and we would like to start working with pylucene4.0 already. I checked out the pylucene trunk and made the necessary changes so that it compiles. Would it be possible to incorporate (some of) these changes? Absolutely, please send a patch to the list or file a bug and attach it there. The issue with a PyLucene 4.0 release is not so much getting it to compile and run but rewriting all the tests and samples (originally ported from Java) since the Lucene api changed in many ways. That's a large amount of work and some of the new analyzer/tokenizer framework stuff needs some new jcc support for generating classes on the fly. I've got that written to some extent already but porting the samples and tests again is daunting. Andi.. Thanks, Roman
Re: lucene4.0 release
The patch probably probably didn't make it to the list, I'll file a ticket later It is definitely lot of work with the python code, I have gone through 1.5 test cases now, and it is just 'unpleasant', so many API changes out there - but I'll try to convert more roman On Thu, Jul 5, 2012 at 7:48 PM, Andi Vajda va...@apache.org wrote: On Jul 6, 2012, at 0:27, Roman Chyla roman.ch...@gmail.com wrote: Lucene is 4.0 in alpha release and we would like to start working with pylucene4.0 already. I checked out the pylucene trunk and made the necessary changes so that it compiles. Would it be possible to incorporate (some of) these changes? Absolutely, please send a patch to the list or file a bug and attach it there. The issue with a PyLucene 4.0 release is not so much getting it to compile and run but rewriting all the tests and samples (originally ported from Java) since the Lucene api changed in many ways. That's a large amount of work and some of the new analyzer/tokenizer framework stuff needs some new jcc support for generating classes on the fly. I've got that written to some extent already but porting the samples and tests again is daunting. Andi.. Thanks, Roman
[jira] [Comment Edited] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406866#comment-13406866 ] Bernd Fehling edited comment on SOLR-3377 at 7/5/12 6:10 AM: - I was willing to supply a final fix to this and was hoping that it will make it to release 4.x. But unfortunately: - I got no enhanced unit test - noone comitted this/my patch either - the problem is still there So I said was willing, thats true, I gave up on this and thinking now about switching to ElasticSearch because they really appreciate any help. was (Author: befehl): I was willing to supply a final fix to this and was hoping that it will make it to release 4.x. But unfortunately: - I got no enhanced unit test - noone comitted this/my patch either - the problem is still there So I said was willing, thats true, I gave up on this and thinking now about switching to ElasticSearch because they really appreciate any help. eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Fix For: 4.0, 3.6.1 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406866#comment-13406866 ] Bernd Fehling commented on SOLR-3377: - I was willing to supply a final fix to this and was hoping that it will make it to release 4.x. But unfortunately: - I got no enhanced unit test - noone comitted this/my patch either - the problem is still there So I said was willing, thats true, I gave up on this and thinking now about switching to ElasticSearch because they really appreciate any help. eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Fix For: 4.0, 3.6.1 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing
Thanks Uwe. Maybe I'm blind but I can't really see any inner classes in this test (or the classes it extends). The file names seems to contain the parameters used to run the test method, I'm not sure where these values are taken from so I don't know how to compress them. On Thu, Jul 5, 2012 at 5:55 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Chris, ** ** See my mail from yesterday: Clover does not have a problem with the code, it is more that the code nesting in this test is so deep, that the filename generated is too long for the underlying os. ** ** Maybe make test simplier with less code nesting (inner-inner-inner classes, too long variable names). The file name seems to be generated by that. ** ** - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de ** ** *From:* Chris Male [mailto:gento...@gmail.com] *Sent:* Thursday, July 05, 2012 7:41 AM *To:* dev@lucene.apache.org *Subject:* Re: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing ** ** I don't really get what is going on here, apart from that Clover is failing due to this test. The file it is looking for seems crazy, is that expected? ** ** Having looked at the test, I haven't come across the @ParametersFactory annotation before but maybe that somehow doesn't work with Clover? On Thu, Jul 5, 2012 at 5:34 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-trunk/1981/ All tests passed Build Log: [...truncated 37937 lines...] [...truncated 37937 lines...] [...truncated 37937 lines...] [...truncated 37937 lines...] [...truncated 37937 lines...] [...truncated 37937 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org ** ** -- Chris Male -- Chris Male | Software Developer | DutchWorks | www.dutchworks.nl
RE: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing
Let's trick clover, by adding magic comments: ///CLOVER:OFF . code . ///CLOVER:ON See https://confluence.atlassian.com/display/CLOVER026/Using+Source+Directives Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Chris Male [mailto:gento...@gmail.com] Sent: Thursday, July 05, 2012 8:14 AM To: dev@lucene.apache.org Subject: Re: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing Thanks Uwe. Maybe I'm blind but I can't really see any inner classes in this test (or the classes it extends). The file names seems to contain the parameters used to run the test method, I'm not sure where these values are taken from so I don't know how to compress them. On Thu, Jul 5, 2012 at 5:55 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Chris, See my mail from yesterday: Clover does not have a problem with the code, it is more that the code nesting in this test is so deep, that the filename generated is too long for the underlying os. Maybe make test simplier with less code nesting (inner-inner-inner classes, too long variable names). The file name seems to be generated by that. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de From: Chris Male [mailto:gento...@gmail.com] Sent: Thursday, July 05, 2012 7:41 AM To: dev@lucene.apache.org Subject: Re: [JENKINS] Lucene-trunk - Build # 1981 - Still Failing I don't really get what is going on here, apart from that Clover is failing due to this test. The file it is looking for seems crazy, is that expected? Having looked at the test, I haven't come across the @ParametersFactory annotation before but maybe that somehow doesn't work with Clover? On Thu, Jul 5, 2012 at 5:34 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-trunk/1981/ All tests passed Build Log: [...truncated 37937 lines...] [...truncated 37937 lines...] [...truncated 37937 lines...] [...truncated 37937 lines...] [...truncated 37937 lines...] [...truncated 37937 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male -- Chris Male | Software Developer | DutchWorks | www.dutchworks.nl
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406874#comment-13406874 ] Shai Erera commented on LUCENE-4190: What if we had an object called IndexFileNames with a method accept(String name), that returns true if the file is recognized, false otherwise - that could give applications a way to create a recognized-set of index files: * Lucene would provide a DefaultIndexFileNames which recognizes all non-codec files * Either the app would provide an extension to the default (or a wrapper) which recognizes its codec files as well ** Or, we make the Codec responsible for recognizing files too, and then the code would just query the Codec for non-default index files. Either way, it seems like we can very easily recognize what are index files and what aren't. When files need to be deleted, it seems simple as well: * Lucene lists all files in the directory * Any file that is referenced by the index (I assume we still know which files are needed right?) is kept * Any other file is queried against IndexFileNames.accept and if it is accepted, it's deleted, otherwise it's left alone. Since this looks too simple to me, I'm assuming that I'm missing something. If so, can someone please clarify the problem to me? IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406877#comment-13406877 ] Gilad Barkai commented on LUCENE-4190: -- Perhaps out of context, but here goes.. Users sometimes do stupid things, me included, such as putting the index in a non-dedicated-directory. But should they pay the penalty just because the code should not get overly complicated? Codecs create their own files, and no one seems able to control what files they create (other than in assert?); Than, is it possible for the codec to handle the removal of the files it created? That would make codecs work the same way the Index handles the 'core' index files - each codec will be able to erase its own. Another closely related option - let IW consult with the codecs about 'non-core-files' and see which one should/could be removed. I only suggest this because I fear for users' files which might get erased. Disclosure: It'll be ages before I understand Lucene 4 half as much as I do Lucene 3.6 (not that that's much), so forgive me if I stepped on anyone's toes, or just described how to implement a time machine :) IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406879#comment-13406879 ] selckin commented on LUCENE-4190: - what if you accidentally call deleteAll() on your production index, maybe old commit points should not be deleted until after a period of 30 days IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406879#comment-13406879 ] selckin edited comment on LUCENE-4190 at 7/5/12 6:54 AM: - edit: remove unhelpful sarcasm, sorry was (Author: selckin): what if you accidentally call deleteAll() on your production index, maybe old commit points should not be deleted until after a period of 30 days IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4094) Randomize file.encoding
[ https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406902#comment-13406902 ] Dawid Weiss commented on LUCENE-4094: - Follow-up discussion wrt overriding file.encoding: http://markmail.org/message/q4eeac7q6fjalbtd Randomize file.encoding --- Key: LUCENE-4094 URL: https://issues.apache.org/jira/browse/LUCENE-4094 Project: Lucene - Java Issue Type: Sub-task Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Stated in the code: {code} // TODO we can't randomize this yet (it drives ant crazy) but this makes tests reproduce // in case machines have different default charsets... sb.append( -Dargs=\-Dfile.encoding= + System.getProperty(file.encoding) + \); {code} But this should work without any problems with junit4 because communication streams are separate and we're decoding output properly (or so I hope). Try and see what happens :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406923#comment-13406923 ] Jan Høydahl commented on SOLR-3377: --- Bernd, I agree that this is a bug that absolutely should be fixed. I have followed it through this far but have not yet had the chance to go the last mile until committing, but I am definitely keen to pick it up again after summer holidays and parental leave, hopefully before. The reason I unassigned myself is to signal to the other committers that I'm not actively working on this and let others step in if they wish. This is the way Apache works - we are all volunteers, and I am sure that with some patience this will make it through in time for 4.0 final. You've done a great job so far with the patch. It may be final and good to go, but personally I'd write some more tests since this particular area has been lacking - before committing. eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Fix For: 4.0, 3.6.1 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3377: -- Priority: Critical (was: Major) Upgrading priority to signal the severity - i.e. a valid user query may return 0 hits, which may be pretty critical for some. eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Priority: Critical Fix For: 4.0, 3.6.1 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4094) Randomize file.encoding
[ https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406946#comment-13406946 ] Robert Muir commented on LUCENE-4094: - I totally disagree with everything the jdk developers are saying. They tend to just whine when we find bugs in their shit. we should continue to do this: its important to seek out these default charset bugs (this is because of their stupid design). Randomize file.encoding --- Key: LUCENE-4094 URL: https://issues.apache.org/jira/browse/LUCENE-4094 Project: Lucene - Java Issue Type: Sub-task Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Stated in the code: {code} // TODO we can't randomize this yet (it drives ant crazy) but this makes tests reproduce // in case machines have different default charsets... sb.append( -Dargs=\-Dfile.encoding= + System.getProperty(file.encoding) + \); {code} But this should work without any problems with junit4 because communication streams are separate and we're decoding output properly (or so I hope). Try and see what happens :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4191) Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404
[ https://issues.apache.org/jira/browse/LUCENE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4191. - Resolution: Won't Fix Don't use these /api links Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404 --- Key: LUCENE-4191 URL: https://issues.apache.org/jira/browse/LUCENE-4191 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.6 Reporter: Chaim Peck Labels: documentation Try to go to this URL: http://lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html The result is that you will be redirected here, which is a 404: http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/BaseTokenFilterFactory.html You can still get to the page from google cache: http://webcache.googleusercontent.com/search?q=cache:mCJCac4iZ0QJ:lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html+cd=1hl=enct=clnkgl=us -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-4.x - Build # 27 - Still Failing
Build: https://builds.apache.org/job/Lucene-4.x/27/ All tests passed Build Log: [...truncated 38243 lines...] [...truncated 38243 lines...] [...truncated 38243 lines...] [...truncated 38243 lines...] [...truncated 38243 lines...] [...truncated 38243 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4094) Randomize file.encoding
[ https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406949#comment-13406949 ] Dawid Weiss commented on LUCENE-4094: - I understand their argument (combination not encountered in practice) but I disagree with the claim it should justify crappy code. The default charset should be independent of the OS-filesystem interaction. It should just work with UTF-16. Anyway, when I run our stuff with enforced UTF-16 lots of weird things start to happen. new FileReader(file), benchmarks run forever (will provide a seed) and such. I'll commit in one by one and then we can start testing/ fixing locally. Randomize file.encoding --- Key: LUCENE-4094 URL: https://issues.apache.org/jira/browse/LUCENE-4094 Project: Lucene - Java Issue Type: Sub-task Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Stated in the code: {code} // TODO we can't randomize this yet (it drives ant crazy) but this makes tests reproduce // in case machines have different default charsets... sb.append( -Dargs=\-Dfile.encoding= + System.getProperty(file.encoding) + \); {code} But this should work without any problems with junit4 because communication streams are separate and we're decoding output properly (or so I hope). Try and see what happens :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406951#comment-13406951 ] Robert Muir commented on LUCENE-4190: - Again: I am totally against complicated file handling here for this reason. People can handle this some other way in their apps. We *HAVE* to keep this kind of code simple and maintainable in lucene. It was a mistake to start a slippery slope by being friendly at all to this. (I reverted) IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4193) Update Lucene FAQ regarding index-time field boosting
Elmer van Chastelet created LUCENE-4193: --- Summary: Update Lucene FAQ regarding index-time field boosting Key: LUCENE-4193 URL: https://issues.apache.org/jira/browse/LUCENE-4193 Project: Lucene - Java Issue Type: Improvement Components: general/website Reporter: Elmer van Chastelet Priority: Minor Current FAQ says the following regarding index-time field boosts: {quote}Index time field boosts are worthless if you set them on every document.{quote} see [the FAQ|http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F]. I think, this should be changed to {quote}Index time field boosts are worthless if you set them on every document _and solely search on this field at query time_.{quote} Because, when searching on _multiple_ fields, a match in a properly index-time boosted field will score higher than a match in a non-boosted field. See [this discussion|https://forum.hibernate.org/viewtopic.php?f=9t=1016615] on Hibernate Search forums. Not sure if there are more places where similar statements are made. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-4.x - Build # 28 - Still Failing
Build: https://builds.apache.org/job/Solr-4.x/28/ No tests ran. Build Log: [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4110) Report long periods of forked jvm inactivity (hung tests/ suites).
[ https://issues.apache.org/jira/browse/LUCENE-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-4110. - Resolution: Fixed Report long periods of forked jvm inactivity (hung tests/ suites). -- Key: LUCENE-4110 URL: https://issues.apache.org/jira/browse/LUCENE-4110 Project: Lucene - Java Issue Type: Sub-task Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Fix For: 5.0 https://github.com/carrotsearch/randomizedtesting/issues/106 I'll see what can be done about it (had some thoughts on the way back to the hotel and I think it's doable). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4189) Test output should include timestamps (start/end for each test/ suite).
[ https://issues.apache.org/jira/browse/LUCENE-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-4189. - Resolution: Fixed Test output should include timestamps (start/end for each test/ suite). --- Key: LUCENE-4189 URL: https://issues.apache.org/jira/browse/LUCENE-4189 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0, 5.0 This adds more verboseness to the output -- should this be optional (overrideable using local properties but defaulting to 'off')? {code} [junit4] [11:54:50.259] Suite: org.apache.lucene.index.TestDeletionPolicy [junit4] [11:54:53.706] Completed in 3.45s, 6 tests [junit4] [junit4] [11:54:53.709] Suite: org.apache.lucene.util.TestVirtualMethod [junit4] [11:54:53.725] Completed in 0.02s, 2 tests [junit4] [junit4] [11:54:53.728] Suite: org.apache.lucene.index.TestRollingUpdates [junit4] [11:54:55.700] Completed in 1.97s, 2 tests [junit4] [junit4] [11:54:55.721] Suite: org.apache.lucene.index.TestIndexWriterExceptions [junit4] [11:55:02.394] Completed in 6.67s, 24 tests [junit4] [junit4] [11:55:02.398] Suite: org.apache.lucene.index.TestNoDeletionPolicy [junit4] [11:55:02.548] Completed in 0.15s, 4 tests ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4110) Report long periods of forked jvm inactivity (hung tests/ suites).
[ https://issues.apache.org/jira/browse/LUCENE-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-4110: Fix Version/s: 4.0 Report long periods of forked jvm inactivity (hung tests/ suites). -- Key: LUCENE-4110 URL: https://issues.apache.org/jira/browse/LUCENE-4110 Project: Lucene - Java Issue Type: Sub-task Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Fix For: 4.0, 5.0 https://github.com/carrotsearch/randomizedtesting/issues/106 I'll see what can be done about it (had some thoughts on the way back to the hotel and I think it's doable). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Clean your workspace (jars update)
I've updated randomizedtesting to 1.6.0. ant clean resolve, please. Also, a few requested things have made it into this commit: * Timestamps on suites/tests (disabled by default, enable in your local props or -Dtests.timestamps=on). https://issues.apache.org/jira/browse/LUCENE-4189 * Long-running/ hung tests will report back to the console now (every 60 seconds). https://issues.apache.org/jira/browse/LUCENE-4110 * [IMPORTANT] The forked JVM's file.encoding property will be randomized between the following three: US-ASCII, ISO-8859-1, UTF-8, (your platform's default). The last one is an important change and it may (will) break tests. Please help out in fixing default encoding-sensitive things both in tests and in code. If you have a Windows machine (or Java 1.6 JVM) you can go with: ant -Dtests.file.encoding=UTF-16 this will most likely break anything that expects lower ASCII range (which is unfortunately the same in all the above randomized encodings). Any problems, requests, ideas, feedback -- speak up. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4094) Randomize file.encoding
[ https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-4094: Fix Version/s: 5.0 4.0 Randomize file.encoding --- Key: LUCENE-4094 URL: https://issues.apache.org/jira/browse/LUCENE-4094 Project: Lucene - Java Issue Type: Sub-task Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0, 5.0 Stated in the code: {code} // TODO we can't randomize this yet (it drives ant crazy) but this makes tests reproduce // in case machines have different default charsets... sb.append( -Dargs=\-Dfile.encoding= + System.getProperty(file.encoding) + \); {code} But this should work without any problems with junit4 because communication streams are separate and we're decoding output properly (or so I hope). Try and see what happens :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4094) Randomize file.encoding
[ https://issues.apache.org/jira/browse/LUCENE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-4094. - Resolution: Fixed Randomize file.encoding --- Key: LUCENE-4094 URL: https://issues.apache.org/jira/browse/LUCENE-4094 Project: Lucene - Java Issue Type: Sub-task Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0, 5.0 Stated in the code: {code} // TODO we can't randomize this yet (it drives ant crazy) but this makes tests reproduce // in case machines have different default charsets... sb.append( -Dargs=\-Dfile.encoding= + System.getProperty(file.encoding) + \); {code} But this should work without any problems with junit4 because communication streams are separate and we're decoding output properly (or so I hope). Try and see what happens :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Solr-4.x - Build # 28 - Still Failing
This has to do with ivy -- I admit I don't know what's happening. The configuration gets read once but it isn't persisted for other ivy tasks (even though the property clearly is): So you end up with: ivy-configure: resolve: [ivy:retrieve] :: loading settings :: url = jar:file:/home/hudson/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve] WARNINGS [ivy:retrieve] module not found: com.carrotsearch.randomizedtesting#junit4-ant;1.6.0 [ivy:retrieve] local: tried [ivy:retrieve] /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar [ivy:retrieve] shared: tried [ivy:retrieve] /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar [ivy:retrieve] public: tried [ivy:retrieve] http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.pom [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.jar [ivy:retrieve] module not found: com.carrotsearch.randomizedtesting#randomizedtesting-runner;1.6.0 [ivy:retrieve] local: tried Note that sonatype's release repository is NOT tried, it just checks the default chain. I'll provide a workaround fix in a second but I don't know how to fix it in a proper way. Dawid On Thu, Jul 5, 2012 at 12:47 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Solr-4.x/28/ No tests ran. Build Log: [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring
[ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406992#comment-13406992 ] Robert Muir commented on LUCENE-4100: - Hello, thank you for working on this! I have just taken a rough glance at the code, and think we should probably look at what API changes would make this sort of thing fit better into Lucene and it easier to implement. Random thoughts: Specifically: what you are doing in the PostingsWriter is similar to computing impacts (I don't have a copy of the paper so admittedly don't know the exact algorithm you are using). But it seems to me that you are putting a maxScore in the term dictionary metadata for all of the terms postings (as a float). With the tool you provide, this works because you have access to e.g. the segment's length normalization information etc (your postingswriter takes a reader). But we would have to think about how to give postingswriters access to this on flush... it seems possible to me though. Giving the postingswriter full statistics (e.g. docfreq) for Similarity computation seems difficult: while I think we could accum this stuff in FreqProxTermsWriter before we flush to the codec, it wouldn't solve the problem at merge time, so you would have to do a 2-pass merge in the codec somehow... But the alternative of splitting the impact (tf/norm) from the document-independent weight (e.g. IDF) isn't that pretty either, because it limits the scoring systems (Similarity implementations) that could use the optimization. as many terms will be low frequency (e.g. docfreq=1), i think its not worth it to encode the maxscore for these low freq terms: we could save space by omitting maxscore for low freq terms and just treat it as infinitely large? the opposite problem: is it really optimal to encode maxscore for the entire term? or would it be better for high-freq terms to encode maxScore for a range of postings (e.g. block). This way, you could skip over ranges of postings that cannot compete (rather than limiting the optimization to an entire term). A codec could put this information into a block header, or at certain intervals, into the skip data, etc. do we really need a full 4-byte float? How well would the algorithm work with degraded precision: e.g. something like SmallFloat. (I think this SmallFloat currently computes a lower bound, we would have to bump to the next byte to make an upper bound). another idea: it might be nice if this optimization could sit underneath the codec, such that you dont need a special Scorer. One idea here would be for your collector to set an attribute on the DocsEnum (maxScore): of course a normal codec would totally ignore this and proceed as today. But codecs like this one could return NO_MORE_DOCS when postings for that term can no longer compete. I'm just not positive if this algorithm can be refactored in this way, and this would also require some clean way of getting these attributes from Collector - Scorer - DocsEnum. Currently Scorer is in the way here :) Just some random thoughts, I'll try to get a copy of this paper so I have a better idea whats going on with this particular optimization... Maxscore - Efficient Scoring Key: LUCENE-4100 URL: https://issues.apache.org/jira/browse/LUCENE-4100 Project: Lucene - Java Issue Type: Improvement Components: core/codecs, core/query/scoring, core/search Affects Versions: 4.0 Reporter: Stefan Pohl Labels: api-change, patch, performance Fix For: 4.0 Attachments: contrib_maxscore.tgz, maxscore.patch At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient algorithm first published in the IR domain in 1995 by H. Turtle J. Flood, that I find deserves more attention among Lucene users (and developers). I implemented a proof of concept and did some performance measurements with example queries and lucenebench, the package of Mike McCandless, resulting in very significant speedups. This ticket is to get started the discussion on including the implementation into Lucene's codebase. Because the technique requires awareness about it from the Lucene user/developer, it seems best to become a contrib/module package so that it consciously can be chosen to be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Solr-4.x - Build # 28 - Still Failing
Ok, I know why this is happening. We have antcall and ant calls across build files. Sometimes these calls pass only selected properties (propertysets) but do not pass references. As in here: ant dir=${common.dir} target=default inheritall=false propertyset refid=uptodate.and.compiled.properties/ /ant ivy:configure stores the default configuration as a reference so the property will be passed down but the reference not. I don't know how to fix it cleanly so I'll just leave my workaround patch in (which re-reads the configuration every time, unfortunately). Dawid On Thu, Jul 5, 2012 at 12:52 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: This has to do with ivy -- I admit I don't know what's happening. The configuration gets read once but it isn't persisted for other ivy tasks (even though the property clearly is): So you end up with: ivy-configure: resolve: [ivy:retrieve] :: loading settings :: url = jar:file:/home/hudson/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve] WARNINGS [ivy:retrieve] module not found: com.carrotsearch.randomizedtesting#junit4-ant;1.6.0 [ivy:retrieve] local: tried [ivy:retrieve] /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar [ivy:retrieve] shared: tried [ivy:retrieve] /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar [ivy:retrieve] public: tried [ivy:retrieve] http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.pom [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.jar [ivy:retrieve] module not found: com.carrotsearch.randomizedtesting#randomizedtesting-runner;1.6.0 [ivy:retrieve] local: tried Note that sonatype's release repository is NOT tried, it just checks the default chain. I'll provide a workaround fix in a second but I don't know how to fix it in a proper way. Dawid On Thu, Jul 5, 2012 at 12:47 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Solr-4.x/28/ No tests ran. Build Log: [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Solr-4.x - Build # 28 - Still Failing
Is your workaround particularly slow? On Thu, Jul 5, 2012 at 11:18 PM, Dawid Weiss dawid.we...@cs.put.poznan.plwrote: Ok, I know why this is happening. We have antcall and ant calls across build files. Sometimes these calls pass only selected properties (propertysets) but do not pass references. As in here: ant dir=${common.dir} target=default inheritall=false propertyset refid=uptodate.and.compiled.properties/ /ant ivy:configure stores the default configuration as a reference so the property will be passed down but the reference not. I don't know how to fix it cleanly so I'll just leave my workaround patch in (which re-reads the configuration every time, unfortunately). Dawid On Thu, Jul 5, 2012 at 12:52 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: This has to do with ivy -- I admit I don't know what's happening. The configuration gets read once but it isn't persisted for other ivy tasks (even though the property clearly is): So you end up with: ivy-configure: resolve: [ivy:retrieve] :: loading settings :: url = jar:file:/home/hudson/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve] WARNINGS [ivy:retrieve] module not found: com.carrotsearch.randomizedtesting#junit4-ant;1.6.0 [ivy:retrieve] local: tried [ivy:retrieve] /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar [ivy:retrieve] shared: tried [ivy:retrieve] /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar [ivy:retrieve] public: tried [ivy:retrieve] http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.pom [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.jar [ivy:retrieve] module not found: com.carrotsearch.randomizedtesting#randomizedtesting-runner;1.6.0 [ivy:retrieve] local: tried Note that sonatype's release repository is NOT tried, it just checks the default chain. I'll provide a workaround fix in a second but I don't know how to fix it in a proper way. Dawid On Thu, Jul 5, 2012 at 12:47 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Solr-4.x/28/ No tests ran. Build Log: [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male
Re: [JENKINS] Solr-4.x - Build # 28 - Still Failing
I don't think so -- it just reloads the config file over and over but it's probably in the os cache anyway. That property passing is broken as I explained. In general antcalls lead to a big mess with references/ properties, I hate it (but don't have an idea how to improve it). Dawid On Thu, Jul 5, 2012 at 1:21 PM, Chris Male gento...@gmail.com wrote: Is your workaround particularly slow? On Thu, Jul 5, 2012 at 11:18 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Ok, I know why this is happening. We have antcall and ant calls across build files. Sometimes these calls pass only selected properties (propertysets) but do not pass references. As in here: ant dir=${common.dir} target=default inheritall=false propertyset refid=uptodate.and.compiled.properties/ /ant ivy:configure stores the default configuration as a reference so the property will be passed down but the reference not. I don't know how to fix it cleanly so I'll just leave my workaround patch in (which re-reads the configuration every time, unfortunately). Dawid On Thu, Jul 5, 2012 at 12:52 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: This has to do with ivy -- I admit I don't know what's happening. The configuration gets read once but it isn't persisted for other ivy tasks (even though the property clearly is): So you end up with: ivy-configure: resolve: [ivy:retrieve] :: loading settings :: url = jar:file:/home/hudson/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve] WARNINGS [ivy:retrieve] module not found: com.carrotsearch.randomizedtesting#junit4-ant;1.6.0 [ivy:retrieve] local: tried [ivy:retrieve] /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] /home/hudson/.ivy2/local/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar [ivy:retrieve] shared: tried [ivy:retrieve] /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/ivys/ivy.xml [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] /home/hudson/.ivy2/shared/com.carrotsearch.randomizedtesting/junit4-ant/1.6.0/jars/junit4-ant.jar [ivy:retrieve] public: tried [ivy:retrieve] http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.pom [ivy:retrieve]-- artifact com.carrotsearch.randomizedtesting#junit4-ant;1.6.0!junit4-ant.jar: [ivy:retrieve] http://repo1.maven.org/maven2/com/carrotsearch/randomizedtesting/junit4-ant/1.6.0/junit4-ant-1.6.0.jar [ivy:retrieve] module not found: com.carrotsearch.randomizedtesting#randomizedtesting-runner;1.6.0 [ivy:retrieve] local: tried Note that sonatype's release repository is NOT tried, it just checks the default chain. I'll provide a workaround fix in a second but I don't know how to fix it in a proper way. Dawid On Thu, Jul 5, 2012 at 12:47 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Solr-4.x/28/ No tests ran. Build Log: [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] [...truncated 8055 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4145) Unhandled exception from test framework (in json parsing of test output files?)
[ https://issues.apache.org/jira/browse/LUCENE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-4145. - Resolution: Fixed Fix Version/s: 5.0 4.0 Should be better now as events are not buffered on the client. I still wouldn't give my head for the -Dtests.iters=gazillion scenario because they're still buffered on the master (for reports, etc.) As always, it's a tradeoff -- spilling those events to disk is possible but would increase the complexity a lot. Maybe an embedded simple db like hsqldb or something would help here, I don't know. Anyway, it doesn't make sense in 99% of situations (so large iteration count/ tests number). Unhandled exception from test framework (in json parsing of test output files?) - Key: LUCENE-4145 URL: https://issues.apache.org/jira/browse/LUCENE-4145 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man Assignee: Dawid Weiss Fix For: 4.0, 5.0 Working on SOLR-3267 i got a weird exception printed to the junit output... {noformat} [junit4] Unhandled exception in thread: Thread[pumper-events,5,main] [junit4] com.carrotsearch.ant.tasks.junit4.dependencies.com.google.gson.JsonParseException: No such reference: id#org.apache.solr.search.TestSort[3] ... {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407001#comment-13407001 ] Carl Austin commented on LUCENE-4190: - I was the original commenter on the blog about this issue and have previously experienced the deletion of all files on a drive because of the exact same restriction - the fallout from this is massive. The issue here is that many people who use lucene will not realise that this can happen, and this situation will occur sooner or later. You can't expect that every developer who uses lucene will understand every in and out, read every bit of javadoc fully or every release change note. Look at the number of posts to the mailing list that are just people who haven't fully read or understood something. I firmly believe that this has to be handled by the library such that a simple mistake or misunderstanding by a developer does not lead to the loss of important files. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4194) Fix default charset sensitive method calls
Dawid Weiss created LUCENE-4194: --- Summary: Fix default charset sensitive method calls Key: LUCENE-4194 URL: https://issues.apache.org/jira/browse/LUCENE-4194 Project: Lucene - Java Issue Type: Bug Reporter: Dawid Weiss Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4194) Fix default charset sensitive method calls
[ https://issues.apache.org/jira/browse/LUCENE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407018#comment-13407018 ] Dawid Weiss commented on LUCENE-4194: - There are a number of places (in tests mostly) which call: {code} new FileReader(File) String.getBytes() new String(byte[]) {code} The expected encoding should be provided explicitly, even if the contents is mostly ASCII. Fix default charset sensitive method calls -- Key: LUCENE-4194 URL: https://issues.apache.org/jira/browse/LUCENE-4194 Project: Lucene - Java Issue Type: Bug Reporter: Dawid Weiss Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4194) Fix default charset sensitive method calls
[ https://issues.apache.org/jira/browse/LUCENE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407019#comment-13407019 ] Dawid Weiss commented on LUCENE-4194: - Try running: {noformat} ant -Dtests.file.encoding=UTF-16 test {noformat} on windows. This exposes most of these issues. Fix default charset sensitive method calls -- Key: LUCENE-4194 URL: https://issues.apache.org/jira/browse/LUCENE-4194 Project: Lucene - Java Issue Type: Bug Reporter: Dawid Weiss Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4194) Fix default charset sensitive method calls
[ https://issues.apache.org/jira/browse/LUCENE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-4194: Attachment: CropperCapture[2].png CropperCapture[1].png A list of files calling forbidden methods... Fix default charset sensitive method calls -- Key: LUCENE-4194 URL: https://issues.apache.org/jira/browse/LUCENE-4194 Project: Lucene - Java Issue Type: Bug Reporter: Dawid Weiss Priority: Minor Fix For: 4.0, 5.0 Attachments: CropperCapture[1].png, CropperCapture[2].png -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407027#comment-13407027 ] Shai Erera commented on LUCENE-4190: bq. We HAVE to keep this kind of code simple and maintainable in lucene. Why? We write lots of other code that prevents users from shooting themselves in the legs, so why make an exception here? Just because a code might get complicated doesn't mean we don't need to write it. While I agree with you that Lucene is not a File manager, I think it'd be good if we can cleanup after ourselves rather than delete everything that we don't recognize. Since you're more familiar than me with the 4.0 internals, can you please relate to the simple proposal I outlined above? Can it even work? IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407033#comment-13407033 ] Michael McCandless commented on LUCENE-4190: I agree there is a real danger here if users accidentally point IndexWriter at the wrong directory. This was found/fixed way in the past already: LUCENE-385. But I also don't want to go back to the hairy files(), extensions() we used to require of all codec components. Yet I think there's a good middle ground: only allow a codec to write to _seg.* or _seg_*.* files (ie the ones created by IndexFileNames). All of our codecs are (should be!) using IndexFileName.* to compute a file name to write to. In reality a codec already isn't free to just write to any file, because then it may conflict with another codec doing the same thing. So de-facto codecs already have a private namespace, prefixed by _seg and further refined by _N (ie when there are multiple postings formats in a single codec). Since a general codec must already obey its private namespace (to not step on other codecs) I think it's fine to enforce it? IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407034#comment-13407034 ] Robert Muir commented on LUCENE-4190: - The problem with a global filter is that what files a codec uses are an implementation detail of the codec. Currently today, a codec can name files pretty much whatever it wants (it must avoid _seg.cfs and segments_seg and segments.gen of course). In general other than exceptional cases, we know which files a codec owns because a codec writes the list of files that it uses for a segment into the SegmentInfo (http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/codecs/lucene40/Lucene40SegmentInfoFormat.html). The problem is these exceptional cases: how can IndexFileDeleter distinguish between leftover partially written index files for a segment and some files of the user, since it may not have the SegmentInfo (.si) for that segment? Previous attempts at this still didnt work well: * listing the extensions() in the codec is not great, e.g. Sep codec uses .doc extension for documents! * having the codec list the files it uses for a segment isnt easy and causes a mess: previously files() had to be symmetric at read and write time and we often had bugs in this, because the files used by the codec often depends upon various things like options the user chooses (e.g. did they enable term vectors, payloads, etc etc). I will do *anything* to prevent this from coming back! So in my opinion, the only real, third option is to restrict what file names a codec can use, in a way thats not a huge imposition to the codec. My patch on this issue (which people weren't happy with) did just this: it restricted file names to begin with an underscore. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407040#comment-13407040 ] Robert Muir commented on LUCENE-4190: - {quote} Since a general codec must already obey its private namespace (to not step on other codecs) I think it's fine to enforce it? {quote} The problem it seems is people want a perfect solution. An imperfect solution (_.*) seems to imply that its a bug if lucene deletes _myImportantDocument.doc. So if we insist on a perfect solution: then fine, the perfect solution I accept is for lucene to totally own the directory, don't put files in there! Then the behavior is clear, no bugs, we delete everything. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4191) Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404
[ https://issues.apache.org/jira/browse/LUCENE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407082#comment-13407082 ] Chaim Peck commented on LUCENE-4191: Then where does one go to find documentation? The above link if the first hit when you google BaseTokenFilterFactory Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404 --- Key: LUCENE-4191 URL: https://issues.apache.org/jira/browse/LUCENE-4191 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.6 Reporter: Chaim Peck Labels: documentation Try to go to this URL: http://lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html The result is that you will be redirected here, which is a 404: http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/BaseTokenFilterFactory.html You can still get to the page from google cache: http://webcache.googleusercontent.com/search?q=cache:mCJCac4iZ0QJ:lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html+cd=1hl=enct=clnkgl=us -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407086#comment-13407086 ] Gilad Barkai commented on LUCENE-4190: -- {quote} So if we insist on a perfect solution: then fine, the perfect solution I accept is for lucene to totally own the directory, don't put files in there! Then the behavior is clear, no bugs, we delete everything. {quote} But than we're left with the original problem - should a poor user (say, me) accidentally put an index in an already filled directory (say /tmp) - the price to pay for is great. Too great IMHO. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407088#comment-13407088 ] Yonik Seeley commented on LUCENE-4190: -- Robert, you completely ignored my explicit VETO. We had consensus, and code was committed. It's no longer your commit to do anything you want with over the objections of others. Undoubtedly, you would now revert any commit I would make to rectify the situation and fix this bug. So let's now take it to the PMC and codify if it's OK to ignore VETOs of other PMC members that you don't agree with. Perhaps we need to update the rules we operate under. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407091#comment-13407091 ] Robert Muir commented on LUCENE-4190: - you can't veto me backing out my own commit. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407094#comment-13407094 ] Mark Miller commented on LUCENE-4190: - bq. pretty sure I have the right to revert my own commit. Once something is in the code base, it doesn't matter who committed it - all the same rules apply. That it's your commit doesn't change anything. Unless you went insane and started threatening license revoking type...oh wait... bq. I can declare the licensing of asl2 as a mistake and instead full gpl if we want to press the point? You can't be on the PMC and play games like this if you ask me. Being on the PMC means you have an obligation to act above this. Are we going back to the revert wars now? As far as I can tell you are trying to act like a dictator on this issue. You contribute a lot to Lucene, but you are not the dictator. Why do you need to *demand* that certain things happen as you prescribe? Why do you need to make threats about revoking licenses? This issue should be about consensus, not bullying. Are you kidding me dude? I hope you start working with the community and stop trying to step on it. Your stance is far too often, it's Roberts way or the highway if you ask me. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407099#comment-13407099 ] Mark Harwood commented on LUCENE-4190: -- -1 for merrily wiping contents of whatever directory a user happens to pick for an index location +0 on requiring all codecs to declare filenames because I take on board Rob's points re complexity +1 for the _* name-spacing proposal as a sensible compromise IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407097#comment-13407097 ] Robert Muir commented on LUCENE-4190: - Thats ok, its clearly another typical Solr-versus-Robert battle here, where Mark+Yonik both gang up on me. Another way to look at it: I committed the patch after Mike reviewed it, because it looked like consensus. There was then a ton of questions and commentary, arguably there wasnt really consensus and i prematurely committed. So i backed it out. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux-Java6-64 - Build # 1168 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/1168/ 5 tests failed. REGRESSION: org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_DeltaImport_add_delete Error Message: Exception during query Stack Trace: java.lang.RuntimeException: Exception during query at __randomizedtesting.SeedInfo.seed([6E23FDC4CD0393D:D44317B3C5D732CF]:0) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:461) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:428) at org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_DeltaImport_add_delete(TestSqlEntityProcessorDelta2.java:282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='1'] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407104#comment-13407104 ] Mark Miller commented on LUCENE-4190: - bq. Thats ok, its clearly another typical Solr-versus-Robert battle here, where Mark+Yonik both gang up on me. I don't have an opinion on this issue. A lot of smart people have already given input, and I was interested to read about it. I have not formulated my own opinion yet. I also don't mind if you and Yonik have disagreement or debate. As long as you act reasonably. Anyone that ignores consensus and threatens license revoking has me not on their side. That's part of the role of a PMC member IMO. To keep an eye out for unhealthy community behavior and point it out. All the disagreement in the world is fine, but you have to play in the sandbox. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407107#comment-13407107 ] Robert Muir commented on LUCENE-4190: - I think if you read through the issue, how is consensus being ignored? Again: I committed the patch after Mike reviewed it, because it looked like consensus. But I think this was premature, because a lot of questions and comments came afterwards. Backing it out is the right thing to do. It might be that we get consensus for this patch or something else and it might even go right back in the way it was. You just don't like the words I used. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4195) Add javadocs to Codec package.html
Alan Woodward created LUCENE-4195: - Summary: Add javadocs to Codec package.html Key: LUCENE-4195 URL: https://issues.apache.org/jira/browse/LUCENE-4195 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Affects Versions: 5.0 Reporter: Alan Woodward Priority: Minor The Codec package.html is pretty basic. Add some overview information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407115#comment-13407115 ] Mark Miller commented on LUCENE-4190: - {quote}Another way to look at it: I committed the patch after Mike reviewed it, because it looked like consensus. There was then a ton of questions and commentary, arguably there wasnt really consensus and i prematurely committed. So i backed it out. {quote} That would have been a good argument and much better than the alternative argument you took IMO. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4195) Add javadocs to Codec package.html
[ https://issues.apache.org/jira/browse/LUCENE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-4195: -- Attachment: LUCENE-4195.patch Here's some basic javadoc, telling users how to register new Codecs and PostingsFormats. Pretty basic, but better than nothing! Add javadocs to Codec package.html -- Key: LUCENE-4195 URL: https://issues.apache.org/jira/browse/LUCENE-4195 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Affects Versions: 5.0 Reporter: Alan Woodward Priority: Minor Attachments: LUCENE-4195.patch The Codec package.html is pretty basic. Add some overview information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4195) Add javadocs to Codec package.html
[ https://issues.apache.org/jira/browse/LUCENE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407117#comment-13407117 ] Robert Muir commented on LUCENE-4195: - awesome! Thanks for doing this. Add javadocs to Codec package.html -- Key: LUCENE-4195 URL: https://issues.apache.org/jira/browse/LUCENE-4195 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Affects Versions: 5.0 Reporter: Alan Woodward Priority: Minor Attachments: LUCENE-4195.patch The Codec package.html is pretty basic. Add some overview information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4195) Add javadocs to Codec package.html
[ https://issues.apache.org/jira/browse/LUCENE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4195. - Resolution: Fixed Fix Version/s: 4.0 I committed this, Thanks again! Add javadocs to Codec package.html -- Key: LUCENE-4195 URL: https://issues.apache.org/jira/browse/LUCENE-4195 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Affects Versions: 5.0 Reporter: Alan Woodward Priority: Minor Fix For: 4.0 Attachments: LUCENE-4195.patch The Codec package.html is pretty basic. Add some overview information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407145#comment-13407145 ] Robert Muir commented on LUCENE-4190: - {quote} The words, the threat, the quick action - correct - that's my problem. {quote} Right, i take offense to the idea that if i committed something too soon, i cant back it out. Sure, it didnt help that I was already frustrated with the technical situation (I thought and still do think, that the patch is a great compromise, easy solution, low risk, simple, etc). But i think if this situation happens, e.g. someone commits prematurely, then there are a bunch of comments on the issue that make it clear there really isnt consensus, then they have the right to back it out, in fact I think its the right thing to do. And nobody can veto that. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407148#comment-13407148 ] Yonik Seeley commented on LUCENE-4190: -- bq. But I think this was premature, because a lot of questions and comments came afterwards. Except that if you read back through the issue, that's not what happened. Anyway, if you're interested in consensus now, I don't see anyone opposed to the underscore solution in the short term, even if some thought it didn't go far enough. I didn't see anyone saying that deleting all files was preferable in the short term. So if there are no objections, I'll re-commit the underscore fix (which there was consensus for), and then discussion can continue about better methods. Robert, I'll repeat your own words back to you: bq. We can maybe improve in the future besides the _ check, but I just think this is an easy improvement that will prevent most of the problems. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407151#comment-13407151 ] Robert Muir commented on LUCENE-4190: - {quote} I didn't see anyone saying that deleting all files was preferable in the short term. {quote} I'm not sure this is totally true. Again I think if its required that we have a *perfect* solution, then deleting all files is preferable to the alternative of codec having hairy code to detect if it owns or doesnt own a file. But this patch is a nice *imperfect* solution that probably prevents accidental deletion of MyImportantDocument.doc or whatever. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407157#comment-13407157 ] Uwe Schindler commented on LUCENE-4190: --- Hi, I was thinking about the whole thing for longer time. My idea would limit us a bit more, but I really like Mike's proposal of fixed names. I would change the Directory class, so every method that handles or deletes files gets 2 parameters, segment name and one arbitrary codec-private file name. the directory is then responsible to create the file name, prefix with _ and so on. A custom directoy (like hbase), could use the segment name as table name and the private file name as identifier, so all segment files go into same hbase table. the diurectory would then also be responible to do a cleanup/list of files, where it would only return files matching the pattern. For the index wide metdata like segments file we would then unfortunately need a special method to get indexoutput :( If we keep with current one-filename, i would make the format fixed, so it throws IOException if filename is invalid. Assert makes no sense here as it does not prevent people from doing the wrong thing. Then really nothing can create invalid files and deleting by _[0-9a-z_]+ works and all would be happy. Alternatively, we could switch to the following: - If we create an *new* index, we enforce that listFiles returns empty list (., .. excluded, buts thats done already), otherwise we throw IOException(directory not empty). - If there is a segment file already there, we can delete everything not allowed in an index. Uwe IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407155#comment-13407155 ] Robert Muir commented on LUCENE-4190: - {quote} So if there are no objections, I'll re-commit the underscore fix (which there was consensus for), and then discussion can continue about better methods. {quote} I don't object to the patch being committed (though i think it would be good to wait a little bit), but I am still very very concerned that it starts a slippery slope back to Codec.files(). IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407162#comment-13407162 ] Robert Muir commented on LUCENE-4190: - {quote} I was thinking about the whole thing for longer time. My idea would limit us a bit more, but I really like Mike's proposal of fixed names. I would change the Directory class, so every method that handles or deletes files gets 2 parameters, segment name and one arbitrary codec-private file name. the directory is then responsible to create the file name, prefix with _ and so on. A custom directoy (like hbase), could use the segment name as table name and the private file name as identifier, so all segment files go into same hbase table. the diurectory would then also be responible to do a cleanup/list of files, where it would only return files matching the pattern. {quote} I'm not sure matching _[0-9a-z_]+ is really that big of an improvement over just the underscore. But i dont think we need to refactor Directory.java to do this. we could just change the underscore check to a regular expression. {quote} Assert makes no sense here as it does not prevent people from doing the wrong thing. {quote} I don't agree: i at first thought to do a hard check, but this is only really necessary for codec developers. So an assert is enough, because you catch it when developing your codec (its either gonna work, or completely not work here). {quote} If we create an new index, we enforce that listFiles returns empty list (., .. excluded, buts thats done already), otherwise we throw IOException(directory not empty). {quote} I thought about this but i have concerns about things like .DS_Store and .nfsX or other files that some system could be doing behind the scenes, etc. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407167#comment-13407167 ] Robert Muir commented on LUCENE-4190: - just to mention: the reason I don't like the Directory refactoring would be some of the crazy things we do (look at CompoundFileDirectory and also IndexWriter copySegmentAsIs, etc). This is basically what i think we should avoid: adding a lot of risky complexity for little gain. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407170#comment-13407170 ] Uwe Schindler commented on LUCENE-4190: --- {quote} bq. Assert makes no sense here as it does not prevent people from doing the wrong thing. I don't agree: i at first thought to do a hard check, but this is only really necessary for codec developers. So an assert is enough, because you catch it when developing your codec (its either gonna work, or completely not work here). {quote} Why not make it a hard check, otherwise one could write a file without _ and schwupps, it's wech :) (German). Why only an assert? If we require all files start with _ lets enorce it, otherwise delete all files like we do currently. Using an assert would get my -1 to commit this again. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407177#comment-13407177 ] Robert Muir commented on LUCENE-4190: - I'm not 1000% determined for it to only be an assert, but then we should change how the code works to make sure that the check is not too expensive. The current assert makes SegmentInfo.addFiles/addFile very expensive. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407177#comment-13407177 ] Robert Muir edited comment on LUCENE-4190 at 7/5/12 2:42 PM: - I'm not 1000% determined for it to only be an assert, but then we should change how the code works to make sure that the check is not too expensive. The current assert makes SegmentInfo.addFiles/addFile very expensive (if its turned directly into a hard check) was (Author: rcmuir): I'm not 1000% determined for it to only be an assert, but then we should change how the code works to make sure that the check is not too expensive. The current assert makes SegmentInfo.addFiles/addFile very expensive. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407181#comment-13407181 ] Uwe Schindler commented on LUCENE-4190: --- String.startsWith(_) static string is cheap (a few cpu cycles, as it only needs compare length and one char... Please dont tell me that SI.addFilkes is called in inner loops like Scorers! Not doing this check is stupid. BTW: In CFSDirectory the assert about double entries on reading the dir should also throw CorruptIndexEx, because a CFS with duplicate file names is broken. This check is even cheaper. I am planning to open a new issue to fix all those I/O related checks to be hard, asserts are not appropriate here. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407185#comment-13407185 ] Robert Muir commented on LUCENE-4190: - Uwe, no i mean that we check the entire list each time. So if someone were to call addFile(), addFile(), addFile() that would be very bad runtime. Ill update the patch. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 696 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/696/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler Error Message: ERROR: SolrIndexSearcher opens=74 closes=72 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=74 closes=72 at __randomizedtesting.SeedInfo.seed([8F8AC7455F54278A]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:191) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:754) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 7663 lines...] [junit4:junit4] 2 161993 T2156 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE ERROR: SolrIndexSearcher opens=74 closes=72 [junit4:junit4] 2 NOTE: test params are: codec=Appending, sim=RandomSimilarityProvider(queryNorm=false,coord=false): {}, locale=lv, timezone=Africa/Blantyre [junit4:junit4] 2 NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.6.0_32 (64-bit)/cpus=2,threads=1,free=53764744,total=238157824 [junit4:junit4] 2 NOTE: All tests run in this JVM: [TestGermanLightStemFilterFactory, TestPropInjectDefaults, SolrCmdDistributorTest, TestTrimFilterFactory, TestSolrQueryParser, LukeRequestHandlerTest, SolrIndexConfigTest, PrimitiveFieldTypeTest, DistanceFunctionTest, MultiTermTest, TestThaiWordFilterFactory, SpatialFilterTest, FileBasedSpellCheckerTest, LengthFilterTest, TestRemoteStreaming, TestSort, PrimUtilsTest, TestConfig, XmlUpdateRequestHandlerTest, XsltUpdateRequestHandlerTest, LeaderElectionIntegrationTest, SolrCoreCheckLockOnStartupTest, TestKeywordMarkerFilterFactory, TestCJKWidthFilterFactory, TestCodecSupport, TestEnglishMinimalStemFilterFactory, FieldAnalysisRequestHandlerTest, FieldMutatingUpdateProcessorTest, TestDelimitedPayloadTokenFilterFactory, TestSwedishLightStemFilterFactory, TestSearchPerf, TestBinaryField, BadIndexSchemaTest, TestTurkishLowerCaseFilterFactory, TestDistributedSearch, TestMappingCharFilterFactory, TestKeepFilterFactory, TestFaceting, FullSolrCloudTest, IndexBasedSpellCheckerTest, IndexSchemaTest, IndexReaderFactoryTest, TestHashPartitioner, TestIndexingPerformance, CoreAdminHandlerTest, SearchHandlerTest, TestHyphenationCompoundWordTokenFilterFactory, TestPropInject, TestFrenchMinimalStemFilterFactory, TestPortugueseMinimalStemFilterFactory, TestElisionFilterFactory, TestCapitalizationFilterFactory, TestItalianLightStemFilterFactory, SnowballPorterFilterFactoryTest, TestQuerySenderNoQuery, TestHindiFilters, TestStandardFactories, ZkSolrClientTest, TestCJKBigramFilterFactory, CloudStateUpdateTest, TestDFRSimilarityFactory, RAMDirectoryFactoryTest, OpenExchangeRatesOrgProviderTest, TestDefaultSimilarityFactory, TestKStemFilterFactory,
[jira] [Updated] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4190: Attachment: LUCENE-4190.patch updated patch with _ check turned into a hard check. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4196) Turn asserts in I/O related code into hard checks
Uwe Schindler created LUCENE-4196: - Summary: Turn asserts in I/O related code into hard checks Key: LUCENE-4196 URL: https://issues.apache.org/jira/browse/LUCENE-4196 Project: Lucene - Java Issue Type: Task Components: core/index Affects Versions: 4.0-ALPHA Reporter: Uwe Schindler Fix For: 4.0 In lots of codecs we only assert, that e.g. some things inside files are correctly in bounds, which leads to security problems (ok, not as bad as C-Style buffer overflows), but e.g. allocating a large array after reading a VInt from a file header and then OOM, is a security issue. So we have to check all those contracts for files as hard checks, especially as a simply check in most cases dont cost anything (and it costs not more than the assert itsself, as the assert also takes CPU power, because it needs a check one time on a static final class field). Of course we cannot check values we read when reading postings, but the simple checks that any postings file has correct header and something like a positive number of elements, or number of elements file size,..., a bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, Mike changed all of those to asserts during the flex development (in my opinion with no real reason). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks
[ https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407194#comment-13407194 ] Robert Muir commented on LUCENE-4196: - I agree, sometimes I added asserts that I feel should be real checks but I feel its safest to just do the assert. Lucene3xNormsProducer:119 is an example. if it fails it means you have a corrumpt .nrm file with wrong norms mismatched for different fields. Turn asserts in I/O related code into hard checks - Key: LUCENE-4196 URL: https://issues.apache.org/jira/browse/LUCENE-4196 Project: Lucene - Java Issue Type: Task Components: core/index Affects Versions: 4.0-ALPHA Reporter: Uwe Schindler Fix For: 4.0 In lots of codecs we only assert, that e.g. some things inside files are correctly in bounds, which leads to security problems (ok, not as bad as C-Style buffer overflows), but e.g. allocating a large array after reading a VInt from a file header and then OOM, is a security issue. So we have to check all those contracts for files as hard checks, especially as a simply check in most cases dont cost anything (and it costs not more than the assert itsself, as the assert also takes CPU power, because it needs a check one time on a static final class field). Of course we cannot check values we read when reading postings, but the simple checks that any postings file has correct header and something like a positive number of elements, or number of elements file size,..., a bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, Mike changed all of those to asserts during the flex development (in my opinion with no real reason). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407196#comment-13407196 ] Uwe Schindler commented on LUCENE-4190: --- New patch looks good, I was not aware that the previous one was iterating over all files each time. As the SegmentInfo internal list should not be available outside, we have no problem anybody else changing this uncontrolled. See also issue LUCENE-4196. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Question about solr config files encoding.
Config fiules are XML and I changed them to be handled by the XML parser (InputStreams), so XML parser reads encoding from Header. But JSON is defined to be UTF-8, so we must supply the encoding (IOUtils.UTF8_CHARSET). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Dawid Weiss [mailto:dawid.we...@gmail.com] Sent: Thursday, July 05, 2012 5:00 PM To: dev@lucene.apache.org Subject: Question about solr config files encoding. Guys should the encoding of config files really be platform-dependent? Currently Solr tests fail massively on setup because of things like this: public OpenExchangeRates(InputStream ratesStream) throws IOException { parser = new JSONParser(new InputStreamReader(ratesStream)); this reader, when confronted with UTF-16 as file.encoding results in funky exceptions like: Caused by: org.apache.noggit.JSONParser$ParseException: JSON Parse Error: char=笊,position=0 BEFORE='笊' AFTER='†≤楳捬慩浥爢㨠≔桩猠摡瑡猠捯汬散瑥搠晲潭⁶慲楯畳⁰牯癩摥牳 湤⁰牯癩摥搠晲' at org.apache.noggit.JSONParser.err(JSONParser.java:221) at org.apache.noggit.JSONParser.next(JSONParser.java:620) at org.apache.noggit.JSONParser.nextEvent(JSONParser.java:661) at org.apache.solr.schema.OpenExchangeRatesOrgProvider$OpenExchangeRates. init(OpenExchangeRatesOrgProvider.java:189) at org.apache.solr.schema.OpenExchangeRatesOrgProvider.reload(OpenExchang eRatesOrgProvider.java:129) Can we fix the encoding of these input files to UTF-8 or something? According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8 We could just enforce/require UTF-8? Alternatively, auto-detect this from a binary stream as a custom Reader class. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks
[ https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407207#comment-13407207 ] Robert Muir commented on LUCENE-4196: - That one is a good example of something we should watch out for, i think its ok because it uses IndexInput.length, but we should make sure we don't directly turn asserts that use things like Directory.fileExists or Directory.fileLength into real checks, it could cause problems for NFS (LUCENE-3727) Turn asserts in I/O related code into hard checks - Key: LUCENE-4196 URL: https://issues.apache.org/jira/browse/LUCENE-4196 Project: Lucene - Java Issue Type: Task Components: core/index Affects Versions: 4.0-ALPHA Reporter: Uwe Schindler Fix For: 4.0 In lots of codecs we only assert, that e.g. some things inside files are correctly in bounds, which leads to security problems (ok, not as bad as C-Style buffer overflows), but e.g. allocating a large array after reading a VInt from a file header and then OOM, is a security issue. So we have to check all those contracts for files as hard checks, especially as a simply check in most cases dont cost anything (and it costs not more than the assert itsself, as the assert also takes CPU power, because it needs a check one time on a static final class field). Of course we cannot check values we read when reading postings, but the simple checks that any postings file has correct header and something like a positive number of elements, or number of elements file size,..., a bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, Mike changed all of those to asserts during the flex development (in my opinion with no real reason). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Question about solr config files encoding.
But JSON is defined to be UTF-8, so we must supply the encoding (IOUtils.UTF8_CHARSET). That RFC says it can be any unicode... this said I agree with you that we can probably assume it's UTF-8 and not worry about anything else. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Question about solr config files encoding.
3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8 :-) I think we can safely assume it is UTF-8, otherwise we must do the same shit like XML parsers with mark() on BufferedInputStream Most libraries out there can only read UTF-8 and SOLR itself produces only UTF8 JSON, right? Those tests only check response from solr. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of Dawid Weiss Sent: Thursday, July 05, 2012 5:35 PM To: dev@lucene.apache.org Subject: Re: Question about solr config files encoding. But JSON is defined to be UTF-8, so we must supply the encoding (IOUtils.UTF8_CHARSET). That RFC says it can be any unicode... this said I agree with you that we can probably assume it's UTF-8 and not worry about anything else. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Question about solr config files encoding.
On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote: According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. One of my little pet peeves with the RFC - I think this was a bad requirement. JSON should have been text, and then their should have been an optional way to detect encoding if other mechanisms don't cover it (like HTTP headers, etc). This effectively means that something like [hi] is not valid JSON for many of you reading this email (if your email client is internally representing it as something other than unicode encoded for example). We could just enforce/require UTF-8? Yes, Solr has normally always required/assumed UTF-8 for config files. It's simply an oversight in any places that don't. -Yonik http://lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Question about solr config files encoding.
I just add: Solr's XML files are parsed according to XML spec, so you can choose any charset, you only have to define it according to XML spec! Also XML POST to updatehandler can be any encoding (it does not need to be declared in header anymore, the ?xml... header is fine). There is already a test! I Fixed all this in endless sessions, but I was happy to do it, as my favourite data format is: XML :-) [I refuse to fix this for DIH, but that's another story, SOLR-2347]. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, July 05, 2012 5:43 PM To: dev@lucene.apache.org Subject: Re: Question about solr config files encoding. On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote: According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. One of my little pet peeves with the RFC - I think this was a bad requirement. JSON should have been text, and then their should have been an optional way to detect encoding if other mechanisms don't cover it (like HTTP headers, etc). This effectively means that something like [hi] is not valid JSON for many of you reading this email (if your email client is internally representing it as something other than unicode encoded for example). We could just enforce/require UTF-8? Yes, Solr has normally always required/assumed UTF-8 for config files. It's simply an oversight in any places that don't. -Yonik http://lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4188) Storing Shapes shouldn't be Strategy dependent
[ https://issues.apache.org/jira/browse/LUCENE-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407221#comment-13407221 ] David Smiley commented on LUCENE-4188: -- RE createStoredField(): bq. I don't really like this. It is barely an improvement on the current code. The whole point of this issue is that the storing of Shapes shouldn't be related to Strategys. I think we should be explicit and require the consumer code (Solr or something else) decides how it wants to store Shapes. If you want a convenience method then it should be static, illustrating it is a utility that the Strategys cannot override. Ideally I would like it somewhere else entirely. The client doesn't have to use this method, but in all tests + the Solr adapters I don't think there's a reason not to. I found it to be useful, and to provide a place to document how it is recommended to store the shape (notice I even included the one-liner source in the javadocs). An advantage of it being an instance method on the Strategy is that it has convenient access to both the field name SpatialContext. I could make this method final, and I could add more documentation that makes it clear that the user is free to store the shape in any way they wish since the spatial module doesn't care. Storing Shapes shouldn't be Strategy dependent -- Key: LUCENE-4188 URL: https://issues.apache.org/jira/browse/LUCENE-4188 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Assignee: David Smiley Attachments: LUCENE-4188_remove_field_storage_from_createField.patch The logic for storing Shape representations seems to be different for each Strategy. The PrefixTreeStrategy impls store the Shape in WKT, which is nice if you're using WKT but not much help if you're not. BBoxStrategy doesn't actually store the Shape itself, but a representation of the bounding box. TwoDoubles seems to follow the PrefixTreeStrategy approach, which is surprising since it only indexes Points and they could be stored without using WKT. I think we need to consider what storing a Shape means. If we want to store the Shape itself, then that logic should be standardised and done outside of the Strategys since it is not really related to them. If we want to store the terms being used by the Strategys to make Shapes queryable, then we need to change the logic in the Strategys to actually do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Question about solr config files encoding.
updatehandler can be any encoding (it does not need to be declared in header ...HTTP header..., sorry -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, July 05, 2012 5:43 PM To: dev@lucene.apache.org Subject: Re: Question about solr config files encoding. On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote: According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. One of my little pet peeves with the RFC - I think this was a bad requirement. JSON should have been text, and then their should have been an optional way to detect encoding if other mechanisms don't cover it (like HTTP headers, etc). This effectively means that something like [hi] is not valid JSON for many of you reading this email (if your email client is internally representing it as something other than unicode encoded for example). We could just enforce/require UTF-8? Yes, Solr has normally always required/assumed UTF-8 for config files. It's simply an oversight in any places that don't. -Yonik http://lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Question about solr config files encoding.
Sure, I don't have a problem with XML. I'll assume UTF-8 for json and go through the issues later today. Dawid On Thu, Jul 5, 2012 at 5:47 PM, Uwe Schindler u...@thetaphi.de wrote: I just add: Solr's XML files are parsed according to XML spec, so you can choose any charset, you only have to define it according to XML spec! Also XML POST to updatehandler can be any encoding (it does not need to be declared in header anymore, the ?xml... header is fine). There is already a test! I Fixed all this in endless sessions, but I was happy to do it, as my favourite data format is: XML :-) [I refuse to fix this for DIH, but that's another story, SOLR-2347]. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, July 05, 2012 5:43 PM To: dev@lucene.apache.org Subject: Re: Question about solr config files encoding. On Thu, Jul 5, 2012 at 10:59 AM, Dawid Weiss dawid.we...@gmail.com wrote: According to JSON RFC: http://tools.ietf.org/html/rfc4627#section-3 JSON text SHALL be encoded in Unicode. One of my little pet peeves with the RFC - I think this was a bad requirement. JSON should have been text, and then their should have been an optional way to detect encoding if other mechanisms don't cover it (like HTTP headers, etc). This effectively means that something like [hi] is not valid JSON for many of you reading this email (if your email client is internally representing it as something other than unicode encoded for example). We could just enforce/require UTF-8? Yes, Solr has normally always required/assumed UTF-8 for config files. It's simply an oversight in any places that don't. -Yonik http://lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux-Java8-64 - Build # 5 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java8-64/5/ 1 tests failed. REGRESSION: org.apache.solr.handler.component.SpellCheckComponentTest.testPerDictionary Error Message: mismatch: '0'!='2' @ spellcheck/suggestions/bar/startOffset Stack Trace: java.lang.RuntimeException: mismatch: '0'!='2' @ spellcheck/suggestions/bar/startOffset at __randomizedtesting.SeedInfo.seed([9AA8B04990EA81A6:5DA644E1988694EE]:0) at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:547) at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:495) at org.apache.solr.handler.component.SpellCheckComponentTest.testPerDictionary(SpellCheckComponentTest.java:102) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:474) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 8725 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux-Java8-64/checkout/build.xml:29: The following error occurred while
[jira] [Updated] (SOLR-3355) Add shard name to SolrCore statistics
[ https://issues.apache.org/jira/browse/SOLR-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3355: -- Fix Version/s: 5.0 I've added collection as well and wrote a couple tests for this - I'll commit shortly. Add shard name to SolrCore statistics - Key: SOLR-3355 URL: https://issues.apache.org/jira/browse/SOLR-3355 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Michael Garski Assignee: Mark Miller Priority: Trivial Fix For: 4.0, 5.0 Attachments: SOLR-3355.patch The JMX stats of the core do not expose the shard name that it is hosting, which could be of use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3355) Add shard name to SolrCore statistics
[ https://issues.apache.org/jira/browse/SOLR-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3355: -- Attachment: SOLR-3355.patch Add shard name to SolrCore statistics - Key: SOLR-3355 URL: https://issues.apache.org/jira/browse/SOLR-3355 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Michael Garski Assignee: Mark Miller Priority: Trivial Fix For: 4.0, 5.0 Attachments: SOLR-3355.patch, SOLR-3355.patch The JMX stats of the core do not expose the shard name that it is hosting, which could be of use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-4.x - Build # 207 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/207/ 3 tests failed. REGRESSION: org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.testCompositePk_DeltaImport_delete Error Message: Exception during query Stack Trace: java.lang.RuntimeException: Exception during query at __randomizedtesting.SeedInfo.seed([2AB24089FB28443:2396D216FACFAC2F]:0) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:461) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:428) at org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.testCompositePk_DeltaImport_delete(TestSqlEntityProcessorDelta3.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='0'] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint
[jira] [Created] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4
David Smiley created LUCENE-4197: Summary: Small improvements to Lucene Spatial Module for v4 Key: LUCENE-4197 URL: https://issues.apache.org/jira/browse/LUCENE-4197 Project: Lucene - Java Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Fix For: 4.0 This issue is to capture small changes to the Lucene spatial module that don't deserve their own issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4
[ https://issues.apache.org/jira/browse/LUCENE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4197: - Attachment: LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch SpatialArgs.toString() shouldn't be overloaded with a ctx -- not needed for its purpose. Nobody was calling it any way. What instigated this finding was that this class depended on SimpleSpatialContext, gone in 0.3-SNAPSHOT of Spatial4j. Small improvements to Lucene Spatial Module for v4 -- Key: LUCENE-4197 URL: https://issues.apache.org/jira/browse/LUCENE-4197 Project: Lucene - Java Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Fix For: 4.0 Attachments: LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch This issue is to capture small changes to the Lucene spatial module that don't deserve their own issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407365#comment-13407365 ] Robert Muir commented on LUCENE-4190: - {quote} I think that the way to bound the namespace of files is to put everything in a subdirectory of the index directory chosen by the user and control the name of that subdirectory, making it clear that this is semi-private to Lucene and that all files in that subdirectory are fair game. {quote} Well there are a couple challenges with that I think: 1. subdirectories currently are a foreign concept to Directory, we would have to make some serious changes there to support subdirectories. 2. Lucene 3.x and Lucene4-alpha indexes still need to be supported, and we dont want to leave behind baggage when we merge, so the transition would be tricky. 3. the user could also do this on their own right? e.g. we still have the same situation we have currently, where anything in that directory can get deleted by lucene, its just underneath another layer. IndexWriter deletes non-Lucene files Key: LUCENE-4190 URL: https://issues.apache.org/jira/browse/LUCENE-4190 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4190.patch, LUCENE-4190.patch, LUCENE-4190.patch Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to bound. But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _base36(_X).Y), so we are much less likely to delete a non-Lucene file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3593) add /solr/api/index.html
[ https://issues.apache.org/jira/browse/SOLR-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-3593. Resolution: Fixed FWIW: i went ahead and used the documentation.html name after all because i realized it kept hte page editing simpler. it was easy to deal with the legacy /solr/api, /solr/api/ and /solr/api/index.html type top level links using redirects... add /solr/api/index.html -- Key: SOLR-3593 URL: https://issues.apache.org/jira/browse/SOLR-3593 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man solr historically only had one version of the javadocs on the site at a time. particularly now that we have 3.6.X and 4.X concurrently, this needs to change. both sets of javadoc are already on the site, and /solr/tutorial.html already links to both versions appropriately but there are still some improvements that should be made... * add a /solr/api/index.html file that mirrors the type of inof listed on /core/documentation.html ** we could use the same documentation.html name, but since historically lots of people have bookmarked/linked to /solr/api reusing that path as the landing page for finding docs about multiple versions seems better ** making this visible will probably mean needing to dial in the existing /solr/api redirect more * update the Javadocs link i nthe right nav to link to this page -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4191) Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404
[ https://issues.apache.org/jira/browse/LUCENE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407386#comment-13407386 ] Hoss Man commented on LUCENE-4191: -- BaseTokenFilterFactory no longer exists in the latest version of Solr (most of the Factory concepts were refactored up into the Lucene-Core analysis-common module) and Google has not yet updated it's crawl of solr javadocs. Solr 3.6 javadocs are still available, or you can follow links from he Solr 4.0-ALPHA javadocs over to the Lucene-Core javadocs for classes like TokenFilterFactory and AbstractAnalysisFactory ... http://lucene.apache.org/solr/documentation.html Lucene doc pages redirect to api-4_0_0-ALPHA which results in 404 --- Key: LUCENE-4191 URL: https://issues.apache.org/jira/browse/LUCENE-4191 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.6 Reporter: Chaim Peck Labels: documentation Try to go to this URL: http://lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html The result is that you will be redirected here, which is a 404: http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/BaseTokenFilterFactory.html You can still get to the page from google cache: http://webcache.googleusercontent.com/search?q=cache:mCJCac4iZ0QJ:lucene.apache.org/solr/api/org/apache/solr/analysis/BaseTokenFilterFactory.html+cd=1hl=enct=clnkgl=us -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: jira tracking of issues fixed in 4.0-ALPHA
I've seen no comments on this: if anyone objects please speak up or i'll move forward with this soon. : : I think for this case, the much easier fix would be to rename the 4.0 : : version to 4.0-alpha and create a new 4.0 one. All not yet fixed would : : get this new version as fix version. : : Doh! ... why didn't i think of that? : : Anybody object to this sequence? : : In Jira project admin for both SOLR and LUCENE... : 1) delete version 4.0-ALPHA (it has no issues yet in either project) : 2) rename version 4.0 to 4.0-ALPHA : 3) add a new version 4.0 : : if/when we get to 4.0-BETA we can do the same thing -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr 3.6.0 javadocs are missing from the site
: thats the whole problem with /api: its not defined at all. it has bee nvery clearly defined since it was created: the latest javadocs ... just because we no longer explicitly link to it, doesn't mean we should stop trying to live up to the point of the link -- especially not when it's so fucking easy to do. : having shit like this just turns into 'lets blame the release manager : when things change and its not the way i want'. where do you get that anyone is going to blame release managers for something? having this redirect isn't going to break anything, nor does it have anything to do with anything an RM should give a fuck about. the only reason we had a hicup with it on tuesday was because that was the day we made the change from only hosting single copy of the solr javadocs, re-using a single path for each new version, to having multiple versions with distinct pathes -- and when we made that change we did *NOT* have any redirect like this in place at all. that change could have been made at any time, regardless of wether it involved a new release, regardless of wether it was done by an RM, and the problem would have been the same: the missing redirect ment old links broke. that was a one time change, that will never affect any other release in the future ever again: we just keep adding new directories for the new docs. having this redirect doesn't affect that in any way shape or form : same goes for download redirect links (I will open an issue tomorrow: : either we remove these download redirect llinks completely, or we fix : them to take versions, because having to add ?'s with bogus stuff on I already opened an issue for that when we noticed this during 3.6 .. no one who cares about the google analytics and understands javascript has bothered to pick it up... https://issues.apache.org/jira/browse/LUCENE-3978 -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: jira tracking of issues fixed in 4.0-ALPHA
+1, as it was my idea :) -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de Chris Hostetter hossman_luc...@fucit.org schrieb: I've seen no comments on this: if anyone objects please speak up or i'll move forward with this soon. : : I think for this case, the much easier fix would be to rename the 4.0 : : version to 4.0-alpha and create a new 4.0 one. All not yet fixed would : : get this new version as fix version. : : Doh! ... why didn't i think of that? : : Anybody object to this sequence? : : In Jira project admin for both SOLR and LUCENE... : 1) delete version 4.0-ALPHA (it has no issues yet in either project) : 2) rename version 4.0 to 4.0-ALPHA : 3) add a new version 4.0 : : if/when we get to 4.0-BETA we can do the same thing -Hoss _ To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3594) SolrCore() doesn't wait SolrCore.getSearcher() to register _searcher
[ https://issues.apache.org/jira/browse/SOLR-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407416#comment-13407416 ] Hoss Man commented on SOLR-3594: bq. The real question here is unrelated to tests: should the SolrCore constructor should wait for a searcher to be registered before returning? i don't think so. Just because a searcher isn't available yet, doesn't mean the SolrCore is unusable - we shouldn't block other uses of the SolrCore just because a searcher isn't available yet. the first thread that attempts to use getSearcher() is what should block on the listeners (depending on the setting of useColdSearcher) The test failure suggests to me that something is wonky with how were are tracking the searcher opens and doing cleanup -- either in SolrCore.close() or in the test framework itself. SolrCore() doesn't wait SolrCore.getSearcher() to register _searcher Key: SOLR-3594 URL: https://issues.apache.org/jira/browse/SOLR-3594 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.4 Reporter: Egor Pahomov Priority: Minor Labels: test Attachments: 3594.patch, testSearchersManagement.patch Original Estimate: 1h Remaining Estimate: 1h SolrCore() executes SolrCore.getSearcher(...) and returns without checking if getSearcher(...) already registered _searcher. As result: if we have SolrEventListener with slow newSearcher(), we can end test before _searcher registered and get then searcher closes and searcher opens doesn't match. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407459#comment-13407459 ] Trey Grainger commented on SOLR-2894: - Hi Erik, Sorry, I missed your original message asking me if I could test out the latest patch - I'd be happy to help. I just tried both your patch and the April 25th patch against the Solr 4.0 Alpha revision and neither applied immediately. I'll see if I can find some time on Sunday to try to get a revision sorted out which will work with the current version. I think there are some changes in the April 24th patch which may need to be re-applied if your changes were based upon the earlier patch. I'll know more once I've had a chance to dig in later this weekend. Thanks, -Trey Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 4.0 Attachments: SOLR-2894.patch, SOLR-2894.patch, distributed_pivot.patch, distributed_pivot.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4198) Allow codecs to index term impacts
Robert Muir created LUCENE-4198: --- Summary: Allow codecs to index term impacts Key: LUCENE-4198 URL: https://issues.apache.org/jira/browse/LUCENE-4198 Project: Lucene - Java Issue Type: Sub-task Components: core/index Reporter: Robert Muir Subtask of LUCENE-4100. Thats an example of something similar to impact indexing (though, his implementation currently stores a max for the entire term, the problem is the same). We can imagine other similar algorithms too: I think the codec API should be able to support these. Currently it really doesnt: Stefan worked around the problem by providing a tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. But it would be better if we fixed the codec API. One problem is that the Postings writer needs to have access to the Similarity. Another problem is that it needs access to the term and collection statistics up front, rather than after the fact. This might have some cost (hopefully minimal), so I'm thinking to experiment in a branch with these changes and see if we can make it work well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4198) Allow codecs to index term impacts
[ https://issues.apache.org/jira/browse/LUCENE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4198: Attachment: LUCENE-4198_flush.patch here's a patch fixing how we compute stats in FreqProxTermsWriter: but the codec api is unchanged. Next ill look at merge, which is trickier, and then see about changing the codec api. Allow codecs to index term impacts -- Key: LUCENE-4198 URL: https://issues.apache.org/jira/browse/LUCENE-4198 Project: Lucene - Java Issue Type: Sub-task Components: core/index Reporter: Robert Muir Attachments: LUCENE-4198_flush.patch Subtask of LUCENE-4100. Thats an example of something similar to impact indexing (though, his implementation currently stores a max for the entire term, the problem is the same). We can imagine other similar algorithms too: I think the codec API should be able to support these. Currently it really doesnt: Stefan worked around the problem by providing a tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. But it would be better if we fixed the codec API. One problem is that the Postings writer needs to have access to the Similarity. Another problem is that it needs access to the term and collection statistics up front, rather than after the fact. This might have some cost (hopefully minimal), so I'm thinking to experiment in a branch with these changes and see if we can make it work well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring
[ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407472#comment-13407472 ] Robert Muir commented on LUCENE-4100: - I spun off a sub-issue (LUCENE-4198) to see how we can first fix this Codec API so that you don't need an IndexRewriter and this patch could work live. Maxscore - Efficient Scoring Key: LUCENE-4100 URL: https://issues.apache.org/jira/browse/LUCENE-4100 Project: Lucene - Java Issue Type: Improvement Components: core/codecs, core/query/scoring, core/search Affects Versions: 4.0 Reporter: Stefan Pohl Labels: api-change, patch, performance Fix For: 4.0 Attachments: contrib_maxscore.tgz, maxscore.patch At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient algorithm first published in the IR domain in 1995 by H. Turtle J. Flood, that I find deserves more attention among Lucene users (and developers). I implemented a proof of concept and did some performance measurements with example queries and lucenebench, the package of Mike McCandless, resulting in very significant speedups. This ticket is to get started the discussion on including the implementation into Lucene's codebase. Because the technique requires awareness about it from the Lucene user/developer, it seems best to become a contrib/module package so that it consciously can be chosen to be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Multi-thread UpdateProcessor
Hello, Most times when single thread streaming http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update is used I saw lack of cpu utilization at Solr server. Resonable motivation is utilize more threads to index faster, but it requires more complicated client side. I propose to employ special update processor which can fork the stream processing onto many threads. If you like it pls vote for https://issues.apache.org/jira/browse/SOLR-3585 . Regards -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com