[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839187#action_12839187 ] Uwe Schindler commented on LUCENE-2285: --- Hi Shai, I applied the patch to a fresh checkout and get no compile errors. Are you sure that the patch applied correctly? I am working in Windows, so if you are not using a patch-apply tool like TortoiseSVN that can accept windows line endings, you have to maybe use dos2unix before? All tests pass here and no compile errors, also in demo webapp and so on (using ANT). > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839187#action_12839187 ] Uwe Schindler edited comment on LUCENE-2285 at 2/27/10 8:08 AM: Hi Shai, I applied the patch to a fresh checkout and get no compile errors. Are you sure that the patch applied correctly? I am working in Windows, so if you are not using a patch-apply tool like TortoiseSVN that can accept windows line endings, you have to maybe use dos2unix before? And don't forget to update your package in Eclipse (press F5). I had this type of errors very often because Eclipse caches the sources. All tests pass here and no compile errors, also in demo webapp and so on (using ANT). was (Author: thetaphi): Hi Shai, I applied the patch to a fresh checkout and get no compile errors. Are you sure that the patch applied correctly? I am working in Windows, so if you are not using a patch-apply tool like TortoiseSVN that can accept windows line endings, you have to maybe use dos2unix before? All tests pass here and no compile errors, also in demo webapp and so on (using ANT). > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839191#action_12839191 ] Uwe Schindler commented on LUCENE-2037: --- Erick, thats already fixed in trunk with my last commit, as you noticed! It exactly does what also rules.TestName does :-) -- I found this class later too and realized that it does the same -- only that lucene has the method in the base class for better migration experience :-). Yesterday I also wrote an extra test assertion, that verifies, that the prorted testcase has all methods starting with "test" assigned a @Test. Robert and me wants to maybe apply this patch during the migration phase when developers are not yet using Junit4 so long and forget to add @Test. The path is very rough and maybe optimized (if @BeforeClass could be used, but cannot as static). The string-ctors are not used in lucene, as the testName in Lucene should be automatically from the current method. The additional ctors in Lucene's tests were only very very very old junit3 relicts (later versions of junit3 also do not need it anymore, they set the test name after instantiating). > Allow Junit4 tests in our environment. > -- > > Key: LUCENE-2037 > URL: https://issues.apache.org/jira/browse/LUCENE-2037 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Affects Versions: 3.1 > Environment: Development >Reporter: Erick Erickson >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.1 > > Attachments: junit-4.7.jar, LUCENE-2037-getName.patch, > LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037.patch, > LUCENE-2037_remove_testwatchman.patch, LUCENE-2037_revised_2.patch > > Original Estimate: 8h > Remaining Estimate: 8h > > Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate > Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should > have to be rewritten. We should start this for the 3.1 release so we can get > a clean 3.0 out smoothly. > It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839191#action_12839191 ] Uwe Schindler edited comment on LUCENE-2037 at 2/27/10 8:16 AM: Erick, thats already fixed in trunk with my last commit, as you noticed! It exactly does what also rules.TestName does :-) -- I found this class later too and realized that it does the same -- only that lucene has the method in the base class for better migration experience :-). Yesterday I also wrote an extra test assertion, that verifies, that the ported test class has all methods starting with "test" annotated with @Test. Robert and me wants to maybe apply this patch during the migration phase when developers are not yet using Junit4 so long and forget to add @Test. The path is very rough and maybe optimized (if @BeforeClass could be used, but cannot as static). The string-ctors are not used in lucene, as the testName in Lucene should be automatically from the current method. The additional ctors in Lucene's tests were only very very very old junit3 relicts (later versions of junit3 also do not need it anymore, they set the test name after instantiating). was (Author: thetaphi): Erick, thats already fixed in trunk with my last commit, as you noticed! It exactly does what also rules.TestName does :-) -- I found this class later too and realized that it does the same -- only that lucene has the method in the base class for better migration experience :-). Yesterday I also wrote an extra test assertion, that verifies, that the prorted testcase has all methods starting with "test" assigned a @Test. Robert and me wants to maybe apply this patch during the migration phase when developers are not yet using Junit4 so long and forget to add @Test. The path is very rough and maybe optimized (if @BeforeClass could be used, but cannot as static). The string-ctors are not used in lucene, as the testName in Lucene should be automatically from the current method. The additional ctors in Lucene's tests were only very very very old junit3 relicts (later versions of junit3 also do not need it anymore, they set the test name after instantiating). > Allow Junit4 tests in our environment. > -- > > Key: LUCENE-2037 > URL: https://issues.apache.org/jira/browse/LUCENE-2037 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Affects Versions: 3.1 > Environment: Development >Reporter: Erick Erickson >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.1 > > Attachments: junit-4.7.jar, LUCENE-2037-getName.patch, > LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037.patch, > LUCENE-2037_remove_testwatchman.patch, LUCENE-2037_revised_2.patch > > Original Estimate: 8h > Remaining Estimate: 8h > > Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate > Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should > have to be rewritten. We should start this for the 3.1 release so we can get > a clean 3.0 out smoothly. > It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2037) Allow Junit4 tests in our environment.
[ https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2037: -- Attachment: LUCENE-2037-LegacyChecker.patch Here the patch for the additional assertion to test if all ported test classes have all @Test added to all methods starting with "test". > Allow Junit4 tests in our environment. > -- > > Key: LUCENE-2037 > URL: https://issues.apache.org/jira/browse/LUCENE-2037 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Affects Versions: 3.1 > Environment: Development >Reporter: Erick Erickson >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.1 > > Attachments: junit-4.7.jar, LUCENE-2037-getName.patch, > LUCENE-2037-LegacyChecker.patch, LUCENE-2037.patch, LUCENE-2037.patch, > LUCENE-2037.patch, LUCENE-2037_remove_testwatchman.patch, > LUCENE-2037_revised_2.patch > > Original Estimate: 8h > Remaining Estimate: 8h > > Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate > Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should > have to be rewritten. We should start this for the 3.1 release so we can get > a clean 3.0 out smoothly. > It's probably worthwhile to convert a small set of tests as an exemplar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839197#action_12839197 ] Uwe Schindler commented on LUCENE-2285: --- Maybe the patch is also outdated, i goes against: Revision: 916685, maybe you can downgrade your checkout using "svn up -r916685", patch and upgrade again. I use TortoiseSVN's TortoiseMerge patch tool, which is more intelligent and also applies very old patches wizthout problems. It works like the new svn patch in svn 1.7x trunk: It uses the revision numbers in the patch's file headers and fetches automatically the requested version, patches it and then updates it to the version of your checkout. By that it makes use of the standard update tools of svn and patches always apply without any moved HUNK problems and so on. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839202#action_12839202 ] Shai Erera commented on LUCENE-2285: Ok I see the problem now - because there are so many files, I didn't see while applying the patch, that there are errors (mismatch) on some files, therefore the patch wasn't applied to them, hence the compile errors. I apply the patch w/ eclipse. The problematic files are tests in the analyzers package, and I suspect it's an encoding issue. My source encoding is UTF-8, yet still when I apply the patch I see different characters on the source and patch file. Not sure where the problem is. The patch file which I downloaded is UTF-8 as well, and TestArabicAnalyzer (for example) contains the correct characters in Arabic (in both the patch file and my checkout copy). Yet still when I apply the patch eclipse doesn't read it as UTF-8 ... Uwe, how about if we do this issue in multiple commits? I.e., commit what you've done so far (which is what we agree on), then I can update, review the remaining warnings and we continue from there? Anyway after you commit this there will be warnings, and over time more warnings will creep in, so we'll need to do some cleanup again :). Is that acceptable? > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839203#action_12839203 ] Shai Erera commented on LUCENE-2285: Strange ... something's wrong w/ eclipse and how it read the patch file? I tried to apply the patch which I created originally (and was on my computer, not downloaded from JIRA) and it fails on the same files ... any ideas? > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839208#action_12839208 ] Shai Erera commented on LUCENE-2285: Googling around I see it's a known problem in Eclipse w/ no solution yet (at least I haven't come across one). Uwe - can you proceed w/ the commit like I suggested and then we review the remaining warnings? > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839211#action_12839211 ] Uwe Schindler commented on LUCENE-2285: --- bq. Strange ... something's wrong w/ eclipse and how it read the patch file? I tried to apply the patch which I created originally (and was on my computer, not downloaded from JIRA) and it fails on the same files ... any ideas? No, sorry. I don't use Eclipse at all, only for some refactoring. I will commit the patch at the weekend and then update a second svn tree with your old patch applied. "svn up" then makes it able to provide a patch with the left out changes, we don't want to apply (some casts, sorry, and we won't fix them). I just say it again: it compiles without any warnings using javac, we cannot support stupiud warnings of other IDEs during our development, as Eclipse is no official requirement. So I still strongy suggest to disable some of the warnings already mentioned. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839212#action_12839212 ] Shai Erera commented on LUCENE-2285: Ok, perhaps you misunderstood me. I suggested to commit *your* version of the patch, and then afterwards we can discuss the remaining warnings that are controversial. We both agree on your version. We disagree on the diff, right? So let's start w/ yours, and then we can continue arguing about the rest :). > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839216#action_12839216 ] Uwe Schindler commented on LUCENE-2285: --- I understood that, I just wanted to say that some warnings will reappear with my patch, because I removed lots of generated code. Thats all I wanted to say :-) Sorry, I don't want to nitpick - this issue is somehow about different opinions on code style and warnings - e.g. i totally aggree with renaming private vars that hide protected ones and so on. I also want to fix generics (I am the "official generics police" *g*). But i simply want something in the code that explains the code better and prevents *future* errors, even if it is a cast too much. :-) I also applied the unused variable fixes, although in test I think its better to just add a "fake" local variable and place a @SupressWarning("unchecked") before it (you cannot apply this annotation to simple statements). So your compiler complains about unused variables - but how to fix that without placing the SuppressWarnings before the method? Which is bad, as I want to only place it before one code line. In TestVirtualMethod, I fixed this by splitting the bigger test method into two smaller ones, only one with SuppressWarnings. But I still prefer to assign to a local unused var to be able to place the annotation before (maybe thats a bug in Java5, that you cannot add annotations to single statements). Maybe do some "fake" operation on the variable like an assertion to mark it "used" *g*. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839219#action_12839219 ] Shai Erera commented on LUCENE-2285: Ok then we understand each other. Indeed I have a different opinion about casts. They are called unnecessary because of a reason. When a method declares it returns an int, there no reason to cast a char to an int. The compiler will do it for you. More than that, if the method will declare it returns a long in the future, the cast will generate a compile error. Styling like that always generate lots of opinions :). We shouldn't however limit ourselves to only two opinions. The fact that you are not using Eclipse, and therefore don't see all the warnings, doesn't mean the rest of us who do use eclipse should see them. If they are justified then ok, but otherwise, saying 'javac does not complain' is just not enough for me. Eclipse is where I spend a large portion of my day in ... So I suggest we get more opinions from others. It's not just about what you or I think ... but we can do this after the larger portion of the warnings is removed. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839224#action_12839224 ] Uwe Schindler commented on LUCENE-2285: --- bq. When a method declares it returns an int, there no reason to cast a char to an int. The compiler will do it for you. I aggree with that, those casts were left in the patch, no problem at all. For me it is a problem esp. in some file handling methods that use longs, but sometimes also use ints (e.g. when reading a block of data). One example: MMapDirectory had a lot of problems with integer overflows in the past. The problems occurred because some formulas were simply not using casts at all, so the compiler did what was in the specs, but which is not always easy to see. So if you explicitely add a cast to (long) in your formula you are fine and really nobody gets hurt. An everybody understands whats happening. The Sun Code formatting guidelines explicitely says that, that you should use integer casts, if it is for more clarity. And if you dont like the warnings, then switch them off for only lucene in eclipse (you dont need to do that globally). I dont agree with using char inside a method when calling other functions without casting to int. E.g. we have some backwards compatibility layer for Unicode 4 that uses Character.toUpperCase(int) and Character.toUpperCase(char) in parallel. And these two methods differ, so i explicitely cast to be sure, which method is called (that was especially important (for Lucene 2.9) in Java 1.4 when compiled with Java 5 - because the javac could use the wrong method without casting to char (even with -source 1.4 -target 1.4) etc. For easy merging to 2.9 (as it is still supported), I want to keep the casts. Thats all. If you like, add some @SuppressWarnings("foobar") to it if you would be happy. bq. More than that, if the method will declare it returns a long in the future, the cast will generate a compile error. Changing return values of public methods will never happen. And if somebody would do this by accident, the cast helps to find the error :-) -- thats only a funny addition. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1990) Add unsigned packed int impls in oal.util
[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1990: --- Affects Version/s: Flex Branch Fix Version/s: Flex Branch > Add unsigned packed int impls in oal.util > - > > Key: LUCENE-1990 > URL: https://issues.apache.org/jira/browse/LUCENE-1990 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Flex Branch >Reporter: Michael McCandless >Priority: Minor > Fix For: Flex Branch > > Attachments: generated_performance-te20100226.txt, > LUCENE-1990-te20100122.patch, LUCENE-1990-te20100210.patch, > LUCENE-1990-te20100212.patch, LUCENE-1990-te20100223.patch, > LUCENE-1990-te20100226.patch, LUCENE-1990-te20100226b.patch, > LUCENE-1990-te20100226c.patch, LUCENE-1990.patch, > LUCENE-1990_PerformanceMeasurements20100104.zip, perf-mkm-20100227.txt, > performance-te20100226.txt > > > There are various places in Lucene that could take advantage of an > efficient packed unsigned int/long impl. EG the terms dict index in > the standard codec in LUCENE-1458 could subsantially reduce it's RAM > usage. FieldCache.StringIndex could as well. And I think "load into > RAM" codecs like the one in TestExternalCodecs could use this too. > I'm picturing something very basic like: > {code} > interface PackedUnsignedLongs { > long get(long index); > void set(long index, long value); > } > {code} > Plus maybe an iterator for getting and maybe also for setting. If it > helps, most of the usages of this inside Lucene will be "write once" > so eg the set could make that an assumption/requirement. > And a factory somewhere: > {code} > PackedUnsignedLongs create(int count, long maxValue); > {code} > I think we should simply autogen the code (we can start from the > autogen code in LUCENE-1410), or, if there is an good existing impl > that has a compatible license that'd be great. > I don't have time near-term to do this... so if anyone has the itch, > please jump! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1990) Add unsigned packed int impls in oal.util
[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1990: --- Attachment: perf-mkm-20100227.txt {quote} bq. Airplane blocking snow drifts!? Where on earth are you anyway? In Denmark. The guy responsible for clearing the runway did indeed clear the runway. He just forgot that the plane needs to taxi into the runway in the first place. That made us miss our connecting flight. {quote} Good grief! {quote} bq. It's very interesting that align is never a win - I think in that case removing it makes sense? It'll be a nice simplification. Well, practically never wins for the machines I tested on and never wins with my implementation. {quote} I think we should remove it... {quote} bq. Did we ever test performance of the specialized (generated) decoders using switch statements? I just did a quick hack in order to measure performance and I was very surprised that the generated switch-based implementations performs so well. It's nearly on par with packed most of the time and exceeds it in some cases. I only tested on 3 machines though. The hack is in the LUCENE-1990-te20100226c.patch and is called when the performance test is executed. {quote} Thanks for testing this! It is interesting. I ran the perf test on a CentOS 5.4 machine, java 1.6.0_17-b04 64 bit server, Intel core 2 duo E8400 (3 ghz) -- attached perf-mkm-20100227.txt. I also show the switch impl close, though always a bit behind. Seems like we should just stick with the non-gen'd packed impl? bq. Note to self: Switch is not equivalent to a series of if-else, when we're talking performance and when we switch without omissions in the cases. Right, if the switch cases are compact, it should compile into a fast jump table (though it may still do an unecessary bounds check). I think, once we removed aligned, this is ready to commit? I think we should land this on flex branch? (It's using CodecUtil, BytesRef -- I'll merge them when I commit). Then I can cutover the terms index to use packed ints. > Add unsigned packed int impls in oal.util > - > > Key: LUCENE-1990 > URL: https://issues.apache.org/jira/browse/LUCENE-1990 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Flex Branch >Reporter: Michael McCandless >Priority: Minor > Fix For: Flex Branch > > Attachments: generated_performance-te20100226.txt, > LUCENE-1990-te20100122.patch, LUCENE-1990-te20100210.patch, > LUCENE-1990-te20100212.patch, LUCENE-1990-te20100223.patch, > LUCENE-1990-te20100226.patch, LUCENE-1990-te20100226b.patch, > LUCENE-1990-te20100226c.patch, LUCENE-1990.patch, > LUCENE-1990_PerformanceMeasurements20100104.zip, perf-mkm-20100227.txt, > performance-te20100226.txt > > > There are various places in Lucene that could take advantage of an > efficient packed unsigned int/long impl. EG the terms dict index in > the standard codec in LUCENE-1458 could subsantially reduce it's RAM > usage. FieldCache.StringIndex could as well. And I think "load into > RAM" codecs like the one in TestExternalCodecs could use this too. > I'm picturing something very basic like: > {code} > interface PackedUnsignedLongs { > long get(long index); > void set(long index, long value); > } > {code} > Plus maybe an iterator for getting and maybe also for setting. If it > helps, most of the usages of this inside Lucene will be "write once" > so eg the set could make that an assumption/requirement. > And a factory somewhere: > {code} > PackedUnsignedLongs create(int count, long maxValue); > {code} > I think we should simply autogen the code (we can start from the > autogen code in LUCENE-1410), or, if there is an good existing impl > that has a compatible license that'd be great. > I don't have time near-term to do this... so if anyone has the itch, > please jump! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839242#action_12839242 ] Robert Muir commented on LUCENE-2285: - bq. Googling around I see it's a known problem in Eclipse w/ no solution yet (at least I haven't come across one). Uwe - can you proceed w/ the commit like I suggested and then we review the remaining warnings? this drives me nuts. here's what you can do: copy the entire patch to clipboard, then apply patch from clipboard, no encoding problems. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839254#action_12839254 ] Shai Erera commented on LUCENE-2285: bq. copy the entire patch to clipboard Super Robert ! That worked !! Now that I apply the patch, I'm back to 1,400 warnings (900 up). Many of them related to generated code and Snowball, but here are few comments: * AnalyzingQueyrParser (contrib/misc), line 144 --> wlist cannot be null at this point because it is created (line 80) as new ArrayList. The same should be removed in line 161, though Eclipse does not complain, which is weird. But wlist cannot be null. ** Besides, the entire code segment in lines 158-164 can be improved, but let's leave it outside the scope of this issue. * TestCharacterUtils - Uwe, note how there are many unnecessary casts to int (from char), while the actual assert method that's called is a long :). Do you still think these are required? * UnicodeUtil - the chars are cast to int in code like this: _int utf32 = (int) str.charAt(i); --> Is that necessary too? Besides these, I'm fine w/ the rest. Still a 2400 warnings reduction, and many of the ones left are either in generated code, or related to deliberate use of deprecated API. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839254#action_12839254 ] Shai Erera edited comment on LUCENE-2285 at 2/27/10 3:00 PM: - bq. copy the entire patch to clipboard Super Robert ! That worked !! Now that I apply the patch, I'm back to 1,400 warnings (900 up). Many of them related to generated code and Snowball, but here are few comments: * AnalyzingQueyrParser (contrib/misc), line 144 --> wlist cannot be null at this point because it is created (line 80) as new ArrayList. The same should be removed in line 161, though Eclipse does not complain, which is weird. But wlist cannot be null. ** Besides, the entire code segment in lines 158-164 can be improved, but let's leave it outside the scope of this issue. * TestCharacterUtils - Uwe, note how there are many unnecessary casts to int (from char), while the actual assert method that's called is a long :). Do you still think these are required? * UnicodeUtil - the chars are cast to int in code like this: _int utf32 = (int) str.charAt(i+1);_ --> Is that necessary too? Besides these, I'm fine w/ the rest. Still a 2400 warnings reduction, and many of the ones left are either in generated code, or related to deliberate use of deprecated API. was (Author: shaie): bq. copy the entire patch to clipboard Super Robert ! That worked !! Now that I apply the patch, I'm back to 1,400 warnings (900 up). Many of them related to generated code and Snowball, but here are few comments: * AnalyzingQueyrParser (contrib/misc), line 144 --> wlist cannot be null at this point because it is created (line 80) as new ArrayList. The same should be removed in line 161, though Eclipse does not complain, which is weird. But wlist cannot be null. ** Besides, the entire code segment in lines 158-164 can be improved, but let's leave it outside the scope of this issue. * TestCharacterUtils - Uwe, note how there are many unnecessary casts to int (from char), while the actual assert method that's called is a long :). Do you still think these are required? * UnicodeUtil - the chars are cast to int in code like this: _int utf32 = (int) str.charAt(i); --> Is that necessary too? Besides these, I'm fine w/ the rest. Still a 2400 warnings reduction, and many of the ones left are either in generated code, or related to deliberate use of deprecated API. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839256#action_12839256 ] Robert Muir commented on LUCENE-2285: - bq. In my internal project, I also make use of Snowball code directly, and improved it to fit better in the analysis process. I should actually diff my changes w/ yours, perhaps I can use yours, or contribute mine. I'd be curious to know what you did, if its possible for you to, I'd like to hear what you did (maybe on the list so it wont be lost on this issue?) my recent mods to snowball itself are on LUCENE-2194, LUCENE-2201. I think previously Karl modified it so that it doesnt use reflection (except the Lovins stemmer) > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839258#action_12839258 ] Shai Erera commented on LUCENE-2285: bq. I'd be curious to know what you did Ok, now you've made me compare the two :). I'm happy to see we both did the same thing, only you call your buffer 'current' while I call it 'buf'. Besides that I've included a static final EMPTY_ARGS instead of all the places where 'new Object[0]' is passed. Nothing too fancy. Another thing is that I wrote an Arabic and Hebrew stemmer, and combined them w/ the Snowball ones by introducing a stemmer class which can be either Snowball or anything else. I'll check if we're allowed to contribute the Hebrew stemmer to Lucene ... BTW FYI - our legal department forbid us to use the Hungarian stemmer because of licensing/legal issues. Besides the stemmers that were originally provided, the Snowball project accepted additional ones like the Hungarian stemmer. However, for that one we weren't able to get a confirmation from the contributor his University indeed gave him permission to contribute the code. Don't know if it matters to anyone here (I've notified Martin Porter as well), but FYI. Our legal department does not permit us to use it (which is not surprising - they are legal ...). I don't want to derail this issue into Snowball discussion, so if you want to talk about it, I suggest we move it to the list. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839259#action_12839259 ] Uwe Schindler commented on LUCENE-2285: --- bq. UnicodeUtil I reverted the whole class as it is very sensitive to binary encoding, so please leave it as it is. Tell me any @SuppressWarnings parameter that makes eclipse happy, I will add it! bq. TestCharacterUtils I missed that this test is junit4, in junit3 the casts are necessary. But if you bug me longer with these casts I will give the issue to somebody else :-) bq. AnalyzingQueyrParser I simply reverted everything with QueryParser in it, because it is normally generated code. :-) As I said before, let us first commit this patch. But i will no longer discuss about casts :-) Thanks for reviewing the patch, its a good help to get code cleaner! > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
snowball discussion on LUCENE-2285
i wanted to continue this here to not clog up the issue! Shai Erera commented on LUCENE-2285: > bq. I'd be curious to know what you did > > Ok, now you've made me compare the two :). I'm happy to see we both did the > same thing, only you call your buffer 'current' while I call it 'buf'. > Besides that I've included a static final EMPTY_ARGS instead of all the > places where 'new Object[0]' is passed. Nothing too fancy. > hmm, i didnt think of this second optimization, does it affect generated code or is it in Among/SnowballProgram? > > Another thing is that I wrote an Arabic and Hebrew stemmer, and combined > them w/ the Snowball ones by introducing a stemmer class which can be either > Snowball or anything else. I'll check if we're allowed to contribute the > Hebrew stemmer to Lucene ... > please do. as far as integration goes, i guess we took a different approach with LUCENE-2055 (from the Analyzer perspective, the user does not care if it uses snowball or something else behind the scenes, etc). > BTW FYI - our legal department forbid us to use the Hungarian stemmer > because of licensing/legal issues. Besides the stemmers that were originally > provided, the Snowball project accepted additional ones like the Hungarian > stemmer. However, for that one we weren't able to get a confirmation from > the contributor his University indeed gave him permission to contribute the > code. Don't know if it matters to anyone here (I've notified Martin Porter > as well), but FYI. Our legal department does not permit us to use it (which > is not surprising - they are legal ...). I don't want to derail this issue > into Snowball discussion, so if you want to talk about it, I suggest we move > it to the list. this is concerning to me, i mean the thing is sitting there on the universities website: http://ilps.science.uva.nl/resources/snowball-hun :) but if apache is concerned about this situation too, someone let me know and i can take savoy's (clearly marked BSD) and we can add that instead, and remove the ambiguous snowball one, even if its temporary: http://members.unine.ch/jacques.savoy/clef/index.html -- Robert Muir rcm...@gmail.com
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839262#action_12839262 ] Uwe Schindler commented on LUCENE-2285: --- I will commit the current patch soon and post the remaing diff of Shais original patch to the issue. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2089) explore using automaton for fuzzyquery
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2089: --- Attachment: LUCENE-2089.patch New rev of gen.py, that uses packed arrays for the states/offsets. It's much more compact -- Lev1 is now 5KB, Lev2 is 11KB, Lev3 is 160KB. And Lev3 compiles! (Robert now you need a test case for Lev3 ;) ). The class files are OK too: Lev1 3.9KB, Lev2 is 7.3KB, Lev3 is 102KB. > explore using automaton for fuzzyquery > -- > > Key: LUCENE-2089 > URL: https://issues.apache.org/jira/browse/LUCENE-2089 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: Flex Branch >Reporter: Robert Muir >Assignee: Mark Miller >Priority: Minor > Fix For: Flex Branch > > Attachments: ContrivedFuzzyBenchmark.java, gen.py, gen.py, gen.py, > gen.py, gen.py, gen.py, Lev2ParametricDescription.java, > Lev2ParametricDescription.java, Lev2ParametricDescription.java, > Lev2ParametricDescription.java, LUCENE-2089.patch, LUCENE-2089.patch, > LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, > LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089_concat.patch, > Moman-0.2.1.tar.gz, TestFuzzy.java > > > we can optimize fuzzyquery by using AutomatonTermsEnum. The idea is to speed > up the core FuzzyQuery in similar fashion to Wildcard and Regex speedups, > maintaining all backwards compatibility. > The advantages are: > * we can seek to terms that are useful, instead of brute-forcing the entire > terms dict > * we can determine matches faster, as true/false from a DFA is array lookup, > don't even need to run levenshtein. > We build Levenshtein DFAs in linear time with respect to the length of the > word: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652 > To implement support for 'prefix' length, we simply concatenate two DFAs, > which doesn't require us to do NFA->DFA conversion, as the prefix portion is > a singleton. the concatenation is also constant time with respect to the size > of the fuzzy DFA, it only need examine its start state. > with this algorithm, parametric tables are precomputed so that DFAs can be > constructed very quickly. > if the required number of edits is too large (we don't have a table for it), > we use "dumb mode" at first (no seeking, no DFA, just brute force like now). > As the priority queue fills up during enumeration, the similarity score > required to be a competitive term increases, so, the enum gets faster and > faster as this happens. This is because terms in core FuzzyQuery are sorted > by boost value, then by term (in lexicographic order). > For a large term dictionary with a low minimal similarity, you will fill the > pq very quickly since you will match many terms. > This not only provides a mechanism to switch to more efficient DFAs (edit > distance of 2 -> edit distance of 1 -> edit distance of 0) during > enumeration, but also to switch from "dumb mode" to "smart mode". > With this design, we can add more DFAs at any time by adding additional > tables. The tradeoff is the tables get rather large, so for very high K, we > would start to increase the size of Lucene's jar file. The idea is we don't > have include large tables for very high K, by using the 'competitive boost' > attribute of the priority queue. > For more information, see http://en.wikipedia.org/wiki/Levenshtein_automaton -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2089) explore using automaton for fuzzyquery
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2089: --- Attachment: LUCENE-2089.patch New patch... just fixes a few small things Uwe noticed (moved unpack method & MASKS up to super class; use newlines to make the massive tables a bit more friendly to look at). > explore using automaton for fuzzyquery > -- > > Key: LUCENE-2089 > URL: https://issues.apache.org/jira/browse/LUCENE-2089 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: Flex Branch >Reporter: Robert Muir >Assignee: Mark Miller >Priority: Minor > Fix For: Flex Branch > > Attachments: ContrivedFuzzyBenchmark.java, gen.py, gen.py, gen.py, > gen.py, gen.py, gen.py, Lev2ParametricDescription.java, > Lev2ParametricDescription.java, Lev2ParametricDescription.java, > Lev2ParametricDescription.java, LUCENE-2089.patch, LUCENE-2089.patch, > LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, > LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, > LUCENE-2089_concat.patch, Moman-0.2.1.tar.gz, TestFuzzy.java > > > we can optimize fuzzyquery by using AutomatonTermsEnum. The idea is to speed > up the core FuzzyQuery in similar fashion to Wildcard and Regex speedups, > maintaining all backwards compatibility. > The advantages are: > * we can seek to terms that are useful, instead of brute-forcing the entire > terms dict > * we can determine matches faster, as true/false from a DFA is array lookup, > don't even need to run levenshtein. > We build Levenshtein DFAs in linear time with respect to the length of the > word: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652 > To implement support for 'prefix' length, we simply concatenate two DFAs, > which doesn't require us to do NFA->DFA conversion, as the prefix portion is > a singleton. the concatenation is also constant time with respect to the size > of the fuzzy DFA, it only need examine its start state. > with this algorithm, parametric tables are precomputed so that DFAs can be > constructed very quickly. > if the required number of edits is too large (we don't have a table for it), > we use "dumb mode" at first (no seeking, no DFA, just brute force like now). > As the priority queue fills up during enumeration, the similarity score > required to be a competitive term increases, so, the enum gets faster and > faster as this happens. This is because terms in core FuzzyQuery are sorted > by boost value, then by term (in lexicographic order). > For a large term dictionary with a low minimal similarity, you will fill the > pq very quickly since you will match many terms. > This not only provides a mechanism to switch to more efficient DFAs (edit > distance of 2 -> edit distance of 1 -> edit distance of 0) during > enumeration, but also to switch from "dumb mode" to "smart mode". > With this design, we can add more DFAs at any time by adding additional > tables. The tradeoff is the tables get rather large, so for very high K, we > would start to increase the size of Lucene's jar file. The idea is we don't > have include large tables for very high K, by using the 'competitive boost' > attribute of the priority queue. > For more information, see http://en.wikipedia.org/wiki/Levenshtein_automaton -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: snowball discussion on LUCENE-2285
Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch the generated code, besides handling calling deprecated API. We've actually taken the same approach I think :). In my Analyzer, the user passes a Locale to create the proper Analyzer. The analyzer comes pre-configured w/ all bunch of filters, like those that handle email tokens produced by the tokenizer (or hosts, acronyms and more), character normalization, ngram/stemmer filters etc. The StemmerFilter creates the proper stemmer based on the language code, and for that I created a SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball ones. The wrapper is only needed for the stemmer filter instance ... I have on my TODO checking contrib/analyzers. Unfortunately our legal department is very suspicious of everything (guess they wouldn't make good legat folks otherwise ;)). If I'll want to use the contrib/analyzers, they'll need to scan the code and identify the owners of the various analyzers ... That's what's on my TODO - going through the process w/ them :). I personally think that the work you're doing on the analyzers is extraordinary, and since I don't have much time maintaining my own package, it has fallen a bit behind in terms of Unicode differences and such. I've come to appreciate the power of open source long ago - for me it'd be best to join forces on this analysis package. I'm sure that will happen one day :). About the Hungarian stemmer - Martin Porter told us that the original (12?) stemmers were written by him and so there's no IP issues. The rest were contributed by other people. All but the Hun contributor responded w/ their rights to contribute the code. It's just the Hun that never responded, even though we've sent a couple of emails. That is problematic. When someone contributes code to Lucene, he grants the ASF license (forgot the wording that's used). That's very reassuring to lawyers, because it doesn't leave them too exposed. But there isn't any similar process in Snowball ... I can look up the correspondence we've had with Martin Porter to refresh my memory on the detailds. Shai On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir wrote: > i wanted to continue this here to not clog up the issue! > > Shai Erera commented on LUCENE-2285: > >> bq. I'd be curious to know what you did >> >> Ok, now you've made me compare the two :). I'm happy to see we both did >> the same thing, only you call your buffer 'current' while I call it 'buf'. >> Besides that I've included a static final EMPTY_ARGS instead of all the >> places where 'new Object[0]' is passed. Nothing too fancy. >> > > hmm, i didnt think of this second optimization, does it affect generated > code or is it in Among/SnowballProgram? > >> >> Another thing is that I wrote an Arabic and Hebrew stemmer, and combined >> them w/ the Snowball ones by introducing a stemmer class which can be either >> Snowball or anything else. I'll check if we're allowed to contribute the >> Hebrew stemmer to Lucene ... >> > > please do. as far as integration goes, i guess we took a different > approach with LUCENE-2055 (from the Analyzer perspective, the user does not > care if it uses snowball or something else behind the scenes, etc). > > >> BTW FYI - our legal department forbid us to use the Hungarian stemmer >> because of licensing/legal issues. Besides the stemmers that were originally >> provided, the Snowball project accepted additional ones like the Hungarian >> stemmer. However, for that one we weren't able to get a confirmation from >> the contributor his University indeed gave him permission to contribute the >> code. Don't know if it matters to anyone here (I've notified Martin Porter >> as well), but FYI. Our legal department does not permit us to use it (which >> is not surprising - they are legal ...). I don't want to derail this issue >> into Snowball discussion, so if you want to talk about it, I suggest we move >> it to the list. > > > this is concerning to me, i mean the thing is sitting there on the > universities website: http://ilps.science.uva.nl/resources/snowball-hun :) > but if apache is concerned about this situation too, someone let me know > and i can take savoy's (clearly marked BSD) and we can add that instead, and > remove the ambiguous snowball one, even if its temporary: > http://members.unine.ch/jacques.savoy/clef/index.html > > > > -- > Robert Muir > rcm...@gmail.com >
Re: snowball discussion on LUCENE-2285
Can you open an issue for the new object[]? its sad about the hungarian issue. I'm inclined to think we should add savoy's and default to it instead. I don't see this as code duplication, as its a different alg. Normally just don't spend a lot of effort towards adding alternative stemmers, but here it makes sense. It sounds really exciting if you are able to merge in what you have done in the future! On Feb 27, 2010 1:16 PM, "Shai Erera" wrote: Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch the generated code, besides handling calling deprecated API. We've actually taken the same approach I think :). In my Analyzer, the user passes a Locale to create the proper Analyzer. The analyzer comes pre-configured w/ all bunch of filters, like those that handle email tokens produced by the tokenizer (or hosts, acronyms and more), character normalization, ngram/stemmer filters etc. The StemmerFilter creates the proper stemmer based on the language code, and for that I created a SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball ones. The wrapper is only needed for the stemmer filter instance ... I have on my TODO checking contrib/analyzers. Unfortunately our legal department is very suspicious of everything (guess they wouldn't make good legat folks otherwise ;)). If I'll want to use the contrib/analyzers, they'll need to scan the code and identify the owners of the various analyzers ... That's what's on my TODO - going through the process w/ them :). I personally think that the work you're doing on the analyzers is extraordinary, and since I don't have much time maintaining my own package, it has fallen a bit behind in terms of Unicode differences and such. I've come to appreciate the power of open source long ago - for me it'd be best to join forces on this analysis package. I'm sure that will happen one day :). About the Hungarian stemmer - Martin Porter told us that the original (12?) stemmers were written by him and so there's no IP issues. The rest were contributed by other people. All but the Hun contributor responded w/ their rights to contribute the code. It's just the Hun that never responded, even though we've sent a couple of emails. That is problematic. When someone contributes code to Lucene, he grants the ASF license (forgot the wording that's used). That's very reassuring to lawyers, because it doesn't leave them too exposed. But there isn't any similar process in Snowball ... I can look up the correspondence we've had with Martin Porter to refresh my memory on the detailds. Shai On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir wrote: > > i wanted to continue this...
[jira] Resolved: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2285. --- Resolution: Fixed Committed revision: 917019 I will attach the open changes as separate patch. Please reopen, if new changes. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2285: -- Attachment: LUCENE-2285-remaining+generated.patch LUCENE-2285-remaining.patch Here for reference the remaining changes of Shai that I rejected (mostly casts, which in my opinion should stay). Important: The UnicodeUtils for example are not tested in a way, that it is really sure, that missing casts do not change the code at all. In my opinion, this should stay like it is. The smaller one, LUCENE-2285-remaining.patch, contains the real changes. The bigger one LUCENE-2285-remaining+generated.patch also changes in generated code (just for reference). > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285-remaining+generated.patch, > LUCENE-2285-remaining.patch, LUCENE-2285.patch, LUCENE-2285.patch, > LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839307#action_12839307 ] Robert Muir commented on LUCENE-2089: - Mike, this is awesome. I ran benchmarks: we are just as fast as before (with only Lev1 and Lev2 enabled), but with smaller generated code. When i turn on Lev3, it speeds up the worst-case ones (no prefix, pq=1024, fuzzy of n=3, n=4), but slows down some of the "better-case" n=3/n=4 cases where there is a prefix or PQ. I think this is because the benchmark is contrived, but realistically n=3 (with seeking!) should be a win for users. A less-contrived benchmark (a 'typical' massive term dictionary) would help for tuning. separately, I think we can add heuristics: e.g. for n > 3 WITH a prefix, use the DFA in "linear mode" until you drop to n=2, as you already have a nice prefix anyway, stuff like that. But if the user doesn't supply a prefix, i think seeking is always a win. Here are the results anyway: I ran it many times and its consistent (obviously differences of just a few MS are not significant). I bolded the ones i think illustrate the differences I am talking about. Its cool to be at the point where we are actually able to measure these kinds of tradeoffs! {{Minimum Sim = 0.73f (edit distance of 1)}} ||Prefix Length||PQ Size||Avg MS (flex trunk)||Avg MS (1,2)||Avg MS (1,2,3)|| |0|1024|3286.0|7.8|7.6 |0|64|3320.4|7.6|8.0 |1|1024|316.8|5.6|5.3 |1|64|314.3|5.6|5.2 |2|1024|31.8|3.8|4.2 |2|64|31.9|3.7|4.5 {{Minimum Sim = 0.58f (edit distance of 2)}} ||Prefix Length||PQ Size||Avg MS (flex trunk)||Avg MS (1,2)||Avg MS (1,2,3)|| |0|1024|4223.3|87.7|91.2 |0|64|4199.7|12.6|13.2 |1|1024|430.1|56.4|62.0 |1|64|392.8|9.3|8.5 |2|1024|82.5|45.5|48.0 |2|64|38.4|6.2|6.3 {{Minimum Sim = 0.43f (edit distance of 3)}} ||Prefix Length||PQ Size||Avg MS (flex trunk)||Avg MS (1,2)||Avg MS (1,2,3)|| |0|1024|5299.9|424.0|*199.8* |0|64|5231.8|54.1|*93.2* |1|1024|522.9|103.6|107.9 |1|64|480.9|14.5|*49.3* |2|1024|89.0|67.9|70.8 |2|64|46.3|6.8|*19.7* {{Minimum Sim = 0.29f (edit distance of 4)}} ||Prefix Length||PQ Size||Avg MS (flex trunk)||Avg MS (1,2)||Avg MS (1,2,3)|| |0|1024|6258.1|363.7|*206.5* |0|64|6247.6|75.6|78.8 |1|1024|609.9|108.3|110.0 |1|64|567.1|13.3|*45.5* |2|1024|98.6|66.6|73.8 |2|64|55.6|6.8|*22.3* > explore using automaton for fuzzyquery > -- > > Key: LUCENE-2089 > URL: https://issues.apache.org/jira/browse/LUCENE-2089 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: Flex Branch >Reporter: Robert Muir >Assignee: Mark Miller >Priority: Minor > Fix For: Flex Branch > > Attachments: ContrivedFuzzyBenchmark.java, gen.py, gen.py, gen.py, > gen.py, gen.py, gen.py, Lev2ParametricDescription.java, > Lev2ParametricDescription.java, Lev2ParametricDescription.java, > Lev2ParametricDescription.java, LUCENE-2089.patch, LUCENE-2089.patch, > LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, > LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, > LUCENE-2089_concat.patch, Moman-0.2.1.tar.gz, TestFuzzy.java > > > we can optimize fuzzyquery by using AutomatonTermsEnum. The idea is to speed > up the core FuzzyQuery in similar fashion to Wildcard and Regex speedups, > maintaining all backwards compatibility. > The advantages are: > * we can seek to terms that are useful, instead of brute-forcing the entire > terms dict > * we can determine matches faster, as true/false from a DFA is array lookup, > don't even need to run levenshtein. > We build Levenshtein DFAs in linear time with respect to the length of the > word: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652 > To implement support for 'prefix' length, we simply concatenate two DFAs, > which doesn't require us to do NFA->DFA conversion, as the prefix portion is > a singleton. the concatenation is also constant time with respect to the size > of the fuzzy DFA, it only need examine its start state. > with this algorithm, parametric tables are precomputed so that DFAs can be > constructed very quickly. > if the required number of edits is too large (we don't have a table for it), > we use "dumb mode" at first (no seeking, no DFA, just brute force like now). > As the priority queue fills up during enumeration, the similarity score > required to be a competitive term increases, so, the enum gets faster and > faster as this happens. This is because terms in core FuzzyQuery are sorted > by boost value, then by term (in lexicographic order). > For a large term dictionary with a low minimal similarity, you will fill the > pq very quickly since you will match many terms. > This not only provides a mechanism to switch to more efficient DFAs (edit > distance
[jira] Created: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]
Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0] -- Key: LUCENE-2288 URL: https://issues.apache.org/jira/browse/LUCENE-2288 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE--2288.patch Instead of allocating new Object[0] create a proper constant in SnowballProgram. The same (for new Class[0]) is created in Among, although it's less critical because Among is called from static initializers ... Patch will follow shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]
[ https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2288: --- Attachment: LUCENE--2288.patch Patch w/ the trivial change. > Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new > Object[0] > -- > > Key: LUCENE-2288 > URL: https://issues.apache.org/jira/browse/LUCENE-2288 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Shai Erera > Fix For: 3.1 > > Attachments: LUCENE--2288.patch > > > Instead of allocating new Object[0] create a proper constant in > SnowballProgram. The same (for new Class[0]) is created in Among, although > it's less critical because Among is called from static initializers ... Patch > will follow shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: snowball discussion on LUCENE-2285
I created LUCENE-2288 for handling the Object[] thingy in SnowballProgram (and Class[] in Among). Shai On Sat, Feb 27, 2010 at 8:48 PM, Robert Muir wrote: > Can you open an issue for the new object[]? its sad about the hungarian > issue. I'm inclined to think we should add savoy's and default to it > instead. I don't see this as code duplication, as its a different alg. > Normally just don't spend a lot of effort towards adding alternative > stemmers, but here it makes sense. > > It sounds really exciting if you are able to merge in what you have done in > the future! > > On Feb 27, 2010 1:16 PM, "Shai Erera" wrote: > > Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch > the generated code, besides handling calling deprecated API. > > We've actually taken the same approach I think :). In my Analyzer, the user > passes a Locale to create the proper Analyzer. The analyzer comes > pre-configured w/ all bunch of filters, like those that handle email tokens > produced by the tokenizer (or hosts, acronyms and more), character > normalization, ngram/stemmer filters etc. The StemmerFilter creates the > proper stemmer based on the language code, and for that I created a > SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball > ones. The wrapper is only needed for the stemmer filter instance ... > > I have on my TODO checking contrib/analyzers. Unfortunately our legal > department is very suspicious of everything (guess they wouldn't make good > legat folks otherwise ;)). If I'll want to use the contrib/analyzers, > they'll need to scan the code and identify the owners of the various > analyzers ... That's what's on my TODO - going through the process w/ them > :). > > I personally think that the work you're doing on the analyzers is > extraordinary, and since I don't have much time maintaining my own package, > it has fallen a bit behind in terms of Unicode differences and such. I've > come to appreciate the power of open source long ago - for me it'd be best > to join forces on this analysis package. I'm sure that will happen one day > :). > > About the Hungarian stemmer - Martin Porter told us that the original (12?) > stemmers were written by him and so there's no IP issues. The rest were > contributed by other people. All but the Hun contributor responded w/ their > rights to contribute the code. It's just the Hun that never responded, even > though we've sent a couple of emails. That is problematic. When someone > contributes code to Lucene, he grants the ASF license (forgot the wording > that's used). That's very reassuring to lawyers, because it doesn't leave > them too exposed. But there isn't any similar process in Snowball ... I can > look up the correspondence we've had with Martin Porter to refresh my memory > on the detailds. > Shai > > On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir wrote: > > > > i wanted to continue this... > >
[jira] Commented: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]
[ https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839311#action_12839311 ] Shai Erera commented on LUCENE-2288: Forgot to mention all analysis tests pass. > Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new > Object[0] > -- > > Key: LUCENE-2288 > URL: https://issues.apache.org/jira/browse/LUCENE-2288 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Shai Erera > Fix For: 3.1 > > Attachments: LUCENE--2288.patch > > > Instead of allocating new Object[0] create a proper constant in > SnowballProgram. The same (for new Class[0]) is created in Among, although > it's less critical because Among is called from static initializers ... Patch > will follow shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2289) Calls to SegmentInfos.message should be wrapped w/ infoStream != null checks
Calls to SegmentInfos.message should be wrapped w/ infoStream != null checks Key: LUCENE-2289 URL: https://issues.apache.org/jira/browse/LUCENE-2289 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Fix For: 3.1 To avoid the expensive message creation (which involves the '+' operator on strings, calls to message should be wrapped w/ infoStream != null check, rather than inside message(). I'll attach a patch which does that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2289) Calls to SegmentInfos.message should be wrapped w/ infoStream != null checks
[ https://issues.apache.org/jira/browse/LUCENE-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2289: --- Attachment: LUCENE--2289.patch Patch w/ the proposed changes. > Calls to SegmentInfos.message should be wrapped w/ infoStream != null checks > > > Key: LUCENE-2289 > URL: https://issues.apache.org/jira/browse/LUCENE-2289 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shai Erera > Fix For: 3.1 > > Attachments: LUCENE--2289.patch > > > To avoid the expensive message creation (which involves the '+' operator on > strings, calls to message should be wrapped w/ infoStream != null check, > rather than inside message(). I'll attach a patch which does that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]
[ https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-2288: --- Assignee: Robert Muir > Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new > Object[0] > -- > > Key: LUCENE-2288 > URL: https://issues.apache.org/jira/browse/LUCENE-2288 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Shai Erera >Assignee: Robert Muir > Fix For: 3.1 > > Attachments: LUCENE--2288.patch > > > Instead of allocating new Object[0] create a proper constant in > SnowballProgram. The same (for new Class[0]) is created in Among, although > it's less critical because Among is called from static initializers ... Patch > will follow shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]
[ https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839315#action_12839315 ] Robert Muir commented on LUCENE-2288: - thanks Shai, the patch looks good to me, though i hope it only affects the Lovins stemmer (or in case someone has written their own Snowball stemmer), as the others should not be using this reflection! will commit in a few days unless someone objects. > Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new > Object[0] > -- > > Key: LUCENE-2288 > URL: https://issues.apache.org/jira/browse/LUCENE-2288 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Shai Erera > Fix For: 3.1 > > Attachments: LUCENE--2288.patch > > > Instead of allocating new Object[0] create a proper constant in > SnowballProgram. The same (for new Class[0]) is created in Among, although > it's less critical because Among is called from static initializers ... Patch > will follow shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings
[ https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839362#action_12839362 ] Shai Erera commented on LUCENE-2285: Thanks Uwe for committing this. I think that further discussion is pointless if you feel that I *bug* you, and you "will no longer discuss about casts" ... Kind of kills any chance of having a serious and 'open' discussion. I can live with the code as it is now ... If someone else feels otherwise, then I don't mind to continue discuss this. > Code cleanup from all sorts of (trivial) warnings > - > > Key: LUCENE-2285 > URL: https://issues.apache.org/jira/browse/LUCENE-2285 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2285-remaining+generated.patch, > LUCENE-2285-remaining.patch, LUCENE-2285.patch, LUCENE-2285.patch, > LUCENE-2285.patch > > > I would like to do some code cleanup and remove all sorts of trivial > warnings, like unnecessary casts, problems w/ javadocs, unused variables, > redundant null checks, unnecessary semicolon etc. These are all very trivial > and should not pose any problem. > I'll create another issue for getting rid of deprecated code usage, like > LuceneTestCase and all sorts of deprecated constructors. That's also trivial > because it only affects Lucene code, but it's a different type of change. > Another issue I'd like to create is about introducing more generics in the > code, where it's missing today - not changing existing API. There are many > places in the code like that. > So, with you permission, I'll start with the trivial ones first, and then > move on to the others. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]
[ https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839364#action_12839364 ] Shai Erera commented on LUCENE-2288: Thanks Robert. I never checked if those methods are actually code, as I didn't do it to earn any CPU cycles back. I just followed good coding practice, and since it appeared in two places, thought that a constant will look like a bit less waste. If you're sure those are not called by the other stemmers (and I'm sure you are :)), then I'm fine if you leave those out as well ;) > Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new > Object[0] > -- > > Key: LUCENE-2288 > URL: https://issues.apache.org/jira/browse/LUCENE-2288 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Shai Erera >Assignee: Robert Muir > Fix For: 3.1 > > Attachments: LUCENE--2288.patch > > > Instead of allocating new Object[0] create a proper constant in > SnowballProgram. The same (for new Class[0]) is created in Among, although > it's less critical because Among is called from static initializers ... Patch > will follow shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Adding .classpath.tmpl
I uploaded the file to http://wiki.apache.org/lucene-java/HowToContribute(bottom of the page). But I don't see any good spot to stuff it in the README. There is no pointer to the HowToContribute page at all, nor to the code formatting styles ... what do you think - create such section at the bottom of README, or leave it out? On Fri, Feb 26, 2010 at 2:58 PM, Shai Erera wrote: > Thanks for your response. I will update the Wiki with the file. After I do > that, I'll add some text to the README file. I'll need one of you to help me > commit it though. > > Thanks again, > Shai > > > On Thu, Feb 25, 2010 at 6:21 PM, Mark Miller wrote: > >> +1 - I'd prefer this stay out of svn as well - I'd rather it go on the >> wiki too - perhaps in the same place that you can find the formatting file >> for eclipse and intellij. >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> On 02/25/2010 11:10 AM, Grant Ingersoll wrote: >> >>> To me, this is stuff that can go on the wiki or somewhere else, otherwise >>> over time, there will be others to add in, etc. We could simply add a >>> pointer to the wiki page in the README. >>> >>> On Feb 24, 2010, at 11:55 PM, Shai Erera wrote: >>> >>> >>> Hi I always find it annoying when I checkout the code to a new project in eclipse, that I need to put everything that I care about in the classpath and adding the dependent libraries. On another project I'm involved with, we did that process once, adding all the source code to the classpath and the libraries and created a .classpath.tmpl. Now when people checkout the code, they can copy the content of that file to their .classpath file and setting up the project is reducing from a couple of minutes to few seconds. I don't want to check-in .classpath because not everyone wants all the code in their classpath. I attached such file to the mail. Note that the only dependency which will break on other machines is the ant.jar dependency, which on my Windows is located under c:\ant. That jar is required to compile contrib/ant from eclipse. Not sure how to resolve that, except besides removing that line from the file and document separately that that's what you need to do if you want to add contrib/ant ... The file is sorted by name, putting the core stuff at the top - so it's easy for people to selectively add the interesting packages. I don't know if an issue is required, if so I can create it in and move the discussion there. Shai - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >>> - >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >>> >> >> >> >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >
Turning IndexReader.isDeleted implementations to final
Hi Do you think it's worth to make some of the isDeleted method impls final, like in ReadOnlySegmentReader and (maybe) DirectoryReader? I'm thinking the classes that are perceived as final could benefit from that, since their impl could be inlined. Maybe just make these classes final entirely? Shai