[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839187#action_12839187
 ] 

Uwe Schindler commented on LUCENE-2285:
---

Hi Shai,

I applied the patch to a fresh checkout and get no compile errors. Are you sure 
that the patch applied correctly? I am working in Windows, so if you are not 
using a patch-apply tool like TortoiseSVN that can accept windows line endings, 
you have to maybe use dos2unix before?

All tests pass here and no compile errors, also in demo webapp and so on (using 
ANT).

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839187#action_12839187
 ] 

Uwe Schindler edited comment on LUCENE-2285 at 2/27/10 8:08 AM:


Hi Shai,

I applied the patch to a fresh checkout and get no compile errors. Are you sure 
that the patch applied correctly? I am working in Windows, so if you are not 
using a patch-apply tool like TortoiseSVN that can accept windows line endings, 
you have to maybe use dos2unix before?

And don't forget to update your package in Eclipse (press F5). I had this type 
of errors very often because Eclipse caches the sources.

All tests pass here and no compile errors, also in demo webapp and so on (using 
ANT).

  was (Author: thetaphi):
Hi Shai,

I applied the patch to a fresh checkout and get no compile errors. Are you sure 
that the patch applied correctly? I am working in Windows, so if you are not 
using a patch-apply tool like TortoiseSVN that can accept windows line endings, 
you have to maybe use dos2unix before?

All tests pass here and no compile errors, also in demo webapp and so on (using 
ANT).
  
> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839191#action_12839191
 ] 

Uwe Schindler commented on LUCENE-2037:
---

Erick,

thats already fixed in trunk with my last commit, as you noticed! It exactly 
does what also rules.TestName does :-) -- I found this class later too and 
realized that it does the same -- only that lucene has the method in the base 
class for better migration experience :-).

Yesterday I also wrote an extra test assertion, that verifies, that the prorted 
testcase has all methods starting with "test" assigned a @Test. Robert and me 
wants to maybe apply this patch during the migration phase when developers are 
not yet using Junit4 so long and forget to add @Test. The path is very rough 
and maybe optimized (if @BeforeClass could be used, but cannot as static).

The string-ctors are not used in lucene, as the testName in Lucene should be 
automatically from the current method. The additional ctors in Lucene's tests 
were only very very very old junit3 relicts (later versions of junit3 also do 
not need it anymore, they set the test name after instantiating).

> Allow Junit4 tests in our environment.
> --
>
> Key: LUCENE-2037
> URL: https://issues.apache.org/jira/browse/LUCENE-2037
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Affects Versions: 3.1
> Environment: Development
>Reporter: Erick Erickson
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: junit-4.7.jar, LUCENE-2037-getName.patch, 
> LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037.patch, 
> LUCENE-2037_remove_testwatchman.patch, LUCENE-2037_revised_2.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate 
> Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should 
> have to be rewritten. We should start this for the 3.1 release so we can get 
> a clean 3.0 out smoothly.
> It's probably worthwhile to convert a small set of tests as an exemplar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2037) Allow Junit4 tests in our environment.

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839191#action_12839191
 ] 

Uwe Schindler edited comment on LUCENE-2037 at 2/27/10 8:16 AM:


Erick,

thats already fixed in trunk with my last commit, as you noticed! It exactly 
does what also rules.TestName does :-) -- I found this class later too and 
realized that it does the same -- only that lucene has the method in the base 
class for better migration experience :-).

Yesterday I also wrote an extra test assertion, that verifies, that the ported 
test class has all methods starting with "test" annotated with @Test. Robert 
and me wants to maybe apply this patch during the migration phase when 
developers are not yet using Junit4 so long and forget to add @Test. The path 
is very rough and maybe optimized (if @BeforeClass could be used, but cannot as 
static).

The string-ctors are not used in lucene, as the testName in Lucene should be 
automatically from the current method. The additional ctors in Lucene's tests 
were only very very very old junit3 relicts (later versions of junit3 also do 
not need it anymore, they set the test name after instantiating).

  was (Author: thetaphi):
Erick,

thats already fixed in trunk with my last commit, as you noticed! It exactly 
does what also rules.TestName does :-) -- I found this class later too and 
realized that it does the same -- only that lucene has the method in the base 
class for better migration experience :-).

Yesterday I also wrote an extra test assertion, that verifies, that the prorted 
testcase has all methods starting with "test" assigned a @Test. Robert and me 
wants to maybe apply this patch during the migration phase when developers are 
not yet using Junit4 so long and forget to add @Test. The path is very rough 
and maybe optimized (if @BeforeClass could be used, but cannot as static).

The string-ctors are not used in lucene, as the testName in Lucene should be 
automatically from the current method. The additional ctors in Lucene's tests 
were only very very very old junit3 relicts (later versions of junit3 also do 
not need it anymore, they set the test name after instantiating).
  
> Allow Junit4 tests in our environment.
> --
>
> Key: LUCENE-2037
> URL: https://issues.apache.org/jira/browse/LUCENE-2037
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Affects Versions: 3.1
> Environment: Development
>Reporter: Erick Erickson
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: junit-4.7.jar, LUCENE-2037-getName.patch, 
> LUCENE-2037.patch, LUCENE-2037.patch, LUCENE-2037.patch, 
> LUCENE-2037_remove_testwatchman.patch, LUCENE-2037_revised_2.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate 
> Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should 
> have to be rewritten. We should start this for the 3.1 release so we can get 
> a clean 3.0 out smoothly.
> It's probably worthwhile to convert a small set of tests as an exemplar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2037) Allow Junit4 tests in our environment.

2010-02-27 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2037:
--

Attachment: LUCENE-2037-LegacyChecker.patch

Here the patch for the additional assertion to test if all ported test classes 
have all @Test added to all methods starting with "test".

> Allow Junit4 tests in our environment.
> --
>
> Key: LUCENE-2037
> URL: https://issues.apache.org/jira/browse/LUCENE-2037
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Affects Versions: 3.1
> Environment: Development
>Reporter: Erick Erickson
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: junit-4.7.jar, LUCENE-2037-getName.patch, 
> LUCENE-2037-LegacyChecker.patch, LUCENE-2037.patch, LUCENE-2037.patch, 
> LUCENE-2037.patch, LUCENE-2037_remove_testwatchman.patch, 
> LUCENE-2037_revised_2.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate 
> Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should 
> have to be rewritten. We should start this for the 3.1 release so we can get 
> a clean 3.0 out smoothly.
> It's probably worthwhile to convert a small set of tests as an exemplar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839197#action_12839197
 ] 

Uwe Schindler commented on LUCENE-2285:
---

Maybe the patch is also outdated, i goes against: Revision: 916685, maybe you 
can downgrade your checkout using "svn up -r916685", patch and upgrade again.

I use TortoiseSVN's TortoiseMerge patch tool, which is more intelligent and 
also applies very old patches wizthout problems. It works like the new svn 
patch in svn 1.7x trunk: It uses the revision numbers in the patch's file 
headers and fetches automatically the requested version, patches it and then 
updates it to the version of your checkout. By that it makes use of the 
standard update tools of svn and patches always apply without any moved HUNK 
problems and so on.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839202#action_12839202
 ] 

Shai Erera commented on LUCENE-2285:


Ok I see the problem now - because there are so many files, I didn't see while 
applying the patch, that there are errors (mismatch) on some files, therefore 
the patch wasn't applied to them, hence the compile errors. I apply the patch 
w/ eclipse. The problematic files are tests in the analyzers package, and I 
suspect it's an encoding issue. My source encoding is UTF-8, yet still when I 
apply the patch I see different characters on the source and patch file. Not 
sure where the problem is. The patch file which I downloaded is UTF-8 as well, 
and TestArabicAnalyzer (for example) contains the correct characters in Arabic 
(in both the patch file and my checkout copy). Yet still when I apply the patch 
eclipse doesn't read it as UTF-8 ...

Uwe, how about if we do this issue in multiple commits? I.e., commit what 
you've done so far (which is what we agree on), then I can update, review the 
remaining warnings and we continue from there? Anyway after you commit this 
there will be warnings, and over time more warnings will creep in, so we'll 
need to do some cleanup again :). Is that acceptable?

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839203#action_12839203
 ] 

Shai Erera commented on LUCENE-2285:


Strange ... something's wrong w/ eclipse and how it read the patch file? I 
tried to apply the patch which I created originally (and was on my computer, 
not downloaded from JIRA) and it fails on the same files ... any ideas?

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839208#action_12839208
 ] 

Shai Erera commented on LUCENE-2285:


Googling around I see it's a known problem in Eclipse w/ no solution yet (at 
least I haven't come across one). Uwe - can you proceed w/ the commit like I 
suggested and then we review the remaining warnings?

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839211#action_12839211
 ] 

Uwe Schindler commented on LUCENE-2285:
---

bq. Strange ... something's wrong w/ eclipse and how it read the patch file? I 
tried to apply the patch which I created originally (and was on my computer, 
not downloaded from JIRA) and it fails on the same files ... any ideas? 

No, sorry. I don't use Eclipse at all, only for some refactoring.

I will commit the patch at the weekend and then update a second svn tree with 
your old patch applied. "svn up" then makes it able to provide a patch with the 
left out changes, we don't want to apply (some casts, sorry, and we won't fix 
them). I just say it again: it compiles without any warnings using javac, we 
cannot support stupiud warnings of other IDEs during our development, as 
Eclipse is no official requirement. So I still strongy suggest to disable some 
of the warnings already mentioned.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839212#action_12839212
 ] 

Shai Erera commented on LUCENE-2285:


Ok, perhaps you misunderstood me. I suggested to commit *your* version of the 
patch, and then afterwards we can discuss the remaining warnings that are 
controversial. We both agree on your version. We disagree on the diff, right? 
So let's start w/ yours, and then we can continue arguing about the rest :).

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839216#action_12839216
 ] 

Uwe Schindler commented on LUCENE-2285:
---

I understood that, I just wanted to say that some warnings will reappear with 
my patch, because I removed lots of generated code. Thats all I wanted to say 
:-)

Sorry, I don't want to nitpick - this issue is somehow about different opinions 
on code style and warnings - e.g. i totally aggree with renaming private vars 
that hide protected ones and so on. I also want to fix generics (I am the 
"official generics police" *g*). But i simply want something in the code that 
explains the code better and prevents *future* errors, even if it is a cast too 
much. :-) I also applied the unused variable fixes, although in test I think 
its better to just add a "fake" local variable and place a 
@SupressWarning("unchecked") before it (you cannot apply this annotation to 
simple statements). So your compiler complains about unused variables - but how 
to fix that without placing the SuppressWarnings before the method? Which is 
bad, as I want to only place it before one code line. In TestVirtualMethod, I 
fixed this by splitting the bigger test method into two smaller ones, only one 
with SuppressWarnings. But I still prefer to assign to a local unused var to be 
able to place the annotation before (maybe thats a bug in Java5, that you 
cannot add annotations to single statements). Maybe do some "fake" operation on 
the variable like an assertion to mark it "used" *g*.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839219#action_12839219
 ] 

Shai Erera commented on LUCENE-2285:


Ok then we understand each other. Indeed I have a different opinion about 
casts. They are called unnecessary because of a reason. When a method declares 
it returns an int, there no reason to cast a char to an int. The compiler will 
do it for you. More than that, if the method will declare it returns a long in 
the future, the cast will generate a compile error.

Styling like that always generate lots of opinions :). We shouldn't however 
limit ourselves to only two opinions. The fact that you are not using Eclipse, 
and therefore don't see all the warnings, doesn't mean the rest of us who do 
use eclipse should see them. If they are justified then ok, but otherwise, 
saying 'javac does not complain' is just not enough for me. Eclipse is where I 
spend a large portion of my day in ...

So I suggest we get more opinions from others. It's not just about what you or 
I think ... but we can do this after the larger portion of the warnings is 
removed.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839224#action_12839224
 ] 

Uwe Schindler commented on LUCENE-2285:
---

bq. When a method declares it returns an int, there no reason to cast a char to 
an int. The compiler will do it for you. 

I aggree with that, those casts were left in the patch, no problem at all. For 
me it is a problem esp. in some file handling methods that use longs, but 
sometimes also use ints (e.g. when reading a block of data). One example: 
MMapDirectory had a lot of problems with integer overflows in the past. The 
problems occurred because some formulas were simply not using casts at all, so 
the compiler did what was in the specs, but which is not always easy to see. So 
if you explicitely add a cast to (long) in your formula you are fine and really 
nobody gets hurt. An everybody understands whats happening. The Sun Code 
formatting guidelines explicitely says that, that you should use integer casts, 
if it is for more clarity. And if you dont like the warnings, then switch them 
off for only lucene in eclipse (you dont need to do that globally).

I dont agree with using char inside a method when calling other functions 
without casting to int. E.g. we have some backwards compatibility layer for 
Unicode 4 that uses Character.toUpperCase(int) and Character.toUpperCase(char) 
in parallel. And these two methods differ, so i explicitely cast to be sure, 
which method is called (that was especially important (for Lucene 2.9) in Java 
1.4 when compiled with Java 5 - because the javac could use the wrong method 
without casting to char (even with -source 1.4 -target 1.4) etc. For easy 
merging to 2.9 (as it is still supported), I want to keep the casts. Thats all. 
If you like, add some @SuppressWarnings("foobar") to it if you would be happy.

bq. More than that, if the method will declare it returns a long in the future, 
the cast will generate a compile error. 

Changing return values of public methods will never happen. And if somebody 
would do this by accident, the cast helps to find the error :-) -- thats only a 
funny addition.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1990) Add unsigned packed int impls in oal.util

2010-02-27 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1990:
---

Affects Version/s: Flex Branch
Fix Version/s: Flex Branch

> Add unsigned packed int impls in oal.util
> -
>
> Key: LUCENE-1990
> URL: https://issues.apache.org/jira/browse/LUCENE-1990
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Flex Branch
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: Flex Branch
>
> Attachments: generated_performance-te20100226.txt, 
> LUCENE-1990-te20100122.patch, LUCENE-1990-te20100210.patch, 
> LUCENE-1990-te20100212.patch, LUCENE-1990-te20100223.patch, 
> LUCENE-1990-te20100226.patch, LUCENE-1990-te20100226b.patch, 
> LUCENE-1990-te20100226c.patch, LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip, perf-mkm-20100227.txt, 
> performance-te20100226.txt
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1990) Add unsigned packed int impls in oal.util

2010-02-27 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1990:
---

Attachment: perf-mkm-20100227.txt

{quote}
bq. Airplane blocking snow drifts!? Where on earth are you anyway?

In Denmark. The guy responsible for clearing the runway did indeed clear the 
runway. He just forgot that the plane needs to taxi into the runway in the 
first place. That made us miss our connecting flight.
{quote}

Good grief!

{quote}
bq. It's very interesting that align is never a win - I think in that case 
removing it makes sense? It'll be a nice simplification.

Well, practically never wins for the machines I tested on and never wins with 
my implementation.
{quote}

I think we should remove it...

{quote}
bq. Did we ever test performance of the specialized (generated) decoders using 
switch statements?

I just did a quick hack in order to measure performance and I was very 
surprised that the generated switch-based implementations performs so well. 
It's nearly on par with packed most of the time and exceeds it in some cases. I 
only tested on 3 machines though. The hack is in the 
LUCENE-1990-te20100226c.patch and is called when the performance test is 
executed.
{quote}

Thanks for testing this!  It is interesting.

I ran the perf test on a CentOS 5.4 machine, java
1.6.0_17-b04 64 bit server, Intel core 2 duo E8400 (3 ghz) -- attached
perf-mkm-20100227.txt.  I also show the switch impl close, though
always a bit behind.

Seems like we should just stick with the non-gen'd packed impl?

bq. Note to self: Switch is not equivalent to a series of if-else, when we're 
talking performance and when we switch without omissions in the cases.

Right, if the switch cases are compact, it should compile into a fast jump
table (though it may still do an unecessary bounds check).

I think, once we removed aligned, this is ready to commit?  I think we
should land this on flex branch?  (It's using CodecUtil, BytesRef --
I'll merge them when I commit).  Then I can cutover the terms index to
use packed ints.


> Add unsigned packed int impls in oal.util
> -
>
> Key: LUCENE-1990
> URL: https://issues.apache.org/jira/browse/LUCENE-1990
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Flex Branch
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: Flex Branch
>
> Attachments: generated_performance-te20100226.txt, 
> LUCENE-1990-te20100122.patch, LUCENE-1990-te20100210.patch, 
> LUCENE-1990-te20100212.patch, LUCENE-1990-te20100223.patch, 
> LUCENE-1990-te20100226.patch, LUCENE-1990-te20100226b.patch, 
> LUCENE-1990-te20100226c.patch, LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip, perf-mkm-20100227.txt, 
> performance-te20100226.txt
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839242#action_12839242
 ] 

Robert Muir commented on LUCENE-2285:
-

bq. Googling around I see it's a known problem in Eclipse w/ no solution yet 
(at least I haven't come across one). Uwe - can you proceed w/ the commit like 
I suggested and then we review the remaining warnings?

this drives me nuts. here's what you can do: copy the entire patch to 
clipboard, then apply patch from clipboard, no encoding problems.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839254#action_12839254
 ] 

Shai Erera commented on LUCENE-2285:


bq. copy the entire patch to clipboard

Super Robert ! That worked !!

Now that I apply the patch, I'm back to 1,400 warnings (900 up). Many of them 
related to generated code and Snowball, but here are few comments:
* AnalyzingQueyrParser (contrib/misc), line 144 --> wlist cannot be null at 
this point because it is created (line 80) as new ArrayList. The same should be 
removed in line 161, though Eclipse does not complain, which is weird. But 
wlist cannot be null.
** Besides, the entire code segment in lines 158-164 can be improved, but let's 
leave it outside the scope of this issue.
* TestCharacterUtils - Uwe, note how there are many unnecessary casts to int 
(from char), while the actual assert method that's called is a long :). Do you 
still think these are required?
* UnicodeUtil - the chars are cast to int in code like this: _int utf32 = (int) 
str.charAt(i); --> Is that necessary too?

Besides these, I'm fine w/ the rest. Still a 2400 warnings reduction, and many 
of the ones left are either in generated code, or related to deliberate use of 
deprecated API.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839254#action_12839254
 ] 

Shai Erera edited comment on LUCENE-2285 at 2/27/10 3:00 PM:
-

bq. copy the entire patch to clipboard

Super Robert ! That worked !!

Now that I apply the patch, I'm back to 1,400 warnings (900 up). Many of them 
related to generated code and Snowball, but here are few comments:
* AnalyzingQueyrParser (contrib/misc), line 144 --> wlist cannot be null at 
this point because it is created (line 80) as new ArrayList. The same should be 
removed in line 161, though Eclipse does not complain, which is weird. But 
wlist cannot be null.
** Besides, the entire code segment in lines 158-164 can be improved, but let's 
leave it outside the scope of this issue.
* TestCharacterUtils - Uwe, note how there are many unnecessary casts to int 
(from char), while the actual assert method that's called is a long :). Do you 
still think these are required?
* UnicodeUtil - the chars are cast to int in code like this: _int utf32 = (int) 
str.charAt(i+1);_ --> Is that necessary too?

Besides these, I'm fine w/ the rest. Still a 2400 warnings reduction, and many 
of the ones left are either in generated code, or related to deliberate use of 
deprecated API.

  was (Author: shaie):
bq. copy the entire patch to clipboard

Super Robert ! That worked !!

Now that I apply the patch, I'm back to 1,400 warnings (900 up). Many of them 
related to generated code and Snowball, but here are few comments:
* AnalyzingQueyrParser (contrib/misc), line 144 --> wlist cannot be null at 
this point because it is created (line 80) as new ArrayList. The same should be 
removed in line 161, though Eclipse does not complain, which is weird. But 
wlist cannot be null.
** Besides, the entire code segment in lines 158-164 can be improved, but let's 
leave it outside the scope of this issue.
* TestCharacterUtils - Uwe, note how there are many unnecessary casts to int 
(from char), while the actual assert method that's called is a long :). Do you 
still think these are required?
* UnicodeUtil - the chars are cast to int in code like this: _int utf32 = (int) 
str.charAt(i); --> Is that necessary too?

Besides these, I'm fine w/ the rest. Still a 2400 warnings reduction, and many 
of the ones left are either in generated code, or related to deliberate use of 
deprecated API.
  
> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839256#action_12839256
 ] 

Robert Muir commented on LUCENE-2285:
-

bq. In my internal project, I also make use of Snowball code directly, and 
improved it to fit better in the analysis process. I should actually diff my 
changes w/ yours, perhaps I can use yours, or contribute mine.

I'd be curious to know what you did, if its possible for you to, I'd like to 
hear what you did (maybe on the list so it wont be lost on this issue?)

my recent mods to snowball itself are on LUCENE-2194, LUCENE-2201. I think 
previously Karl modified it so that it doesnt use reflection (except the Lovins 
stemmer)

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839258#action_12839258
 ] 

Shai Erera commented on LUCENE-2285:


bq. I'd be curious to know what you did

Ok, now you've made me compare the two :). I'm happy to see we both did the 
same thing, only you call your buffer 'current' while I call it 'buf'. Besides 
that I've included a static final EMPTY_ARGS instead of all the places where 
'new Object[0]' is passed. Nothing too fancy.

Another thing is that I wrote an Arabic and Hebrew stemmer, and combined them 
w/ the Snowball ones by introducing a stemmer class which can be either 
Snowball or anything else. I'll check if we're allowed to contribute the Hebrew 
stemmer to Lucene ...

BTW FYI - our legal department forbid us to use the Hungarian stemmer because 
of licensing/legal issues. Besides the stemmers that were originally provided, 
the Snowball project accepted additional ones like the Hungarian stemmer. 
However, for that one we weren't able to get a confirmation from the 
contributor his University indeed gave him permission to contribute the code. 
Don't know if it matters to anyone here (I've notified Martin Porter as well), 
but FYI. Our legal department does not permit us to use it (which is not 
surprising - they are legal ...). I don't want to derail this issue into 
Snowball discussion, so if you want to talk about it, I suggest we move it to 
the list.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839259#action_12839259
 ] 

Uwe Schindler commented on LUCENE-2285:
---

bq. UnicodeUtil 

I reverted the whole class as it is very sensitive to binary encoding, so 
please leave it as it is. Tell me any @SuppressWarnings parameter that makes 
eclipse happy, I will add it!

bq. TestCharacterUtils

I missed that this test is junit4, in junit3 the casts are necessary. But if 
you bug me longer with these casts I will give the issue to somebody else :-)

bq. AnalyzingQueyrParser

I simply reverted everything with QueryParser in it, because it is normally 
generated code. :-) As I said before, let us first commit this patch. But i 
will no longer discuss about casts :-)

Thanks for reviewing the patch, its a good help to get code cleaner!

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



snowball discussion on LUCENE-2285

2010-02-27 Thread Robert Muir
i wanted to continue this here to not clog up the issue!

Shai Erera commented on LUCENE-2285:

> bq. I'd be curious to know what you did
>
> Ok, now you've made me compare the two :). I'm happy to see we both did the
> same thing, only you call your buffer 'current' while I call it 'buf'.
> Besides that I've included a static final EMPTY_ARGS instead of all the
> places where 'new Object[0]' is passed. Nothing too fancy.
>

hmm, i didnt think of this second optimization, does it affect generated
code or is it in Among/SnowballProgram?

>
> Another thing is that I wrote an Arabic and Hebrew stemmer, and combined
> them w/ the Snowball ones by introducing a stemmer class which can be either
> Snowball or anything else. I'll check if we're allowed to contribute the
> Hebrew stemmer to Lucene ...
>

please do.  as far as integration goes, i guess we took a different approach
with LUCENE-2055 (from the Analyzer perspective, the user does not care if
it uses snowball or something else behind the scenes, etc).


> BTW FYI - our legal department forbid us to use the Hungarian stemmer
> because of licensing/legal issues. Besides the stemmers that were originally
> provided, the Snowball project accepted additional ones like the Hungarian
> stemmer. However, for that one we weren't able to get a confirmation from
> the contributor his University indeed gave him permission to contribute the
> code. Don't know if it matters to anyone here (I've notified Martin Porter
> as well), but FYI. Our legal department does not permit us to use it (which
> is not surprising - they are legal ...). I don't want to derail this issue
> into Snowball discussion, so if you want to talk about it, I suggest we move
> it to the list.


this is concerning to me, i mean the thing is sitting there on the
universities website: http://ilps.science.uva.nl/resources/snowball-hun :)
but if apache is concerned about this situation too, someone let me know and
i can take savoy's (clearly marked BSD) and we can add that instead, and
remove the ambiguous snowball one, even if its temporary:
http://members.unine.ch/jacques.savoy/clef/index.html



-- 
Robert Muir
rcm...@gmail.com


[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839262#action_12839262
 ] 

Uwe Schindler commented on LUCENE-2285:
---

I will commit the current patch soon and post the remaing diff of Shais 
original patch to the issue.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-27 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2089:
---

Attachment: LUCENE-2089.patch

New rev of gen.py, that uses packed arrays for the states/offsets.

It's much more compact -- Lev1 is now 5KB, Lev2 is 11KB, Lev3 is 160KB.  And 
Lev3 compiles!  (Robert now you need a test case for Lev3 ;) ).  The class 
files are OK too: Lev1 3.9KB, Lev2 is 7.3KB, Lev3 is 102KB.

> explore using automaton for fuzzyquery
> --
>
> Key: LUCENE-2089
> URL: https://issues.apache.org/jira/browse/LUCENE-2089
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: Flex Branch
>Reporter: Robert Muir
>Assignee: Mark Miller
>Priority: Minor
> Fix For: Flex Branch
>
> Attachments: ContrivedFuzzyBenchmark.java, gen.py, gen.py, gen.py, 
> gen.py, gen.py, gen.py, Lev2ParametricDescription.java, 
> Lev2ParametricDescription.java, Lev2ParametricDescription.java, 
> Lev2ParametricDescription.java, LUCENE-2089.patch, LUCENE-2089.patch, 
> LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, 
> LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089_concat.patch, 
> Moman-0.2.1.tar.gz, TestFuzzy.java
>
>
> we can optimize fuzzyquery by using AutomatonTermsEnum. The idea is to speed 
> up the core FuzzyQuery in similar fashion to Wildcard and Regex speedups, 
> maintaining all backwards compatibility.
> The advantages are:
> * we can seek to terms that are useful, instead of brute-forcing the entire 
> terms dict
> * we can determine matches faster, as true/false from a DFA is array lookup, 
> don't even need to run levenshtein.
> We build Levenshtein DFAs in linear time with respect to the length of the 
> word: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652
> To implement support for 'prefix' length, we simply concatenate two DFAs, 
> which doesn't require us to do NFA->DFA conversion, as the prefix portion is 
> a singleton. the concatenation is also constant time with respect to the size 
> of the fuzzy DFA, it only need examine its start state.
> with this algorithm, parametric tables are precomputed so that DFAs can be 
> constructed very quickly.
> if the required number of edits is too large (we don't have a table for it), 
> we use "dumb mode" at first (no seeking, no DFA, just brute force like now).
> As the priority queue fills up during enumeration, the similarity score 
> required to be a competitive term increases, so, the enum gets faster and 
> faster as this happens. This is because terms in core FuzzyQuery are sorted 
> by boost value, then by term (in lexicographic order).
> For a large term dictionary with a low minimal similarity, you will fill the 
> pq very quickly since you will match many terms. 
> This not only provides a mechanism to switch to more efficient DFAs (edit 
> distance of 2 -> edit distance of 1 -> edit distance of 0) during 
> enumeration, but also to switch from "dumb mode" to "smart mode".
> With this design, we can add more DFAs at any time by adding additional 
> tables. The tradeoff is the tables get rather large, so for very high K, we 
> would start to increase the size of Lucene's jar file. The idea is we don't 
> have include large tables for very high K, by using the 'competitive boost' 
> attribute of the priority queue.
> For more information, see http://en.wikipedia.org/wiki/Levenshtein_automaton

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-27 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2089:
---

Attachment: LUCENE-2089.patch

New patch... just fixes a few small things Uwe noticed (moved unpack method & 
MASKS up to super class; use newlines to make the massive tables a bit more 
friendly to look at).

> explore using automaton for fuzzyquery
> --
>
> Key: LUCENE-2089
> URL: https://issues.apache.org/jira/browse/LUCENE-2089
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: Flex Branch
>Reporter: Robert Muir
>Assignee: Mark Miller
>Priority: Minor
> Fix For: Flex Branch
>
> Attachments: ContrivedFuzzyBenchmark.java, gen.py, gen.py, gen.py, 
> gen.py, gen.py, gen.py, Lev2ParametricDescription.java, 
> Lev2ParametricDescription.java, Lev2ParametricDescription.java, 
> Lev2ParametricDescription.java, LUCENE-2089.patch, LUCENE-2089.patch, 
> LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, 
> LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, 
> LUCENE-2089_concat.patch, Moman-0.2.1.tar.gz, TestFuzzy.java
>
>
> we can optimize fuzzyquery by using AutomatonTermsEnum. The idea is to speed 
> up the core FuzzyQuery in similar fashion to Wildcard and Regex speedups, 
> maintaining all backwards compatibility.
> The advantages are:
> * we can seek to terms that are useful, instead of brute-forcing the entire 
> terms dict
> * we can determine matches faster, as true/false from a DFA is array lookup, 
> don't even need to run levenshtein.
> We build Levenshtein DFAs in linear time with respect to the length of the 
> word: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652
> To implement support for 'prefix' length, we simply concatenate two DFAs, 
> which doesn't require us to do NFA->DFA conversion, as the prefix portion is 
> a singleton. the concatenation is also constant time with respect to the size 
> of the fuzzy DFA, it only need examine its start state.
> with this algorithm, parametric tables are precomputed so that DFAs can be 
> constructed very quickly.
> if the required number of edits is too large (we don't have a table for it), 
> we use "dumb mode" at first (no seeking, no DFA, just brute force like now).
> As the priority queue fills up during enumeration, the similarity score 
> required to be a competitive term increases, so, the enum gets faster and 
> faster as this happens. This is because terms in core FuzzyQuery are sorted 
> by boost value, then by term (in lexicographic order).
> For a large term dictionary with a low minimal similarity, you will fill the 
> pq very quickly since you will match many terms. 
> This not only provides a mechanism to switch to more efficient DFAs (edit 
> distance of 2 -> edit distance of 1 -> edit distance of 0) during 
> enumeration, but also to switch from "dumb mode" to "smart mode".
> With this design, we can add more DFAs at any time by adding additional 
> tables. The tradeoff is the tables get rather large, so for very high K, we 
> would start to increase the size of Lucene's jar file. The idea is we don't 
> have include large tables for very high K, by using the 'competitive boost' 
> attribute of the priority queue.
> For more information, see http://en.wikipedia.org/wiki/Levenshtein_automaton

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: snowball discussion on LUCENE-2285

2010-02-27 Thread Shai Erera
Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch
the generated code, besides handling calling deprecated API.

We've actually taken the same approach I think :). In my Analyzer, the user
passes a Locale to create the proper Analyzer. The analyzer comes
pre-configured w/ all bunch of filters, like those that handle email tokens
produced by the tokenizer (or hosts, acronyms and more), character
normalization, ngram/stemmer filters etc. The StemmerFilter creates the
proper stemmer based on the language code, and for that I created a
SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball
ones. The wrapper is only needed for the stemmer filter instance ...

I have on my TODO checking contrib/analyzers. Unfortunately our legal
department is very suspicious of everything (guess they wouldn't make good
legat folks otherwise ;)). If I'll want to use the contrib/analyzers,
they'll need to scan the code and identify the owners of the various
analyzers ... That's what's on my TODO - going through the process w/ them
:).

I personally think that the work you're doing on the analyzers is
extraordinary, and since I don't have much time maintaining my own package,
it has fallen a bit behind in terms of Unicode differences and such. I've
come to appreciate the power of open source long ago - for me it'd be best
to join forces on this analysis package. I'm sure that will happen one day
:).

About the Hungarian stemmer - Martin Porter told us that the original (12?)
stemmers were written by him and so there's no IP issues. The rest were
contributed by other people. All but the Hun contributor responded w/ their
rights to contribute the code. It's just the Hun that never responded, even
though we've sent a couple of emails. That is problematic. When someone
contributes code to Lucene, he grants the ASF license (forgot the wording
that's used). That's very reassuring to lawyers, because it doesn't leave
them too exposed. But there isn't any similar process in Snowball ... I can
look up the correspondence we've had with Martin Porter to refresh my memory
on the detailds.
Shai
On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir  wrote:

> i wanted to continue this here to not clog up the issue!
>
> Shai Erera commented on LUCENE-2285:
>
>> bq. I'd be curious to know what you did
>>
>> Ok, now you've made me compare the two :). I'm happy to see we both did
>> the same thing, only you call your buffer 'current' while I call it 'buf'.
>> Besides that I've included a static final EMPTY_ARGS instead of all the
>> places where 'new Object[0]' is passed. Nothing too fancy.
>>
>
> hmm, i didnt think of this second optimization, does it affect generated
> code or is it in Among/SnowballProgram?
>
>>
>> Another thing is that I wrote an Arabic and Hebrew stemmer, and combined
>> them w/ the Snowball ones by introducing a stemmer class which can be either
>> Snowball or anything else. I'll check if we're allowed to contribute the
>> Hebrew stemmer to Lucene ...
>>
>
> please do.  as far as integration goes, i guess we took a different
> approach with LUCENE-2055 (from the Analyzer perspective, the user does not
> care if it uses snowball or something else behind the scenes, etc).
>
>
>> BTW FYI - our legal department forbid us to use the Hungarian stemmer
>> because of licensing/legal issues. Besides the stemmers that were originally
>> provided, the Snowball project accepted additional ones like the Hungarian
>> stemmer. However, for that one we weren't able to get a confirmation from
>> the contributor his University indeed gave him permission to contribute the
>> code. Don't know if it matters to anyone here (I've notified Martin Porter
>> as well), but FYI. Our legal department does not permit us to use it (which
>> is not surprising - they are legal ...). I don't want to derail this issue
>> into Snowball discussion, so if you want to talk about it, I suggest we move
>> it to the list.
>
>
> this is concerning to me, i mean the thing is sitting there on the
> universities website: http://ilps.science.uva.nl/resources/snowball-hun :)
> but if apache is concerned about this situation too, someone let me know
> and i can take savoy's (clearly marked BSD) and we can add that instead, and
> remove the ambiguous snowball one, even if its temporary:
> http://members.unine.ch/jacques.savoy/clef/index.html
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>


Re: snowball discussion on LUCENE-2285

2010-02-27 Thread Robert Muir
Can you open an issue for the new object[]?  its sad about the hungarian
issue.  I'm inclined to think we should add savoy's and default to it
instead.  I don't see this as code duplication, as its a different alg.
Normally just don't spend a lot of effort towards adding alternative
stemmers, but here it makes sense.

It sounds really exciting if you are able to merge in what you have done in
the future!

On Feb 27, 2010 1:16 PM, "Shai Erera"  wrote:

Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch
the generated code, besides handling calling deprecated API.

We've actually taken the same approach I think :). In my Analyzer, the user
passes a Locale to create the proper Analyzer. The analyzer comes
pre-configured w/ all bunch of filters, like those that handle email tokens
produced by the tokenizer (or hosts, acronyms and more), character
normalization, ngram/stemmer filters etc. The StemmerFilter creates the
proper stemmer based on the language code, and for that I created a
SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball
ones. The wrapper is only needed for the stemmer filter instance ...

I have on my TODO checking contrib/analyzers. Unfortunately our legal
department is very suspicious of everything (guess they wouldn't make good
legat folks otherwise ;)). If I'll want to use the contrib/analyzers,
they'll need to scan the code and identify the owners of the various
analyzers ... That's what's on my TODO - going through the process w/ them
:).

I personally think that the work you're doing on the analyzers is
extraordinary, and since I don't have much time maintaining my own package,
it has fallen a bit behind in terms of Unicode differences and such. I've
come to appreciate the power of open source long ago - for me it'd be best
to join forces on this analysis package. I'm sure that will happen one day
:).

About the Hungarian stemmer - Martin Porter told us that the original (12?)
stemmers were written by him and so there's no IP issues. The rest were
contributed by other people. All but the Hun contributor responded w/ their
rights to contribute the code. It's just the Hun that never responded, even
though we've sent a couple of emails. That is problematic. When someone
contributes code to Lucene, he grants the ASF license (forgot the wording
that's used). That's very reassuring to lawyers, because it doesn't leave
them too exposed. But there isn't any similar process in Snowball ... I can
look up the correspondence we've had with Martin Porter to refresh my memory
on the detailds.
 Shai

On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir  wrote:
>
> i wanted to continue this...


[jira] Resolved: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2285.
---

Resolution: Fixed

Committed revision: 917019

I will attach the open changes as separate patch. Please reopen, if new changes.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2285:
--

Attachment: LUCENE-2285-remaining+generated.patch
LUCENE-2285-remaining.patch

Here for reference the remaining changes of Shai that I rejected (mostly casts, 
which in my opinion should stay). Important: The UnicodeUtils for example are 
not tested in a way, that it is really sure, that missing casts do not change 
the code at all. In my opinion, this should stay like it is.

The smaller one, LUCENE-2285-remaining.patch, contains the real changes. The 
bigger one LUCENE-2285-remaining+generated.patch also changes in generated code 
(just for reference).

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285-remaining+generated.patch, 
> LUCENE-2285-remaining.patch, LUCENE-2285.patch, LUCENE-2285.patch, 
> LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839307#action_12839307
 ] 

Robert Muir commented on LUCENE-2089:
-

Mike, this is awesome. I ran benchmarks: we are just as fast as before (with 
only Lev1 and Lev2 enabled), but with smaller generated code.
When i turn on Lev3, it speeds up the worst-case ones (no prefix, pq=1024, 
fuzzy of n=3, n=4), but slows down some of the "better-case" n=3/n=4 cases 
where there is a prefix or PQ.

I think this is because the benchmark is contrived, but realistically n=3 (with 
seeking!) should be a win for users. A less-contrived benchmark (a 'typical' 
massive term dictionary) would help for tuning.

separately, I think we can add heuristics: e.g. for n > 3 WITH a prefix, use 
the DFA in "linear mode" until you drop to n=2, as you already have a nice 
prefix anyway, stuff like that. But if the user doesn't supply a prefix, i 
think seeking is always a win.

Here are the results anyway: I ran it many times and its consistent (obviously 
differences of just a few MS are not significant). I bolded the ones i think 
illustrate the differences I am talking about.

Its cool to be at the point where we are actually able to measure these kinds 
of tradeoffs!

{{Minimum Sim = 0.73f (edit distance of 1)}} 
||Prefix Length||PQ Size||Avg MS (flex trunk)||Avg MS (1,2)||Avg MS (1,2,3)||
|0|1024|3286.0|7.8|7.6
|0|64|3320.4|7.6|8.0
|1|1024|316.8|5.6|5.3
|1|64|314.3|5.6|5.2
|2|1024|31.8|3.8|4.2
|2|64|31.9|3.7|4.5

{{Minimum Sim = 0.58f (edit distance of 2)}}
||Prefix Length||PQ Size||Avg MS (flex trunk)||Avg MS (1,2)||Avg MS (1,2,3)||
|0|1024|4223.3|87.7|91.2
|0|64|4199.7|12.6|13.2
|1|1024|430.1|56.4|62.0
|1|64|392.8|9.3|8.5
|2|1024|82.5|45.5|48.0
|2|64|38.4|6.2|6.3


{{Minimum Sim = 0.43f (edit distance of 3)}}
||Prefix Length||PQ Size||Avg MS (flex trunk)||Avg MS (1,2)||Avg MS (1,2,3)||
|0|1024|5299.9|424.0|*199.8*
|0|64|5231.8|54.1|*93.2*
|1|1024|522.9|103.6|107.9
|1|64|480.9|14.5|*49.3*
|2|1024|89.0|67.9|70.8
|2|64|46.3|6.8|*19.7*


{{Minimum Sim = 0.29f (edit distance of 4)}}
||Prefix Length||PQ Size||Avg MS (flex trunk)||Avg MS (1,2)||Avg MS (1,2,3)||
|0|1024|6258.1|363.7|*206.5*
|0|64|6247.6|75.6|78.8
|1|1024|609.9|108.3|110.0
|1|64|567.1|13.3|*45.5*
|2|1024|98.6|66.6|73.8
|2|64|55.6|6.8|*22.3*


> explore using automaton for fuzzyquery
> --
>
> Key: LUCENE-2089
> URL: https://issues.apache.org/jira/browse/LUCENE-2089
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: Flex Branch
>Reporter: Robert Muir
>Assignee: Mark Miller
>Priority: Minor
> Fix For: Flex Branch
>
> Attachments: ContrivedFuzzyBenchmark.java, gen.py, gen.py, gen.py, 
> gen.py, gen.py, gen.py, Lev2ParametricDescription.java, 
> Lev2ParametricDescription.java, Lev2ParametricDescription.java, 
> Lev2ParametricDescription.java, LUCENE-2089.patch, LUCENE-2089.patch, 
> LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, 
> LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, 
> LUCENE-2089_concat.patch, Moman-0.2.1.tar.gz, TestFuzzy.java
>
>
> we can optimize fuzzyquery by using AutomatonTermsEnum. The idea is to speed 
> up the core FuzzyQuery in similar fashion to Wildcard and Regex speedups, 
> maintaining all backwards compatibility.
> The advantages are:
> * we can seek to terms that are useful, instead of brute-forcing the entire 
> terms dict
> * we can determine matches faster, as true/false from a DFA is array lookup, 
> don't even need to run levenshtein.
> We build Levenshtein DFAs in linear time with respect to the length of the 
> word: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652
> To implement support for 'prefix' length, we simply concatenate two DFAs, 
> which doesn't require us to do NFA->DFA conversion, as the prefix portion is 
> a singleton. the concatenation is also constant time with respect to the size 
> of the fuzzy DFA, it only need examine its start state.
> with this algorithm, parametric tables are precomputed so that DFAs can be 
> constructed very quickly.
> if the required number of edits is too large (we don't have a table for it), 
> we use "dumb mode" at first (no seeking, no DFA, just brute force like now).
> As the priority queue fills up during enumeration, the similarity score 
> required to be a competitive term increases, so, the enum gets faster and 
> faster as this happens. This is because terms in core FuzzyQuery are sorted 
> by boost value, then by term (in lexicographic order).
> For a large term dictionary with a low minimal similarity, you will fill the 
> pq very quickly since you will match many terms. 
> This not only provides a mechanism to switch to more efficient DFAs (edit 
> distance

[jira] Created: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]

2010-02-27 Thread Shai Erera (JIRA)
Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new 
Object[0]
--

 Key: LUCENE-2288
 URL: https://issues.apache.org/jira/browse/LUCENE-2288
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Reporter: Shai Erera
 Fix For: 3.1
 Attachments: LUCENE--2288.patch

Instead of allocating new Object[0] create a proper constant in 
SnowballProgram. The same (for new Class[0]) is created in Among, although it's 
less critical because Among is called from static initializers ... Patch will 
follow shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]

2010-02-27 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2288:
---

Attachment: LUCENE--2288.patch

Patch w/ the trivial change.

> Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new 
> Object[0]
> --
>
> Key: LUCENE-2288
> URL: https://issues.apache.org/jira/browse/LUCENE-2288
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Reporter: Shai Erera
> Fix For: 3.1
>
> Attachments: LUCENE--2288.patch
>
>
> Instead of allocating new Object[0] create a proper constant in 
> SnowballProgram. The same (for new Class[0]) is created in Among, although 
> it's less critical because Among is called from static initializers ... Patch 
> will follow shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: snowball discussion on LUCENE-2285

2010-02-27 Thread Shai Erera
I created LUCENE-2288 for handling the Object[] thingy in SnowballProgram
(and Class[] in Among).

Shai

On Sat, Feb 27, 2010 at 8:48 PM, Robert Muir  wrote:

> Can you open an issue for the new object[]?  its sad about the hungarian
> issue.  I'm inclined to think we should add savoy's and default to it
> instead.  I don't see this as code duplication, as its a different alg.
> Normally just don't spend a lot of effort towards adding alternative
> stemmers, but here it makes sense.
>
> It sounds really exciting if you are able to merge in what you have done in
> the future!
>
> On Feb 27, 2010 1:16 PM, "Shai Erera"  wrote:
>
> Hi Robert, the EMPTY_ARGS stuff is just in SnowballProgram. I didn't touch
> the generated code, besides handling calling deprecated API.
>
> We've actually taken the same approach I think :). In my Analyzer, the user
> passes a Locale to create the proper Analyzer. The analyzer comes
> pre-configured w/ all bunch of filters, like those that handle email tokens
> produced by the tokenizer (or hosts, acronyms and more), character
> normalization, ngram/stemmer filters etc. The StemmerFilter creates the
> proper stemmer based on the language code, and for that I created a
> SnowballWrapper - that allows me to instantiate Arabic/Hebrew or Snowball
> ones. The wrapper is only needed for the stemmer filter instance ...
>
> I have on my TODO checking contrib/analyzers. Unfortunately our legal
> department is very suspicious of everything (guess they wouldn't make good
> legat folks otherwise ;)). If I'll want to use the contrib/analyzers,
> they'll need to scan the code and identify the owners of the various
> analyzers ... That's what's on my TODO - going through the process w/ them
> :).
>
> I personally think that the work you're doing on the analyzers is
> extraordinary, and since I don't have much time maintaining my own package,
> it has fallen a bit behind in terms of Unicode differences and such. I've
> come to appreciate the power of open source long ago - for me it'd be best
> to join forces on this analysis package. I'm sure that will happen one day
> :).
>
> About the Hungarian stemmer - Martin Porter told us that the original (12?)
> stemmers were written by him and so there's no IP issues. The rest were
> contributed by other people. All but the Hun contributor responded w/ their
> rights to contribute the code. It's just the Hun that never responded, even
> though we've sent a couple of emails. That is problematic. When someone
> contributes code to Lucene, he grants the ASF license (forgot the wording
> that's used). That's very reassuring to lawyers, because it doesn't leave
> them too exposed. But there isn't any similar process in Snowball ... I can
> look up the correspondence we've had with Martin Porter to refresh my memory
> on the detailds.
>  Shai
>
> On Sat, Feb 27, 2010 at 5:35 PM, Robert Muir  wrote:
> >
> > i wanted to continue this...
>
>


[jira] Commented: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839311#action_12839311
 ] 

Shai Erera commented on LUCENE-2288:


Forgot to mention all analysis tests pass.

> Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new 
> Object[0]
> --
>
> Key: LUCENE-2288
> URL: https://issues.apache.org/jira/browse/LUCENE-2288
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Reporter: Shai Erera
> Fix For: 3.1
>
> Attachments: LUCENE--2288.patch
>
>
> Instead of allocating new Object[0] create a proper constant in 
> SnowballProgram. The same (for new Class[0]) is created in Among, although 
> it's less critical because Among is called from static initializers ... Patch 
> will follow shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2289) Calls to SegmentInfos.message should be wrapped w/ infoStream != null checks

2010-02-27 Thread Shai Erera (JIRA)
Calls to SegmentInfos.message should be wrapped w/ infoStream != null checks


 Key: LUCENE-2289
 URL: https://issues.apache.org/jira/browse/LUCENE-2289
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
 Fix For: 3.1


To avoid the expensive message creation (which involves the '+' operator on 
strings, calls to message should be wrapped w/ infoStream != null check, rather 
than inside message(). I'll attach a patch which does that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2289) Calls to SegmentInfos.message should be wrapped w/ infoStream != null checks

2010-02-27 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2289:
---

Attachment: LUCENE--2289.patch

Patch w/ the proposed changes.

> Calls to SegmentInfos.message should be wrapped w/ infoStream != null checks
> 
>
> Key: LUCENE-2289
> URL: https://issues.apache.org/jira/browse/LUCENE-2289
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
> Fix For: 3.1
>
> Attachments: LUCENE--2289.patch
>
>
> To avoid the expensive message creation (which involves the '+' operator on 
> strings, calls to message should be wrapped w/ infoStream != null check, 
> rather than inside message(). I'll attach a patch which does that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]

2010-02-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-2288:
---

Assignee: Robert Muir

> Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new 
> Object[0]
> --
>
> Key: LUCENE-2288
> URL: https://issues.apache.org/jira/browse/LUCENE-2288
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Reporter: Shai Erera
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE--2288.patch
>
>
> Instead of allocating new Object[0] create a proper constant in 
> SnowballProgram. The same (for new Class[0]) is created in Among, although 
> it's less critical because Among is called from static initializers ... Patch 
> will follow shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]

2010-02-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839315#action_12839315
 ] 

Robert Muir commented on LUCENE-2288:
-

thanks Shai, the patch looks good to me, though i hope it only affects the 
Lovins stemmer (or in case someone has written their own Snowball stemmer), as 
the others should not be using this reflection!

will commit in a few days unless someone objects.

> Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new 
> Object[0]
> --
>
> Key: LUCENE-2288
> URL: https://issues.apache.org/jira/browse/LUCENE-2288
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Reporter: Shai Erera
> Fix For: 3.1
>
> Attachments: LUCENE--2288.patch
>
>
> Instead of allocating new Object[0] create a proper constant in 
> SnowballProgram. The same (for new Class[0]) is created in Among, although 
> it's less critical because Among is called from static initializers ... Patch 
> will follow shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839362#action_12839362
 ] 

Shai Erera commented on LUCENE-2285:


Thanks Uwe for committing this. I think that further discussion is pointless if 
you feel that I *bug* you, and you "will no longer discuss about casts" ... 
Kind of kills any chance of having a serious and 'open' discussion. I can live 
with the code as it is now ...

If someone else feels otherwise, then I don't mind to continue discuss this.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285-remaining+generated.patch, 
> LUCENE-2285-remaining.patch, LUCENE-2285.patch, LUCENE-2285.patch, 
> LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2288) Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new Object[0]

2010-02-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839364#action_12839364
 ] 

Shai Erera commented on LUCENE-2288:


Thanks Robert. I never checked if those methods are actually code, as I didn't 
do it to earn any CPU cycles back. I just followed good coding practice, and 
since it appeared in two places, thought that a constant will look like a bit 
less waste. If you're sure those are not called by the other stemmers (and I'm 
sure you are :)), then I'm fine if you leave those out as well ;)

> Create EMPTY_ARGS constsant in SnowballProgram instead of allocating new 
> Object[0]
> --
>
> Key: LUCENE-2288
> URL: https://issues.apache.org/jira/browse/LUCENE-2288
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Reporter: Shai Erera
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE--2288.patch
>
>
> Instead of allocating new Object[0] create a proper constant in 
> SnowballProgram. The same (for new Class[0]) is created in Among, although 
> it's less critical because Among is called from static initializers ... Patch 
> will follow shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Adding .classpath.tmpl

2010-02-27 Thread Shai Erera
I uploaded the file to
http://wiki.apache.org/lucene-java/HowToContribute(bottom of the
page). But I don't see any good spot to stuff it in the
README. There is no pointer to the HowToContribute page at all, nor to the
code formatting styles ... what do you think - create such section at the
bottom of README, or leave it out?

On Fri, Feb 26, 2010 at 2:58 PM, Shai Erera  wrote:

> Thanks for your response. I will update the Wiki with the file. After I do
> that, I'll add some text to the README file. I'll need one of you to help me
> commit it though.
>
> Thanks again,
> Shai
>
>
> On Thu, Feb 25, 2010 at 6:21 PM, Mark Miller wrote:
>
>> +1 - I'd prefer this stay out of svn as well - I'd rather it go on the
>> wiki too - perhaps in the same place that you can find the formatting file
>> for eclipse and intellij.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> On 02/25/2010 11:10 AM, Grant Ingersoll wrote:
>>
>>> To me, this is stuff that can go on the wiki or somewhere else, otherwise
>>> over time, there will be others to add in, etc.  We could simply add a
>>> pointer to the wiki page in the README.
>>>
>>> On Feb 24, 2010, at 11:55 PM, Shai Erera wrote:
>>>
>>>
>>>
 Hi

 I always find it annoying when I checkout the code to a new project in
 eclipse, that I need to put everything that I care about in the classpath
 and adding the dependent libraries. On another project I'm involved with, 
 we
 did that process once, adding all the source code to the classpath and the
 libraries and created a .classpath.tmpl. Now when people checkout the code,
 they can copy the content of that file to their .classpath file and setting
 up the project is reducing from a couple of minutes to few seconds.

 I don't want to check-in .classpath because not everyone wants all the
 code in their classpath.

 I attached such file to the mail. Note that the only dependency which
 will break on other machines is the ant.jar dependency, which on my Windows
 is located under c:\ant. That jar is required to compile contrib/ant from
 eclipse. Not sure how to resolve that, except besides removing that line
 from the file and document separately that that's what you need to do if 
 you
 want to add contrib/ant ...

 The file is sorted by name, putting the core stuff at the top - so it's
 easy for people to selectively add the interesting packages.

 I don't know if an issue is required, if so I can create it in and move
 the discussion there.

 Shai
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org


>>>
>>>
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>>
>>>
>>>
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>


Turning IndexReader.isDeleted implementations to final

2010-02-27 Thread Shai Erera
Hi

Do you think it's worth to make some of the isDeleted method impls final,
like in ReadOnlySegmentReader and (maybe) DirectoryReader? I'm thinking the
classes that are perceived as final could benefit from that, since their
impl could be inlined. Maybe just make these classes final entirely?

Shai