date:20100225

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838243#action_12838243
 ] 

Uwe Schindler commented on LUCENE-2285:
---

I forgot: Lucene should compile with javac without warnings - and this is 
working.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838240#action_12838240
 ] 

Uwe Schindler commented on LUCENE-2285:
---

bq. I don't share the same feeling ... I think it's a strong capability - write 
a method which doesn't need to start w/ testXYZ just to be run by JUnit (though 
I do both for clarity). I think moving to JUnit 4 only simplifies things, as it 
allows testing classes w/o the need to extend TestCase. But I'm not going to 
argue about it here, I'd like to keep this issue contained, and short. So I 
won't touch the LuceneTestCase deprecation, as it's still controversial judging 
by what you say. 

This was discussed lots of times in JIRA and frenode IRC (#lucene). The 
important thing is: We want all testcases in lucene to be extended from one 
class (if Junit4 it's LuceneTestcaseJ4), because we have some additional checks 
that should be run before/after each test: FieldCache checkups, merge scheduler 
tests, reproducing random test seeds,...

bq. I'll remove those SuppressWarnings then?

Yes, I remove them only whenever i see them.

At the moment, such code cleanup  and if they also affect whitespace cleanup, 
have the problem that they make merging with the new flex branch (flexible 
indexing) harder, so the most important thing is to not simply reformat all the 
code. When adding generics, please use the above eclipse codesyste, as we don't 
want to have whitespace after commas inside generics.

bq. About generics, there are the internal parts of the code, like using List, 
ArrayList etc. Scanning quickly through the list, it looks like most of the 
Lucene related warnings are about referencing them ... so it should be also 
easy to fix.

Last time I opened Lucene inside Eclipse only produced one type of warning: 
"Use of raw type" - but this warning only affected instanceof checks. This is 
stupid and a bug in Eclipse. Instanceof checks can never be generic, as the 
tests is useless at runtime. So a check for "instance of ArrayList" without 
generics is perfectly fine. Adding generics make you feel that there is some 
compiler check or runtime checks, both of them are not done. Instanceof is 
runtime-only and can never check any generics. So I am against those checks! 
But if you find some unneeded casts and missing generics in general, +1.

The instanceof warnings can somehow be switched off in Eclipse, dont know how. 
For coding i dont use IDEs (only for automatic refactoring).

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-02-25 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2074:
--

Attachment: LUCENE-2074.patch

Here updated patch, svn copy/move before apply as mentioned above.

> Use a separate JFlex generated Unicode 4 by Java 5 compatible 
> StandardTokenizer
> ---
>
> Key: LUCENE-2074
> URL: https://issues.apache.org/jira/browse/LUCENE-2074
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.1
>
> Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
> LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
> LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
> LUCENE-2074.patch
>
>
> The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
> (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
> regenerate the file.
> After regeneration the Tokenizer behaves different for some characters. 
> Because of that we should only use the new TokenizerImpl when 
> Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838266#action_12838266
 ] 

Shai Erera commented on LUCENE-2285:


How about if I undeprecate LuceneTestCase for now, and if there will be a 
decision to remove it, then we'll remove it? It will eliminate lots (!!!) of 
deprecation warnings.

About the generics - any reason why not have a space after commas inside 
generics?

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838271#action_12838271
 ] 

Uwe Schindler commented on LUCENE-2285:
---

See the mailing list. And it is done like that everywhere in current luecen's 
source tree. And you have no problem, just use the codestyle xml file for the 
project and you are done.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838277#action_12838277
 ] 

Shai Erera commented on LUCENE-2285:


bq. See the mailing list.

What does it mean? What should I look for? Is it related to undeprecate 
LuceneTestCase? If so, can you give me the short answer - yes/no?

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838279#action_12838279
 ] 

Uwe Schindler commented on LUCENE-2285:
---

bq. What does it mean?

Sorry, I chnaged my comment above.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838271#action_12838271
 ] 

Uwe Schindler edited comment on LUCENE-2285 at 2/25/10 10:00 AM:
-

bq. About the generics - any reason why not have a space after commas inside 
generics?

See the mailing list. And it is done like that everywhere in current luecen's 
source tree. And you have no problem, just use the codestyle xml file for the 
project and you are done.

[http://www.lucidimagination.com/search/document/62fe00098351dbe3/whitespace_inside_generics_parameters]

  was (Author: thetaphi):
See the mailing list. And it is done like that everywhere in current 
luecen's source tree. And you have no problem, just use the codestyle xml file 
for the project and you are done.
  
> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Field level document update

2010-02-25 Thread Anshum

Hi,
I'd like to know do we have something for field level document updation
planned for the near future? Something that would not require the
document to be readded in case of any modification.

-- 
Anshum 
--
You tread upon my patience.
-- William Shakespeare, "Henry IV"

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838295#action_12838295
 ] 

Shai Erera commented on LUCENE-2285:


Ok I'm fine w/ the formatting. What about un-deprecating LuceneTestCase for now?

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838297#action_12838297
 ] 

Uwe Schindler commented on LUCENE-2285:
---

bq. Ok I'm fine w/ the formatting. What about un-deprecating LuceneTestCase for 
now? 

+1, I wanted to do this already, but forgot about it.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Field level document update

2010-02-25 Thread Michael McCandless

Possible approaches have been discussed on the list, fairly recently,
but I don't think there's active work against it...

Mike

On Thu, Feb 25, 2010 at 5:24 AM, Anshum  wrote:
> Hi,
> I'd like to know do we have something for field level document updation
> planned for the near future? Something that would not require the
> document to be readded in case of any modification.
>
> --
> Anshum
> --
> You tread upon my patience.
>                -- William Shakespeare, "Henry IV"
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838319#action_12838319
 ] 

Shai Erera commented on LUCENE-2285:


Uwe, what should I do w/ Version.LUCENE_CURRENT? It is deprecated, however lots 
of tests are using it. Do they need to reference LUCENE_31? What will happen in 
future versions? Every release we'll change all the tests? I remember a 
discussion about this, just trying to figure out how to change the tests now.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Stored fields access

2010-02-25 Thread Earwin Burrfoot

I'm thinking, should Lucene introduce new interface to read stored
document fields?

Current 'Document document(int n)' mechanism is barely usable due to
overhead involved. While I believe underlying index structure works
pretty fast (if it fits in memory, as is the case for most
performance-concerned installations), there's no adequate access to it
and people are forced to introduce contraptions like LinkedIn's
payload-assisted luceneId<->appId mapping or similar caches we employ.

What I am thinking about is something along the lines of existing
iterators like TermDocs/TermPositions. Iterate over docs, then iterate
over fields stored for each, extract data, ???, profit.
Comments?

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838333#action_12838333
 ] 

Uwe Schindler commented on LUCENE-2285:
---

There is an issue open about that, ignore it please, Simon Willnauer will 
repair that. You have to use LuceneTestCase.TEST_VERSION_CURRENT.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838333#action_12838333
 ] 

Uwe Schindler edited comment on LUCENE-2285 at 2/25/10 1:05 PM:


There is an issue open about that, ignore it please, Simon Willnauer will 
repair that. You have to use LuceneTestCase.TEST_VERSION_CURRENT, as all tests 
extends one of LuceneTestCase/-J4, it is simple to refactor using "sed".

  was (Author: thetaphi):
There is an issue open about that, ignore it please, Simon Willnauer will 
repair that. You have to use LuceneTestCase.TEST_VERSION_CURRENT.
  
> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838335#action_12838335
 ] 

Uwe Schindler commented on LUCENE-2285:
---

By the way, there are lots of tests, that are explicitely testing deprecated 
APIs, so warnings are fine. We know about compile warnings there, but do not 
change them to work differnt (how should we?) - we can only add 
@SuppressWarning("deprecated") to them. I would suggest, that you ignore tests 
and only look at the main class files.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838349#action_12838349
 ] 

Shai Erera commented on LUCENE-2285:


Thanks for the concerns Uwe, I've noticed the tests that test deprecated code 
and I know better than to try to fix them ... uploading the patch now

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2286) enable DefaultSimilarity.setDiscountOverlaps by default

2010-02-25 Thread Robert Muir (JIRA)

enable DefaultSimilarity.setDiscountOverlaps by default
---

 Key: LUCENE-2286
 URL: https://issues.apache.org/jira/browse/LUCENE-2286
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Robert Muir


I think we should enable setDiscountOverlaps in DefaultSimilarity by default.

If you are using synonyms or commongrams or a number of other 
0-posInc-term-injecting methods, these currently screw up your length 
normalization.
These terms have a position increment of zero, so they shouldnt count towards 
the length of the document.

I've done relevance tests with persian showing the difference is significant, 
and i think its a big trap to anyone using synonyms, etc: your relevance can 
actually get worse if you don't flip this boolean flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2286) enable DefaultSimilarity.setDiscountOverlaps by default

2010-02-25 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2286:


Attachment: LUCENE-2286.patch

attached is a patch, with backwards-break in CHANGES.

> enable DefaultSimilarity.setDiscountOverlaps by default
> ---
>
> Key: LUCENE-2286
> URL: https://issues.apache.org/jira/browse/LUCENE-2286
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
> Attachments: LUCENE-2286.patch
>
>
> I think we should enable setDiscountOverlaps in DefaultSimilarity by default.
> If you are using synonyms or commongrams or a number of other 
> 0-posInc-term-injecting methods, these currently screw up your length 
> normalization.
> These terms have a position increment of zero, so they shouldnt count towards 
> the length of the document.
> I've done relevance tests with persian showing the difference is significant, 
> and i think its a big trap to anyone using synonyms, etc: your relevance can 
> actually get worse if you don't flip this boolean flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2286) enable DefaultSimilarity.setDiscountOverlaps by default

2010-02-25 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838359#action_12838359
 ] 

Michael McCandless commented on LUCENE-2286:


+1

> enable DefaultSimilarity.setDiscountOverlaps by default
> ---
>
> Key: LUCENE-2286
> URL: https://issues.apache.org/jira/browse/LUCENE-2286
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
> Attachments: LUCENE-2286.patch
>
>
> I think we should enable setDiscountOverlaps in DefaultSimilarity by default.
> If you are using synonyms or commongrams or a number of other 
> 0-posInc-term-injecting methods, these currently screw up your length 
> normalization.
> These terms have a position increment of zero, so they shouldnt count towards 
> the length of the document.
> I've done relevance tests with persian showing the difference is significant, 
> and i think its a big trap to anyone using synonyms, etc: your relevance can 
> actually get worse if you don't flip this boolean flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2285:
---

Attachment: LUCENE-2285.patch

Quite a large patch. I've started off with 3832 compiler warnings based on my 
eclipse settings and we're now down to 510. All tests pass, including core, 
contrib and tag. I've also fixed a bunch of javadocs warnings, and "ant 
javadocs" now passes cleanly. I did not do any formatting to the code, in order 
to preserve the patch as clear and focused as possible, even though it's a very 
large one ...

It touches a lot of files. So the sooner someone can help me commit it the 
better (before these files change).

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838375#action_12838375
 ] 

Uwe Schindler commented on LUCENE-2285:
---

I will look into it, some of the changes are problematic because they appear in 
generated classes (like QueryParser), so i will leave that out. Also 
@SuppressWarnings("unused") is not a javac annotation. Thanks for fixing the 
javadoc warnings and cleaning up some import statements. As far as I see, you 
did duplicate work and replaced all LUCENE_CURRENT constants in tests, so I may 
close the other bug report when committing this, too.

This may take some time :-)

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-2285:
-

Assignee: Uwe Schindler

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838384#action_12838384
 ] 

Uwe Schindler commented on LUCENE-2285:
---

Hi Shai,

I applied the patch to my checkout, so it will not get out-of date. As 
mentioned before, I have to review each change, as on my first diagonal 
look-around I found a removed cast in TestCharArraySet/Map that is important to 
call the right method, without the cast the test would pass, but the affected 
method is never called. I am also not want to remove some casts in NumericRange 
and other parts, where the casts were added for more clearness in code. 
Especially at some places without the cast it is not clear what javac will do, 
so the cast is for more "security" even if not needed.

So please excuse by complaints, but two people looking over such a large patch 
is really needed.

Thanks for the work! Uwe

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2283) Possible Memory Leak in StoredFieldsWriter

2010-02-25 Thread Tim Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-2283:
--

Attachment: LUCENE-2283.patch

Here's a patch for using a pool for stored fields buffers


> Possible Memory Leak in StoredFieldsWriter
> --
>
> Key: LUCENE-2283
> URL: https://issues.apache.org/jira/browse/LUCENE-2283
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 3.1
>
> Attachments: LUCENE-2283.patch
>
>
> StoredFieldsWriter creates a pool of PerDoc instances
> this pool will grow but never be reclaimed by any mechanism
> furthermore, each PerDoc instance contains a RAMFile.
> this RAMFile will also never be truncated (and will only ever grow) (as far 
> as i can tell)
> When feeding documents with large number of stored fields (or one large 
> dominating stored field) this can result in memory being consumed in the 
> RAMFile but never reclaimed. Eventually, each pooled PerDoc could grow very 
> large, even if large documents are rare.
> Seems like there should be some attempt to reclaim memory from the PerDoc[] 
> instance pool (or otherwise limit the size of RAMFiles that are cached) etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Stored fields access

2010-02-25 Thread Erick Erickson

Does LazyLoading address this? I'm assuming your issue is
that the default behavior loads the entire document regardless
of whether you actually want all the fields.

Erick

On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot  wrote:

> I'm thinking, should Lucene introduce new interface to read stored
> document fields?
>
> Current 'Document document(int n)' mechanism is barely usable due to
> overhead involved. While I believe underlying index structure works
> pretty fast (if it fits in memory, as is the case for most
> performance-concerned installations), there's no adequate access to it
> and people are forced to introduce contraptions like LinkedIn's
> payload-assisted luceneId<->appId mapping or similar caches we employ.
>
> What I am thinking about is something along the lines of existing
> iterators like TermDocs/TermPositions. Iterate over docs, then iterate
> over fields stored for each, extract data, ???, profit.
> Comments?
>
> --
> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Re: Adding .classpath.tmpl

2010-02-25 Thread Grant Ingersoll

To me, this is stuff that can go on the wiki or somewhere else, otherwise over 
time, there will be others to add in, etc.  We could simply add a pointer to 
the wiki page in the README.

On Feb 24, 2010, at 11:55 PM, Shai Erera wrote:

> Hi
> 
> I always find it annoying when I checkout the code to a new project in 
> eclipse, that I need to put everything that I care about in the classpath and 
> adding the dependent libraries. On another project I'm involved with, we did 
> that process once, adding all the source code to the classpath and the 
> libraries and created a .classpath.tmpl. Now when people checkout the code, 
> they can copy the content of that file to their .classpath file and setting 
> up the project is reducing from a couple of minutes to few seconds.
> 
> I don't want to check-in .classpath because not everyone wants all the code 
> in their classpath.
> 
> I attached such file to the mail. Note that the only dependency which will 
> break on other machines is the ant.jar dependency, which on my Windows is 
> located under c:\ant. That jar is required to compile contrib/ant from 
> eclipse. Not sure how to resolve that, except besides removing that line from 
> the file and document separately that that's what you need to do if you want 
> to add contrib/ant ...
> 
> The file is sorted by name, putting the core stuff at the top - so it's easy 
> for people to selectively add the interesting packages.
> 
> I don't know if an issue is required, if so I can create it in and move the 
> discussion there.
> 
> Shai
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Updated: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Erick Erickson

I'm so glad somebody else gets bugged by all the trivial warnings, all along
I thought it was a personal problem ..

As I remember, I deprecated LuceneTestCase entirely to encourage people
to migrate to the Junit4 variant (LuceneTestCaseJ4). So removing those
deprecations should be approached with some caution. Of course this
may have changed in the interim

Erick

On Thu, Feb 25, 2010 at 10:01 AM, Shai Erera (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Shai Erera updated LUCENE-2285:
> ---
>
>Attachment: LUCENE-2285.patch
>
> Quite a large patch. I've started off with 3832 compiler warnings based on
> my eclipse settings and we're now down to 510. All tests pass, including
> core, contrib and tag. I've also fixed a bunch of javadocs warnings, and
> "ant javadocs" now passes cleanly. I did not do any formatting to the code,
> in order to preserve the patch as clear and focused as possible, even though
> it's a very large one ...
>
> It touches a lot of files. So the sooner someone can help me commit it the
> better (before these files change).
>
> > Code cleanup from all sorts of (trivial) warnings
> > -
> >
> > Key: LUCENE-2285
> > URL: https://issues.apache.org/jira/browse/LUCENE-2285
> > Project: Lucene - Java
> >  Issue Type: Improvement
> >Reporter: Shai Erera
> >Priority: Minor
> > Fix For: 3.1
> >
> > Attachments: LUCENE-2285.patch
> >
> >
> > I would like to do some code cleanup and remove all sorts of trivial
> warnings, like unnecessary casts, problems w/ javadocs, unused variables,
> redundant null checks, unnecessary semicolon etc. These are all very trivial
> and should not pose any problem.
> > I'll create another issue for getting rid of deprecated code usage, like
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial
> because it only affects Lucene code, but it's a different type of change.
> > Another issue I'd like to create is about introducing more generics in
> the code, where it's missing today - not changing existing API. There are
> many places in the code like that.
> > So, with you permission, I'll start with the trivial ones first, and then
> move on to the others.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Re: Adding .classpath.tmpl

2010-02-25 Thread Mark Miller

+1 - I'd prefer this stay out of svn as well - I'd rather it go on the 
wiki too - perhaps in the same place that you can find the formatting 
file for eclipse and intellij.


--
- Mark

http://www.lucidimagination.com



On 02/25/2010 11:10 AM, Grant Ingersoll wrote:

To me, this is stuff that can go on the wiki or somewhere else, otherwise over 
time, there will be others to add in, etc.  We could simply add a pointer to 
the wiki page in the README.

On Feb 24, 2010, at 11:55 PM, Shai Erera wrote:

   

Hi

I always find it annoying when I checkout the code to a new project in eclipse, 
that I need to put everything that I care about in the classpath and adding the 
dependent libraries. On another project I'm involved with, we did that process 
once, adding all the source code to the classpath and the libraries and created 
a .classpath.tmpl. Now when people checkout the code, they can copy the content 
of that file to their .classpath file and setting up the project is reducing 
from a couple of minutes to few seconds.

I don't want to check-in .classpath because not everyone wants all the code in 
their classpath.

I attached such file to the mail. Note that the only dependency which will 
break on other machines is the ant.jar dependency, which on my Windows is 
located under c:\ant. That jar is required to compile contrib/ant from eclipse. 
Not sure how to resolve that, except besides removing that line from the file 
and document separately that that's what you need to do if you want to add 
contrib/ant ...

The file is sorted by name, putting the core stuff at the top - so it's easy 
for people to selectively add the interesting packages.

I don't know if an issue is required, if so I can create it in and move the 
discussion there.

Shai

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

   





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Erick Erickson

Junit4:

Well, simply disliking the @Test annotation seems like a poor reason to stay
with Junit3, although I admit it's a pain in the neck to change. Which is
why I didn't try to change all of them. The current system lends itself to
the practice of mangling the test name as a way of not running it, which far
too easily allows the test case to be forever ignored. One concrete
advantage of  annotations in Junit4 is the ability to add another "stupid"
annotation @Ignore, which then gets reported and thus doesn't get lost.

As I remember, that last place we left localization what that Mike (?) saw
some intermittent problem that I couldn't reproduce. I could dust off that
code and see what the current state of affairs is since this has come up
again. The other problem was that the implementation I used lead to
*increased* test run times. The localization tests basically spun through
all the Locales available and ran all the tests in the class against them.
The current system only runs *some* of the tests in a test class through the
localization process. This can be addressed by, at worst, splitting the test
class up, but in my proof-of-concept that seemed like too much detail...

My purpose in deprecating LuceneTestCase was to explicitly encourage
migration to Junit4, the deprecation warnings being the goad. I vote against
removing it

FWIW
Erick

On Thu, Feb 25, 2010 at 10:54 AM, Uwe Schindler (JIRA) wrote:

>
>[
> https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838384#action_12838384]
>
> Uwe Schindler commented on LUCENE-2285:
> ---
>
> Hi Shai,
>
> I applied the patch to my checkout, so it will not get out-of date. As
> mentioned before, I have to review each change, as on my first diagonal
> look-around I found a removed cast in TestCharArraySet/Map that is important
> to call the right method, without the cast the test would pass, but the
> affected method is never called. I am also not want to remove some casts in
> NumericRange and other parts, where the casts were added for more clearness
> in code. Especially at some places without the cast it is not clear what
> javac will do, so the cast is for more "security" even if not needed.
>
> So please excuse by complaints, but two people looking over such a large
> patch is really needed.
>
> Thanks for the work! Uwe
>
> > Code cleanup from all sorts of (trivial) warnings
> > -
> >
> > Key: LUCENE-2285
> > URL: https://issues.apache.org/jira/browse/LUCENE-2285
> > Project: Lucene - Java
> >  Issue Type: Improvement
> >Reporter: Shai Erera
> >Assignee: Uwe Schindler
> >Priority: Minor
> > Fix For: 3.1
> >
> > Attachments: LUCENE-2285.patch
> >
> >
> > I would like to do some code cleanup and remove all sorts of trivial
> warnings, like unnecessary casts, problems w/ javadocs, unused variables,
> redundant null checks, unnecessary semicolon etc. These are all very trivial
> and should not pose any problem.
> > I'll create another issue for getting rid of deprecated code usage, like
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial
> because it only affects Lucene code, but it's a different type of change.
> > Another issue I'd like to create is about introducing more generics in
> the code, where it's missing today - not changing existing API. There are
> many places in the code like that.
> > So, with you permission, I'll start with the trivial ones first, and then
> move on to the others.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

[jira] Updated: (LUCENE-2286) enable DefaultSimilarity.setDiscountOverlaps by default

2010-02-25 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2286:
---

Fix Version/s: 3.1

> enable DefaultSimilarity.setDiscountOverlaps by default
> ---
>
> Key: LUCENE-2286
> URL: https://issues.apache.org/jira/browse/LUCENE-2286
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2286.patch
>
>
> I think we should enable setDiscountOverlaps in DefaultSimilarity by default.
> If you are using synonyms or commongrams or a number of other 
> 0-posInc-term-injecting methods, these currently screw up your length 
> normalization.
> These terms have a position increment of zero, so they shouldnt count towards 
> the length of the document.
> I've done relevance tests with persian showing the difference is significant, 
> and i think its a big trap to anyone using synonyms, etc: your relevance can 
> actually get worse if you don't flip this boolean flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838456#action_12838456
 ] 

Shai Erera commented on LUCENE-2285:


bq. some of the changes are problematic because they appear in generated 
classes (like QueryParser),

So? All I did was remove unnecessary semicolons and casts ... next time those 
files will be generated, the warnings will return. But at least until then we 
can live w/ few less warnings ... which will allow my perfectionist eyes to 
rest a little :).

bq. replaced all LUCENE_CURRENT constants in tests

Yes, I figured I'm already touching these files, let's do it all at once. 
Reduced another ~300 warnings.

About the removed casts - eclipse really warns you on unnecessary casts. I have 
never found a case where it was wrong. The removed cast from TestCharArraySet 
is justified because you want to test the contains(Object) method, which is 
exactly what happens. In fact, when I look at the code, I think there is a 
wrong cast:
{code}
assertFalse(CharArraySet.EMPTY_SET.contains((Object) "foo")); // invokes 
the contains(Object) method
assertFalse(CharArraySet.EMPTY_SET.contains("foo".toCharArray())); // 
invokes the contains(Object) method
assertFalse(CharArraySet.EMPTY_SET.contains("foo".toCharArray(),0,3)); // 
invokes the contains(char[], int, int) method
{code}

If the intention was to check all 3 contains methods, then the first cast 
should have been to CharSequence? Anyway, the removed cast (the second, which 
cast to Object) is justified as it's indeed unnecessary.

bq. Also @SuppressWarnings("unused") is not a javac annotation

Are you sure? I have another project which compiles w/ javac and it works fine. 
I'll check it, but I trust you :).

About adding casts for clarity of code - I guess that's a matter of styling, 
but the cast is truly unnecessary, and just produces a warning. I would like 
the code to be free of those, but that's only my opinion.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Stored fields access

2010-02-25 Thread Erick Erickson

OK, never mind 

Erick

On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot  wrote:

> My issue is with extra objects created in the process. Field selection
> can be handled with, well, FieldSelector.
>
> 2010/2/25 Erick Erickson :
> > Does LazyLoading address this? I'm assuming your issue is
> > that the default behavior loads the entire document regardless
> > of whether you actually want all the fields.
> > Erick
> >
> > On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot 
> wrote:
> >>
> >> I'm thinking, should Lucene introduce new interface to read stored
> >> document fields?
> >>
> >> Current 'Document document(int n)' mechanism is barely usable due to
> >> overhead involved. While I believe underlying index structure works
> >> pretty fast (if it fits in memory, as is the case for most
> >> performance-concerned installations), there's no adequate access to it
> >> and people are forced to introduce contraptions like LinkedIn's
> >> payload-assisted luceneId<->appId mapping or similar caches we employ.
> >>
> >> What I am thinking about is something along the lines of existing
> >> iterators like TermDocs/TermPositions. Iterate over docs, then iterate
> >> over fields stored for each, extract data, ???, profit.
> >> Comments?
> >>
> >> --
> >> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
> >> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> >> ICQ: 104465785
> >>
> >> -
> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >>
> >
> >
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

[jira] Updated: (LUCENE-2277) QueryNodeImpl throws ConcurrentModificationException on add(List)

2010-02-25 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2277:
---

Fix Version/s: 3.1

> QueryNodeImpl throws ConcurrentModificationException on add(List)
> 
>
> Key: LUCENE-2277
> URL: https://issues.apache.org/jira/browse/LUCENE-2277
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 3.0
> Environment: all
>Reporter: Frank Wesemann
>Priority: Critical
> Fix For: 3.1
>
> Attachments: addChildren.patch
>
>
> on adding a List of children to a QueryNodeImplemention a 
> ConcurrentModificationException is thrown.
> This is due to the fact that QueryNodeImpl instead of iteration over the 
> supplied list, iterates over its internal clauses List.
> Patch:
> Index: QueryNodeImpl.java
> ===
> --- QueryNodeImpl.java(revision 911642)
> +++ QueryNodeImpl.java(working copy)
> @@ -74,7 +74,7 @@
>
> .getLocalizedMessage(QueryParserMessages.NODE_ACTION_NOT_SUPPORTED));
>  }
>  
> -for (QueryNode child : getChildren()) {
> +for (QueryNode child : children) {
>add(child);
>  }
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838464#action_12838464
 ] 

Uwe Schindler commented on LUCENE-2285:
---

bq. The removed cast from TestCharArraySet is justified because you want to 
test the contains(Object) method, which is exactly what happens. In fact, when 
I look at the code, I think there is a wrong cast:

This is not true:
{noformat}
Index: src/test/org/apache/lucene/analysis/TestCharArrayMap.java
===
--- src/test/org/apache/lucene/analysis/TestCharArrayMap.java   (revision 
916146)
+++ src/test/org/apache/lucene/analysis/TestCharArrayMap.java   (working copy)
@@ -76,7 +76,7 @@
 int n=0;
 for (Object o : cs) {
   assertTrue(cm.containsKey(o));
-  assertTrue(cm.containsKey((char[]) o));
+  assertTrue(cm.containsKey(o));
   n++;
 }
 assertEquals(hm.size(), n);
Index: src/test/org/apache/lucene/analysis/TestCharArraySet.java
===
--- src/test/org/apache/lucene/analysis/TestCharArraySet.java   (revision 
916146)
+++ src/test/org/apache/lucene/analysis/TestCharArraySet.java   (working copy)
@@ -475,7 +475,7 @@
   assertFalse(CharArraySet.EMPTY_SET.contains(stopword));
 }
 assertFalse(CharArraySet.EMPTY_SET.contains((Object) "foo"));
-assertFalse(CharArraySet.EMPTY_SET.contains((Object) "foo".toCharArray()));
+assertFalse(CharArraySet.EMPTY_SET.contains("foo".toCharArray()));
 assertFalse(CharArraySet.EMPTY_SET.contains("foo".toCharArray(),0,3));
   }
{noformat}

The problem here is:
We have a char[] and a Object method. The check tests that also the Object 
method accepts char[]. This is important if you cast CharArraySet to 
java.util.Set and call with char[]. So removin the cast for this test is simply 
wrong. You can check this with Clover. And you patch even adds the same check 
two times - instead of forcing the right method.

And by the way: When I run "ant" and it compiles I get no warning message at 
all.

bq. Are you sure? I have another project which compiles w/ javac and it works 
fine. I'll check it, but I trust you .

As I said before, java compiles need to simply ignore unknown SuppressWarnings 
(see Java Language specs). It's only Eclipse that is to over

bq. About adding casts for clarity of code - I guess that's a matter of 
styling, but the cast is truly unnecessary, and just produces a warning. I 
would like the code to be free of those, but that's only my opinion.

Yes its a matter of styling, because of the same we don't want to have 
autoboxing in internal lucene code, because autoboxing has a speed impact for 
some of Lucene's code (like collectors). So we want to see what happens.

I want to understand that I apply a char -> int conversion, especially in our 
new TokenFilters you can get a problem very fast as Character-methods are very 
sensitive if called with char or int from the Unicode part. And I say it again, 
javac shows no warning, so I don't see any need to change this, just because 
this Eclipse prints a useless warning. But you can switch them off. I have some 
warnings i simply swicth off after creating a project with eclipse. Like the 
problem with generified instanceof checks, which are a bug in Eclipse.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache

Re: Stored fields access

2010-02-25 Thread Earwin Burrfoot

My issue is with extra objects created in the process. Field selection
can be handled with, well, FieldSelector.

2010/2/25 Erick Erickson :
> Does LazyLoading address this? I'm assuming your issue is
> that the default behavior loads the entire document regardless
> of whether you actually want all the fields.
> Erick
>
> On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot  wrote:
>>
>> I'm thinking, should Lucene introduce new interface to read stored
>> document fields?
>>
>> Current 'Document document(int n)' mechanism is barely usable due to
>> overhead involved. While I believe underlying index structure works
>> pretty fast (if it fits in memory, as is the case for most
>> performance-concerned installations), there's no adequate access to it
>> and people are forced to introduce contraptions like LinkedIn's
>> payload-assisted luceneId<->appId mapping or similar caches we employ.
>>
>> What I am thinking about is something along the lines of existing
>> iterators like TermDocs/TermPositions. Iterate over docs, then iterate
>> over fields stored for each, extract data, ???, profit.
>> Comments?
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>> ICQ: 104465785
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Erick Erickson

I don't have my heart set on keeping the deprecation, so taking it off works
for me. I'd also agree that we need a concerted effort to either completely
convert or we should leave it un-deprecated so feel free.

Let's move the junit4 stuff off to another discussion.

Erick


On Thu, Feb 25, 2010 at 1:27 PM, Shai Erera  wrote:

> Erik, I'm totally with you on JUnit 4. I think the @Test annotation is
> really not a big deal (it's actually very easy to migrate all the current
> tests to JUnit 4 with the added import using some script. Even manually it
> should be such a big deal.
>
> @Ignore is a perfect other advantage of JUnit4. I've found some tests which
> were prefixed with _, i.e. _testXYZ just to disable them. Nobody knows about
> it until he looks at the code (and pays attention). @Ignore would have been
> better.
>
> And there are lots of other advantages, like the @Before and @After (not
> only class). Another problem I've found in the tests is that not all
> extended LuceneTestCase, and usually their setUp and tearDown
> implementations were wrong - not calling super first/last. When I moved them
> to extend LuceneTestCase they broke (I fixed them, don't worry). However,
> that could never happen if the super's methods were tagged w/ @Before/After,
> because JUnit would take care running them before/after their sub-classes'
> @Before/After. So that's another win for JUnit4.
>
> And of course the @Before/AfterClass are really great !
>
> So all in all, I'm a big fan of JUnit4, and if the discussion will start
> again, I'll pay more attention to it and participate (I admit I didn't
> follow it before). As long as it happens on the list and not on some IRC
> channel (!?!?).
>
> But like Uwe said, that's slightly unrelated to that issue. Because that
> deprecation alone produced > 500 warnings (probably even much more), I
> un-deprecated it, and when we make a decision one way or the other, we
> should simply remove it (in case that's the decision). Until then, let's get
> rid of the unnecessary noise, agree?
>
> Shai
>
>
> On Thu, Feb 25, 2010 at 7:15 PM, Uwe Schindler  wrote:
>
>>  This discussion is out oft he scope of this issue. We can start the
>> flamewar again. In IRC we came to the conculsion, that our primary intent is
>> to make the test runs faster, which we achieved by patching lots of tests to
>> not change static defaults and so be able to run all tests in the same JVM
>> without forking. More speed improvements can be done by moving read-only
>> index creation for search tests into static @BeforeClass and setting
>> IndexReaders/-Searchers to NULL in @AfterClass to allow GC of static fields
>> holding RAMDirectory and so on.
>>
>>
>>
>> The @Test annotation lead to more confusion and errors at our delevopers.
>> E.g. we had a test merged back from 3.0 (without Junit4) to trunk or even
>> new tests were added, but nobody added @Test to it, leading to the fact that
>> the test were never run. So the most important change to LuceneTestCaseJ4
>> would be to emulate the old test* method names as if they have @Test. By
>> that you could still disable them as mentioned, but it would reduce the
>> burden of these dumb import statements and useless annotations.
>>
>>
>>
>> By the way, why does LuceneTestCaseJ4 extend TestWatchman and also a
>> instance field extends that class? I do not understand the whole magic
>> behind, this is totally confusing to me – annotating a field that is never
>> used in code by an annotation is stupid and looks totally incorrect (I mean
>> the field holding the TestWatchman-subclass). - This is another thing why I
>> am against the migration of our already proven tests.
>>
>>
>>
>> Because of that we don’t want to deprecate LuceneTestCase and instead only
>> transform new tests and such needing @BeforeClass/@AfterClass for more speed
>> to the new API.
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>
>> http://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Erick Erickson [mailto:erickerick...@gmail.com]
>> *Sent:* Thursday, February 25, 2010 5:27 PM
>> *To:* java-dev@lucene.apache.org
>> *Subject:* Re: [jira] Commented: (LUCENE-2285) Code cleanup from all
>> sorts of (trivial) warnings
>>
>>
>>
>> Junit4:
>>
>>
>>
>> Well, simply disliking the @Test annotation seems like a poor reason to
>> stay with Junit3, although I admit it's a pain in the neck to change. Which
>> is why I didn't try to change all of them. The current system lends itself
>> to the practice of mangling the test name as a way of not running it, which
>> far too easily allows the test case to be forever ignored. One concrete
>> advantage of  annotations in Junit4 is the ability to add another "stupid"
>> annotation @Ignore, which then gets reported and thus doesn't get lost.
>>
>> As I remember, that last place we left localization what that Mike (?) saw
>> some intermittent problem that I couldn't reproduce. I could dust off that
>> code and s

[jira] Assigned: (LUCENE-2286) enable DefaultSimilarity.setDiscountOverlaps by default

2010-02-25 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-2286:
---

Assignee: Robert Muir

> enable DefaultSimilarity.setDiscountOverlaps by default
> ---
>
> Key: LUCENE-2286
> URL: https://issues.apache.org/jira/browse/LUCENE-2286
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2286.patch
>
>
> I think we should enable setDiscountOverlaps in DefaultSimilarity by default.
> If you are using synonyms or commongrams or a number of other 
> 0-posInc-term-injecting methods, these currently screw up your length 
> normalization.
> These terms have a position increment of zero, so they shouldnt count towards 
> the length of the document.
> I've done relevance tests with persian showing the difference is significant, 
> and i think its a big trap to anyone using synonyms, etc: your relevance can 
> actually get worse if you don't flip this boolean flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera

Erik, I'm totally with you on JUnit 4. I think the @Test annotation is
really not a big deal (it's actually very easy to migrate all the current
tests to JUnit 4 with the added import using some script. Even manually it
should be such a big deal.

@Ignore is a perfect other advantage of JUnit4. I've found some tests which
were prefixed with _, i.e. _testXYZ just to disable them. Nobody knows about
it until he looks at the code (and pays attention). @Ignore would have been
better.

And there are lots of other advantages, like the @Before and @After (not
only class). Another problem I've found in the tests is that not all
extended LuceneTestCase, and usually their setUp and tearDown
implementations were wrong - not calling super first/last. When I moved them
to extend LuceneTestCase they broke (I fixed them, don't worry). However,
that could never happen if the super's methods were tagged w/ @Before/After,
because JUnit would take care running them before/after their sub-classes'
@Before/After. So that's another win for JUnit4.

And of course the @Before/AfterClass are really great !

So all in all, I'm a big fan of JUnit4, and if the discussion will start
again, I'll pay more attention to it and participate (I admit I didn't
follow it before). As long as it happens on the list and not on some IRC
channel (!?!?).

But like Uwe said, that's slightly unrelated to that issue. Because that
deprecation alone produced > 500 warnings (probably even much more), I
un-deprecated it, and when we make a decision one way or the other, we
should simply remove it (in case that's the decision). Until then, let's get
rid of the unnecessary noise, agree?

Shai

On Thu, Feb 25, 2010 at 7:15 PM, Uwe Schindler  wrote:

>  This discussion is out oft he scope of this issue. We can start the
> flamewar again. In IRC we came to the conculsion, that our primary intent is
> to make the test runs faster, which we achieved by patching lots of tests to
> not change static defaults and so be able to run all tests in the same JVM
> without forking. More speed improvements can be done by moving read-only
> index creation for search tests into static @BeforeClass and setting
> IndexReaders/-Searchers to NULL in @AfterClass to allow GC of static fields
> holding RAMDirectory and so on.
>
>
>
> The @Test annotation lead to more confusion and errors at our delevopers.
> E.g. we had a test merged back from 3.0 (without Junit4) to trunk or even
> new tests were added, but nobody added @Test to it, leading to the fact that
> the test were never run. So the most important change to LuceneTestCaseJ4
> would be to emulate the old test* method names as if they have @Test. By
> that you could still disable them as mentioned, but it would reduce the
> burden of these dumb import statements and useless annotations.
>
>
>
> By the way, why does LuceneTestCaseJ4 extend TestWatchman and also a
> instance field extends that class? I do not understand the whole magic
> behind, this is totally confusing to me – annotating a field that is never
> used in code by an annotation is stupid and looks totally incorrect (I mean
> the field holding the TestWatchman-subclass). - This is another thing why I
> am against the migration of our already proven tests.
>
>
>
> Because of that we don’t want to deprecate LuceneTestCase and instead only
> transform new tests and such needing @BeforeClass/@AfterClass for more speed
> to the new API.
>
>
>
> -
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Erick Erickson [mailto:erickerick...@gmail.com]
> *Sent:* Thursday, February 25, 2010 5:27 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: [jira] Commented: (LUCENE-2285) Code cleanup from all sorts
> of (trivial) warnings
>
>
>
> Junit4:
>
>
>
> Well, simply disliking the @Test annotation seems like a poor reason to
> stay with Junit3, although I admit it's a pain in the neck to change. Which
> is why I didn't try to change all of them. The current system lends itself
> to the practice of mangling the test name as a way of not running it, which
> far too easily allows the test case to be forever ignored. One concrete
> advantage of  annotations in Junit4 is the ability to add another "stupid"
> annotation @Ignore, which then gets reported and thus doesn't get lost.
>
> As I remember, that last place we left localization what that Mike (?) saw
> some intermittent problem that I couldn't reproduce. I could dust off that
> code and see what the current state of affairs is since this has come up
> again. The other problem was that the implementation I used lead to
> *increased* test run times. The localization tests basically spun through
> all the Locales available and ran all the tests in the class against them.
> The current system only runs *some* of the tests in a test class through the
> localization process. This can be addressed by, at worst, splitting the test
> class up, but in my

[jira] Commented: (LUCENE-2286) enable DefaultSimilarity.setDiscountOverlaps by default

2010-02-25 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838459#action_12838459
 ] 

Robert Muir commented on LUCENE-2286:
-

ok, i will commit in a few days if no one objects. In my opinion the backwards 
break is the easiest way to go.

in practice it won't hurt existing docs, and if someone is concerned about bad 
ranking (because the newly indexed docs suddenly are ranked better), they can 
turn this off with the boolean until the get a chance to reindex all docs.

> enable DefaultSimilarity.setDiscountOverlaps by default
> ---
>
> Key: LUCENE-2286
> URL: https://issues.apache.org/jira/browse/LUCENE-2286
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2286.patch
>
>
> I think we should enable setDiscountOverlaps in DefaultSimilarity by default.
> If you are using synonyms or commongrams or a number of other 
> 0-posInc-term-injecting methods, these currently screw up your length 
> normalization.
> These terms have a position increment of zero, so they shouldnt count towards 
> the length of the document.
> I've done relevance tests with persian showing the difference is significant, 
> and i think its a big trap to anyone using synonyms, etc: your relevance can 
> actually get worse if you don't flip this boolean flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2286) enable DefaultSimilarity.setDiscountOverlaps by default

2010-02-25 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838458#action_12838458
 ] 

Michael McCandless commented on LUCENE-2286:


Patch looks good (trivial).

> enable DefaultSimilarity.setDiscountOverlaps by default
> ---
>
> Key: LUCENE-2286
> URL: https://issues.apache.org/jira/browse/LUCENE-2286
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2286.patch
>
>
> I think we should enable setDiscountOverlaps in DefaultSimilarity by default.
> If you are using synonyms or commongrams or a number of other 
> 0-posInc-term-injecting methods, these currently screw up your length 
> normalization.
> These terms have a position increment of zero, so they shouldnt count towards 
> the length of the document.
> I've done relevance tests with persian showing the difference is significant, 
> and i think its a big trap to anyone using synonyms, etc: your relevance can 
> actually get worse if you don't flip this boolean flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-25 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2111:


Attachment: LUCENE-2111.patch

a few more easy nocommits

> Wrapup flexible indexing
> 
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Flex Branch
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1
>
> Attachments: LUCENE-2111-EmptyTermsEnum.patch, 
> LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111_bytesRef.patch, LUCENE-2111_experimental.patch, 
> LUCENE-2111_fuzzy.patch, LUCENE-2111_toString.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search 
> performance testing looks good, it survived several visits from the Unicode 
> policeman ;)
> But it still has a number of nocommits, could use some more scrutiny 
> especially on the "emulate old API on flex index" and vice/versa code paths, 
> and still needs some more performance testing.  I'll do these under this 
> issue, and we should open separate issues for other self contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Stored fields access

2010-02-25 Thread Tim Smith

I created LUCENE-2276 a couple of days ago to at least allow reusing Document 
objects (didn't see any interest from anyone though)

 -- Tim

Erick Erickson wrote:
> OK, never mind 
>
> Erick
>
> On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot  > wrote:
>
> My issue is with extra objects created in the process. Field selection
> can be handled with, well, FieldSelector.
>
> 2010/2/25 Erick Erickson  >:
> > Does LazyLoading address this? I'm assuming your issue is
> > that the default behavior loads the entire document regardless
> > of whether you actually want all the fields.
> > Erick
> >
> > On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot
> mailto:ear...@gmail.com>> wrote:
> >>
> >> I'm thinking, should Lucene introduce new interface to read stored
> >> document fields?
> >>
> >> Current 'Document document(int n)' mechanism is barely usable
> due to
> >> overhead involved. While I believe underlying index structure works
> >> pretty fast (if it fits in memory, as is the case for most
> >> performance-concerned installations), there's no adequate
> access to it
> >> and people are forced to introduce contraptions like LinkedIn's
> >> payload-assisted luceneId<->appId mapping or similar caches we
> employ.
> >>
> >> What I am thinking about is something along the lines of existing
> >> iterators like TermDocs/TermPositions. Iterate over docs, then
> iterate
> >> over fields stored for each, extract data, ???, profit.
> >> Comments?
> >>
> >> --
> >> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com
> )
> >> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> >> ICQ: 104465785
> >>
> >>
> -
> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> 
> >> For additional commands, e-mail:
> java-dev-h...@lucene.apache.org
> 
> >>
> >
> >
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com
> )
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> 
>
>

Re: Stored fields access

2010-02-25 Thread Earwin Burrfoot

Missed that, I have a heap of unread Jira mails :/

Okay, you're reusing Document object and the list inside. To reuse
Fieldable instances you'd have to do some very awkward things.
More awkward things are required to extract your longed-for values
from the Document.
To add insult to injury, Document and Fieldable define a boatload of
stuff that is used at indexation-time, but has zero meaning at
search-time.
This is just broken, quickly-hacked-together API.

2010/2/25 Tim Smith :
> I created LUCENE-2276 a couple of days ago to at least allow reusing
> Document objects (didn't see any interest from anyone though)
>
>  -- Tim
>
> Erick Erickson wrote:
>
> OK, never mind 
> Erick
>
> On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot  wrote:
>>
>> My issue is with extra objects created in the process. Field selection
>> can be handled with, well, FieldSelector.
>>
>> 2010/2/25 Erick Erickson :
>> > Does LazyLoading address this? I'm assuming your issue is
>> > that the default behavior loads the entire document regardless
>> > of whether you actually want all the fields.
>> > Erick
>> >
>> > On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot 
>> > wrote:
>> >>
>> >> I'm thinking, should Lucene introduce new interface to read stored
>> >> document fields?
>> >>
>> >> Current 'Document document(int n)' mechanism is barely usable due to
>> >> overhead involved. While I believe underlying index structure works
>> >> pretty fast (if it fits in memory, as is the case for most
>> >> performance-concerned installations), there's no adequate access to it
>> >> and people are forced to introduce contraptions like LinkedIn's
>> >> payload-assisted luceneId<->appId mapping or similar caches we employ.
>> >>
>> >> What I am thinking about is something along the lines of existing
>> >> iterators like TermDocs/TermPositions. Iterate over docs, then iterate
>> >> over fields stored for each, extract data, ???, profit.
>> >> Comments?
>> >>
>> >> --
>> >> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
>> >> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>> >> ICQ: 104465785
>> >>
>> >> -
>> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>> ICQ: 104465785
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Stored fields access

2010-02-25 Thread Tim Smith

yeah,

i would like to see a more "term-vector"/sax like api for extracting
values that requires no extra object overhead as well

pass in a "collector" that will call methods as fields are encountered
(and can return false if walking the document should stop (or some Enum
for more options))

i just throw away the lucene Document and Field objects when i'm done
with them anyway (well i'll cache them in an LRU cache for later reuse,
but i could do smarter things if i didn't need the lucene Document
object in the first place)

 -- Tim

Earwin Burrfoot wrote:
> Missed that, I have a heap of unread Jira mails :/
>
> Okay, you're reusing Document object and the list inside. To reuse
> Fieldable instances you'd have to do some very awkward things.
> More awkward things are required to extract your longed-for values
> from the Document.
> To add insult to injury, Document and Fieldable define a boatload of
> stuff that is used at indexation-time, but has zero meaning at
> search-time.
> This is just broken, quickly-hacked-together API.
>
> 2010/2/25 Tim Smith :
>   
>> I created LUCENE-2276 a couple of days ago to at least allow reusing
>> Document objects (didn't see any interest from anyone though)
>>
>>  -- Tim
>>
>> Erick Erickson wrote:
>>
>> OK, never mind 
>> Erick
>>
>> On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot  wrote:
>> 
>>> My issue is with extra objects created in the process. Field selection
>>> can be handled with, well, FieldSelector.
>>>
>>> 2010/2/25 Erick Erickson :
>>>   
 Does LazyLoading address this? I'm assuming your issue is
 that the default behavior loads the entire document regardless
 of whether you actually want all the fields.
 Erick

 On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot 
 wrote:
 
> I'm thinking, should Lucene introduce new interface to read stored
> document fields?
>
> Current 'Document document(int n)' mechanism is barely usable due to
> overhead involved. While I believe underlying index structure works
> pretty fast (if it fits in memory, as is the case for most
> performance-concerned installations), there's no adequate access to it
> and people are forced to introduce contraptions like LinkedIn's
> payload-assisted luceneId<->appId mapping or similar caches we employ.
>
> What I am thinking about is something along the lines of existing
> iterators like TermDocs/TermPositions. Iterate over docs, then iterate
> over fields stored for each, extract data, ???, profit.
> Comments?
>
> --
> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>   
 
>>>
>>> --
>>> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
>>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>>> ICQ: 104465785
>>>
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>>
>>>   
>>
>> 
>
>
>
>

[jira] Commented: (LUCENE-1732) Multi-threaded Spatial Search

2010-02-25 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838544#action_12838544
 ] 

David Smiley commented on LUCENE-1732:
--

If I have a machine with say four CPU cores also running Solr with four cores 
(a distributed -- i.e. sharded index), would it be fair to say that the 
optimization presented in this issue is of no use?

> Multi-threaded Spatial Search
> -
>
> Key: LUCENE-1732
> URL: https://issues.apache.org/jira/browse/LUCENE-1732
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/spatial
>Affects Versions: 2.9
>Reporter: Chris Male
> Attachments: LUCENE-1732_multi_threaded_spatial_search.patch
>
>
> The attached patch is a large refactoring of the spatial search contrib.  The 
> primary contribution is the creation of the ThreadedDistanceFilter, which 
> uses an ExecutorService to filter the documents in multiple threads.  As a 
> result of doing the filtering in multiple threads, the time taken to filter 
> 1.2 million documents has been reduced from nearly 3s, to between 500-800ms.
> As part of this work, the DistanceQueryBuilder has been replaced by the 
> SpatialFilter, a Lucene Filter, some unused functionality has been removed, 
> and the package hierarchy has changed.  Consequently this patch breaks 
> backwards compatibility with the existing spatial search contrib.
> Also during the process of making these changes, abstractions have been added 
> so that the one implementation of the ThreadedDistanceFilter can work with 
> lat/long and geohash data formats, and so that precise but costly arc 
> distance calculations can be replaced by less precise but much more efficient 
> flat plane calculations if needed.
> This patch will be used in an upcoming patch for Solr which will improve 
> Solr's support for spatial search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2167) StandardTokenizer Javadoc does not correctly describe tokenization around punctuation characters

2010-02-25 Thread Shyamal Prasad (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838574#action_12838574
 ] 

Shyamal Prasad commented on LUCENE-2167:


Robert Muir wrote:
{quote}
I would love it if you could produce a grammar that implemented UAX#29!

If so, in my opinion it should become the StandardAnalyzer for the next lucene 
version. If I thought I could do it correctly, I would have already done it, as 
the support for the unicode properties needed to do this is now in the trunk of 
Jflex!
{quote}

I'm not smart enough to know if I should even try to do it at all (leave alone 
correctly), but am always willing to learn! Thanks for the references, I will 
certainly give it an honest try.

/Shyamal

> StandardTokenizer Javadoc does not correctly describe tokenization around 
> punctuation characters
> 
>
> Key: LUCENE-2167
> URL: https://issues.apache.org/jira/browse/LUCENE-2167
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.4.1, 2.9, 2.9.1, 3.0
>Reporter: Shyamal Prasad
>Priority: Minor
> Attachments: LUCENE-2167.patch, LUCENE-2167.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The Javadoc for StandardTokenizer states:
> {quote}
> Splits words at punctuation characters, removing punctuation. 
> However, a dot that's not followed by whitespace is considered part of a 
> token.
> Splits words at hyphens, unless there's a number in the token, in which case 
> the whole 
> token is interpreted as a product number and is not split.
> {quote}
> This is not accurate. The actual JFlex implementation treats hyphens 
> interchangeably with
> punctuation. So, for example "video,mp4,test" results in a *single* token and 
> not three tokens
> as the documentation would suggest.
> Additionally, the documentation suggests that "video-mp4-test-again" would 
> become a single
> token, but in reality it results in two tokens: "video-mp4-test" and "again".
> IMHO the parser implementation is fine as is since it is hard to keep 
> everyone happy, but it is probably
> worth cleaning up the documentation string. 
> The patch included here updates the documentation string and adds a few test 
> cases to confirm the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-25 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2111:
---

Attachment: LUCENE-2111.patch

Attached patch, fixes flex APIs to not return null (instead return .EMPTY 
objects).

> Wrapup flexible indexing
> 
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Flex Branch
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1
>
> Attachments: LUCENE-2111-EmptyTermsEnum.patch, 
> LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111_bytesRef.patch, LUCENE-2111_experimental.patch, 
> LUCENE-2111_fuzzy.patch, LUCENE-2111_toString.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search 
> performance testing looks good, it survived several visits from the Unicode 
> policeman ;)
> But it still has a number of nocommits, could use some more scrutiny 
> especially on the "emulate old API on flex index" and vice/versa code paths, 
> and still needs some more performance testing.  I'll do these under this 
> issue, and we should open separate issues for other self contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Uwe's question

2010-02-25 Thread Erick Erickson

<>
No good reason, I plead confusion when figuring out how to use it. I've
attached a patch to Lucene 2037 that removes the LuceneTestCaseJ4 extending
TestWatchman.

<>

Well, this is to provide the same functionality as LuceneTestCase. I'm
reaching a bit here since I haven't been in that code lately, but...

LocalizedTestCase called runBare in LuceneTestCase which reported the seed
value if an exception was thrown. I couldn't find a good way to access
runBare or analogs in Junit4, but the interceptor pattern worked as well.
The interceptor is called by the Junit framework on test events, so there
aren't references to it in the Lucene test code. There are other places that
call runBare, so I assumed that if anyone wanted to use Junit4 with those
classes it would be a good thing to allow.

I think the interceptor pattern is an elegant way to "do something" at
discrete points in the test run, although it is a bit opaque.

Most of this was put in when I was trying to move LocalizedTestCase to the
Junit4 world. We didn't do that, but this still needs to be kept if we want
LuceneTestCaseJ4 to be a drop-in replacement for LuceneTestCase.

<<< - This is another thing why I am against the migration of our already
proven tests.>>>

If you'll recall the discussion at the time, neither am I. I do believe,
though, that if anyone wants to change a test class to use Junit4 it's a
good thing to have something that'll drop in without surprises, which is
what I was trying for.

Erick

[jira] Updated: (LUCENE-2037) Allow Junit4 tests in our environment.

2010-02-25 Thread Erick Erickson (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated LUCENE-2037:
---

Attachment: LUCENE-2037_remove_testwatchman.patch

Removed unnecessary derivation from TestWatchman.

Corrected minor typo in comment.

> Allow Junit4 tests in our environment.
> --
>
> Key: LUCENE-2037
> URL: https://issues.apache.org/jira/browse/LUCENE-2037
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Affects Versions: 3.1
> Environment: Development
>Reporter: Erick Erickson
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: junit-4.7.jar, LUCENE-2037.patch, LUCENE-2037.patch, 
> LUCENE-2037.patch, LUCENE-2037_remove_testwatchman.patch, 
> LUCENE-2037_revised_2.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate 
> Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should 
> have to be rewritten. We should start this for the 3.1 release so we can get 
> a clean 3.0 out smoothly.
> It's probably worthwhile to convert a small set of tests as an exemplar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Uwe's question

2010-02-25 Thread Erick Erickson

Hmmm, didn't reopen the JIRA, should I? Or will it just magically get into
Michael's queue?

On Thu, Feb 25, 2010 at 8:52 PM, Erick Erickson wrote:

> < instance field extends that class?>>
> No good reason, I plead confusion when figuring out how to use it. I've
> attached a patch to Lucene 2037 that removes the LuceneTestCaseJ4 extending
> TestWatchman.
>
> < me – annotating a field that is never used in code by an annotation is
> stupid and looks totally incorrect (I mean the field holding the
> TestWatchman-subclass).>>
>
> Well, this is to provide the same functionality as LuceneTestCase. I'm
> reaching a bit here since I haven't been in that code lately, but...
>
> LocalizedTestCase called runBare in LuceneTestCase which reported the seed
> value if an exception was thrown. I couldn't find a good way to access
> runBare or analogs in Junit4, but the interceptor pattern worked as well.
> The interceptor is called by the Junit framework on test events, so there
> aren't references to it in the Lucene test code. There are other places that
> call runBare, so I assumed that if anyone wanted to use Junit4 with those
> classes it would be a good thing to allow.
>
> I think the interceptor pattern is an elegant way to "do something" at
> discrete points in the test run, although it is a bit opaque.
>
> Most of this was put in when I was trying to move LocalizedTestCase to the
> Junit4 world. We didn't do that, but this still needs to be kept if we want
> LuceneTestCaseJ4 to be a drop-in replacement for LuceneTestCase.
>
> <<< - This is another thing why I am against the migration of our already
> proven tests.>>>
>
> If you'll recall the discussion at the time, neither am I. I do believe,
> though, that if anyone wants to change a test class to use Junit4 it's a
> good thing to have something that'll drop in without surprises, which is
> what I was trying for.
>
> Erick
>

Re: Uwe's question

2010-02-25 Thread Robert Muir

> LocalizedTestCase called runBare in LuceneTestCase which reported the seed
> value if an exception was thrown. I couldn't find a good way to access
> runBare or analogs in Junit4, but the interceptor pattern worked as well.
> The interceptor is called by the Junit framework on test events, so there
> aren't references to it in the Lucene test code. There are other places that
> call runBare, so I assumed that if anyone wanted to use Junit4 with those
> classes it would be a good thing to allow.
>

I didn't forget about your patch Erick, in my opinion there is nothing wrong
with it. I hope its not discouraging you, the problem is a few of us have
spent countless hours trying to debug this hard-to-reproduce Thai test
failure problem.

It failed in the existing tests, too, with Junit 3 on hudson (one time!). At
this point, i start to wonder if it could be related to stuff like this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6683975

I don't think we should let this stop progress with the tests, if you think
we should move LocalizedTestCase to junit 4 lets do it.

-- 
Robert Muir
rcm...@gmail.com

[jira] Resolved: (LUCENE-2284) MatchAllDocsQueryNode toString() creates invalid XML-Tag

2010-02-25 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2284.
-

Resolution: Fixed

Committed revision 916543.

Thanks Frank!

> MatchAllDocsQueryNode toString() creates invalid XML-Tag
> 
>
> Key: LUCENE-2284
> URL: https://issues.apache.org/jira/browse/LUCENE-2284
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
> Environment: all
>Reporter: Frank Wesemann
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2284.patch
>
>
> MatchAllDocsQueryNode.toString() returns "", 
> which is inavlid XML should read ".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Uwe's question

2010-02-25 Thread Erick Erickson

Well, "Things got busy (tm)". Uwe's point if valid; unless there's
demonstrable gain, moving things to Junit4 "just for fun" is wasted motion,
indeed dangerous. I was focusing on LocalizedTestCase to understand the
place of runBare etc. in the scheme of things since when I created
LuceneTestCaseJ4 that was something I wanted to figure out to make it a
replacement for LuceneTestCase.

I can't point to a compelling reason to shake up the code, the only
improvement it would have is having a demonstration of using the Junit4
@RunWith annotation for future reference.

So, I've no compelling reason to push that patch forward. If y'all think
it's worth it I'll be happy to crank that patch back up again, it'll take a
few days though. It does affect a several files, and if the main value here
is an exemplar of the @RunWith annotation, perhaps there's a better place to
put that in.

Erick

On Thu, Feb 25, 2010 at 9:06 PM, Robert Muir  wrote:

>
>
>
>> LocalizedTestCase called runBare in LuceneTestCase which reported the seed
>> value if an exception was thrown. I couldn't find a good way to access
>> runBare or analogs in Junit4, but the interceptor pattern worked as well.
>> The interceptor is called by the Junit framework on test events, so there
>> aren't references to it in the Lucene test code. There are other places that
>> call runBare, so I assumed that if anyone wanted to use Junit4 with those
>> classes it would be a good thing to allow.
>>
>
> I didn't forget about your patch Erick, in my opinion there is nothing
> wrong with it. I hope its not discouraging you, the problem is a few of us
> have spent countless hours trying to debug this hard-to-reproduce Thai test
> failure problem.
>
> It failed in the existing tests, too, with Junit 3 on hudson (one time!).
> At this point, i start to wonder if it could be related to stuff like this:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6683975
>
> I don't think we should let this stop progress with the tests, if you think
> we should move LocalizedTestCase to junit 4 lets do it.
>
> --
> Robert Muir
> rcm...@gmail.com
>

Re: Uwe's question

2010-02-25 Thread Shai Erera

Ok this seems a discussion related to JUnit 4, so I'll port what I've said
about it from the other thread (doing the code cleanup):

{quote}
Erik, I'm totally with you on JUnit 4. I think the @Test annotation is
really not a big deal (it's actually very easy to migrate all the current
tests to JUnit 4 with the added import using some script. Even manually it
shouldn't be such a big deal.

@Ignore is a perfect other advantage of JUnit4. I've found some tests which
were prefixed with _, i.e. _testXYZ just to disable them. Nobody knows about
them until he looks at the code (and pays attention). @Ignore would have
been better.

And there are lots of other advantages, like the @Before and @After (not
only class). Another problem I've found in the tests is that not all
extended LuceneTestCase, and usually their setUp and tearDown
implementations were wrong - not calling super first/last. When I moved them
to extend LuceneTestCase they broke (I fixed them, don't worry). However,
that could never happen if the super's methods were tagged w/ @Before/After,
because JUnit would take care running them before/after their sub-classes'
@Before/After. So that's another win for JUnit4.

And of course the @Before/AfterClass are really great !
{quote}

I think the @Before/After annotations can be a real win for our tests.

My two cents,
Shai

On Fri, Feb 26, 2010 at 4:57 AM, Erick Erickson wrote:

> Well, "Things got busy (tm)". Uwe's point if valid; unless there's
> demonstrable gain, moving things to Junit4 "just for fun" is wasted motion,
> indeed dangerous. I was focusing on LocalizedTestCase to understand the
> place of runBare etc. in the scheme of things since when I created
> LuceneTestCaseJ4 that was something I wanted to figure out to make it a
> replacement for LuceneTestCase.
>
> I can't point to a compelling reason to shake up the code, the only
> improvement it would have is having a demonstration of using the Junit4
> @RunWith annotation for future reference.
>
> So, I've no compelling reason to push that patch forward. If y'all think
> it's worth it I'll be happy to crank that patch back up again, it'll take a
> few days though. It does affect a several files, and if the main value here
> is an exemplar of the @RunWith annotation, perhaps there's a better place to
> put that in.
>
> Erick
>
>
> On Thu, Feb 25, 2010 at 9:06 PM, Robert Muir  wrote:
>
>>
>>
>>
>>> LocalizedTestCase called runBare in LuceneTestCase which reported the
>>> seed value if an exception was thrown. I couldn't find a good way to access
>>> runBare or analogs in Junit4, but the interceptor pattern worked as well.
>>> The interceptor is called by the Junit framework on test events, so there
>>> aren't references to it in the Lucene test code. There are other places that
>>> call runBare, so I assumed that if anyone wanted to use Junit4 with those
>>> classes it would be a good thing to allow.
>>>
>>
>> I didn't forget about your patch Erick, in my opinion there is nothing
>> wrong with it. I hope its not discouraging you, the problem is a few of us
>> have spent countless hours trying to debug this hard-to-reproduce Thai test
>> failure problem.
>>
>> It failed in the existing tests, too, with Junit 3 on hudson (one time!).
>> At this point, i start to wonder if it could be related to stuff like this:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6683975
>>
>> I don't think we should let this stop progress with the tests, if you
>> think we should move LocalizedTestCase to junit 4 lets do it.
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>
>

[jira] Updated: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2285:
---

Attachment: LUCENE-2285.patch

Fixes TestCharArrayMap/Test original bug.

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2285) Code cleanup from all sorts of (trivial) warnings

2010-02-25 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838708#action_12838708
 ] 

Shai Erera commented on LUCENE-2285:


Uwe, your examples are still wrong. CharArrayMap has 3 methods: 
containsKey(CharSequence), containsKey(Object) and containsKey(char[], int, 
int). There is no contains(char[]). Therefore when you cast to char[], the 
Object method is the one that's called, not that char[],int,int.

If I change the code to:  assertTrue(cm.containsKey((char[]) o, 0, ((char[]) 
o).length )); then the right method is invoked. So I guess the tests were 
defected in the first place .. and like I said, eclipse doesn't lie when it 
says a cast is unnecessary, at least I haven't seen such a case yet.

I'll fix those two tests now, because they were defective right from the 
beginning. Thanks for spotting this, because you've just revealed a bug which 
existed in the tests :).

bq. because of the same we don't want to have autoboxing in internal lucene code

I don't see how autoboxing is related to casting ... If a map returns an 
Integer, and you assign it to 'int', then whether or not you'll do the cast it 
will autounbox it. If you assign it to an Integer, then you won't be able to 
cast to 'int' (I think?) and hence the cast is redundant as well.

About Character methods, eclipse is smart enough to detect that when you call a 
method w/ a char type, then the right one is called, vs. if you call it with 
the int type. Hovering over the method call reveals immediately the method 
variant that's called. So I see no reason why a char would be need to cast to 
(char). If you want to call an int variant method, then you'll need to cast to 
int, and eclipse won't complain about that.

Switching off compiler warnings in eclipse is your choice ... the Lucene code 
is full of 'hidden' casting because that's how Java works. When you do 'int' * 
1.0, it's cast to double, and people are aware of that ... in fact, they have 
to assign the result to a double, or they'll be forced to cast to anything 
else. When you work w/ integers and the method returns a long, it's cast 
automatically. So if there are few cases where you'd want to alert the user, 
put it in a comment, or like int charint = /*(int)*/ c;

Like I said, it's a styling issue. I'm not going to turn off my warnings in 
eclipse ... and I don't understand what's this 'generified instanceof checks" - 
can you give an example?

> Code cleanup from all sorts of (trivial) warnings
> -
>
> Key: LUCENE-2285
> URL: https://issues.apache.org/jira/browse/LUCENE-2285
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2285.patch, LUCENE-2285.patch
>
>
> I would like to do some code cleanup and remove all sorts of trivial 
> warnings, like unnecessary casts, problems w/ javadocs, unused variables, 
> redundant null checks, unnecessary semicolon etc. These are all very trivial 
> and should not pose any problem.
> I'll create another issue for getting rid of deprecated code usage, like 
> LuceneTestCase and all sorts of deprecated constructors. That's also trivial 
> because it only affects Lucene code, but it's a different type of change.
> Another issue I'd like to create is about introducing more generics in the 
> code, where it's missing today - not changing existing API. There are many 
> places in the code like that.
> So, with you permission, I'll start with the trivial ones first, and then 
> move on to the others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2037) Allow Junit4 tests in our environment.

2010-02-25 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838736#action_12838736
 ] 

Uwe Schindler commented on LUCENE-2037:
---

I just committed this.

One quetsion: In JUnit3, the call to getName() always created a correct 
testName (because JUnit3 took care about the current test running). If I inject 
one bug into a random test using newRandom() that is not using the ctor with 
name param, the additional error message about the random seed simply prints "" 
as test name. In the past this worked. Ideally it should print ot the current 
@Test method name as before.

How to do this? I would like to have this and remove the getName() stuff from 
the class.

> Allow Junit4 tests in our environment.
> --
>
> Key: LUCENE-2037
> URL: https://issues.apache.org/jira/browse/LUCENE-2037
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Affects Versions: 3.1
> Environment: Development
>Reporter: Erick Erickson
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: junit-4.7.jar, LUCENE-2037.patch, LUCENE-2037.patch, 
> LUCENE-2037.patch, LUCENE-2037_remove_testwatchman.patch, 
> LUCENE-2037_revised_2.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Now that we're dropping Java 1.4 compatibility for 3.0, we can incorporate 
> Junit4 in testing. Junit3 and junit4 tests can coexist, so no tests should 
> have to be rewritten. We should start this for the 3.1 release so we can get 
> a clean 3.0 out smoothly.
> It's probably worthwhile to convert a small set of tests as an exemplar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

59 matches

Mail list logo