[jira] Updated: (LUCENE-1567) New flexible query parser

2009-07-29 Thread Adriano Crestani (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano Crestani updated LUCENE-1567:
-

Attachment: lucene_trunk_FlexQueryParser_2009july29_v11.patch

{quote}
The warnings occur because you put links to the new contrib queryparser into 
the core queryparser. That doesn't work as the contribs are not in the 
classpath of the core, so I think we should remove those links and change them 
just to plain text.

Also, please make sure to add to the main build.xml appropriate entries for the 
javadocs, otherwise the "All" javadocs will not contain the contrib QP classes.

There are also some TODOs in the docs; especially in top-level places, such as 
the package.html of your new package, we should not have TODOs in the docs. 
Please fix that soon, 2.9 is coming quickly. 
{quote}

Done!

I also fixed and added a some other javadocs that were missing and renamed 
ConstantScoreRewriteAttribute (and its impl) to MultiTermRewriteMethodAttribute.

I think the only thing remaining is to add a package.html to 
org.apache.queryParser.messages package with a good description about it. Luis 
has a good knowledge about this package, if you have time, can you add this 
file to that package? Thanks :)

> New flexible query parser
> -
>
> Key: LUCENE-1567
> URL: https://issues.apache.org/jira/browse/LUCENE-1567
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: QueryParser
> Environment: N/A
>Reporter: Luis Alves
>Assignee: Michael Busch
> Fix For: 2.9
>
> Attachments: lucene-1567.patch, 
> lucene_1567_adriano_crestani_07_13_2009.patch, 
> lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
> lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
> lucene_trunk_FlexQueryParser_2009july15_v6.patch, 
> lucene_trunk_FlexQueryParser_2009july16_v7.patch, 
> lucene_trunk_FlexQueryParser_2009july23_v8.patch, 
> lucene_trunk_FlexQueryParser_2009july27_v9.patch, 
> lucene_trunk_FlexQueryParser_2009july28_v10.patch, 
> lucene_trunk_FlexQueryParser_2009july29_v11.patch, 
> lucene_trunk_FlexQueryParser_2009March24.patch, 
> lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
> QueryParser_restructure_meetup_june2009_v2.pdf, 
> wiki_switching_to_the_new_query_parser.txt
>
>
> From "New flexible query parser" thread by Micheal Busch
> in my team at IBM we have used a different query parser than Lucene's in
> our products for quite a while. Recently we spent a significant amount
> of time in refactoring the code and designing a very generic
> architecture, so that this query parser can be easily used for different
> products with varying query syntaxes.
> This work was originally driven by Andreas Neumann (who, however, left
> our team); most of the code was written by Luis Alves, who has been a
> bit active in Lucene in the past, and Adriano Campos, who joined our
> team at IBM half a year ago. Adriano is Apache committer and PMC member
> on the Tuscany project and getting familiar with Lucene now too.
> We think this code is much more flexible and extensible than the current
> Lucene query parser, and would therefore like to contribute it to
> Lucene. I'd like to give a very brief architecture overview here,
> Adriano and Luis can then answer more detailed questions as they're much
> more familiar with the code than I am.
> The goal was it to separate syntax and semantics of a query. E.g. 'a AND
> b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
> We distinguish the semantics of the different query components, e.g.
> whether and how to tokenize/lemmatize/normalize the different terms or
> which Query objects to create for the terms. We wanted to be able to
> write a parser with a new syntax, while reusing the underlying
> semantics, as quickly as possible.
> In fact, Adriano is currently working on a 100% Lucene-syntax compatible
> implementation to make it easy for people who are using Lucene's query
> parser to switch.
> The query parser has three layers and its core is what we call the
> QueryNodeTree. It is a tree that initially represents the syntax of the
> original query, e.g. for 'a AND b':
>   AND
>  /   \
> A B
> The three layers are:
> 1. QueryParser
> 2. QueryNodeProcessor
> 3. QueryBuilder
> 1. The upper layer is the parsing layer which simply transforms the
> query text string into a QueryNodeTree. Currently our implementations of
> this layer use javacc.
> 2. The query node processors do most of the work. It is in fact a
> configurable chain of processors. Each processors can walk the tree and
> modify nodes or even the tree's structure. That makes it possible to
> e.g. do query optimization before the query is executed or to tokenize
> terms.
> 3. The third layer is also a configurable chain of builders, whic

[jira] Updated: (LUCENE-1628) Persian Analyzer

2009-07-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1628:


Attachment: LUCENE-1628.patch

add lowercasefilter, consistent with the arabic analyzer, its userfriendly for 
the common case where there is also some english text.


> Persian Analyzer
> 
>
> Key: LUCENE-1628
> URL: https://issues.apache.org/jira/browse/LUCENE-1628
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, 
> LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt
>
>
> A simple persian analyzer.
> i measured trec scores with the benchmark package below against 
> http://ece.ut.ac.ir/DBRG/Hamshahri/ :
> SimpleAnalyzer:
> SUMMARY
>   Search Seconds: 0.012
>   DocName Seconds:0.020
>   Num Points:   981.015
>   Num Good Points:   33.738
>   Max Good Points:   36.185
>   Average Precision:  0.374
>   MRR:0.667
>   Recall: 0.905
>   Precision At 1: 0.585
>   Precision At 2: 0.531
>   Precision At 3: 0.513
>   Precision At 4: 0.496
>   Precision At 5: 0.486
>   Precision At 6: 0.487
>   Precision At 7: 0.479
>   Precision At 8: 0.465
>   Precision At 9: 0.458
>   Precision At 10:0.460
>   Precision At 11:0.453
>   Precision At 12:0.453
>   Precision At 13:0.445
>   Precision At 14:0.438
>   Precision At 15:0.438
>   Precision At 16:0.438
>   Precision At 17:0.429
>   Precision At 18:0.429
>   Precision At 19:0.419
>   Precision At 20:0.415
> PersianAnalyzer:
> SUMMARY
>   Search Seconds: 0.004
>   DocName Seconds:0.011
>   Num Points:   987.692
>   Num Good Points:   36.123
>   Max Good Points:   36.185
>   Average Precision:  0.481
>   MRR:0.833
>   Recall: 0.998
>   Precision At 1: 0.754
>   Precision At 2: 0.715
>   Precision At 3: 0.646
>   Precision At 4: 0.646
>   Precision At 5: 0.631
>   Precision At 6: 0.621
>   Precision At 7: 0.593
>   Precision At 8: 0.577
>   Precision At 9: 0.573
>   Precision At 10:0.566
>   Precision At 11:0.572
>   Precision At 12:0.562
>   Precision At 13:0.554
>   Precision At 14:0.549
>   Precision At 15:0.542
>   Precision At 16:0.538
>   Precision At 17:0.533
>   Precision At 18:0.527
>   Precision At 19:0.525
>   Precision At 20:0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1758) improve arabic analyzer: light8 -> light10

2009-07-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1758:


Attachment: LUCENE-1758.patch

add lowercasefilter, and replace TODO: more tests with some tests.

> improve arabic analyzer: light8 -> light10
> --
>
> Key: LUCENE-1758
> URL: https://issues.apache.org/jira/browse/LUCENE-1758
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1758.patch, LUCENE-1758.patch, LUCENE-1758.txt
>
>
> Someone mentioned on the java user list that the arabic analysis was not as 
> good as they would like.
> This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
> In the light10 paper, this improves precision from .390 to .413
> They mention this is not statistically significant, but it makes linguistic 
> sense and at least has been shown not to hurt.
> In the future, I hope openrelevance will allow us to try some more 
> approaches. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-29 Thread Luis Alves (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736982#action_12736982
 ] 

Luis Alves commented on LUCENE-1567:


Hi Uwe,
{quote}
Will it be possible to specify some type of "schema" for the query parser in 
future, to automatically create NumericRangeQuery for different numeric types? 
It would then be possible to index a numeric value (double,float,long,int) 
using NumericField and then the query parser knows, which type of field this is 
and so it correctly creates a NumericRangeQuery for strings like "[1.567..*]" 
or "(1.787..19.5]". NumericRangeQuery also supports the rewrite modes, only 
some type of schema support is missing.
{quote}

I think this is doable.
I don't think there is a way to extract if a field is numeric from the index, 
so 
the user will have to configure the FieldConfig objects in the ConfigHandler.
But if this is done, it will not be that difficult to implement the rest.

Can you create a new "jira issue" with the description of the feature,
so we can discuss the details there.
I'll try to implement that once we agree on all the details.



> New flexible query parser
> -
>
> Key: LUCENE-1567
> URL: https://issues.apache.org/jira/browse/LUCENE-1567
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: QueryParser
> Environment: N/A
>Reporter: Luis Alves
>Assignee: Michael Busch
> Fix For: 2.9
>
> Attachments: lucene-1567.patch, 
> lucene_1567_adriano_crestani_07_13_2009.patch, 
> lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
> lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
> lucene_trunk_FlexQueryParser_2009july15_v6.patch, 
> lucene_trunk_FlexQueryParser_2009july16_v7.patch, 
> lucene_trunk_FlexQueryParser_2009july23_v8.patch, 
> lucene_trunk_FlexQueryParser_2009july27_v9.patch, 
> lucene_trunk_FlexQueryParser_2009july28_v10.patch, 
> lucene_trunk_FlexQueryParser_2009March24.patch, 
> lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
> QueryParser_restructure_meetup_june2009_v2.pdf, 
> wiki_switching_to_the_new_query_parser.txt
>
>
> From "New flexible query parser" thread by Micheal Busch
> in my team at IBM we have used a different query parser than Lucene's in
> our products for quite a while. Recently we spent a significant amount
> of time in refactoring the code and designing a very generic
> architecture, so that this query parser can be easily used for different
> products with varying query syntaxes.
> This work was originally driven by Andreas Neumann (who, however, left
> our team); most of the code was written by Luis Alves, who has been a
> bit active in Lucene in the past, and Adriano Campos, who joined our
> team at IBM half a year ago. Adriano is Apache committer and PMC member
> on the Tuscany project and getting familiar with Lucene now too.
> We think this code is much more flexible and extensible than the current
> Lucene query parser, and would therefore like to contribute it to
> Lucene. I'd like to give a very brief architecture overview here,
> Adriano and Luis can then answer more detailed questions as they're much
> more familiar with the code than I am.
> The goal was it to separate syntax and semantics of a query. E.g. 'a AND
> b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
> We distinguish the semantics of the different query components, e.g.
> whether and how to tokenize/lemmatize/normalize the different terms or
> which Query objects to create for the terms. We wanted to be able to
> write a parser with a new syntax, while reusing the underlying
> semantics, as quickly as possible.
> In fact, Adriano is currently working on a 100% Lucene-syntax compatible
> implementation to make it easy for people who are using Lucene's query
> parser to switch.
> The query parser has three layers and its core is what we call the
> QueryNodeTree. It is a tree that initially represents the syntax of the
> original query, e.g. for 'a AND b':
>   AND
>  /   \
> A B
> The three layers are:
> 1. QueryParser
> 2. QueryNodeProcessor
> 3. QueryBuilder
> 1. The upper layer is the parsing layer which simply transforms the
> query text string into a QueryNodeTree. Currently our implementations of
> this layer use javacc.
> 2. The query node processors do most of the work. It is in fact a
> configurable chain of processors. Each processors can walk the tree and
> modify nodes or even the tree's structure. That makes it possible to
> e.g. do query optimization before the query is executed or to tokenize
> terms.
> 3. The third layer is also a configurable chain of builders, which
> transform the QueryNodeTree into Lucene Query objects.
> Furthermore the query parser uses flexible configuration objects, which
> are based on AttributeSo

RE: [jira] Commented: (LUCENE-1764) SampleComparable doesn't work well in contrib/remote tests

2009-07-29 Thread Chris Hostetter

: SortField.equals() and hashCode() contain a hint:
: 
:   /** Returns true if o is equal to this.  If a
:*  {...@link SortComparatorSource} (deprecated) or {...@link
:*  FieldCache.Parser} was provided, it must properly
:*  implement equals (unless a singleton is always used). */
: 
: Maybe we should make this more visible, contain all different SortField
: comparator/parsers and place it in the the setter methods for parser and
: comparators.

SortField doesn't seem like the right place at all -- people constructing 
instances of SortField, or calling setter methods of SortField shouldn't 
have to care about this at all -- it's people who extend 
SortComparatorSource or FieldCache.Parser who need to be aware of these 
issues, so shouldn't the class level javadocs for those packages spell it 
out?

(ideally those abstract classes would declare hasCode and equals as 
abstract to *force* people to implement them ... but ship has sailed)




-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736966#action_12736966
 ] 

Mark Miller commented on LUCENE-1486:
-

Okay thanks. I think we should pull it for 2.9.

> Wildcards, ORs etc inside Phrase queries
> 
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.4
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 2.9
>
> Attachments: ComplexPhraseQueryParser.java, 
> junit_complex_phrase_qp_07_21_2009.patch, 
> junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
> field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of 
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in 
> QueryParser itself. This works as a proof of concept  for much of the query 
> parser syntax. Examples from the Junit test include:
>   checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies 
> are OK in phrases
>   checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic 
> works
>   checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic 
> works.
>   
>   checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a 
> phrase is bad
>   checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases 
> is bad
>   checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries 
> inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-29 Thread Luis Alves (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736965#action_12736965
 ] 

Luis Alves commented on LUCENE-1486:


My understanding is that with "New flexible query parser" (LUCENE-1567),
the old QueryParser classes will be deprecated in 2.9
and removed in 3.0 (or moved to contrib in 3.0).

This change will also make ComplexPhraseQueryParser deprecated
because it currently extends the old queryparser.

ComplexPhraseQueryParser was not part of any lucene release
and was only checked in 2 months ago in trunk.

For the reasons above I think we should re-implement this functionality
using the new flexible query parser.

3.0 and 2.9 releases will be very similar 
but 3.0 will have all deprecated APIs removed (at least this is my 
understanding).

In my view the path should be:
- Wait for LUCENE-1567 to be in trunk
- re-implement this feature using the "New flexible query parser"
- and probably do it using a super set of the current syntax with a new 
TextParser.

I'm not sure if I'll have the time to implement a compatible implementation of
ComplexPhraseQueryParser before 2.9 release :(

I'm currently working on 1567 to finalize the patch,
cleaning up javadocs and some small clean up to the APIs.

I'll try to work on ComplexPhraseQueryParser,
once lucene-1567 is in the trunk.

So in my view, ComplexPhraseQueryParser depends on 1567, 
and will require some extra work after 1567 is in the trunk.

I think we have the following, options:
# We could wait until 1567 is in trunk and wait for a compatible implementation 
of ComplexPhraseQueryParser using 1567,
  before we release 2.9. (this would still remove the current 
ComplexPhraseQueryParser class, and provide this features with 
LuceneQueryParserHelper class, or with a new TextParser name complexphrase)
# We can release 2.9 with only 1567, but that will require 
ComplexPhraseQueryParser to be removed from trunk or at least deprecated in 
2.9, and in 3.X re-implement it using the "New flexible query parser" APIs

I hope this helps :)



> Wildcards, ORs etc inside Phrase queries
> 
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.4
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 2.9
>
> Attachments: ComplexPhraseQueryParser.java, 
> junit_complex_phrase_qp_07_21_2009.patch, 
> junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
> field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of 
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in 
> QueryParser itself. This works as a proof of concept  for much of the query 
> parser syntax. Examples from the Junit test include:
>   checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies 
> are OK in phrases
>   checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic 
> works
>   checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic 
> works.
>   
>   checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a 
> phrase is bad
>   checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases 
> is bad
>   checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries 
> inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1695) Update the Highlighter to use the new TokenStream API

2009-07-29 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1695:


Attachment: LUCENE-1695.patch

To trunk

> Update the Highlighter to use the new TokenStream API
> -
>
> Key: LUCENE-1695
> URL: https://issues.apache.org/jira/browse/LUCENE-1695
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
> Attachments: LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch, 
> LUCENE-1695.patch, LUCENE-1695.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-625) Query auto completer

2009-07-29 Thread Karl Wettin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736923#action_12736923
 ] 

Karl Wettin commented on LUCENE-625:


bq. Karl, did you ever proceed on this patch? I'm interested in adding 
autosuggest to Solr.

I used this patch for a few things a couple of years ago. If I recall 
everything right I ended up using the bootstrapped apriori corpus of LUCENE-626 
as training data the last time. Made the corpus rather small, speedy and still 
relevant for most users.

But the major caveat is that this patch is a trie and is thus a "precise 
forward only" thing. So that might not fit all use cases. It might be easier to 
get things going using an index with ngrams of untokenized user queries (i.e. 
including whitespace) or subject-like fields. 

But I really prefere user queries as using only the last n queries will make it 
sensitive to trends. That will however require quite a bit of data to work 
well. A lot as in hundreds of thousands of user queries, according to my 
experience.

Not sure if this was an answer to your question.. : )

> Query auto completer
> 
>
> Key: LUCENE-625
> URL: https://issues.apache.org/jira/browse/LUCENE-625
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Karl Wettin
>Priority: Minor
> Attachments: autocomplete_0.0.1.tar.gz, autocomplete_20060730.tar.gz
>
>
> A trie that helps users to type in their query. Made for AJAX, works great 
> with ruby on rails common scripts . Similar to the 
> Google labs suggester.
> Trained by user queries. Optimizable. Uses an in memory corpus. Serializable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1749) FieldCache introspection API

2009-07-29 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1749:


Attachment: LUCENE-1749.patch

Updates:

* merged in updated ram usage estimator code
* updated most failing tests to work without creating top level FieldCaches
* removed offending calls to explain - I left nocommit comments here - 
depending on what we decide, we could turn off the subreader check for these
* Turned off the subreader check for stress sort test - it sorts in back compat 
mode and compares to the new mode - so it loads both on purpose.
* I don't remember if I touched anything else.

tests pass now

> FieldCache introspection API
> 
>
> Key: LUCENE-1749
> URL: https://issues.apache.org/jira/browse/LUCENE-1749
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Priority: Minor
> Fix For: 2.9
>
> Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, 
> LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, 
> LUCENE-1749.patch
>
>
> FieldCache should expose an Expert level API for runtime introspection of the 
> FieldCache to provide info about what is in the FieldCache at any given 
> moment.  We should also provide utility methods for sanity checking that the 
> FieldCache doesn't contain anything "odd"...
>* entries for the same reader/field with different types/parsers
>* entries for the same field/type/parser in a reader and it's subreader(s)
>* etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-29 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736879#action_12736879
 ] 

Michael Busch commented on LUCENE-1567:
---

{quote}
Could you also please fix the javadocs? When I'm building the javadocs I'm 
getting a lot of warnings about not found references.
{quote}

The warnings occur because you put links to the new contrib queryparser into 
the core queryparser. That doesn't work as the contribs are not in the 
classpath of the core, so I think we should remove those links and change them 
just to plain text.

Also, please make sure to add to the main build.xml appropriate entries for the 
javadocs, otherwise the "All" javadocs will not contain the contrib QP classes.

There are also some TODOs in the docs; especially in top-level places, such as 
the package.html of your new package, we should not have TODOs in the docs. 
Please fix that soon, 2.9 is coming quickly. 

> New flexible query parser
> -
>
> Key: LUCENE-1567
> URL: https://issues.apache.org/jira/browse/LUCENE-1567
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: QueryParser
> Environment: N/A
>Reporter: Luis Alves
>Assignee: Michael Busch
> Fix For: 2.9
>
> Attachments: lucene-1567.patch, 
> lucene_1567_adriano_crestani_07_13_2009.patch, 
> lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
> lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
> lucene_trunk_FlexQueryParser_2009july15_v6.patch, 
> lucene_trunk_FlexQueryParser_2009july16_v7.patch, 
> lucene_trunk_FlexQueryParser_2009july23_v8.patch, 
> lucene_trunk_FlexQueryParser_2009july27_v9.patch, 
> lucene_trunk_FlexQueryParser_2009july28_v10.patch, 
> lucene_trunk_FlexQueryParser_2009March24.patch, 
> lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
> QueryParser_restructure_meetup_june2009_v2.pdf, 
> wiki_switching_to_the_new_query_parser.txt
>
>
> From "New flexible query parser" thread by Micheal Busch
> in my team at IBM we have used a different query parser than Lucene's in
> our products for quite a while. Recently we spent a significant amount
> of time in refactoring the code and designing a very generic
> architecture, so that this query parser can be easily used for different
> products with varying query syntaxes.
> This work was originally driven by Andreas Neumann (who, however, left
> our team); most of the code was written by Luis Alves, who has been a
> bit active in Lucene in the past, and Adriano Campos, who joined our
> team at IBM half a year ago. Adriano is Apache committer and PMC member
> on the Tuscany project and getting familiar with Lucene now too.
> We think this code is much more flexible and extensible than the current
> Lucene query parser, and would therefore like to contribute it to
> Lucene. I'd like to give a very brief architecture overview here,
> Adriano and Luis can then answer more detailed questions as they're much
> more familiar with the code than I am.
> The goal was it to separate syntax and semantics of a query. E.g. 'a AND
> b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
> We distinguish the semantics of the different query components, e.g.
> whether and how to tokenize/lemmatize/normalize the different terms or
> which Query objects to create for the terms. We wanted to be able to
> write a parser with a new syntax, while reusing the underlying
> semantics, as quickly as possible.
> In fact, Adriano is currently working on a 100% Lucene-syntax compatible
> implementation to make it easy for people who are using Lucene's query
> parser to switch.
> The query parser has three layers and its core is what we call the
> QueryNodeTree. It is a tree that initially represents the syntax of the
> original query, e.g. for 'a AND b':
>   AND
>  /   \
> A B
> The three layers are:
> 1. QueryParser
> 2. QueryNodeProcessor
> 3. QueryBuilder
> 1. The upper layer is the parsing layer which simply transforms the
> query text string into a QueryNodeTree. Currently our implementations of
> this layer use javacc.
> 2. The query node processors do most of the work. It is in fact a
> configurable chain of processors. Each processors can walk the tree and
> modify nodes or even the tree's structure. That makes it possible to
> e.g. do query optimization before the query is executed or to tokenize
> terms.
> 3. The third layer is also a configurable chain of builders, which
> transform the QueryNodeTree into Lucene Query objects.
> Furthermore the query parser uses flexible configuration objects, which
> are based on AttributeSource/Attribute. It also uses message classes that
> allow to attach resource bundles. This makes it possible to translate
> messages, which is an important feature of a query parser.
> This design a

[jira] Commented: (LUCENE-625) Query auto completer

2009-07-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736858#action_12736858
 ] 

Jason Rutherglen commented on LUCENE-625:
-

Karl, did you ever proceed on this patch?  I'm interested in adding autosuggest 
to Solr.

> Query auto completer
> 
>
> Key: LUCENE-625
> URL: https://issues.apache.org/jira/browse/LUCENE-625
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Karl Wettin
>Priority: Minor
> Attachments: autocomplete_0.0.1.tar.gz, autocomplete_20060730.tar.gz
>
>
> A trie that helps users to type in their query. Made for AJAX, works great 
> with ruby on rails common scripts . Similar to the 
> Google labs suggester.
> Trained by user queries. Optimizable. Uses an in memory corpus. Serializable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736851#action_12736851
 ] 

Mark Miller commented on LUCENE-1486:
-

If we don't have a clear path for this very soon I think we should pull it from 
this release.

> Wildcards, ORs etc inside Phrase queries
> 
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.4
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 2.9
>
> Attachments: ComplexPhraseQueryParser.java, 
> junit_complex_phrase_qp_07_21_2009.patch, 
> junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
> field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of 
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in 
> QueryParser itself. This works as a proof of concept  for much of the query 
> parser syntax. Examples from the Junit test include:
>   checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies 
> are OK in phrases
>   checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic 
> works
>   checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic 
> works.
>   
>   checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a 
> phrase is bad
>   checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases 
> is bad
>   checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries 
> inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: backwards compat tests

2009-07-29 Thread Uwe Schindler
> > My suggestion was to write the build script in a way that it checks out
> the
> > branch with the same revision number as the current base dir (trunk).
> 
> I think this would work, as long as we always commit top-level and
> back-compat tag in one transaction (commit)?
> 
> (And, even if we don't do it as one commit, the risk that someone
> happens to do a checkout between the two commits is presumably
> negligible).

I think if you first commit in backwards-branch and then in trunk, you never
get an inconsistent state. The trunk revision is lower than the new branch
revision, so nothing changes, as a trunk checkout and test-tag would run the
tests from its current revision (that did not change).

This is the same as now. You can modify the bw-branch and create a new tag,
but as trunks common-build is not updated, nobody would see it.

You only get an inconsistent state if you have run test-tag before and have
a current checkout of the bw-branch. If you then do svn update on the
bw-branch you will update this to last revision. But if you do this, you
will also update trunk (otherwise it would not make sense).

There is only one problem: If you already have checked out the branch with a
specific revision and then update trunk, the next test-run will use the old
tests (as dir already exists, currently it would checkout a new tag because
dir name changed). Because of this, test-tag should also do a svn update to
the current trunk's revision.

> > Alternatively instead of putting a tag name into common-build.xml, it
> could
> > be the revision number. So it would check out ./branches/
> > lucene_2_4_back_compat_tests with the revision given in common-build.
> 
> This would also be better than what we have today (saves the extra
> "svn copy" step), but if we can make the first approach work that's
> even better!

I suggest two variables in common-build.xml:
- backwards-branch or backwards-branch-url (must be changed when 3.0 is out
and 3.1 starts in trunk).
- backwards-revision

The same problem with trunk updated and branch still available also happens
here. So each run of test-tag should do a svn update to the revision from
the config before (maybe give the possibility to switch this off or only
update, never downgrade)


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: backwards compat tests

2009-07-29 Thread Michael McCandless
On Wed, Jul 29, 2009 at 4:31 PM, Uwe Schindler wrote:

> My suggestion was to write the build script in a way that it checks out the
> branch with the same revision number as the current base dir (trunk).

I think this would work, as long as we always commit top-level and
back-compat tag in one transaction (commit)?

(And, even if we don't do it as one commit, the risk that someone
happens to do a checkout between the two commits is presumably
negligible).

> Alternatively instead of putting a tag name into common-build.xml, it could
> be the revision number. So it would check out …/branches/
> lucene_2_4_back_compat_tests with the revision given in common-build.

This would also be better than what we have today (saves the extra
"svn copy" step), but if we can make the first approach work that's
even better!

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
On Wed, Jul 29, 2009 at 4:06 PM, Shai Erera wrote:
>> Or the undelete methods in IndexReader could just acquire the write lock?
>
> I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete
> a document, no? And then I'll need to acquire the write lock, just like any
> other "write" operation done through IndexReader, right?
>
> Or do you suggest we allow this for readOnly IndexReaders too?

Right, you'll definitely need to acquire the write lock for undeleteDoc.

>> That might be too restrictive?
>
> Yes - I pointed that just as a safety measure. However, sometimes
> (especially following the 'agile' guidelines) it's better to develop
> something for a problem we know exist, rather than trying to over-engineer
> for something we 'think might exist'. If a good use case will be presented
> in the future which requires the undelete to work also in readers that did
> not do the delete themselves, we can change that behavior then, no?
>
> Maybe I'll start to work on it and we can decide that as we go? There's no
> point making decisions now, when we don't know if it is a major thing to
> support or not. Maybe it can be supported 'for free', and then it won't be a
> question at all.

I agree!  There's no need to decide now.  So let's defer.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: backwards compat tests

2009-07-29 Thread Uwe Schindler
Thanks for the hint, Shai!

 

It is related, but the reason behind the tags was clear to me.

 

My suggestion was to write the build script in a way that it checks out the
branch with the same revision number as the current base dir (trunk).
Alternatively instead of putting a tag name into common-build.xml, it could
be the revision number. So it would check out ./branches/
lucene_2_4_back_compat_tests with the revision given in common-build.

 

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
eMail: u...@thetaphi.de

  _  

From: Shai Erera [mailto:ser...@gmail.com] 
Sent: Wednesday, July 29, 2009 9:22 PM
To: java-dev@lucene.apache.org
Subject: Re: backwards compat tests

 

Uwe - I asked this question a while ago on LUCENE-1529 and this is an answer
Mike gave:
http://issues.apache.org/jira/browse/LUCENE-1529?focusedCommentId=12699177

&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#act
ion_12699177

I think it's related to what you ask

Shai

On Wed, Jul 29, 2009 at 10:01 PM, Uwe Schindler  wrote:

I do it that way:

 

-  Checkout the backwards branch (not the tag) to
trunk/tags/lucene_2_4_back_compat_tests. I have this checkout everytime
there, I update it regularily together with trunk.

-  Place and leave a build.properties files with the following line
in your trunk dir: "tag=lucene_2_4_back_compat_tests"

-  You can then test using ant test / test-tag and so on, the java
property fixes the tag directory to your branch checkout. The good thing is,
that you always have the last revision of branch and can modify and commit
it directly.

-  If everything is ok, do a tag from your checked out branch (svn
copy .) and then update the main common-build.xml

 

I was always wondering: Why do we need tags for the backwards tests? Why not
just automatically checkout the revision equal to the current trunk revision
for testing (what I did manually)? Currently we always have to create a new
tag after each commit to backwards branch, this is somehow strange (ok, by
that you fix the revision used for testing this trunk checkout, but if you
checkout the same revision no in the backwards branch that trunk currently
has, it would always be correctly related).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
eMail: u...@thetaphi.de

  _  

From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, July 29, 2009 6:24 PM
To: java-dev@lucene.apache.org
Subject: backwards compat tests

 

Is their a wiki page on how to handle updating the back compat tests? I
found some mail regarding it, but most of what I found was older. The latest
I saw talked about the separate branch, and updating that branch with fixes
if you need too - but I see now it seems to work with tags?

 

Do I update the branch, tag it with the current date, then update the build
file to point to the new tag (compatibility.tag)?

-- 
- Mark

http://www.lucidimagination.com

 



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Shai Erera
>
> Or the undelete methods in IndexReader could just acquire the write lock?
>

I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete
a document, no? And then I'll need to acquire the write lock, just like any
other "write" operation done through IndexReader, right?

Or do you suggest we allow this for readOnly IndexReaders too?

That might be too restrictive?
>

Yes - I pointed that just as a safety measure. However, sometimes
(especially following the 'agile' guidelines) it's better to develop
something for a problem we know exist, rather than trying to over-engineer
for something we 'think might exist'. If a good use case will be presented
in the future which requires the undelete to work also in readers that did
not do the delete themselves, we can change that behavior then, no?

Maybe I'll start to work on it and we can decide that as we go? There's no
point making decisions now, when we don't know if it is a major thing to
support or not. Maybe it can be supported 'for free', and then it won't be a
question at all.

Shai

On Wed, Jul 29, 2009 at 10:58 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> undeleteAll doesn't have such a restriction.
>


Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
On Wed, Jul 29, 2009 at 3:05 PM, Shai Erera wrote:
> Yes of course. I meant to create an undeleteDoc variant for every deleteDoc.
> So if IndexWriter has deleteDocuments(Term), I will add
> undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add
> undeleteDocument(int).

OK.

> It is up to the caller to make sure whatever he undeletes was indeed
> deleted, i.e., if you reader.deleteDocument(4) and then
> reader.undeleteDocument(4), you should make sure that 4 represents the same
> document.

Presumably in IndexReader we can return int count (how many deleted),
but in IndexWriter it's void.

> In fact, I think it might be useful to restrict the undeleteDoc methods to
> the same reader instance with which they were deleted? It's easy to do by
> checking if deletedDocs does not contain any of the docs passed to the
> undelete method. The rational is that I believe the best use case for these
> undelete methods to be a mini "undo" of the last delete. Using the same
> reader instance you're guaranteed that the document is still "deleted"
> between delete() and undelete().

That might be too restrictive?  Ie, this is the best use case we can
picture today, but others could come up with different use cases, and
there's no technical reason for such a restriction?

undeleteAll doesn't have such a restriction.

> Also, since I can only open the index for write once, whether by IndexWriter
> or IndexReader w/ readOnly=false, we can guarantee that an undelete followed
> by delete is safe?

Or the undelete methods in IndexReader could just acquire the write lock?

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: backwards compat tests

2009-07-29 Thread Shai Erera
Uwe - I asked this question a while ago on LUCENE-1529 and this is an answer
Mike gave:
http://issues.apache.org/jira/browse/LUCENE-1529?focusedCommentId=12699177&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12699177

I think it's related to what you ask

Shai

On Wed, Jul 29, 2009 at 10:01 PM, Uwe Schindler  wrote:

>  I do it that way:
>
>
>
> -  Checkout the backwards branch (not the tag) to
> trunk/tags/lucene_2_4_back_compat_tests. I have this checkout everytime
> there, I update it regularily together with trunk.
>
> -  Place and leave a build.properties files with the following
> line in your trunk dir: “tag=lucene_2_4_back_compat_tests”
>
> -  You can then test using ant test / test-tag and so on, the java
> property fixes the tag directory to your branch checkout. The good thing is,
> that you always have the last revision of branch and can modify and commit
> it directly.
>
> -  If everything is ok, do a tag from your checked out branch (svn
> copy …) and then update the main common-build.xml
>
>
>
> I was always wondering: Why do we need tags for the backwards tests? Why
> not just automatically checkout the revision equal to the current trunk
> revision for testing (what I did manually)? Currently we always have to
> create a new tag after each commit to backwards branch, this is somehow
> strange (ok, by that you fix the revision used for testing this trunk
> checkout, but if you checkout the same revision no in the backwards branch
> that trunk currently has, it would always be correctly related).
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>   --
>
> *From:* Mark Miller [mailto:markrmil...@gmail.com]
> *Sent:* Wednesday, July 29, 2009 6:24 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* backwards compat tests
>
>
>
> Is their a wiki page on how to handle updating the back compat tests? I
> found some mail regarding it, but most of what I found was older. The latest
> I saw talked about the separate branch, and updating that branch with fixes
> if you need too - but I see now it seems to work with tags?
>
>
>
> Do I update the branch, tag it with the current date, then update the build
> file to point to the new tag (compatibility.tag)?
>
> --
> - Mark
>
> http://www.lucidimagination.com
>


[jira] Closed: (LUCENE-1762) Slightly more readable code in Token/TermAttributeImpl

2009-07-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-1762.
-

Resolution: Fixed

Committed revision: 799025
This is without CHANGES.txt updates, because nothing was changed that is 
visible to the outside :-)

Thanks Eks!

> Slightly more readable code in Token/TermAttributeImpl
> --
>
> Key: LUCENE-1762
> URL: https://issues.apache.org/jira/browse/LUCENE-1762
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Eks Dev
>Assignee: Uwe Schindler
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1762-bw.patch, LUCENE-1762.patch, 
> LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch
>
>
> No big deal. 
> growTermBuffer(int newSize) was using correct, but slightly hard to follow 
> code. 
> the method was returning null as a hint that the current termBuffer has 
> enough space to the upstream code or reallocated buffer.
> this patch simplifies logic   making this method to only reallocate buffer, 
> nothing more.  
> It reduces number of if(null) checks in a few methods and reduces amount of 
> code. 
> all tests pass.
> This also adds tests for the new basic attribute impls (copies of the Token 
> tests).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1762) Slightly more readable code in Token/TermAttributeImpl

2009-07-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1762:
--

Description: 
No big deal. 

growTermBuffer(int newSize) was using correct, but slightly hard to follow 
code. 

the method was returning null as a hint that the current termBuffer has enough 
space to the upstream code or reallocated buffer.

this patch simplifies logic   making this method to only reallocate buffer, 
nothing more.  
It reduces number of if(null) checks in a few methods and reduces amount of 
code. 
all tests pass.

This also adds tests for the new basic attribute impls (copies of the Token 
tests).

  was:
No big deal. 

growTermBuffer(int newSize) was using correct, but slightly hard to follow 
code. 

the method was returning null as a hint that the current termBuffer has enough 
space to the upstream code or reallocated buffer.

this patch simplifies logic   making this method to only reallocate buffer, 
nothing more.  
It reduces number of if(null) checks in a few methods and reduces amount of 
code. 
all tests pass.

Summary: Slightly more readable code in Token/TermAttributeImpl  (was: 
Slightly more readable code in TermAttributeImpl )

> Slightly more readable code in Token/TermAttributeImpl
> --
>
> Key: LUCENE-1762
> URL: https://issues.apache.org/jira/browse/LUCENE-1762
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Eks Dev
>Assignee: Uwe Schindler
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1762-bw.patch, LUCENE-1762.patch, 
> LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch
>
>
> No big deal. 
> growTermBuffer(int newSize) was using correct, but slightly hard to follow 
> code. 
> the method was returning null as a hint that the current termBuffer has 
> enough space to the upstream code or reallocated buffer.
> this patch simplifies logic   making this method to only reallocate buffer, 
> nothing more.  
> It reduces number of if(null) checks in a few methods and reduces amount of 
> code. 
> all tests pass.
> This also adds tests for the new basic attribute impls (copies of the Token 
> tests).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Shai Erera
Yes of course. I meant to create an undeleteDoc variant for every deleteDoc.
So if IndexWriter has deleteDocuments(Term), I will add
undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add
undeleteDocument(int).

It is up to the caller to make sure whatever he undeletes was indeed
deleted, i.e., if you reader.deleteDocument(4) and then
reader.undeleteDocument(4), you should make sure that 4 represents the same
document.

In fact, I think it might be useful to restrict the undeleteDoc methods to
the same reader instance with which they were deleted? It's easy to do by
checking if deletedDocs does not contain any of the docs passed to the
undelete method. The rational is that I believe the best use case for these
undelete methods to be a mini "undo" of the last delete. Using the same
reader instance you're guaranteed that the document is still "deleted"
between delete() and undelete().

Also, since I can only open the index for write once, whether by IndexWriter
or IndexReader w/ readOnly=false, we can guarantee that an undelete followed
by delete is safe?

Shai

On Wed, Jul 29, 2009 at 7:26 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> +1
>
> Though not by docID (since they aren't reliable in context of
> IndexWriter)... and it should be undeleteDocuments (with an "s") since
> it could affect more than one doc.
>
> Mike
>
> On Wed, Jul 29, 2009 at 10:55 AM, Shai Erera wrote:
> > Hi
> >
> > I think such methods are useful for a Lucene app, which needs to rollback
> a
> > single document delete. Today, IndexReader offers undeleteAll(), which is
> a
> > bit extreme. There are two scenarios for this, that I know of:
> > 1) (recently showed up on the user list) I'd like to synchronize
> documents
> > on disk and in the index. So if I have a document in the index which I
> want
> > to delete, and also a file on the file system (corresponds to an ID or
> > something), and the file delete fails, I may want to undelete that
> document.
> > This has alternatives, but still and undeleteDocument will be useful in
> this
> > case.
> >
> > 2) ParallelReader allows one to add a document to two indexes, some
> fields
> > to one index and other to the second index, and then read those indexes
> in
> > parallel. Such applications will need to delete documents sometimes, and
> an
> > undeleteDocument will be useful if a "transactional delete" is needed:
> i.e.,
> > if the first delete succeeds, and the second fails, undo the first
> delete.
> >
> > 3) ParallelReader doesn't support deleteDocument well currently - i.e.,
> if
> > one of the deletes fail, some readers will be left w/ the document and
> some
> > won't (this is I think a bug).
> >
> > What do you think?
> >
> > Shai
> >
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>


RE: backwards compat tests

2009-07-29 Thread Uwe Schindler
I do it that way:

 

-  Checkout the backwards branch (not the tag) to
trunk/tags/lucene_2_4_back_compat_tests. I have this checkout everytime
there, I update it regularily together with trunk.

-  Place and leave a build.properties files with the following line
in your trunk dir: "tag=lucene_2_4_back_compat_tests"

-  You can then test using ant test / test-tag and so on, the java
property fixes the tag directory to your branch checkout. The good thing is,
that you always have the last revision of branch and can modify and commit
it directly.

-  If everything is ok, do a tag from your checked out branch (svn
copy .) and then update the main common-build.xml

 

I was always wondering: Why do we need tags for the backwards tests? Why not
just automatically checkout the revision equal to the current trunk revision
for testing (what I did manually)? Currently we always have to create a new
tag after each commit to backwards branch, this is somehow strange (ok, by
that you fix the revision used for testing this trunk checkout, but if you
checkout the same revision no in the backwards branch that trunk currently
has, it would always be correctly related).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
eMail: u...@thetaphi.de

  _  

From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, July 29, 2009 6:24 PM
To: java-dev@lucene.apache.org
Subject: backwards compat tests

 

Is their a wiki page on how to handle updating the back compat tests? I
found some mail regarding it, but most of what I found was older. The latest
I saw talked about the separate branch, and updating that branch with fixes
if you need too - but I see now it seems to work with tags?

 

Do I update the branch, tag it with the current date, then update the build
file to point to the new tag (compatibility.tag)?

-- 
- Mark

http://www.lucidimagination.com



[jira] Commented: (LUCENE-1749) FieldCache introspection API

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736750#action_12736750
 ] 

Mark Miller commented on LUCENE-1749:
-

bq. And then there is explain - IndexSearcher passes the top level reader to 
the weight explain, and valuesourcequery will get a fieldcache based on that 
reader. I guess that one is a bug.

I don't even know what to do about this one. All I can think is that you pump 
out an explain for each sub reader - but thats pretty unhelpful.

Perhaps the best we can do is javadoc the extra requirements that may be needed 
when you use explain?

> FieldCache introspection API
> 
>
> Key: LUCENE-1749
> URL: https://issues.apache.org/jira/browse/LUCENE-1749
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Priority: Minor
> Fix For: 2.9
>
> Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, 
> LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch
>
>
> FieldCache should expose an Expert level API for runtime introspection of the 
> FieldCache to provide info about what is in the FieldCache at any given 
> moment.  We should also provide utility methods for sanity checking that the 
> FieldCache doesn't contain anything "odd"...
>* entries for the same reader/field with different types/parsers
>* entries for the same field/type/parser in a reader and it's subreader(s)
>* etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1766.


Resolution: Fixed

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, 
> LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1749) FieldCache introspection API

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736732#action_12736732
 ] 

Mark Miller commented on LUCENE-1749:
-

bq. figure out why previously mentioned tests are breaking (need help with this 
one ... don't know enough about the code these tests excercise

Eh - its yucky. There are parts where the tests are passing the top level 
reader (say to a collector) when it should be using the sub readers. I fixed 
one :)
But then there is more - looked at a couple more difficult ones that also pass 
the top level reader for the test.

And then there is explain - IndexSearcher passes the top level reader to the 
weight explain, and valuesourcequery will get a fieldcache based on that 
reader. I guess that one is a bug.

And there are prob a few other similar type things...

> FieldCache introspection API
> 
>
> Key: LUCENE-1749
> URL: https://issues.apache.org/jira/browse/LUCENE-1749
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Priority: Minor
> Fix For: 2.9
>
> Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, 
> LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch
>
>
> FieldCache should expose an Expert level API for runtime introspection of the 
> FieldCache to provide info about what is in the FieldCache at any given 
> moment.  We should also provide utility methods for sanity checking that the 
> FieldCache doesn't contain anything "odd"...
>* entries for the same reader/field with different types/parsers
>* entries for the same field/type/parser in a reader and it's subreader(s)
>* etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1767) Add sizeof to OpenBitSet

2009-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736724#action_12736724
 ] 

Simon Willnauer commented on LUCENE-1767:
-

Jason, I would expect a sizeOf method to return the size of the bitset itself 
(what #size()) returns. Maybe you find another name for that method. I also 
think you can safely leave the constants out - once you leave those out this 
method is almost identical to #capacity / #size.

I'm not sure if such a method would rather confuse users / developers. If we 
add it I would rather go for a very meaningful name like allocatedBytes.

simon

> Add sizeof to OpenBitSet
> 
>
> Key: LUCENE-1767
> URL: https://issues.apache.org/jira/browse/LUCENE-1767
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1767.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage 
> when many OBS' are cached (such as Solr).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1767) Add sizeof to OpenBitSet

2009-07-29 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1767:
-

Attachment: LUCENE-1767.patch

Added sizeOf method

> Add sizeof to OpenBitSet
> 
>
> Key: LUCENE-1767
> URL: https://issues.apache.org/jira/browse/LUCENE-1767
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1767.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage 
> when many OBS' are cached (such as Solr).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1767) Add sizeof to OpenBitSet

2009-07-29 Thread Jason Rutherglen (JIRA)
Add sizeof to OpenBitSet


 Key: LUCENE-1767
 URL: https://issues.apache.org/jira/browse/LUCENE-1767
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 2.9


Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage when 
many OBS' are cached (such as Solr).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1752) incorrect snippet returned with SpanScorer

2009-07-29 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved LUCENE-1752.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Thanks Koji!

> incorrect snippet returned with SpanScorer
> --
>
> Key: LUCENE-1752
> URL: https://issues.apache.org/jira/browse/LUCENE-1752
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Affects Versions: 2.9
>Reporter: Koji Sekiguchi
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1752.patch
>
>
> This problem was reported by my customer. They are using Solr 1.3 and 
> uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer.
> {panel:title=Query}
> (f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")
> {panel}
> The snippet we expected is:
> {panel}
> x y z a b c d e f g b c g
> {panel}
> but we got:
> {panel}
> x y z a b c d e f g b c g
> {panel}
> Program to reproduce the problem:
> {code}
> public class TestHighlighter {
>   static final String CONTENT = "x y z a b c d e f g b c g";
>   static final String PH1 = "\"a b c d\"";
>   static final String PH2 = "\"b c g\"";
>   static final String F1 = "f1";
>   static final String F2 = "f2";
>   static final String F1C = F1 + ":";
>   static final String F2C = F2 + ":";
>   static final String QUERY_STRING =
> "(" + F1C + PH1 + " OR " + F2C + PH1 + ") AND ("
> + F1C + PH2 + " OR " + F2C + PH2 + ")";
>   static Analyzer analyzer = new WhitespaceAnalyzer();
>   
>   public static void main(String[] args) throws Exception {
> QueryParser qp = new QueryParser( F1, analyzer );
> Query query = qp.parse( QUERY_STRING );
> CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( 
> F1, new StringReader( CONTENT ) ) );
> Scorer scorer = new SpanScorer( query, F1, stream, false );
> Highlighter h = new Highlighter( scorer );
> System.out.println( "query : " + QUERY_STRING );
> System.out.println( h.getBestFragment( analyzer, F1,  CONTENT ) );
>   }
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736696#action_12736696
 ] 

Simon Willnauer commented on LUCENE-1766:
-

looks good. private final Object is rather a general best practice than 
something lucene or module specific.

simon

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, 
> LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: backwards compat tests

2009-07-29 Thread Michael McCandless
I think it's not documented anywhere... roughly these are the steps:

  * Make mods to tags/lucene_2_4_.../* so "ant test-tag" passes

  * Use "svn switch" to switch that tags checkout from a "tag" to the
2_4 back compat branch

  * Commit from that dir & plant a new tag

  * Update common-build.xml to point to the new tag

  * Maybe run "ant test-tag" again and confirm everything passes

  * Commit at the top level

Mike

On Wed, Jul 29, 2009 at 12:23 PM, Mark Miller wrote:
> Is their a wiki page on how to handle updating the back compat tests? I
> found some mail regarding it, but most of what I found was older. The latest
> I saw talked about the separate branch, and updating that branch with fixes
> if you need too - but I see now it seems to work with tags?
> Do I update the branch, tag it with the current date, then update the build
> file to point to the new tag (compatibility.tag)?
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Michael McCandless
+1

Though not by docID (since they aren't reliable in context of
IndexWriter)... and it should be undeleteDocuments (with an "s") since
it could affect more than one doc.

Mike

On Wed, Jul 29, 2009 at 10:55 AM, Shai Erera wrote:
> Hi
>
> I think such methods are useful for a Lucene app, which needs to rollback a
> single document delete. Today, IndexReader offers undeleteAll(), which is a
> bit extreme. There are two scenarios for this, that I know of:
> 1) (recently showed up on the user list) I'd like to synchronize documents
> on disk and in the index. So if I have a document in the index which I want
> to delete, and also a file on the file system (corresponds to an ID or
> something), and the file delete fails, I may want to undelete that document.
> This has alternatives, but still and undeleteDocument will be useful in this
> case.
>
> 2) ParallelReader allows one to add a document to two indexes, some fields
> to one index and other to the second index, and then read those indexes in
> parallel. Such applications will need to delete documents sometimes, and an
> undeleteDocument will be useful if a "transactional delete" is needed: i.e.,
> if the first delete succeeds, and the second fails, undo the first delete.
>
> 3) ParallelReader doesn't support deleteDocument well currently - i.e., if
> one of the deletes fail, some readers will be left w/ the document and some
> won't (this is I think a bug).
>
> What do you think?
>
> Shai
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



backwards compat tests

2009-07-29 Thread Mark Miller
Is their a wiki page on how to handle updating the back compat tests? I
found some mail regarding it, but most of what I found was older. The latest
I saw talked about the separate branch, and updating that branch with fixes
if you need too - but I see now it seems to work with tags?
Do I update the branch, tag it with the current date, then update the build
file to point to the new tag (compatibility.tag)?

-- 
- Mark

http://www.lucidimagination.com


[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1766:
---

Attachment: LUCENE-1766.patch

OK another rev!

I backed away from giving particulars on how should synchronize and just said 
generically "use your own (non-Lucene) objects instead".

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, 
> LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1748) getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736674#action_12736674
 ] 

Mark Miller commented on LUCENE-1748:
-

This is going to require a patch to the 2.4 back compat branch to pass tests.

> getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract
> --
>
> Key: LUCENE-1748
> URL: https://issues.apache.org/jira/browse/LUCENE-1748
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Query/Scoring
>Affects Versions: 2.4, 2.4.1
> Environment: all
>Reporter: Hugh Cayless
>Assignee: Mark Miller
> Fix For: 2.9, 3.0, 3.1
>
> Attachments: LUCENE-1748.patch
>
>
> I just spent a long time tracking down a bug resulting from upgrading to 
> Lucene 2.4.1 on a project that implements some SpanQuerys of its own and was 
> written against 2.3.  Since the project's SpanQuerys didn't implement 
> getPayloadSpans, the call to that method went to SpanQuery.getPayloadSpans 
> which returned null and caused a NullPointerException in the Lucene code, far 
> away from the actual source of the problem.  
> It would be much better for this kind of thing to show up at compile time, I 
> think.
> Thanks!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736662#action_12736662
 ] 

Uwe Schindler commented on LUCENE-1567:
---

Just a question: Will it be possible to specify some type of "schema" for the 
query parser in future, to automatically create NumericRangeQuery for different 
numeric types? It would then be possible to index a numeric value 
(double,float,long,int) using NumericField and then the query parser knows, 
which type of field this is and so it correctly creates a NumericRangeQuery for 
strings like "[1.567..*]" or "(1.787..19.5]". NumericRangeQuery also supports 
the rewrite modes, only some type of schema support is missing.

I ask this, because someone asked on java-user for such a feature in query 
parser.

> New flexible query parser
> -
>
> Key: LUCENE-1567
> URL: https://issues.apache.org/jira/browse/LUCENE-1567
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: QueryParser
> Environment: N/A
>Reporter: Luis Alves
>Assignee: Michael Busch
> Fix For: 2.9
>
> Attachments: lucene-1567.patch, 
> lucene_1567_adriano_crestani_07_13_2009.patch, 
> lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
> lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
> lucene_trunk_FlexQueryParser_2009july15_v6.patch, 
> lucene_trunk_FlexQueryParser_2009july16_v7.patch, 
> lucene_trunk_FlexQueryParser_2009july23_v8.patch, 
> lucene_trunk_FlexQueryParser_2009july27_v9.patch, 
> lucene_trunk_FlexQueryParser_2009july28_v10.patch, 
> lucene_trunk_FlexQueryParser_2009March24.patch, 
> lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
> QueryParser_restructure_meetup_june2009_v2.pdf, 
> wiki_switching_to_the_new_query_parser.txt
>
>
> From "New flexible query parser" thread by Micheal Busch
> in my team at IBM we have used a different query parser than Lucene's in
> our products for quite a while. Recently we spent a significant amount
> of time in refactoring the code and designing a very generic
> architecture, so that this query parser can be easily used for different
> products with varying query syntaxes.
> This work was originally driven by Andreas Neumann (who, however, left
> our team); most of the code was written by Luis Alves, who has been a
> bit active in Lucene in the past, and Adriano Campos, who joined our
> team at IBM half a year ago. Adriano is Apache committer and PMC member
> on the Tuscany project and getting familiar with Lucene now too.
> We think this code is much more flexible and extensible than the current
> Lucene query parser, and would therefore like to contribute it to
> Lucene. I'd like to give a very brief architecture overview here,
> Adriano and Luis can then answer more detailed questions as they're much
> more familiar with the code than I am.
> The goal was it to separate syntax and semantics of a query. E.g. 'a AND
> b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
> We distinguish the semantics of the different query components, e.g.
> whether and how to tokenize/lemmatize/normalize the different terms or
> which Query objects to create for the terms. We wanted to be able to
> write a parser with a new syntax, while reusing the underlying
> semantics, as quickly as possible.
> In fact, Adriano is currently working on a 100% Lucene-syntax compatible
> implementation to make it easy for people who are using Lucene's query
> parser to switch.
> The query parser has three layers and its core is what we call the
> QueryNodeTree. It is a tree that initially represents the syntax of the
> original query, e.g. for 'a AND b':
>   AND
>  /   \
> A B
> The three layers are:
> 1. QueryParser
> 2. QueryNodeProcessor
> 3. QueryBuilder
> 1. The upper layer is the parsing layer which simply transforms the
> query text string into a QueryNodeTree. Currently our implementations of
> this layer use javacc.
> 2. The query node processors do most of the work. It is in fact a
> configurable chain of processors. Each processors can walk the tree and
> modify nodes or even the tree's structure. That makes it possible to
> e.g. do query optimization before the query is executed or to tokenize
> terms.
> 3. The third layer is also a configurable chain of builders, which
> transform the QueryNodeTree into Lucene Query objects.
> Furthermore the query parser uses flexible configuration objects, which
> are based on AttributeSource/Attribute. It also uses message classes that
> allow to attach resource bundles. This makes it possible to translate
> messages, which is an important feature of a query parser.
> This design allows us to develop different query syntaxes very quickly.
> Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
> underlying pro

[jira] Updated: (LUCENE-1749) FieldCache introspection API

2009-07-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1749:
-

Attachment: LUCENE-1749.patch

checkpoint: refactored the sanity checking code into a utility class and wrote 
tests specifically for it to prove it finds insane stuff.

TODO:
* clean up the api, make it less clunky (and not static)
** return structured data showing exactly which combinations in FieldCache are 
insane
* javadocs
* figure out why previously mentioned tests are breaking (need help with this 
one ... don't know enough about the code these tests excercise)

> FieldCache introspection API
> 
>
> Key: LUCENE-1749
> URL: https://issues.apache.org/jira/browse/LUCENE-1749
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Priority: Minor
> Fix For: 2.9
>
> Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, 
> LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch
>
>
> FieldCache should expose an Expert level API for runtime introspection of the 
> FieldCache to provide info about what is in the FieldCache at any given 
> moment.  We should also provide utility methods for sanity checking that the 
> FieldCache doesn't contain anything "odd"...
>* entries for the same reader/field with different types/parsers
>* entries for the same field/type/parser in a reader and it's subreader(s)
>* etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction

2009-07-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1763:
---

Attachment: LUCENE-1763.patch

Adds a ctor w/ IndexWriter to MergePolicy, LogMergePolicy, and its extensions.
Fixed tests and IndexWriter code
Fixed tags

All tests pass

> MergePolicy should require an IndexWriter upon construction
> ---
>
> Key: LUCENE-1763
> URL: https://issues.apache.org/jira/browse/LUCENE-1763
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1763.patch
>
>
> MergePolicy does not require an IW upon construction, but requires one to be 
> passed as method arg to various methods. This gives the impression as if a 
> single MP instance can be shared across various IW instances, which is not 
> true for all MPs (if at all). In addition, LogMergePolicy uses the IW 
> instance passed to these methods incosistently, and is currently exposed to 
> potential NPEs.
> This issue will change MP to require an IW instance, however for back-compat 
> reasons the following changes will be made:
> # A new MP ctor w/ IW as arg will be introduced. Additionally, for 
> back-compat a default ctor will also be declared which will assign null to 
> the member IW.
> # Methods that require IW will be deprecated, and new ones will be declared.
> #* For back-compat, the new ones will not be made abstract, but will throw 
> UOE, with a comment that they will become abstract in 3.0.
> # All current MP impls will move to use the member instance.
> # The code which calls MP methods will continue to use the deprecated 
> methods, passing an IW even that it won't be necessary --> this is strictly 
> for back-compat.
> In 3.0, we'll remove the deprecated default ctor and methods, and change the 
> code to not call the IW method variants anymore.
> I hope that I didn't leave anything out. I'm sure I'll find out when I work 
> on the patch :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)

2009-07-29 Thread Shai Erera
Hi

I think such methods are useful for a Lucene app, which needs to rollback a
single document delete. Today, IndexReader offers undeleteAll(), which is a
bit extreme. There are two scenarios for this, that I know of:
1) (recently showed up on the user list) I'd like to synchronize documents
on disk and in the index. So if I have a document in the index which I want
to delete, and also a file on the file system (corresponds to an ID or
something), and the file delete fails, I may want to undelete that document.
This has alternatives, but still and undeleteDocument will be useful in this
case.

2) ParallelReader allows one to add a document to two indexes, some fields
to one index and other to the second index, and then read those indexes in
parallel. Such applications will need to delete documents sometimes, and an
undeleteDocument will be useful if a "transactional delete" is needed: i.e.,
if the first delete succeeds, and the second fails, undo the first delete.

3) ParallelReader doesn't support deleteDocument well currently - i.e., if
one of the deletes fail, some readers will be left w/ the document and some
won't (this is I think a bug).

What do you think?

Shai


[jira] Commented: (LUCENE-1695) Update the Highlighter to use the new TokenStream API

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736646#action_12736646
 ] 

Mark Miller commented on LUCENE-1695:
-

So without further objection, I'm going to commit this so that I can finish the 
'make spanscorer the default' issue.

> Update the Highlighter to use the new TokenStream API
> -
>
> Key: LUCENE-1695
> URL: https://issues.apache.org/jira/browse/LUCENE-1695
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
> Attachments: LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch, 
> LUCENE-1695.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction

2009-07-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736643#action_12736643
 ] 

Michael McCandless commented on LUCENE-1763:


I think subclassing LMP is also extremely advanced, ie, it's OK to make an 
exception to our back-compat policy.

> MergePolicy should require an IndexWriter upon construction
> ---
>
> Key: LUCENE-1763
> URL: https://issues.apache.org/jira/browse/LUCENE-1763
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
>
> MergePolicy does not require an IW upon construction, but requires one to be 
> passed as method arg to various methods. This gives the impression as if a 
> single MP instance can be shared across various IW instances, which is not 
> true for all MPs (if at all). In addition, LogMergePolicy uses the IW 
> instance passed to these methods incosistently, and is currently exposed to 
> potential NPEs.
> This issue will change MP to require an IW instance, however for back-compat 
> reasons the following changes will be made:
> # A new MP ctor w/ IW as arg will be introduced. Additionally, for 
> back-compat a default ctor will also be declared which will assign null to 
> the member IW.
> # Methods that require IW will be deprecated, and new ones will be declared.
> #* For back-compat, the new ones will not be made abstract, but will throw 
> UOE, with a comment that they will become abstract in 3.0.
> # All current MP impls will move to use the member instance.
> # The code which calls MP methods will continue to use the deprecated 
> methods, passing an IW even that it won't be necessary --> this is strictly 
> for back-compat.
> In 3.0, we'll remove the deprecated default ctor and methods, and change the 
> code to not call the IW method variants anymore.
> I hope that I didn't leave anything out. I'm sure I'll find out when I work 
> on the patch :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API

2009-07-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736634#action_12736634
 ] 

Robert Muir commented on LUCENE-1460:
-

Michael, sorry to leave it incomplete, I think I am not the best for the 
remaining ones.

For example I am a little intimidated by things such as this note in 
ShingleMatrix: 
{code}
  * This method exists in order to avoid reursive calls to the method
  * as the complexity of a fairlt small matrix then easily would require
  * a gigabyte sized stack per thread.
{code}


> Change all contrib TokenStreams/Filters to use the new TokenStream API
> --
>
> Key: LUCENE-1460
> URL: https://issues.apache.org/jira/browse/LUCENE-1460
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1460.patch, lucene-1460.patch, lucene-1460.patch, 
> lucene-1460.patch, LUCENE-1460_contrib_partial.txt, 
> LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, 
> LUCENE-1460_core.txt, LUCENE-1460_partial.txt
>
>
> Now that we have the new TokenStream API (LUCENE-1422) we should change all 
> contrib modules to use it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1752) incorrect snippet returned with SpanScorer

2009-07-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736629#action_12736629
 ] 

Mark Miller commented on LUCENE-1752:
-

Thanks Koji - I had forgotten about this one. I'll commit it in a bit.

> incorrect snippet returned with SpanScorer
> --
>
> Key: LUCENE-1752
> URL: https://issues.apache.org/jira/browse/LUCENE-1752
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Affects Versions: 2.9
>Reporter: Koji Sekiguchi
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1752.patch
>
>
> This problem was reported by my customer. They are using Solr 1.3 and 
> uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer.
> {panel:title=Query}
> (f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")
> {panel}
> The snippet we expected is:
> {panel}
> x y z a b c d e f g b c g
> {panel}
> but we got:
> {panel}
> x y z a b c d e f g b c g
> {panel}
> Program to reproduce the problem:
> {code}
> public class TestHighlighter {
>   static final String CONTENT = "x y z a b c d e f g b c g";
>   static final String PH1 = "\"a b c d\"";
>   static final String PH2 = "\"b c g\"";
>   static final String F1 = "f1";
>   static final String F2 = "f2";
>   static final String F1C = F1 + ":";
>   static final String F2C = F2 + ":";
>   static final String QUERY_STRING =
> "(" + F1C + PH1 + " OR " + F2C + PH1 + ") AND ("
> + F1C + PH2 + " OR " + F2C + PH2 + ")";
>   static Analyzer analyzer = new WhitespaceAnalyzer();
>   
>   public static void main(String[] args) throws Exception {
> QueryParser qp = new QueryParser( F1, analyzer );
> Query query = qp.parse( QUERY_STRING );
> CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( 
> F1, new StringReader( CONTENT ) ) );
> Scorer scorer = new SpanScorer( query, F1, stream, false );
> Highlighter h = new Highlighter( scorer );
> System.out.println( "query : " + QUERY_STRING );
> System.out.println( h.getBestFragment( analyzer, F1,  CONTENT ) );
>   }
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1766:


Attachment: LUCENE-1766.patch

Added small but important fact about the synchronization Object.
Everything else looks good to me!

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, 
> LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1752) incorrect snippet returned with SpanScorer

2009-07-29 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-1752:
---

Fix Version/s: 2.9

I'd like set 2.9. With the patch, highlighter works on our production 
environment perfectly.

> incorrect snippet returned with SpanScorer
> --
>
> Key: LUCENE-1752
> URL: https://issues.apache.org/jira/browse/LUCENE-1752
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Affects Versions: 2.9
>Reporter: Koji Sekiguchi
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1752.patch
>
>
> This problem was reported by my customer. They are using Solr 1.3 and 
> uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer.
> {panel:title=Query}
> (f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")
> {panel}
> The snippet we expected is:
> {panel}
> x y z a b c d e f g b c g
> {panel}
> but we got:
> {panel}
> x y z a b c d e f g b c g
> {panel}
> Program to reproduce the problem:
> {code}
> public class TestHighlighter {
>   static final String CONTENT = "x y z a b c d e f g b c g";
>   static final String PH1 = "\"a b c d\"";
>   static final String PH2 = "\"b c g\"";
>   static final String F1 = "f1";
>   static final String F2 = "f2";
>   static final String F1C = F1 + ":";
>   static final String F2C = F2 + ":";
>   static final String QUERY_STRING =
> "(" + F1C + PH1 + " OR " + F2C + PH1 + ") AND ("
> + F1C + PH2 + " OR " + F2C + PH2 + ")";
>   static Analyzer analyzer = new WhitespaceAnalyzer();
>   
>   public static void main(String[] args) throws Exception {
> QueryParser qp = new QueryParser( F1, analyzer );
> Query query = qp.parse( QUERY_STRING );
> CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( 
> F1, new StringReader( CONTENT ) ) );
> Scorer scorer = new SpanScorer( query, F1, stream, false );
> Highlighter h = new Highlighter( scorer );
> System.out.println( "query : " + QUERY_STRING );
> System.out.println( h.getBestFragment( analyzer, F1,  CONTENT ) );
>   }
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction

2009-07-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736617#action_12736617
 ] 

Shai Erera commented on LUCENE-1763:


I don't mind doing that ... but note that LMP's methods are public (it 
overrides and declare them public) and so I was thinking that someone could 
potentially have written his own LMP (no one can write their own MP today). But 
if you're fine w/ me doing that, it's fine by me as well.

BTW - I don't need to come up w/ new names after all, since by just adding the 
same method, w/o the IW arg changes its signature. But I agree that having just 
the right form makes more sense.

> MergePolicy should require an IndexWriter upon construction
> ---
>
> Key: LUCENE-1763
> URL: https://issues.apache.org/jira/browse/LUCENE-1763
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
>
> MergePolicy does not require an IW upon construction, but requires one to be 
> passed as method arg to various methods. This gives the impression as if a 
> single MP instance can be shared across various IW instances, which is not 
> true for all MPs (if at all). In addition, LogMergePolicy uses the IW 
> instance passed to these methods incosistently, and is currently exposed to 
> potential NPEs.
> This issue will change MP to require an IW instance, however for back-compat 
> reasons the following changes will be made:
> # A new MP ctor w/ IW as arg will be introduced. Additionally, for 
> back-compat a default ctor will also be declared which will assign null to 
> the member IW.
> # Methods that require IW will be deprecated, and new ones will be declared.
> #* For back-compat, the new ones will not be made abstract, but will throw 
> UOE, with a comment that they will become abstract in 3.0.
> # All current MP impls will move to use the member instance.
> # The code which calls MP methods will continue to use the deprecated 
> methods, passing an IW even that it won't be necessary --> this is strictly 
> for back-compat.
> In 3.0, we'll remove the deprecated default ctor and methods, and change the 
> code to not call the IW method variants anymore.
> I hope that I didn't leave anything out. I'm sure I'll find out when I work 
> on the patch :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1766:
---

Attachment: LUCENE-1766.patch

IndexReader & IndexSearcher as well.

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction

2009-07-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736614#action_12736614
 ] 

Michael McCandless commented on LUCENE-1763:


How about we:

  * Simply change the methods.  Yes it's technically a break in back-compat, 
but since they are package private, and so advanced (I think very few people 
have customized their merge policy/scheduler), a compile time error on upgrade 
seems fine.

  * Make the APIs public (perhaps add a unit test, outside of oal.index 
package, asserting that all that's required is in fact public)

  * Mark the APIs as subject to change.

> MergePolicy should require an IndexWriter upon construction
> ---
>
> Key: LUCENE-1763
> URL: https://issues.apache.org/jira/browse/LUCENE-1763
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
>
> MergePolicy does not require an IW upon construction, but requires one to be 
> passed as method arg to various methods. This gives the impression as if a 
> single MP instance can be shared across various IW instances, which is not 
> true for all MPs (if at all). In addition, LogMergePolicy uses the IW 
> instance passed to these methods incosistently, and is currently exposed to 
> potential NPEs.
> This issue will change MP to require an IW instance, however for back-compat 
> reasons the following changes will be made:
> # A new MP ctor w/ IW as arg will be introduced. Additionally, for 
> back-compat a default ctor will also be declared which will assign null to 
> the member IW.
> # Methods that require IW will be deprecated, and new ones will be declared.
> #* For back-compat, the new ones will not be made abstract, but will throw 
> UOE, with a comment that they will become abstract in 3.0.
> # All current MP impls will move to use the member instance.
> # The code which calls MP methods will continue to use the deprecated 
> methods, passing an IW even that it won't be necessary --> this is strictly 
> for back-compat.
> In 3.0, we'll remove the deprecated default ctor and methods, and change the 
> code to not call the IW method variants anymore.
> I hope that I didn't leave anything out. I'm sure I'll find out when I work 
> on the patch :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-1766:



I'll add to IndexReader & IndexSearcher as well.

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736557#action_12736557
 ] 

Simon Willnauer commented on LUCENE-1766:
-

We don't afaik.

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1766.


Resolution: Fixed

OK thanks Simon!

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736556#action_12736556
 ] 

Uwe Schindler commented on LUCENE-1766:
---

By the way: Do we have a TS note for IndexReader?

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736551#action_12736551
 ] 

Simon Willnauer commented on LUCENE-1766:
-

looks good to me.

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1758) improve arabic analyzer: light8 -> light10

2009-07-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736548#action_12736548
 ] 

Michael McCandless commented on LUCENE-1758:


bq. perhaps both this and LUCENE-1628 should include LowerCaseFilter.

That seems reasonable?

> improve arabic analyzer: light8 -> light10
> --
>
> Key: LUCENE-1758
> URL: https://issues.apache.org/jira/browse/LUCENE-1758
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1758.patch, LUCENE-1758.txt
>
>
> Someone mentioned on the java user list that the arabic analysis was not as 
> good as they would like.
> This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
> In the light10 paper, this improves precision from .390 to .413
> They mention this is not statistically significant, but it makes linguistic 
> sense and at least has been shown not to hurt.
> In the future, I hope openrelevance will allow us to try some more 
> approaches. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1766:
---

Attachment: LUCENE-1766.patch

Tweaked the wording... Simon if this looks OK to you I'll commit shortly!

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch, LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1766:
--

Assignee: Michael McCandless

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1766:


Attachment: LUCENE-1766.patch

> Add Thread-Safety note to IndexWriter JavaDoc
> -
>
> Key: LUCENE-1766
> URL: https://issues.apache.org/jira/browse/LUCENE-1766
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1766.patch
>
>
> IndexWriter Javadocs should contain a note about thread-safety. This is 
> already mentioned on the wiki FAQ page but such an essential information 
> should be part of the module documentation too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc

2009-07-29 Thread Simon Willnauer (JIRA)
Add Thread-Safety note to IndexWriter JavaDoc
-

 Key: LUCENE-1766
 URL: https://issues.apache.org/jira/browse/LUCENE-1766
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 2.9


IndexWriter Javadocs should contain a note about thread-safety. This is already 
mentioned on the wiki FAQ page but such an essential information should be part 
of the module documentation too.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #902

2009-07-29 Thread Michael McCandless
I'm guessing it was the empty source file I accidentally left in for
LUCENE-1754, that Hoss removed (thanks!). I think clover saw that as
an attempt to instrument a source in the empty-string package.

I'm unfamiliar w/ how to configure clover, but I agree we should make
sure it's testing coverage for our "normal" unit tests.  Rather than
turn it off for test-tag, can we measure coverage of all tests
(test-tag, test-core, test-contrib)?

Is there someone familiar w/ clover who can look into this?

Mike

On Wed, Jul 29, 2009 at 3:10 AM, Uwe Schindler wrote:
> This seems to be fixed now. But there is something completely wrong with
> clover:
>
> If you look into the clover reports, there are a lot of classes having 0%
> code coverage, but there are tests available (e.g. my new NumericRange
> things). Also *all* contribs have 0%.
>
> After thinking a little bit about it, it seems, that the cloverage report is
> build not from the normal test-run, but it is generated from the results of
> the test-tag. This explains, why NumericRange and Spatial seem to have no
> tests for clover.
>
> Does anybody know, how to fix this. Maybe the cloverage should be disabled
> for the test run in test-tag? What can be changed in build.xml to do this?
>
> I have no clover installed locally, so I cannot try this out.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>> -Original Message-
>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Sent: Tuesday, July 28, 2009 12:13 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: Build failed in Hudson: Lucene-trunk #902
>>
>> Hmm... the build looks like it failed because of some odd clover
>> licensing issue:
>>
>>    [clover] Sorry, you are not licensed to instrument files in the package
>> ''.
>>
>> Anyone have any ideas?
>>
>> Mike
>>
>> On Mon, Jul 27, 2009 at 11:26 PM, Apache Hudson
>> Server wrote:
>> > See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/902/changes
>> >
>> > Changes:
>> >
>> > [uschindler] LUCENE-1754: JavaDoc updates
>> >
>> > [mikemccand] LUCENE-1754: EMPTY_DOCIDSET subclasses DocIdSet directly
>> >
>> > [mikemccand] LUCENE-1754: just use EMPTY_DOCIDSET.iterator() instead of
>> new EmptyDocIdSetIterator
>> >
>> > [mikemccand] LUCENE-1595: don't use SortField.AUTO; deprecate
>> LineDocMaker & EnwikiDocMaker
>> >
>> > [mikemccand] LUCENE-1754: add EmptyDocIdSetIterator
>> >
>> > [mikemccand] LUCENE-1754: update back-compat test
>> >
>> > [mikemccand] LUCENE-1754: BooleanQuery detects up front if it won't
>> match any docs and returns null from its scorer() instead of
>> NonMatchingScorer
>> >
>> > --
>> > [...truncated 21062 lines...]
>> >  [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property
>> >  [javadoc] 1 error
>> >  [javadoc] 32 warnings
>> >      [jar] Building jar:
>> http://hudson.zones.apache.org/hudson/job/Lucene-
>> trunk/ws/trunk/build/contrib/spatial/lucene-spatial-2009-07-28_02-04-46-
>> javadoc.jar
>> >     [echo] Building spellchecker...
>> >
>> > javadocs:
>> >  [javadoc] Generating Javadoc
>> >  [javadoc] Javadoc execution
>> >  [javadoc] Loading source files for package
>> org.apache.lucene.search.spell...
>> >  [javadoc] Constructing Javadoc information...
>> >  [javadoc] javadoc: warning - Error reading file:
>> http://hudson.zones.apache.org/hudson/job/Lucene-
>> trunk/ws/trunk/build/docs/api/contrib-spellchecker/../package-list
>> >  [javadoc] Standard Doclet version 1.5.0_14
>> >  [javadoc] Building tree for all the packages and classes...
>> >  [javadoc] Building index for all the packages and classes...
>> >  [javadoc] Building index for all classes...
>> >  [javadoc] javadoc: error - Error while reading file
>> http://hudson.zones.apache.org/hudson/job/Lucene-
>> trunk/ws/trunk/contrib/spellchecker/src/java/overview.html
>> >  [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-
>> trunk/ws/trunk/build/docs/api/contrib-spellchecker/stylesheet.css...
>> >  [javadoc] Note: Custom tags that could override future standard tags:
>> �...@todo. To avoid potential overrides, use at least one period character
>> (.) in custom tag names.
>> >  [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property
>> >  [javadoc] 1 error
>> >  [javadoc] 1 warning
>> >      [jar] Building jar:
>> http://hudson.zones.apache.org/hudson/job/Lucene-
>> trunk/ws/trunk/build/contrib/spellchecker/lucene-spellchecker-2009-07-
>> 28_02-04-46-javadoc.jar
>> >     [echo] Building surround...
>> >
>> > javadocs:
>> >  [javadoc] Generating Javadoc
>> >  [javadoc] Javadoc execution
>> >  [javadoc] Loading source files for package
>> org.apache.lucene.queryParser.surround.parser...
>> >  [javadoc] Loading source files for package
>> org.apache.lucene.queryParser.surround.query...
>> >  [javadoc] Constructing Javadoc information...
>> >  [javadoc] javadoc: warning - Error reading file:

[jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types

2009-07-29 Thread Richard Marr (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736525#action_12736525
 ] 

Richard Marr commented on LUCENE-1690:
--

There's also another problem I've just noticed. Please ignore the latest patch.

> Morelikethis queries are very slow compared to other search types
> -
>
> Key: LUCENE-1690
> URL: https://issues.apache.org/jira/browse/LUCENE-1690
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.4.1
>Reporter: Richard Marr
>Priority: Minor
> Attachments: LruCache.patch, LUCENE-1690.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The MoreLikeThis object performs term frequency lookups for every query.  
> From my testing that's what seems to take up the majority of time for 
> MoreLikeThis searches.  
> For some (I'd venture many) applications it's not necessary for term 
> statistics to be looked up every time. A fairly naive opt-in caching 
> mechanism tied to the life of the MoreLikeThis object would allow 
> applications to cache term statistics for the duration that suits them.
> I've got this working in my test code. I'll put together a patch file when I 
> get a minute. From my testing this can improve performance by a factor of 
> around 10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1765) incorrect doc description of fielded query syntax

2009-07-29 Thread solrize (JIRA)
incorrect doc description of fielded query syntax
-

 Key: LUCENE-1765
 URL: https://issues.apache.org/jira/browse/LUCENE-1765
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.4.1
 Environment: lucene.apache.org docs
Reporter: solrize
Priority: Minor


http://lucene.apache.org/java/2_4_1/queryparsersyntax.html#Fields says:

  You can search any field by typing the field name followed by a colon ":" and 
then the term you are looking for. 

This is slightly incomplete since the stuff after the fieldname can be a more 
complex query, not necessarily a term.  For example, 

title:(do it right)

seems to work when I tried it.  It would be good if the doc was updated to 
describe the syntax exactly.

Also, "documentation" should be one of the components selectable in bug reports.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Build failed in Hudson: Lucene-trunk #902

2009-07-29 Thread Uwe Schindler
This seems to be fixed now. But there is something completely wrong with
clover:

If you look into the clover reports, there are a lot of classes having 0%
code coverage, but there are tests available (e.g. my new NumericRange
things). Also *all* contribs have 0%.

After thinking a little bit about it, it seems, that the cloverage report is
build not from the normal test-run, but it is generated from the results of
the test-tag. This explains, why NumericRange and Spatial seem to have no
tests for clover.

Does anybody know, how to fix this. Maybe the cloverage should be disabled
for the test run in test-tag? What can be changed in build.xml to do this?

I have no clover installed locally, so I cannot try this out.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Tuesday, July 28, 2009 12:13 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Build failed in Hudson: Lucene-trunk #902
> 
> Hmm... the build looks like it failed because of some odd clover
> licensing issue:
> 
>[clover] Sorry, you are not licensed to instrument files in the package
> ''.
> 
> Anyone have any ideas?
> 
> Mike
> 
> On Mon, Jul 27, 2009 at 11:26 PM, Apache Hudson
> Server wrote:
> > See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/902/changes
> >
> > Changes:
> >
> > [uschindler] LUCENE-1754: JavaDoc updates
> >
> > [mikemccand] LUCENE-1754: EMPTY_DOCIDSET subclasses DocIdSet directly
> >
> > [mikemccand] LUCENE-1754: just use EMPTY_DOCIDSET.iterator() instead of
> new EmptyDocIdSetIterator
> >
> > [mikemccand] LUCENE-1595: don't use SortField.AUTO; deprecate
> LineDocMaker & EnwikiDocMaker
> >
> > [mikemccand] LUCENE-1754: add EmptyDocIdSetIterator
> >
> > [mikemccand] LUCENE-1754: update back-compat test
> >
> > [mikemccand] LUCENE-1754: BooleanQuery detects up front if it won't
> match any docs and returns null from its scorer() instead of
> NonMatchingScorer
> >
> > --
> > [...truncated 21062 lines...]
> >  [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property
> >  [javadoc] 1 error
> >  [javadoc] 32 warnings
> >      [jar] Building jar:
> http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/ws/trunk/build/contrib/spatial/lucene-spatial-2009-07-28_02-04-46-
> javadoc.jar
> >     [echo] Building spellchecker...
> >
> > javadocs:
> >  [javadoc] Generating Javadoc
> >  [javadoc] Javadoc execution
> >  [javadoc] Loading source files for package
> org.apache.lucene.search.spell...
> >  [javadoc] Constructing Javadoc information...
> >  [javadoc] javadoc: warning - Error reading file:
> http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/ws/trunk/build/docs/api/contrib-spellchecker/../package-list
> >  [javadoc] Standard Doclet version 1.5.0_14
> >  [javadoc] Building tree for all the packages and classes...
> >  [javadoc] Building index for all the packages and classes...
> >  [javadoc] Building index for all classes...
> >  [javadoc] javadoc: error - Error while reading file
> http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/ws/trunk/contrib/spellchecker/src/java/overview.html
> >  [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/ws/trunk/build/docs/api/contrib-spellchecker/stylesheet.css...
> >  [javadoc] Note: Custom tags that could override future standard tags:
> �...@todo. To avoid potential overrides, use at least one period character
> (.) in custom tag names.
> >  [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property
> >  [javadoc] 1 error
> >  [javadoc] 1 warning
> >      [jar] Building jar:
> http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/ws/trunk/build/contrib/spellchecker/lucene-spellchecker-2009-07-
> 28_02-04-46-javadoc.jar
> >     [echo] Building surround...
> >
> > javadocs:
> >  [javadoc] Generating Javadoc
> >  [javadoc] Javadoc execution
> >  [javadoc] Loading source files for package
> org.apache.lucene.queryParser.surround.parser...
> >  [javadoc] Loading source files for package
> org.apache.lucene.queryParser.surround.query...
> >  [javadoc] Constructing Javadoc information...
> >  [javadoc] javadoc: warning - Error reading file:
> http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/ws/trunk/build/docs/api/contrib-surround/../package-list
> >  [javadoc] Standard Doclet version 1.5.0_14
> >  [javadoc] Building tree for all the packages and classes...
> >  [javadoc] Building index for all the packages and classes...
> >  [javadoc] Building index for all classes...
> >  [javadoc] javadoc: error - Error while reading file
> http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/ws/trunk/contrib/surround/src/java/overview.html
> >  [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene-
> trunk/ws/trunk/build/docs/api/contrib-surround/stylesheet.css...
> >  [javadoc] Note: Custom tags