[jira] Resolved: (LUCENE-2458) queryparser makes all CJK queries phrase queries regardless of analyzer

2010-05-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2458.
-

Resolution: Fixed

Committed revision 948326 (trunk) / 948325 (3x)

> queryparser makes all CJK queries phrase queries regardless of analyzer
> ---
>
> Key: LUCENE-2458
> URL: https://issues.apache.org/jira/browse/LUCENE-2458
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2458.patch, LUCENE-2458.patch, LUCENE-2458.patch
>
>
> The queryparser automatically makes *ALL* CJK, Thai, Lao, Myanmar, Tibetan, 
> ... queries into phrase queries, even though you didn't ask for one, and 
> there isn't a way to turn this off.
> This completely breaks lucene for these languages, as it treats all queries 
> like 'grep'.
> Example: if you query for f:abcd with standardanalyzer, where a,b,c,d are 
> chinese characters, you get a phrasequery of "a b c d". if you use cjk 
> analyzer, its no better, its a phrasequery of  "ab bc cd", and if you use 
> smartchinese analyzer, you get a phrasequery like "ab cd". But the user 
> didn't ask for one, and they cannot turn it off.
> The reason is that the code to form phrase queries is not internationally 
> appropriate and assumes whitespace tokenization. If more than one token comes 
> out of whitespace delimited text, its automatically a phrase query no matter 
> what.
> The proposed patch fixes the core queryparser (with all backwards compat 
> kept) to only form phrase queries when the double quote operator is used. 
> Implementing subclasses can always extend the QP and auto-generate whatever 
> kind of queries they want that might completely break search for languages 
> they don't care about, but core general-purpose QPs should be language 
> independent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2458) queryparser makes all CJK queries phrase queries regardless of analyzer

2010-05-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871501#action_12871501
 ] 

Robert Muir commented on LUCENE-2458:
-

This patch fixes the bug in all queryparsers. I plan to commit soon.

If desired, someone can make their own euro-centric queryparser in the contrib 
section and I have no objection, as long as its clearly documented that its 
unsuitable for many languages (just like the JDK does).

> queryparser makes all CJK queries phrase queries regardless of analyzer
> ---
>
> Key: LUCENE-2458
> URL: https://issues.apache.org/jira/browse/LUCENE-2458
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2458.patch, LUCENE-2458.patch, LUCENE-2458.patch
>
>
> The queryparser automatically makes *ALL* CJK, Thai, Lao, Myanmar, Tibetan, 
> ... queries into phrase queries, even though you didn't ask for one, and 
> there isn't a way to turn this off.
> This completely breaks lucene for these languages, as it treats all queries 
> like 'grep'.
> Example: if you query for f:abcd with standardanalyzer, where a,b,c,d are 
> chinese characters, you get a phrasequery of "a b c d". if you use cjk 
> analyzer, its no better, its a phrasequery of  "ab bc cd", and if you use 
> smartchinese analyzer, you get a phrasequery like "ab cd". But the user 
> didn't ask for one, and they cannot turn it off.
> The reason is that the code to form phrase queries is not internationally 
> appropriate and assumes whitespace tokenization. If more than one token comes 
> out of whitespace delimited text, its automatically a phrase query no matter 
> what.
> The proposed patch fixes the core queryparser (with all backwards compat 
> kept) to only form phrase queries when the double quote operator is used. 
> Implementing subclasses can always extend the QP and auto-generate whatever 
> kind of queries they want that might completely break search for languages 
> they don't care about, but core general-purpose QPs should be language 
> independent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2458) queryparser makes all CJK queries phrase queries regardless of analyzer

2010-05-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2458:


Summary: queryparser makes all CJK queries phrase queries regardless of 
analyzer  (was: queryparser shouldn't generate phrasequeries based on term 
count)
Description: 
The queryparser automatically makes *ALL* CJK, Thai, Lao, Myanmar, Tibetan, ... 
queries into phrase queries, even though you didn't ask for one, and there 
isn't a way to turn this off.

This completely breaks lucene for these languages, as it treats all queries 
like 'grep'.

Example: if you query for f:abcd with standardanalyzer, where a,b,c,d are 
chinese characters, you get a phrasequery of "a b c d". if you use cjk 
analyzer, its no better, its a phrasequery of  "ab bc cd", and if you use 
smartchinese analyzer, you get a phrasequery like "ab cd". But the user didn't 
ask for one, and they cannot turn it off.

The reason is that the code to form phrase queries is not internationally 
appropriate and assumes whitespace tokenization. If more than one token comes 
out of whitespace delimited text, its automatically a phrase query no matter 
what.

The proposed patch fixes the core queryparser (with all backwards compat kept) 
to only form phrase queries when the double quote operator is used. 

Implementing subclasses can always extend the QP and auto-generate whatever 
kind of queries they want that might completely break search for languages they 
don't care about, but core general-purpose QPs should be language independent.


  was:
The current method in the queryparser to generate phrasequeries is wrong:

The Query Syntax documentation 
(http://lucene.apache.org/java/3_0_1/queryparsersyntax.html) states:
{noformat}
A Phrase is a group of words surrounded by double quotes such as "hello dolly".
{noformat}

But as we know, this isn't actually true.

Instead the terms are first divided on whitespace, then the analyzer term count 
is used as some sort of "heuristic" to determine if its a phrase query or not.
This assumption is a disaster for languages that don't use whitespace 
separation: CJK, compounding European languages like German, Finnish, etc. It 
also
makes it difficult for people to use n-gram analysis techniques. In these cases 
you get bad relevance (MAP improves nearly *10x* if you use a PositionFilter at 
query-time to "turn this off" for chinese).

For even english, this undocumented behavior is bad. Perhaps in some cases its 
being abused as some heuristic to "second guess" the tokenizer and piece back 
things it shouldn't have split, but for large collections, doing things like 
generating phrasequeries because StandardTokenizer split a compound on a dash 
can cause serious performance problems. Instead people should analyze their 
text with the appropriate methods, and QueryParser should only generate phrase 
queries when the syntax asks for one.

The PositionFilter in contrib can be seen as a workaround, but its pretty 
obscure and people are not familiar with it. The result is we have bad 
out-of-box behavior for many languages, and bad performance for others on some 
inputs.

I propose instead that we change the grammar to actually look for double quotes 
to determine when to generate a phrase query, consistent with the documentation.



editing this issue to make it easier to understand.

> queryparser makes all CJK queries phrase queries regardless of analyzer
> ---
>
> Key: LUCENE-2458
> URL: https://issues.apache.org/jira/browse/LUCENE-2458
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2458.patch, LUCENE-2458.patch, LUCENE-2458.patch
>
>
> The queryparser automatically makes *ALL* CJK, Thai, Lao, Myanmar, Tibetan, 
> ... queries into phrase queries, even though you didn't ask for one, and 
> there isn't a way to turn this off.
> This completely breaks lucene for these languages, as it treats all queries 
> like 'grep'.
> Example: if you query for f:abcd with standardanalyzer, where a,b,c,d are 
> chinese characters, you get a phrasequery of "a b c d". if you use cjk 
> analyzer, its no better, its a phrasequery of  "ab bc cd", and if you use 
> smartchinese analyzer, you get a phrasequery like "ab cd". But the user 
> didn't ask for one, and they cannot turn it off.
> The reason is that the code to form phrase queries is not internationally 
> appropriate and assumes whitespace tokenization. If more than one token comes 
> out of whitespace delimited text, its automatically a phrase query no matter 
> what.
> The proposed patch fixes the core queryparser (with all backwards compat 
> kept) to onl

[jira] Resolved: (SOLR-1928) terms component doesn't tiebreak by index order

2010-05-25 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1928.


Resolution: Fixed

> terms component doesn't tiebreak by index order
> ---
>
> Key: SOLR-1928
> URL: https://issues.apache.org/jira/browse/SOLR-1928
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 4.0
>
>
> The external/readable representation is used in the CountPair, so tiebreaks 
> won't be by index order when sorting by count.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #1199

2010-05-25 Thread Apache Hudson Server
See 

Changes:

[uschindler] Generics Policeman ticket

[rmuir] LUCENE-2413: move more core analysis to analyzers module

[rmuir] LUCENE-2413: move more core analysis to analyzers module

[rmuir] LUCENE-2413: consolidate remaining concrete core analyzers to 
modules/analysis

[rmuir] LUCENE-2413: consolidate remaining concrete core analyzers to 
modules/analysis

[mikemccand] LUCENE-2476: release write lock on any exception during 
IndexWriter ctor

[uschindler] Add main version number into autogenerated hudson version number

--
[...truncated 7455 lines...]
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 

  [javadoc] Note: Custom tags that were not seen:  @lucene.experimental, 
@lucene.internal
  [jar] Building jar: 

 [echo] Building misc...

javadocs:
[mkdir] Created dir: 

  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.index...
  [javadoc] Loading source files for package org.apache.lucene.misc...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
:43:
 warning - Tag @link: reference not found: IndexWriter#addIndexes(IndexReader[])
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 

  [javadoc] Note: Custom tags that were not seen:  @lucene.internal
  [javadoc] 1 warning
  [jar] Building jar: 

 [echo] Building queries...

javadocs:
[mkdir] Created dir: 

  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package org.apache.lucene.search.regex...
  [javadoc] Loading source files for package org.apache.lucene.search.similar...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_22
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
:525:
 warning - Tag @see: reference not found: 
org.apache.lucene.analysis.StopFilter#makeStopSet StopFilter.makeStopSet()
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 

  [javadoc] Note: Custom tags that were not seen:  @lucene.experimental, 
@lucene.internal
  [javadoc] 1 warning
  [jar] Building jar: 

 [echo] Building queryparser...

javadocs:
[mkdir] Created dir: 

  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.analyzing...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.complexPhrase...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.builders...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.config...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.messages...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.nodes...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.parser...
  [javadoc] Loading source files for package 
org.apache.lucene.queryParser.core.processors...
  [javadoc] Loading sour

[jira] Commented: (SOLR-236) Field collapsing

2010-05-25 Thread Christophe Biocca (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871470#action_12871470
 ] 

Christophe Biocca commented on SOLR-236:


I'd just like to throw in a suggestion about the AbstractDocumentCollapser & 
CollapseCollectorFactory APIs: It seems to me that changing the 
factory.createCollapseCollector(SolrRequest req) to 
factory.createCollapseCollector(ResponseBuilder rb) would allow for more 
specialized collapse collectors, that would be able to use, amongst other 
things, the SortSpec in the implementation of the collector. Our use case is 
that we want to show possibly more than one document for a given value of a 
collapse field, depending on relative scores. Passing in the ResponseBuilder 
would allow us to do that much more easily. Since the caching uses the 
ResponseBuilder object as its key, it won't introduce any new issues.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
> field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: solr and analyzers module

2010-05-25 Thread Chris Hostetter

: > FWIW: the other thing you may not be aware of is that schema.xml has
: > always had a "version" attribute on the top level  declaration
...
: You are right, I was unaware of this. But i'm confused that its currently 1.3.

it's always been completley independent of the Solr version.  it's the 
"Schema version"

: To deal with improved defaults in this new modularized world, I feel
: that we just shouldnt have so many concrete Analyzers in java that
: really should be "examples"
...
: examples of how to do analysis for Solr users. I'd like to be able to
: just have these, and rid of the concrete Java implementations
: entirely.

+1 ... on the other hand, the best way to ensure that "examples" work is 
to compile & test them, so ... damned if we do, damned if we don't.


-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1915) DebugComponent should use NamedList to output Explanations instead of Explanation.toString()

2010-05-25 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1915:
---

Attachment: SOLR-1915.patch

patch making the necessary changes.

Just in case some poor soul was actually attempting to programmaticality parse 
the old "toString()" response, i added a new "debug.explain.stringFormat=true" 
param to force the old behavior, but the default is to be a structured 
response, and frankly i don't think we should even document that param except 
in the upgrade section of CHANGES.txt (where i explicitly indicated that it 
would be removed in the next release -- it should only be there as a red flag 
and a stop gap in case people don't notice the change until it breaks something)



> DebugComponent should use NamedList to output Explanations instead of 
> Explanation.toString()
> 
>
> Key: SOLR-1915
> URL: https://issues.apache.org/jira/browse/SOLR-1915
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Minor
> Attachments: SOLR-1915.patch
>
>
> DebugComponent currently uses Explanation.toString() to "format" score 
> explanations for each document as plain text with whitespace indenting to 
> denote the hierarchical relationship, and then adds those explanations to the 
> SolrQueryResponse.
> Instead DebugComponent should transform the Explanation objects into 
> NamedLists so that the full structure can be formatted in a logical way by 
> the ResponseWriter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2413) Consolidate all (Solr's & Lucene's) analyzers into modules/analysis

2010-05-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871392#action_12871392
 ] 

Robert Muir commented on LUCENE-2413:
-

Committed LUCENE-2413_coreUtils.patch revision 948225

> Consolidate all (Solr's & Lucene's) analyzers into modules/analysis
> ---
>
> Key: LUCENE-2413
> URL: https://issues.apache.org/jira/browse/LUCENE-2413
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Michael McCandless
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2413-charfilter.patch, LUCENE-2413-PFAW+LF.patch, 
> LUCENE-2413_commongrams.patch, LUCENE-2413_coreAnalyzers.patch, 
> LUCENE-2413_coreUtils.patch, LUCENE-2413_folding.patch, 
> LUCENE-2413_htmlstrip.patch, LUCENE-2413_icu.patch, 
> LUCENE-2413_keep_hyphen_trim.patch, LUCENE-2413_keyword.patch, 
> LUCENE-2413_mockfilter.patch, LUCENE-2413_mockfilter.patch, 
> LUCENE-2413_pattern.patch, LUCENE-2413_porter.patch, 
> LUCENE-2413_removeDups.patch, LUCENE-2413_synonym.patch, 
> LUCENE-2413_teesink.patch, LUCENE-2413_test4.patch, 
> LUCENE-2413_testanalyzer.patch, LUCENE-2413_testanalyzer.patch, 
> LUCENE-2413_tests2.patch, LUCENE-2413_tests3.patch, LUCENE-2413_wdf.patch
>
>
> We've been wanting to do this for quite some time now...  I think, now that 
> Solr/Lucene are merged, and we're looking at opening an unstable line of 
> development for Solr/Lucene, now is the right time to do it.
> A standalone module for all analyzers also empowers apps to separately 
> version the analyzers from which version of Solr/Lucene they use, possibly 
> enabling us to remove Version entirely from the analyzers.
> We should also do LUCENE-2309 (decouple, as much as possible, indexer from 
> the analysis API), but I don't think that issue needs to block this 
> consolidation.
> Once we do this, there is one place where our users can find all the 
> analyzers that Solr/Lucene provide.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Simon Willnauer -> core Lucene/Solr committer

2010-05-25 Thread Uwe Schindler
Welcome Simon! Heavy Committing!RT

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant
Ingersoll
> Sent: Tuesday, May 25, 2010 11:28 PM
> To: dev@lucene.apache.org
> Subject: Simon Willnauer -> core Lucene/Solr committer
> 
> I'm happy to announce Simon Willnauer is now a core Lucene/Solr
> committer.  Simon has been a contrib committer for quite some time and
> should make for an excellent core committer, that is if he can stay away
from
> conference planning!
> 
> Cheers,
> Grant
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2413) Consolidate all (Solr's & Lucene's) analyzers into modules/analysis

2010-05-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2413:


Attachment: LUCENE-2413_coreUtils.patch

moves CharFilter, CharArraySet, and CharArrayMap

> Consolidate all (Solr's & Lucene's) analyzers into modules/analysis
> ---
>
> Key: LUCENE-2413
> URL: https://issues.apache.org/jira/browse/LUCENE-2413
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Michael McCandless
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2413-charfilter.patch, LUCENE-2413-PFAW+LF.patch, 
> LUCENE-2413_commongrams.patch, LUCENE-2413_coreAnalyzers.patch, 
> LUCENE-2413_coreUtils.patch, LUCENE-2413_folding.patch, 
> LUCENE-2413_htmlstrip.patch, LUCENE-2413_icu.patch, 
> LUCENE-2413_keep_hyphen_trim.patch, LUCENE-2413_keyword.patch, 
> LUCENE-2413_mockfilter.patch, LUCENE-2413_mockfilter.patch, 
> LUCENE-2413_pattern.patch, LUCENE-2413_porter.patch, 
> LUCENE-2413_removeDups.patch, LUCENE-2413_synonym.patch, 
> LUCENE-2413_teesink.patch, LUCENE-2413_test4.patch, 
> LUCENE-2413_testanalyzer.patch, LUCENE-2413_testanalyzer.patch, 
> LUCENE-2413_tests2.patch, LUCENE-2413_tests3.patch, LUCENE-2413_wdf.patch
>
>
> We've been wanting to do this for quite some time now...  I think, now that 
> Solr/Lucene are merged, and we're looking at opening an unstable line of 
> development for Solr/Lucene, now is the right time to do it.
> A standalone module for all analyzers also empowers apps to separately 
> version the analyzers from which version of Solr/Lucene they use, possibly 
> enabling us to remove Version entirely from the analyzers.
> We should also do LUCENE-2309 (decouple, as much as possible, indexer from 
> the analysis API), but I don't think that issue needs to block this 
> consolidation.
> Once we do this, there is one place where our users can find all the 
> analyzers that Solr/Lucene provide.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Simon Willnauer -> core Lucene/Solr committer

2010-05-25 Thread Grant Ingersoll
I'm happy to announce Simon Willnauer is now a core Lucene/Solr committer.  
Simon has been a contrib committer for quite some time and should make for an 
excellent core committer, that is if he can stay away from conference planning!

Cheers,
Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Created: (SOLR-1920) Need generic placemarker for DIH delta-import

2010-05-25 Thread Lance Norskog
+1

SOLR-1499 allows you to search the Solr index and find the most recent
record you indexed, if you added a timestamp or unique id. It requires
a sorted search, though.

On Wed, May 19, 2010 at 4:16 PM, Shawn Heisey (JIRA)  wrote:
> Need generic placemarker for DIH delta-import
> -
>
>                 Key: SOLR-1920
>                 URL: https://issues.apache.org/jira/browse/SOLR-1920
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>            Reporter: Shawn Heisey
>            Priority: Minor
>             Fix For: 3.1
>
>
> The dataimporthandler currently is only capable of saving the index timestamp 
> for later use in delta-import commands.  It should be extended to allow any 
> arbitrary data to be used as a placemarker for the next import.
>
> It is possible to use externally supplied variables in data-config.xml and 
> send values in via the URL that starts the import, but if the config can 
> support it natively, that is better.
>
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>



-- 
Lance Norskog
goks...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-1928) terms component doesn't tiebreak by index order

2010-05-25 Thread Yonik Seeley (JIRA)
terms component doesn't tiebreak by index order
---

 Key: SOLR-1928
 URL: https://issues.apache.org/jira/browse/SOLR-1928
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 4.0


The external/readable representation is used in the CountPair, so tiebreaks 
won't be by index order when sorting by count.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871342#action_12871342
 ] 

Michael McCandless commented on LUCENE-2380:


I did some rough estimates of RAM usage for StringIndex (trunk) vs
TermIndex (patch).

Java String is an object, so estimate 8 byte object header in the JRE.
It seems to have 3 int fields (offset, count, hashCode), from
OpenJDK's sources, plus ref to char[].

The char[] has 8 byte object header, int length, and actual array
data.

So in trunk's StringIndex:

  per-unique-term: 40 bytes (48 on 64bit jre) + 2*length-of-string-in-UTF16
  per-doc: 4 bytes (8 bytes on 64 bit)

In the patch:

  per-unique-term: ceil(log2(totalUTF8BytesTermData)) + utf8 bytes + 1 or 2 
bytes (vInt, for term length)
  per-doc: ceil(log2(numUniqueTerm)) bits

So eg say you have an English title field, avg length 40 chars, and
assume always unique.  On a 5M doc index, trunk would take ~591MB and
patch would take ~226 MB (32bit JRE) = 62% less.

But if you have a CJK title field, avg 10 chars (may be highish), it's
less savings because UTF8 takes 50% more RAM than UTF16 does for CJK
(and others).  Trunk would take ~305MB and patch ~178MB (32bit JRE) =
42% less.

Also don't forget the GC load of having 5M String & char[] objects...


> Add FieldCache.getTermBytes, to load term data as byte[]
> 
>
> Key: LUCENE-2380
> URL: https://issues.apache.org/jira/browse/LUCENE-2380
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
> string, but not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding 
> methods to load terms as native byte[], since in general they may not be 
> representable as String.  This should be quite a bit more RAM efficient too, 
> for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Andrzej Bialecki as Lucene/Solr committer

2010-05-25 Thread Grant Ingersoll
Welcome!  Was frankly surprised way back when to realize you weren't a 
committer given the ubiquity of Luke and your involvement with Nutch, etc.

On May 25, 2010, at 8:01 AM, Andrzej Bialecki wrote:
>  I tend the garden
> and enjoy working with various powered tools - recently acquired a
> welding machine, you can imagine the possibilities... 

Mmm, power tools...  

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2413) Consolidate all (Solr's & Lucene's) analyzers into modules/analysis

2010-05-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871338#action_12871338
 ] 

Robert Muir commented on LUCENE-2413:
-

Committed LUCENE-2413_coreAnalyzers.patch revision 948195.

> Consolidate all (Solr's & Lucene's) analyzers into modules/analysis
> ---
>
> Key: LUCENE-2413
> URL: https://issues.apache.org/jira/browse/LUCENE-2413
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Michael McCandless
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2413-charfilter.patch, LUCENE-2413-PFAW+LF.patch, 
> LUCENE-2413_commongrams.patch, LUCENE-2413_coreAnalyzers.patch, 
> LUCENE-2413_folding.patch, LUCENE-2413_htmlstrip.patch, 
> LUCENE-2413_icu.patch, LUCENE-2413_keep_hyphen_trim.patch, 
> LUCENE-2413_keyword.patch, LUCENE-2413_mockfilter.patch, 
> LUCENE-2413_mockfilter.patch, LUCENE-2413_pattern.patch, 
> LUCENE-2413_porter.patch, LUCENE-2413_removeDups.patch, 
> LUCENE-2413_synonym.patch, LUCENE-2413_teesink.patch, 
> LUCENE-2413_test4.patch, LUCENE-2413_testanalyzer.patch, 
> LUCENE-2413_testanalyzer.patch, LUCENE-2413_tests2.patch, 
> LUCENE-2413_tests3.patch, LUCENE-2413_wdf.patch
>
>
> We've been wanting to do this for quite some time now...  I think, now that 
> Solr/Lucene are merged, and we're looking at opening an unstable line of 
> development for Solr/Lucene, now is the right time to do it.
> A standalone module for all analyzers also empowers apps to separately 
> version the analyzers from which version of Solr/Lucene they use, possibly 
> enabling us to remove Version entirely from the analyzers.
> We should also do LUCENE-2309 (decouple, as much as possible, indexer from 
> the analysis API), but I don't think that issue needs to block this 
> consolidation.
> Once we do this, there is one place where our users can find all the 
> analyzers that Solr/Lucene provide.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871308#action_12871308
 ] 

Michael McCandless commented on LUCENE-2380:


OK I ran some sort perf tests.  I picked the worst case -- trivial
query (TermQuery) matching all docs, sorting by either a highly unique
string field (random string) or enumerated field (country ~ a couple
hundred values), from benchmark's SortableSingleDocSource.

Index has 5M docs.  Each run is best of 3.

Results:

||Sort||Trunk QPS||Patch QPS||Change %||
|random|7.75|5.64|{color:red}-27.2%{color}
|country|8.05|7.62|{color:red}-5.3%{color}

So the packed ints lookups are more costly than trunk today (but,
at a large reduction in RAM used).

Then I tried another test, asking packed ints to upgrade to an array
of the nearest native type (ie byte[], short[], int[], long[]) for the
doc -> ord map.  This is faster since lookups don't require
shift/mask, but, wastes some space since you have unused bits:

||Sort||Trunk QPS||Patch QPS||Change %||
|random|7.75|7.89|{color:green}1.8%{color}
|country|8.05|7.64|{color:red}-5.1%{color}

The country case didn't get any better (noise) because it happened to
already be using 8 bits (byte[]) for doc->ord map.

Remember this is a worst case test -- if you query matches fewer
results than your entire index, or your query is more costly to
evaluate than the simple single TermQuery, this FieldCache lookup cost
will be relatively smaller.

So... I think we should expose in the new FieldCache methods an
optional param to control time/space tradeoff; I'll add this,
defaulting to upgrading to nearest native type.  I think the 5.3%
slowdown on the country field is acceptable given the large reduction
in RAM used...


> Add FieldCache.getTermBytes, to load term data as byte[]
> 
>
> Key: LUCENE-2380
> URL: https://issues.apache.org/jira/browse/LUCENE-2380
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
> string, but not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding 
> methods to load terms as native byte[], since in general they may not be 
> representable as String.  This should be quite a bit more RAM efficient too, 
> for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1927) DocBuilder Inefficiency

2010-05-25 Thread Robert Zotter (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Zotter updated SOLR-1927:


Attachment: SOLR-1927.patch

> DocBuilder Inefficiency
> ---
>
> Key: SOLR-1927
> URL: https://issues.apache.org/jira/browse/SOLR-1927
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Robert Zotter
>Priority: Trivial
> Attachments: SOLR-1927.patch
>
>
> I am looking into collectDelta method in DocBuilder.java and I noticed that
> to determine the deltaRemoveSet it currently loops through the whole
> deltaSet for each deleted row.
> Does anyone else agree with the fact that this is quite inefficient?
> For delta-imports with a large deltaSet and deletedSet I found a
> considerable improvement in speed if we just save all deleted keys in a set.
> Then we just have to loop through the deltaSet once to determine which rows
> should be removed by checking if the deleted key set contains the delta row
> key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-1927) DocBuilder Inefficiency

2010-05-25 Thread Robert Zotter (JIRA)
DocBuilder Inefficiency
---

 Key: SOLR-1927
 URL: https://issues.apache.org/jira/browse/SOLR-1927
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Robert Zotter
Priority: Trivial


I am looking into collectDelta method in DocBuilder.java and I noticed that
to determine the deltaRemoveSet it currently loops through the whole
deltaSet for each deleted row. (Version 1.4.0 line 641)

Does anyone else agree with the fact that this is quite inefficient?

For delta-imports with a large deltaSet and deletedSet I found a
considerable improvement in speed if we just save all deleted keys in a set.
Then we just have to loop through the deltaSet once to determine which rows
should be removed by checking if the deleted key set contains the delta row
key.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1927) DocBuilder Inefficiency

2010-05-25 Thread Robert Zotter (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Zotter updated SOLR-1927:


Description: 
I am looking into collectDelta method in DocBuilder.java and I noticed that
to determine the deltaRemoveSet it currently loops through the whole
deltaSet for each deleted row.

Does anyone else agree with the fact that this is quite inefficient?

For delta-imports with a large deltaSet and deletedSet I found a
considerable improvement in speed if we just save all deleted keys in a set.
Then we just have to loop through the deltaSet once to determine which rows
should be removed by checking if the deleted key set contains the delta row
key.


  was:
I am looking into collectDelta method in DocBuilder.java and I noticed that
to determine the deltaRemoveSet it currently loops through the whole
deltaSet for each deleted row. (Version 1.4.0 line 641)

Does anyone else agree with the fact that this is quite inefficient?

For delta-imports with a large deltaSet and deletedSet I found a
considerable improvement in speed if we just save all deleted keys in a set.
Then we just have to loop through the deltaSet once to determine which rows
should be removed by checking if the deleted key set contains the delta row
key.



> DocBuilder Inefficiency
> ---
>
> Key: SOLR-1927
> URL: https://issues.apache.org/jira/browse/SOLR-1927
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Robert Zotter
>Priority: Trivial
>
> I am looking into collectDelta method in DocBuilder.java and I noticed that
> to determine the deltaRemoveSet it currently loops through the whole
> deltaSet for each deleted row.
> Does anyone else agree with the fact that this is quite inefficient?
> For delta-imports with a large deltaSet and deletedSet I found a
> considerable improvement in speed if we just save all deleted keys in a set.
> Then we just have to loop through the deltaSet once to determine which rows
> should be removed by checking if the deleted key set contains the delta row
> key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2477) remove MoreLikeThis's default analyzer

2010-05-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871294#action_12871294
 ] 

Uwe Schindler commented on LUCENE-2477:
---

+1

> remove MoreLikeThis's default analyzer
> --
>
> Key: LUCENE-2477
> URL: https://issues.apache.org/jira/browse/LUCENE-2477
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2477.patch
>
>
> MoreLikeThis has the following:
> {code}
> public static final Analyzer DEFAULT_ANALYZER = new 
> StandardAnalyzer(Version.LUCENE_CURRENT);
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr updateRequestHandler and performance vs. atomicity

2010-05-25 Thread Paul Elschot
Sounds like a distributed two phase commit is needed.
Would http://activemq.apache.org/ do the job?
If it does, camel (split off of activemq) has a lucene component
that could be of interest, too.

Regards,
Paul Elschot

Op dinsdag 25 mei 2010 14:59:29 schreef Yonik Seeley:
> On Mon, May 24, 2010 at 9:10 AM,   wrote:
> > In particular, it would be nice to be able to post documents in such a way
> > that you can guarantee that the document is permanently in Solr’s queue,
> > safe in the event of a Solr restart, etc., even if the document has not yet
> > been “committed”.
> 
> Yep, this is a longer term goal of SolrCloud.
> And to be truly safe, committing to stable storage is not enough -
> that still might crash and never recover.  One needs to write to
> multiple nodes.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2413) Consolidate all (Solr's & Lucene's) analyzers into modules/analysis

2010-05-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2413:


Attachment: LUCENE-2413_coreAnalyzers.patch

attached is a patch that pulls out the rest of lucene's concrete analyzers and 
puts them in the analyzers module.

in order to do this, I had to rearrange demo. Instead i made it contrib/demo, 
and this really simplified the build system.


> Consolidate all (Solr's & Lucene's) analyzers into modules/analysis
> ---
>
> Key: LUCENE-2413
> URL: https://issues.apache.org/jira/browse/LUCENE-2413
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Michael McCandless
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2413-charfilter.patch, LUCENE-2413-PFAW+LF.patch, 
> LUCENE-2413_commongrams.patch, LUCENE-2413_coreAnalyzers.patch, 
> LUCENE-2413_folding.patch, LUCENE-2413_htmlstrip.patch, 
> LUCENE-2413_icu.patch, LUCENE-2413_keep_hyphen_trim.patch, 
> LUCENE-2413_keyword.patch, LUCENE-2413_mockfilter.patch, 
> LUCENE-2413_mockfilter.patch, LUCENE-2413_pattern.patch, 
> LUCENE-2413_porter.patch, LUCENE-2413_removeDups.patch, 
> LUCENE-2413_synonym.patch, LUCENE-2413_teesink.patch, 
> LUCENE-2413_test4.patch, LUCENE-2413_testanalyzer.patch, 
> LUCENE-2413_testanalyzer.patch, LUCENE-2413_tests2.patch, 
> LUCENE-2413_tests3.patch, LUCENE-2413_wdf.patch
>
>
> We've been wanting to do this for quite some time now...  I think, now that 
> Solr/Lucene are merged, and we're looking at opening an unstable line of 
> development for Solr/Lucene, now is the right time to do it.
> A standalone module for all analyzers also empowers apps to separately 
> version the analyzers from which version of Solr/Lucene they use, possibly 
> enabling us to remove Version entirely from the analyzers.
> We should also do LUCENE-2309 (decouple, as much as possible, indexer from 
> the analysis API), but I don't think that issue needs to block this 
> consolidation.
> Once we do this, there is one place where our users can find all the 
> analyzers that Solr/Lucene provide.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr version housekeeping in Jira/Wiki (1.5, 1.6, 3.x, 4.x, etc...)

2010-05-25 Thread Chris Hostetter


A while back, after the trunk merge (but before the 3x branch fork) yonik 
and i spear-headed a healthy depate on the list about whether the next 
version of Solr should have a lock-step version number with Lucene ... 
while i've generally come arround to yonik's way of thinking, that's *not* 
what this thread is about (i say that up front in the hopes of preventing 
this thread from devolving into a continued debate about internal vs 
marketing version numbers)


Independent of the questions of what branch the next version of SOlr 
should be released on, or what version number "label" it should be called, 
is the issue of keeping straight what bug fixes and features have been 
added to what branches.  Several issues in Jira were marked as "Fixed" in 
1.5, prior to the trunk merge but with the ambiguity about how the 
versioning was going to evlove, were never bulk updated to indicate that 
they were actaully going be fixed in 3.1 (or 4.0).  Now that we may (or 
may not) ever have a 1.5 release, it can be hard to look at a Jira issue 
and make sense of where the chnages were actually commited.  This has been 
componded by some committers (i take responsibility for being the majority 
of the problem) continuing to mark issues they commit as being fixed in 
"1.5" even though they commited to the "trunk" (after the lucene/solr 
trunk merge)


Likewise for the way we annotate information in the Solr wiki.  Several 
bits of documentation are annoated as being in 1.5, but nothing is marked 
as 3.1 or 4.1


What i'd like to propose is that we focus on making sure the 
"Fix Version" in Jira and the annotations on the wiki correctly reflect 
the "next" version of the *branches* where changes have been commited. 
Even if (in the unlikely event) the final version numbers that we 
release are ultimatley differnet, we can at least be reasonably confident 
that a simple batch replace will work.


In concrete terms, these are the steps i'm planning to take in a few days 
unless someone objects, or suggests a simpler path...


1) create a new Jira version for Solr called "next" as a way to track 
unresolved issues that people generally feel should be fixed in the "next" 
feature release.


2) bulk change any Solr issue currently UNRESOLVED with a "Fix Version" or 
1.5, 1.6, 3.1, or 4.0 so that it's new Fix Version is "next"


3) Compute three diffs, one for each of each of these three 
CHANGES.txt files...


http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev/CHANGES.txt
http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/CHANGES.txt
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/CHANGES.txt
...against the official 1.4 CHANGES.txt...
http://svn.apache.org/viewvc/lucene/solr/tags/release-1.4.0/CHANGES.txt

4) merge the diffs from step#3 into a 4 column report, listing every issue 
mentioned in any of those three CHANGES.txt files and which "branches" it 
has been commited to.


5) using the report for step#4, manually update every individual issue so 
that the Fix Version accurately the list of *possible* versions that issue 
will be fixed in, if there is a release off of those respective branches 
(ie: some subset of (1.5, 3.1, 4.0))


6) delete "1.6" as a Solr version in Jira.

7) Update the Solr1.5 wiki page to link to the 1.5 branch in SVN, and add 
a note that such a release may never actually happen... 
http://wiki.apache.org/solr/Solr1.5


8) Create new wiki pages for Solr3.1 and Solr4.0, model them after the 
Solr1.5 page with pointers to what branch of SVN development is taking 
place on and where to track issues fixed on those branches.  (we can also 
add verbage here about the merged lucene/solr dev model, and why the 3x 
branch was created, but we can worry about that later)


9) Audit every link to the Solr1.5 page, and add links to the new Solr3.1 
and Solr4.0 pages as needed...

http://wiki.apache.org/solr/Solr1.5?action=fullsearch&context=180&value=linkto%3A%22Solr1.5%22


...I'm not particularly looking forward to step #5, but it's the only safe 
way i can thin of to make everything is correct.  I'm open to other 
suggestions.



-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2477) remove MoreLikeThis's default analyzer

2010-05-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2477:


Attachment: LUCENE-2477.patch

patch for 4.0, which sets the default to null.

If you don't have term vectors and dont set the analyzer, it throws a UOE.

All tests pass, for the 3.1 backport, instead of setting it to null, we simply 
deprecate DEFAULT_ANALYZER and specify that you must provide it if you are 
using stored fields.


> remove MoreLikeThis's default analyzer
> --
>
> Key: LUCENE-2477
> URL: https://issues.apache.org/jira/browse/LUCENE-2477
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2477.patch
>
>
> MoreLikeThis has the following:
> {code}
> public static final Analyzer DEFAULT_ANALYZER = new 
> StandardAnalyzer(Version.LUCENE_CURRENT);
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2477) remove MoreLikeThis's default analyzer

2010-05-25 Thread Robert Muir (JIRA)
remove MoreLikeThis's default analyzer
--

 Key: LUCENE-2477
 URL: https://issues.apache.org/jira/browse/LUCENE-2477
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0
 Attachments: LUCENE-2477.patch

MoreLikeThis has the following:

{code}
public static final Analyzer DEFAULT_ANALYZER = new 
StandardAnalyzer(Version.LUCENE_CURRENT);
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1553) extended dismax query parser

2010-05-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871244#action_12871244
 ] 

Hoss Man commented on SOLR-1553:


Another bug noticed on the user list, edismax doesn't seem to respect MM all 
the time -- in particular when there are negated clauses...

Compare...
http://localhost:8983/solr/select?debugQuery=true&defType=dismax&qf=text&q=xxx+yyy+zzz+-1234&mm=2
http://localhost:8983/solr/select?debugQuery=true&defType=edismax&qf=text&q=xxx+yyy+zzz+-1234&mm=2

> extended dismax query parser
> 
>
> Key: SOLR-1553
> URL: https://issues.apache.org/jira/browse/SOLR-1553
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Fix For: 1.5
>
> Attachments: edismax.unescapedcolon.bug.test.patch, 
> edismax.userFields.patch, SOLR-1553.patch, SOLR-1553.pf-refactor.patch
>
>
> An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Cservenak, Tamas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871236#action_12871236
 ] 

Cservenak, Tamas commented on LUCENE-2476:
--

This is a Lucene index _known_ to be corrupt (got from a "live" Nexus or just 
"breaking" it manually by tampering with hex editor, not remember anymore). The 
Lucene used to create this index is 2.3.2, so during this UT I believe an index 
upgrade happens too.

{noformat}
[INFO] Failed to configure timeline index, trying to repair it.
org.sonatype.timeline.TimelineException: Fail to configure timeline index!
at 
org.sonatype.timeline.DefaultTimelineIndexer.configure(DefaultTimelineIndexer.java:107)
at 
org.sonatype.timeline.DefaultTimeline.configure(DefaultTimeline.java:49)
at 
org.sonatype.timeline.TimelineTest.testRepairIndexCouldNotRead(TimelineTest.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.NegativeArraySizeException
at org.apache.lucene.store.IndexInput.readString(IndexInput.java:126)
at org.apache.lucene.index.SegmentInfo.(SegmentInfo.java:173)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:258)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:312)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:521)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:308)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1076)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:868)
at 
org.sonatype.timeline.DefaultTimelineIndexer.configure(DefaultTimelineIndexer.java:100)
... 18 more
{noformat}


> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional co

[jira] Commented: (LUCENE-2104) IndexWriter.unlock does does nothing if NativeFSLockFactory is used

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871235#action_12871235
 ] 

Michael McCandless commented on LUCENE-2104:


bq. Should I backport?

No -- it was already fixed on 3x (I just marked it as such in the issue), 
probably because we branched 3x off after this was committed.

> IndexWriter.unlock does does nothing if NativeFSLockFactory is used
> ---
>
> Key: LUCENE-2104
> URL: https://issues.apache.org/jira/browse/LUCENE-2104
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2104.patch, LUCENE-2104.patch, LUCENE-2104.patch
>
>
> If NativeFSLockFactory is used, IndexWriter.unlock will return, silently 
> doing nothing. The reason is that NativeFSLockFactory's makeLock always 
> creates a new NativeFSLock. NativeFSLock's release first checks if its lock 
> is not null. However, only if obtain() is called, that lock is not null. So 
> release actually does nothing, and so IndexWriter.unlock does not delete the 
> lock, or fail w/ exception.
> This is only a problem in NativeFSLock, and not in other Lock 
> implementations, at least as I was able to see.
> Need to think first how to reproduce in a test, and then fix it. I'll work on 
> it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1926) add hl.q parameter

2010-05-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871234#action_12871234
 ] 

Hoss Man commented on SOLR-1926:


Koji: highlighting isn't something i think about much, but i have to wonder if 
an alternate "highlight _query_" is really the best concept here (specificly 
the part where it's parsed by a QParser into a Query and then the terms are 
extracted)

would it make more sense to imagine a multivalued "hl.text" param, such that 
each value is passed to the analyzer for each "hl.fl" field, and the resulting 
terms are all highlighted (in their respective fields) ... thus bypassing the 
complications of extracting temrs from queries?

would that be more useful or less useful?

(although: i suppose hl.requireFieldMatch wouldn't really make any sense in a 
situation like this ... and there'd be no way to say "highlight DELL only in 
the maker field")

Hmmm... anyway, just something i wanted to toss out there in case it inspired 
you.

> add hl.q parameter
> --
>
> Key: SOLR-1926
> URL: https://issues.apache.org/jira/browse/SOLR-1926
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Affects Versions: 1.4
>Reporter: Koji Sekiguchi
>Priority: Trivial
>
> If hl.q parameter is set, HighlightComponent uses it rather than q.
> Use case:
> You search "PC" with highlight and facet capability:
> {code}
> q=PC
> &facet=on&facet.field=maker&facet.field=something
> &hl=on&hl.fl=desc
> {code}
> You get a lot of results with snippets (term "PC" highlighted in desc field). 
> Then you click a link "maker:DELL(50)" to narrow the result:
> {code}
> q=PC
> &facet=on&facet.field=something
> &fq=maker:DELL
> &hl=on&hl.fl=desc
> {code}
> You'll get narrowed result with term "PC" highlighted snippets. But, 
> sometimes I'd like to see "DELL" to be highlighted as well, because I clicked 
> "DELL". In this case, hl.q can be used:
> {code}
> q=PC
> &facet=on&facet.field=something
> &fq=maker:DELL
> &hl=on&hl.fl=desc&*hl.q=PC+maker:DELL*
> {code}
> Note that hl.requireFieldMatch should be false (false is default) in this 
> scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2455) Some house cleaning in addIndexes*

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871233#action_12871233
 ] 

Michael McCandless commented on LUCENE-2455:


Patch looks good Shai!  Thanks.

> Some house cleaning in addIndexes*
> --
>
> Key: LUCENE-2455
> URL: https://issues.apache.org/jira/browse/LUCENE-2455
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Trivial
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2455_3x.patch, LUCENE-2455_3x.patch, 
> LUCENE-2455_3x.patch, LUCENE-2455_3x.patch, LUCENE-2455_3x.patch
>
>
> Today, the use of addIndexes and addIndexesNoOptimize is confusing - 
> especially on when to invoke each. Also, addIndexes calls optimize() in 
> the beginning, but only on the target index. It also includes the 
> following jdoc statement, which from how I understand the code, is 
> wrong: _After this completes, the index is optimized._ -- optimize() is 
> called in the beginning and not in the end. 
> On the other hand, addIndexesNoOptimize does not call optimize(), and 
> relies on the MergeScheduler and MergePolicy to handle the merges. 
> After a short discussion about that on the list (Thanks Mike for the 
> clarifications!) I understand that there are really two core differences 
> between the two: 
> * addIndexes supports IndexReader extensions
> * addIndexesNoOptimize performs better
> This issue proposes the following:
> # Clear up the documentation of each, spelling out the pros/cons of 
>   calling them clearly in the javadocs.
> # Rename addIndexesNoOptimize to addIndexes
> # Remove optimize() call from addIndexes(IndexReader...)
> # Document that clearly in both, w/ a recommendation to call optimize() 
>   before on any of the Directories/Indexes if it's a concern. 
> That way, we maintain all the flexibility in the API - 
> addIndexes(IndexReader...) allows for using IR extensions, 
> addIndexes(Directory...) is considered more efficient, by allowing the 
> merges to happen concurrently (depending on MS) and also factors in the 
> MP. So unless you have an IR extension, addDirectories is really the one 
> you should be using. And you have the freedom to call optimize() before 
> each if you care about it, or don't if you don't care. Either way, 
> incurring the cost of optimize() is entirely in the user's hands. 
> BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler 
> nor MergePolicy, but rather call SegmentMerger directly. This might be 
> another place for improvement. I'll look into it, and if it's not too 
> complicated, I may cover it by this issue as well. If you have any hints 
> that can give me a good head start on that, please don't be shy :). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2476.


Fix Version/s: 3.0.2
   Resolution: Fixed

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1870) Binary Update Request (javabin) fails when the field type of a multivalued SolrInputDocument field is a Set (or any type that is identified as an instance of iterable)

2010-05-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871231#action_12871231
 ] 

Hoss Man commented on SOLR-1870:


bq. Iterator was created as a special type in javabin codec so that items can 
be streamed. Any collection should have been written as a List of specific size.

I'm confused ... if Iterator support was only ever ment to be "special" for 
streaming items, then why did writeKnownType have support for Iterator?  and 
why did JavaBinUpdateRequestCodec override the default behavior of readIterator 
to treat it special?

As far as your patch goes: instead of adding a new "if (val instanceof 
Collection)" test to writeKnownType, shouldn't you replace the existing 
"instanceof List" with "instanceof Collection" ?

I'm still not understanding all of this, but it also seems like *both* patches 
would be a good idea -- your change ensures that all Collections are serialized 
as an Array, but it still leaves open the possibility of a bug if someone tries 
to use the codec to stream something which is *not* a Collection but is 
Iterable.  perhaps that was not originally ment ot be supported, but is there 
any harm in it?  is the special case behavior for Iterators for streaming used 
in a way besides the "top level" docs iterator that i mentioned?

> Binary Update Request (javabin) fails when the field type of a multivalued 
> SolrInputDocument field is a Set (or any type that is identified as an 
> instance of iterable) 
> 
>
> Key: SOLR-1870
> URL: https://issues.apache.org/jira/browse/SOLR-1870
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, update
>Affects Versions: 1.4
>Reporter: Prasanna Ranganathan
> Attachments: SOLR-1870-test.patch, SOLR-1870.patch, SOLR-1870.patch
>
>
> When the field type of a field in a SolrInputDocument is a Collection based 
> on the Set interface, the JavaBinUpdate request fails. It works when sending 
> the document data over XML.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871230#action_12871230
 ] 

Shai Erera commented on LUCENE-2476:


This exception shows a LockObtainFailed exception - can you post the one that 
resulted in NegativeArraySize -- curious to know where you hit it, and what 
sort of corruption yields to that :).

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871227#action_12871227
 ] 

Michael McCandless commented on LUCENE-2476:


bq. The patch applied to 3.0.1 (I had to do it manually, since I believe this 
patch is against trunk, not 3.0.1) does fix my problem. The IndexWriter is now 
successfully recreated and my UT does recover just fine from corrupted indexes.

OK thanks for confirming -- I'll backport to 3.0.x as well.

(Yes patch is against trunk).

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871226#action_12871226
 ] 

Michael McCandless commented on LUCENE-2476:


bq. The patch applied to 3.0.1 (I had to do it manually, since I believe this 
patch is against trunk, not 3.0.1) does fix my problem. The IndexWriter is now 
successfully recreated and my UT does recover just fine from corrupted indexes.

OK thanks for confirming -- I'll backport to 3.0.x as well.

(Yes patch is against trunk).

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2455) Some house cleaning in addIndexes*

2010-05-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2455:
---

Attachment: LUCENE-2455_3x.patch

Update w/ comments. I plan to commit this either later today or tomorrow (and 
then port it to trunk). So if you haven't done so and want a last chance review 
- that's your chance.

> Some house cleaning in addIndexes*
> --
>
> Key: LUCENE-2455
> URL: https://issues.apache.org/jira/browse/LUCENE-2455
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Trivial
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2455_3x.patch, LUCENE-2455_3x.patch, 
> LUCENE-2455_3x.patch, LUCENE-2455_3x.patch, LUCENE-2455_3x.patch
>
>
> Today, the use of addIndexes and addIndexesNoOptimize is confusing - 
> especially on when to invoke each. Also, addIndexes calls optimize() in 
> the beginning, but only on the target index. It also includes the 
> following jdoc statement, which from how I understand the code, is 
> wrong: _After this completes, the index is optimized._ -- optimize() is 
> called in the beginning and not in the end. 
> On the other hand, addIndexesNoOptimize does not call optimize(), and 
> relies on the MergeScheduler and MergePolicy to handle the merges. 
> After a short discussion about that on the list (Thanks Mike for the 
> clarifications!) I understand that there are really two core differences 
> between the two: 
> * addIndexes supports IndexReader extensions
> * addIndexesNoOptimize performs better
> This issue proposes the following:
> # Clear up the documentation of each, spelling out the pros/cons of 
>   calling them clearly in the javadocs.
> # Rename addIndexesNoOptimize to addIndexes
> # Remove optimize() call from addIndexes(IndexReader...)
> # Document that clearly in both, w/ a recommendation to call optimize() 
>   before on any of the Directories/Indexes if it's a concern. 
> That way, we maintain all the flexibility in the API - 
> addIndexes(IndexReader...) allows for using IR extensions, 
> addIndexes(Directory...) is considered more efficient, by allowing the 
> merges to happen concurrently (depending on MS) and also factors in the 
> MP. So unless you have an IR extension, addDirectories is really the one 
> you should be using. And you have the freedom to call optimize() before 
> each if you care about it, or don't if you don't care. Either way, 
> incurring the cost of optimize() is entirely in the user's hands. 
> BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler 
> nor MergePolicy, but rather call SegmentMerger directly. This might be 
> another place for improvement. I'll look into it, and if it's not too 
> complicated, I may cover it by this issue as well. If you have any hints 
> that can give me a good head start on that, please don't be shy :). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Cservenak, Tamas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871224#action_12871224
 ] 

Cservenak, Tamas commented on LUCENE-2476:
--

This is an UT, that 1st _copies_ a known (broken) Index files to a place, and 
than tries to use it. Naturally, it fails (since the index files are 
corrupted), and then it tries to _recreate_ the index files and recreate the 
index content, but it fails to obtain the write lock again. After patch above 
applied to 3.0.1, the UT does pass okay.

This is the stack trace I have with vanilla 3.0.1:

{noformat}
org.sonatype.timeline.TimelineException: Fail to configure timeline index!
at 
org.sonatype.timeline.DefaultTimelineIndexer.configure(DefaultTimelineIndexer.java:106)
at 
org.sonatype.timeline.DefaultTimeline.repairTimelineIndexer(DefaultTimeline.java:79)
at 
org.sonatype.timeline.DefaultTimeline.configure(DefaultTimeline.java:60)
at 
org.sonatype.timeline.TimelineTest.testRepairIndexCouldNotRead(TimelineTest.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
out: 
NativeFSLock@/Users/cstamas/worx/sonatype/spice/trunk/spice-timeline/target/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1045)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:868)
at 
org.sonatype.timeline.DefaultTimelineIndexer.configure(DefaultTimelineIndexer.java:99)
... 19 more
{noformat}

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2104) IndexWriter.unlock does does nothing if NativeFSLockFactory is used

2010-05-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871223#action_12871223
 ] 

Uwe Schindler commented on LUCENE-2104:
---

Should I backport?

> IndexWriter.unlock does does nothing if NativeFSLockFactory is used
> ---
>
> Key: LUCENE-2104
> URL: https://issues.apache.org/jira/browse/LUCENE-2104
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2104.patch, LUCENE-2104.patch, LUCENE-2104.patch
>
>
> If NativeFSLockFactory is used, IndexWriter.unlock will return, silently 
> doing nothing. The reason is that NativeFSLockFactory's makeLock always 
> creates a new NativeFSLock. NativeFSLock's release first checks if its lock 
> is not null. However, only if obtain() is called, that lock is not null. So 
> release actually does nothing, and so IndexWriter.unlock does not delete the 
> lock, or fail w/ exception.
> This is only a problem in NativeFSLock, and not in other Lock 
> implementations, at least as I was able to see.
> Need to think first how to reproduce in a test, and then fix it. I'll work on 
> it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-1926) add hl.q parameter

2010-05-25 Thread Koji Sekiguchi (JIRA)
add hl.q parameter
--

 Key: SOLR-1926
 URL: https://issues.apache.org/jira/browse/SOLR-1926
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Trivial


If hl.q parameter is set, HighlightComponent uses it rather than q.

Use case:

You search "PC" with highlight and facet capability:

{code}
q=PC
&facet=on&facet.field=maker&facet.field=something
&hl=on&hl.fl=desc
{code}

You get a lot of results with snippets (term "PC" highlighted in desc field). 
Then you click a link "maker:DELL(50)" to narrow the result:

{code}
q=PC
&facet=on&facet.field=something
&fq=maker:DELL
&hl=on&hl.fl=desc
{code}

You'll get narrowed result with term "PC" highlighted snippets. But, sometimes 
I'd like to see "DELL" to be highlighted as well, because I clicked "DELL". In 
this case, hl.q can be used:

{code}
q=PC
&facet=on&facet.field=something
&fq=maker:DELL
&hl=on&hl.fl=desc&*hl.q=PC+maker:DELL*
{code}

Note that hl.requireFieldMatch should be false (false is default) in this 
scenario.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: NPE Within IndexWriter.optimize (Solr Trunk Nightly)

2010-05-25 Thread Uwe Schindler
There is one possibility this error can occur. Maybe you extracted the build
onto an existing (previous) snapshot folder. As the Lucene JARS contain
version numbers it may be the case that an old JAR file is now mixed between
the new ones and so these errors can occur.

Please extract to an empty folder and put your indexes there (if you created
some before).

Please note: The trunk version has no stable index format, indexes may
corrupt easily. If you want a "preview" on the coming stable version,
download the 3.1 Artifacts (not 4.0):
http://hudson.zones.apache.org/hudson/job/Solr-3.x/lastSuccessfulBuild/artif
act/branch_3x/solr/dist/

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Chris Herron [mailto:che...@gmail.com]
> Sent: Tuesday, May 25, 2010 5:38 PM
> To: dev@lucene.apache.org
> Subject: Re: NPE Within IndexWriter.optimize (Solr Trunk Nightly)
> 
> Uwe, Mike,
> 
> I downloaded the nightly build by visiting the wiki:
> http://wiki.apache.org/solr/FrontPage
> ... and then clicking on "Download newest Solr nightly build here"
> http://hudson.zones.apache.org/hudson/job/Solr-
> trunk/lastSuccessfulBuild/artifact/trunk/solr/dist/
> The exact file I download yesterday was:
> http://hudson.zones.apache.org/hudson/job/Solr-
> trunk/lastSuccessfulBuild/artifact/trunk/solr/dist/apache-solr-2010-05-
> 24_08-05-13.tgz (no longer exists).
> 
> I have no patches or mods added. The CHANGES.txt file in the download
> includes this:
> 
> ~~
> ~~~
> $Id: CHANGES.txt 945897 2010-05-18 21:30:41Z hossman $
> 
> ==  4.0.0-dev == Versions of Major
> Components
> -
> Apache Lucene trunk
> Apache Tika 0.6
> Carrot2 3.1.0
> ~~
> ~~~
> 
> Did I fetch the wrong version? If so, where can I grab the (1.5.x) nightly
> builds?
> 
> Mike, thanks for the CheckIndex suggestion. Shall run that once I've
> confirmed which version I'm running.
> 
> Thanks,
> 
> Chris
> 
> On May 25, 2010, at 5:09 AM, Uwe Schindler wrote:
> 
> > Maybe it's the 3x version?
> >
> > The artifact names in Hudson are currently identical for solr-trunk
> > and solr-3x. You have to specifiy which version you use!
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >> -Original Message-
> >> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> >> Sent: Tuesday, May 25, 2010 11:01 AM
> >> To: dev@lucene.apache.org
> >> Subject: Re: NPE Within IndexWriter.optimize (Solr Trunk Nightly)
> >>
> >> Hmmm spooky.
> >>
> >> For some reason I can't correlate the line numbers in
> >> TermInfosReader.java the current trunk sources; the line numbers for
> >> all the other sources line
> > up.
> >> This is a stock nightly build right?
> >> You don't have any mods/patches?
> >>
> >> Can you enable assertions when you optimize and see if anything trips?
> >>
> >> Also, can you run CheckIndex on your index (java -ea
> >> org.apache.lucene.index.CheckIndex /path/to/index), and post the
> output?
> >>
> >> Mike
> >>
> >> On Mon, May 24, 2010 at 7:43 PM, Chris Herron 
> wrote:
> >>> Hi,
> >>>
> >>> I'm using the latest nightly build of solr
> > (apache-solr-2010-05-24_08-05-13)
> >> and am repeatedly experiencing a NullPointerException after calling
> > delete,
> >> commit, optimize. Stack trace below. The index is ~20Gb.
> >>>
> >>> I'm not doing Lucene/Solr core development - I just figured this was
> >>> a
> >> better place to ask given that this was a nightly build.
> >>>
> >>> Any observations that would help resolve?
> >>>
> >>> Thanks,
> >>>
> >>> Chris
> >>>
> >>> SEVERE: java.io.IOException: background merge hit exception:
> >>> _gr5a:C127 _gsbj:C486/3 _gsbk:C1 _gsbl:C1/1 _gsbm:C1 _gsbn:C1
> >>> _gsbo:C1
> >>> _gsbp:C1 _gsbq:C1 _gssn:C69 into _gsss [optimize] [mergeDocStores]
> >>>at
> >>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2418)
> >>>at
> >>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2343)
> >>>at
> >>>
> >>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandle
> >> r
> >>> 2.java:403)
> >>>at
> >>>
> >>
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(Run
> >> U
> >>> pdateProcessorFactory.java:85)
> >>>at
> >>>
> >>
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandl
> >> e
> >>> rUtils.java:107)
> >>>at
> >>>
> >>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> >> n
> >>> tentStreamHandlerBase.java:48)
> >>>at
> >>>
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> >> e
> >>> rBase.java:131)
> >>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1321)
> >>>at
> >>> org

[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871220#action_12871220
 ] 

Shai Erera commented on LUCENE-2476:


Out of curiosity - would you mind posting here the exception?

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Cservenak, Tamas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871219#action_12871219
 ] 

Cservenak, Tamas commented on LUCENE-2476:
--

Yes, I do hit LUCENE-2104 at the same time... nice.

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Cservenak, Tamas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871218#action_12871218
 ] 

Cservenak, Tamas commented on LUCENE-2476:
--

Just to confirm this patch as fix.

The patch applied to 3.0.1 (I had to do it manually, since I believe this patch 
is against trunk, not 3.0.1) does fix my problem. The IndexWriter is now 
successfully recreated and my UT does recover just fine from corrupted indexes.

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871217#action_12871217
 ] 

Michael McCandless commented on LUCENE-2476:


bq. I tried both IndexWriter#unlock and 
Directory#cleanLock(IndexWriter.WRITE_LOCK_NAME) but non of those removed the 
entry from LOCK_HELD HashSet. It was unchanged.

Ahh, sorry, I think you are hitting LUCENE-2104.

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2104) IndexWriter.unlock does does nothing if NativeFSLockFactory is used

2010-05-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2104:
---

Fix Version/s: 3.1

> IndexWriter.unlock does does nothing if NativeFSLockFactory is used
> ---
>
> Key: LUCENE-2104
> URL: https://issues.apache.org/jira/browse/LUCENE-2104
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Shai Erera
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2104.patch, LUCENE-2104.patch, LUCENE-2104.patch
>
>
> If NativeFSLockFactory is used, IndexWriter.unlock will return, silently 
> doing nothing. The reason is that NativeFSLockFactory's makeLock always 
> creates a new NativeFSLock. NativeFSLock's release first checks if its lock 
> is not null. However, only if obtain() is called, that lock is not null. So 
> release actually does nothing, and so IndexWriter.unlock does not delete the 
> lock, or fail w/ exception.
> This is only a problem in NativeFSLock, and not in other Lock 
> implementations, at least as I was able to see.
> Need to think first how to reproduce in a test, and then fix it. I'll work on 
> it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Cservenak, Tamas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871206#action_12871206
 ] 

Cservenak, Tamas commented on LUCENE-2476:
--

I tried both IndexWriter#unlock and 
Directory#cleanLock(IndexWriter.WRITE_LOCK_NAME) but non of those removed the 
entry from LOCK_HELD HashSet. It was unchanged.

The NativeFSLock#release() was returning false in both cases.

So, this is what I meant by "provide some circumvention", since up to now, I 
did not figure out any other means to remove the entry from LOCK_HELD. All of 
these did _not_ remove it.

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2476:
---

Attachment: LUCENE-2476.patch

Patch.

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2476.patch
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2476:
---

Fix Version/s: 3.1
   4.0

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871201#action_12871201
 ] 

Michael McCandless commented on LUCENE-2476:


I agree, we should fix this.  I'll change to a try/finally w/ a success boolean.

You can use IndexWriter#unlock to forcefully remove the lock, as a workaround.

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Priority: Blocker
> Fix For: 3.1, 4.0
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871200#action_12871200
 ] 

Shai Erera commented on LUCENE-2476:


Can you post here the full stacktrace?

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Priority: Blocker
> Fix For: 3.1, 4.0
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2476:
--

Assignee: Michael McCandless

> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained
> 
>
> Key: LUCENE-2476
> URL: https://issues.apache.org/jira/browse/LUCENE-2476
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Affects Versions: 3.0.1
>Reporter: Cservenak, Tamas
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.1, 4.0
>
>
> Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
> writeLock obtained.
> The init method in IndexWriter catches IOException only (I got 
> NegativeArraySize by reading up a _corrupt_ index), and now, there is no way 
> to recover, since the writeLock will be kept obtained. Moreover, I don't have 
> IndexWriter instance either, to "grab" the lock somehow, since the init() 
> method is called from IndexWriter constructor.
> Either broaden the catch to all exceptions, or at least provide some 
> circumvention to clear up. In my case, I'd like to "fallback", just delete 
> the corrupted index from disk and recreate it, but it is impossible, since 
> the LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
> cleaned out and is no (at least apparent) way to clean it out forcibly. I 
> can't create new IndexWriter, since it will always fail with 
> LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: NPE Within IndexWriter.optimize (Solr Trunk Nightly)

2010-05-25 Thread Chris Herron
Uwe, Mike,

I downloaded the nightly build by visiting the wiki:
http://wiki.apache.org/solr/FrontPage
... and then clicking on "Download newest Solr nightly build here"
http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/solr/dist/
The exact file I download yesterday was:
http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/solr/dist/apache-solr-2010-05-24_08-05-13.tgz
 (no longer exists).

I have no patches or mods added. The CHANGES.txt file in the download includes 
this:

~
$Id: CHANGES.txt 945897 2010-05-18 21:30:41Z hossman $

==  4.0.0-dev ==
Versions of Major Components
-
Apache Lucene trunk
Apache Tika 0.6
Carrot2 3.1.0
~

Did I fetch the wrong version? If so, where can I grab the (1.5.x) nightly 
builds?

Mike, thanks for the CheckIndex suggestion. Shall run that once I've confirmed 
which version I'm running.

Thanks,

Chris

On May 25, 2010, at 5:09 AM, Uwe Schindler wrote:

> Maybe it's the 3x version?
> 
> The artifact names in Hudson are currently identical for solr-trunk and
> solr-3x. You have to specifiy which version you use!
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
>> -Original Message-
>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Sent: Tuesday, May 25, 2010 11:01 AM
>> To: dev@lucene.apache.org
>> Subject: Re: NPE Within IndexWriter.optimize (Solr Trunk Nightly)
>> 
>> Hmmm spooky.
>> 
>> For some reason I can't correlate the line numbers in TermInfosReader.java
>> the current trunk sources; the line numbers for all the other sources line
> up.
>> This is a stock nightly build right?
>> You don't have any mods/patches?
>> 
>> Can you enable assertions when you optimize and see if anything trips?
>> 
>> Also, can you run CheckIndex on your index (java -ea
>> org.apache.lucene.index.CheckIndex /path/to/index), and post the output?
>> 
>> Mike
>> 
>> On Mon, May 24, 2010 at 7:43 PM, Chris Herron  wrote:
>>> Hi,
>>> 
>>> I'm using the latest nightly build of solr
> (apache-solr-2010-05-24_08-05-13)
>> and am repeatedly experiencing a NullPointerException after calling
> delete,
>> commit, optimize. Stack trace below. The index is ~20Gb.
>>> 
>>> I'm not doing Lucene/Solr core development - I just figured this was a
>> better place to ask given that this was a nightly build.
>>> 
>>> Any observations that would help resolve?
>>> 
>>> Thanks,
>>> 
>>> Chris
>>> 
>>> SEVERE: java.io.IOException: background merge hit exception:
>>> _gr5a:C127 _gsbj:C486/3 _gsbk:C1 _gsbl:C1/1 _gsbm:C1 _gsbn:C1 _gsbo:C1
>>> _gsbp:C1 _gsbq:C1 _gssn:C69 into _gsss [optimize] [mergeDocStores]
>>>at
>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2418)
>>>at
>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2343)
>>>at
>>> 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler
>>> 2.java:403)
>>>at
>>> 
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(Run
>> U
>>> pdateProcessorFactory.java:85)
>>>at
>>> 
>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandle
>>> rUtils.java:107)
>>>at
>>> 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
>> n
>>> tentStreamHandlerBase.java:48)
>>>at
>>> 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
>> e
>>> rBase.java:131)
>>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1321)
>>>at
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
>>> java:341)
>>>at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
>>> .java:244)
>>>at
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletH
>>> andler.java:1190)
>>>at
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:
>>> 424)
>>>at
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
>>> va:119)
>>>at
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java
>>> :457)
>>>at
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandle
>>> r.java:229)
>>>at
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandle
>>> r.java:931)
>>>at
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:3
>>> 61)
>>>at
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler
>>> .java:186)
>>>at
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler
>>> .java:867)
>>>at
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
>>> va:117)
>>>at
>>> org.

[jira] Created: (LUCENE-2476) Constructor of IndexWriter let's runtime exceptions pop up, while keeping the writeLock obtained

2010-05-25 Thread Cservenak, Tamas (JIRA)
Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
writeLock obtained


 Key: LUCENE-2476
 URL: https://issues.apache.org/jira/browse/LUCENE-2476
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Affects Versions: 3.0.1
Reporter: Cservenak, Tamas
Priority: Blocker


Constructor of IndexWriter let's runtime exceptions pop up, while keeping the 
writeLock obtained.

The init method in IndexWriter catches IOException only (I got 
NegativeArraySize by reading up a _corrupt_ index), and now, there is no way to 
recover, since the writeLock will be kept obtained. Moreover, I don't have 
IndexWriter instance either, to "grab" the lock somehow, since the init() 
method is called from IndexWriter constructor.

Either broaden the catch to all exceptions, or at least provide some 
circumvention to clear up. In my case, I'd like to "fallback", just delete the 
corrupted index from disk and recreate it, but it is impossible, since the 
LOCK_HELD NativeFSLockFactory's entry about obtained WriteLock is _never_ 
cleaned out and is no (at least apparent) way to clean it out forcibly. I can't 
create new IndexWriter, since it will always fail with 
LockObtainFailedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1925) CSV Response Writer

2010-05-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871186#action_12871186
 ] 

Yonik Seeley commented on SOLR-1925:


This is something that some people have asked for since my CNET days...  I 
thought there was already an open issue for this, but I can't seem to find it 
(so I guess not!)

> CSV Response Writer
> ---
>
> Key: SOLR-1925
> URL: https://issues.apache.org/jira/browse/SOLR-1925
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
> Environment: indep. of env.
>Reporter: Chris A. Mattmann
> Fix For: 1.5
>
>
> As part of some work I'm doing, I put together a CSV Response Writer. It 
> currently takes all the docs resultant from a query and then outputs their 
> metadata in simple CSV format. The use of a delimeter is configurable (by 
> default if there are multiple values for a particular field they are 
> separated with a | symbol).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1925) CSV Response Writer

2010-05-25 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871180#action_12871180
 ] 

Chris A. Mattmann commented on SOLR-1925:
-

Hey Eric cool!

Sure, I'd love to collaborate with you on this. Patch forthcoming, then let's 
work it...

Cheers,
Chris


> CSV Response Writer
> ---
>
> Key: SOLR-1925
> URL: https://issues.apache.org/jira/browse/SOLR-1925
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
> Environment: indep. of env.
>Reporter: Chris A. Mattmann
> Fix For: 1.5
>
>
> As part of some work I'm doing, I put together a CSV Response Writer. It 
> currently takes all the docs resultant from a query and then outputs their 
> metadata in simple CSV format. The use of a delimeter is configurable (by 
> default if there are multiple values for a particular field they are 
> separated with a | symbol).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1925) CSV Response Writer

2010-05-25 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871172#action_12871172
 ] 

Erik Hatcher commented on SOLR-1925:


I was _just_ thinking of writing this very thing the other day.   

I think this should use the same default delimiters and header as the CSV 
update handler does so that data is easily ingested and output in the the same 
format (provided the field data is stored of course).   

> CSV Response Writer
> ---
>
> Key: SOLR-1925
> URL: https://issues.apache.org/jira/browse/SOLR-1925
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
> Environment: indep. of env.
>Reporter: Chris A. Mattmann
> Fix For: 1.5
>
>
> As part of some work I'm doing, I put together a CSV Response Writer. It 
> currently takes all the docs resultant from a query and then outputs their 
> metadata in simple CSV format. The use of a delimeter is configurable (by 
> default if there are multiple values for a particular field they are 
> separated with a | symbol).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-236) Field collapsing

2010-05-25 Thread Kallin Nagelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871168#action_12871168
 ] 

Kallin Nagelberg commented on SOLR-236:
---

I tried asking this question on the user list, but perhaps this is a more 
appropriate forum. 

As I understand field collapsing has been disabled on multi-valued fields. Is 
this really necessary?

Let's say I have a multi-valued field, 'my-mv-field'. I have a query like 
(my-mv-field:1 OR my-mv-field:5) that returns docs with the following values 
for 'my-mv-field':
 
Doc1: 1, 2, 3, 
Doc2: 1, 3 
Doc3: 2, 4, 5, 6
Doc4: 1

If I collapse on that field with that query I imagine it should mean 'collect 
the docs, starting from the top, so that I find 1 and 5'. In this case if it 
returned Doc1 and Doc3 I would be happy. 

There must be some ambiguity or implementation detail I am unaware that is 
preventing this. It may be a critical piece of functionality for an application 
I'm working on, so I'm curious if there is point in pursuing development of 
this functionality or if I am missing something.

Thanks,
Kallin Nagelberg

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
> field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-1925) CSV Response Writer

2010-05-25 Thread Chris A. Mattmann (JIRA)
CSV Response Writer
---

 Key: SOLR-1925
 URL: https://issues.apache.org/jira/browse/SOLR-1925
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
 Environment: indep. of env.
Reporter: Chris A. Mattmann
 Fix For: 1.5


As part of some work I'm doing, I put together a CSV Response Writer. It 
currently takes all the docs resultant from a query and then outputs their 
metadata in simple CSV format. The use of a delimeter is configurable (by 
default if there are multiple values for a particular field they are separated 
with a | symbol).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr updateRequestHandler and performance vs. atomicity

2010-05-25 Thread Yonik Seeley
On Mon, May 24, 2010 at 9:10 AM,   wrote:
> In particular, it would be nice to be able to post documents in such a way
> that you can guarantee that the document is permanently in Solr’s queue,
> safe in the event of a Solr restart, etc., even if the document has not yet
> been “committed”.

Yep, this is a longer term goal of SolrCloud.
And to be truly safe, committing to stable storage is not enough -
that still might crash and never recover.  One needs to write to
multiple nodes.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries

2010-05-25 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871123#action_12871123
 ] 

Peter Wolanin commented on SOLR-1852:
-

I'm thinking about 1.4 backporting - not sure what's happening with 1.5

Yes, you'd have to re-index if we have to backport to 1.4, but I assume that's 
only going to affect documents that would currently have broken searches?

> enablePositionIncrements="true" can cause searches to fail when they are 
> parsed as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
>Assignee: Robert Muir
> Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 
> 1.4 analyzer tells me that I will get a match, but when I enter the search 
> either in the client or directly in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another 
> sentence:
> "Support Identi.ca"
> the search matches.  Testing suggests the word "for" is the problem, and it 
> looks like the bug occurs when a stop word preceeds a word that is split up 
> using the word delimiter filter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing 
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in 
> Solr trunk, either due to the upgraded lucene or changes to the 
> WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1923) add caverphone to phoneticfilter

2010-05-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-1923.
---

Resolution: Fixed

Committed revision 948011 (trunk) / 948027 (3x)

> add caverphone to phoneticfilter
> 
>
> Key: SOLR-1923
> URL: https://issues.apache.org/jira/browse/SOLR-1923
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 3.1
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-1923.patch
>
>
> we upgraded commons-codec but didn't add this new one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2455) Some house cleaning in addIndexes*

2010-05-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871109#action_12871109
 ] 

Shai Erera commented on LUCENE-2455:


The only place I see firstInt is used perhaps unnecessarily is in the for-loop. 
So I've changed the code to look like this:

{code}
int count, version;
if (firstInt < CompoundFileWriter.FORMAT_PRE_VERSION) {
  count = stream.readVInt();
  version = firstInt;
} else {
  count = firstInt;
  version = CompoundFileWriter.FORMAT_PRE_VERSION;
}
{code}

And then I query for version == CompoundFileWriter.FORMAT_PRE_VERSION inside 
the for-loop. Is that what you meant?

There is a check before all that ensuring that read firstInt does not indicate 
an index corruption -- that should remain as-is, right?

> Some house cleaning in addIndexes*
> --
>
> Key: LUCENE-2455
> URL: https://issues.apache.org/jira/browse/LUCENE-2455
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Trivial
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2455_3x.patch, LUCENE-2455_3x.patch, 
> LUCENE-2455_3x.patch, LUCENE-2455_3x.patch
>
>
> Today, the use of addIndexes and addIndexesNoOptimize is confusing - 
> especially on when to invoke each. Also, addIndexes calls optimize() in 
> the beginning, but only on the target index. It also includes the 
> following jdoc statement, which from how I understand the code, is 
> wrong: _After this completes, the index is optimized._ -- optimize() is 
> called in the beginning and not in the end. 
> On the other hand, addIndexesNoOptimize does not call optimize(), and 
> relies on the MergeScheduler and MergePolicy to handle the merges. 
> After a short discussion about that on the list (Thanks Mike for the 
> clarifications!) I understand that there are really two core differences 
> between the two: 
> * addIndexes supports IndexReader extensions
> * addIndexesNoOptimize performs better
> This issue proposes the following:
> # Clear up the documentation of each, spelling out the pros/cons of 
>   calling them clearly in the javadocs.
> # Rename addIndexesNoOptimize to addIndexes
> # Remove optimize() call from addIndexes(IndexReader...)
> # Document that clearly in both, w/ a recommendation to call optimize() 
>   before on any of the Directories/Indexes if it's a concern. 
> That way, we maintain all the flexibility in the API - 
> addIndexes(IndexReader...) allows for using IR extensions, 
> addIndexes(Directory...) is considered more efficient, by allowing the 
> merges to happen concurrently (depending on MS) and also factors in the 
> MP. So unless you have an IR extension, addDirectories is really the one 
> you should be using. And you have the freedom to call optimize() before 
> each if you care about it, or don't if you don't care. Either way, 
> incurring the cost of optimize() is entirely in the user's hands. 
> BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler 
> nor MergePolicy, but rather call SegmentMerger directly. This might be 
> another place for improvement. I'll look into it, and if it's not too 
> complicated, I may cover it by this issue as well. If you have any hints 
> that can give me a good head start on that, please don't be shy :). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Andrzej Bialecki as Lucene/Solr committer

2010-05-25 Thread Robert Muir
Welcome!

On Tue, May 25, 2010 at 8:01 AM, Andrzej Bialecki  wrote:
> On 2010-05-24 23:06, Yonik Seeley wrote:
>> On Mon, May 24, 2010 at 5:33 AM, Michael McCandless
>>  wrote:
>>> I'm happy to announce that the PMC has accepted Andrzej Bialecki as
>>> Lucene/Solr committer!
>>>
>>> Welcome aboard Andrzej,
>>
>> An enthusiastic jet lagged +1 ;-)
>
> :) Thanks everyone!
>
> Regarding the customary introduction: I've been hanging around for some
> time already ... so a few less-known tidbits are: I live in Poland with
> my wife and 2 kids, in a country house in the vicinity of Warsaw (using
> radiolink hookup if you're curious). In my free time I play the guitar
> (decently) and piano (poorly, but enthusiastically), I tend the garden
> and enjoy working with various powered tools - recently acquired a
> welding machine, you can imagine the possibilities... and I read ca. 2-3
> SF paperbacks per week. I've been a happy Lucene user since 2003,
> created the Luke tool, then became Nutch/Hadoop committer, and now I'm
> proud to become a Lucene committer.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENENET-368) Similarity.Net doesn't compile with Lucene trunk

2010-05-25 Thread Simone Chiaretta (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simone Chiaretta updated LUCENENET-368:
---

Attachment: similarity-net-cantcompile.patch

Attached is the patch to fix this compile error

> Similarity.Net doesn't compile with Lucene trunk
> 
>
> Key: LUCENENET-368
> URL: https://issues.apache.org/jira/browse/LUCENENET-368
> Project: Lucene.Net
>  Issue Type: Bug
>Reporter: Simone Chiaretta
>Priority: Minor
> Attachments: similarity-net-cantcompile.patch
>
>
> If you compile Similarity.Net using Lucene.net 2.9.2 (or also from trunk) you 
> get the following compile error:
> C:\Projects\lucene.net\Lucene.Net_2_9_2\contrib\Similarity.Net\Similarity.Net\Similar\MoreLikeThis.cs(500,57):
>  error CS0266: Cannot implicitly convert type 
> 'System.Collections.Generic.ICollection' to 
> 'System.Collections.ICollection'. An explicit conversion exists (are you 
> missing a cast?)
> C:\Projects\lucene.net\Lucene.Net_2_9_2\contrib\Similarity.Net\Similarity.Net\Similar\MoreLikeThis.cs(521,57):
>  error CS0266: Cannot implicitly convert type 
> 'System.Collections.Generic.ICollection' to 
> 'System.Collections.ICollection'. An explicit conversion exists (are you 
> missing a cast?)
> This is caused by IndexReader.GetFieldNames returns a ICollection 
> rather then just a ICollection as before.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (LUCENENET-368) Similarity.Net doesn't compile with Lucene trunk

2010-05-25 Thread Simone Chiaretta (JIRA)
Similarity.Net doesn't compile with Lucene trunk


 Key: LUCENENET-368
 URL: https://issues.apache.org/jira/browse/LUCENENET-368
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Simone Chiaretta
Priority: Minor


If you compile Similarity.Net using Lucene.net 2.9.2 (or also from trunk) you 
get the following compile error:

C:\Projects\lucene.net\Lucene.Net_2_9_2\contrib\Similarity.Net\Similarity.Net\Similar\MoreLikeThis.cs(500,57):
 error CS0266: Cannot implicitly convert type 
'System.Collections.Generic.ICollection' to 
'System.Collections.ICollection'. An explicit conversion exists (are you 
missing a cast?)
C:\Projects\lucene.net\Lucene.Net_2_9_2\contrib\Similarity.Net\Similarity.Net\Similar\MoreLikeThis.cs(521,57):
 error CS0266: Cannot implicitly convert type 
'System.Collections.Generic.ICollection' to 
'System.Collections.ICollection'. An explicit conversion exists (are you 
missing a cast?)

This is caused by IndexReader.GetFieldNames returns a ICollection 
rather then just a ICollection as before.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Welcome Andrzej Bialecki as Lucene/Solr committer

2010-05-25 Thread Andrzej Bialecki
On 2010-05-24 23:06, Yonik Seeley wrote:
> On Mon, May 24, 2010 at 5:33 AM, Michael McCandless
>  wrote:
>> I'm happy to announce that the PMC has accepted Andrzej Bialecki as
>> Lucene/Solr committer!
>>
>> Welcome aboard Andrzej,
> 
> An enthusiastic jet lagged +1 ;-)

:) Thanks everyone!

Regarding the customary introduction: I've been hanging around for some
time already ... so a few less-known tidbits are: I live in Poland with
my wife and 2 kids, in a country house in the vicinity of Warsaw (using
radiolink hookup if you're curious). In my free time I play the guitar
(decently) and piano (poorly, but enthusiastically), I tend the garden
and enjoy working with various powered tools - recently acquired a
welding machine, you can imagine the possibilities... and I read ca. 2-3
SF paperbacks per week. I've been a happy Lucene user since 2003,
created the Luke tool, then became Nutch/Hadoop committer, and now I'm
proud to become a Lucene committer.

-- 
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-05-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2380:
---

Attachment: LUCENE-2380.patch

New patch -- now all tests pass.  Getting closer... but I still have to perf 
tes...

> Add FieldCache.getTermBytes, to load term data as byte[]
> 
>
> Key: LUCENE-2380
> URL: https://issues.apache.org/jira/browse/LUCENE-2380
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
> string, but not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding 
> methods to load terms as native byte[], since in general they may not be 
> representable as String.  This should be quite a bit more RAM efficient too, 
> for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2455) Some house cleaning in addIndexes*

2010-05-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871075#action_12871075
 ] 

Michael McCandless commented on LUCENE-2455:


Could you fix "firstInt' to have a very short life?

Meaning, you read firstInt, and very quickly use that to assign to version & 
count, and no longer use it again.  Ie, all subsequent checks when loading 
should be against version, not firstInt...

Also, can you maybe rename CFW.PRE_VERSION -> CFW.FORMAT_PRE_VERSION?  (to 
match the other FORMAT_X).

Otherwise looks great!

> Some house cleaning in addIndexes*
> --
>
> Key: LUCENE-2455
> URL: https://issues.apache.org/jira/browse/LUCENE-2455
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Trivial
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2455_3x.patch, LUCENE-2455_3x.patch, 
> LUCENE-2455_3x.patch, LUCENE-2455_3x.patch
>
>
> Today, the use of addIndexes and addIndexesNoOptimize is confusing - 
> especially on when to invoke each. Also, addIndexes calls optimize() in 
> the beginning, but only on the target index. It also includes the 
> following jdoc statement, which from how I understand the code, is 
> wrong: _After this completes, the index is optimized._ -- optimize() is 
> called in the beginning and not in the end. 
> On the other hand, addIndexesNoOptimize does not call optimize(), and 
> relies on the MergeScheduler and MergePolicy to handle the merges. 
> After a short discussion about that on the list (Thanks Mike for the 
> clarifications!) I understand that there are really two core differences 
> between the two: 
> * addIndexes supports IndexReader extensions
> * addIndexesNoOptimize performs better
> This issue proposes the following:
> # Clear up the documentation of each, spelling out the pros/cons of 
>   calling them clearly in the javadocs.
> # Rename addIndexesNoOptimize to addIndexes
> # Remove optimize() call from addIndexes(IndexReader...)
> # Document that clearly in both, w/ a recommendation to call optimize() 
>   before on any of the Directories/Indexes if it's a concern. 
> That way, we maintain all the flexibility in the API - 
> addIndexes(IndexReader...) allows for using IR extensions, 
> addIndexes(Directory...) is considered more efficient, by allowing the 
> merges to happen concurrently (depending on MS) and also factors in the 
> MP. So unless you have an IR extension, addDirectories is really the one 
> you should be using. And you have the freedom to call optimize() before 
> each if you care about it, or don't if you don't care. Either way, 
> incurring the cost of optimize() is entirely in the user's hands. 
> BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler 
> nor MergePolicy, but rather call SegmentMerger directly. This might be 
> another place for improvement. I'll look into it, and if it's not too 
> complicated, I may cover it by this issue as well. If you have any hints 
> that can give me a good head start on that, please don't be shy :). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Solr updateRequestHandler and performance vs. atomicity

2010-05-25 Thread karl.wright
I created SOLR-1924.  Let me know if it's clear enough, or if you'd like me to 
modify the ticket in any way.
Thanks,
Karl

From: ext Mark Miller [markrmil...@gmail.com]
Sent: Tuesday, May 25, 2010 5:20 AM
To: dev@lucene.apache.org
Subject: Re: Solr updateRequestHandler and performance vs. atomicity

Okay, makes sense.

Perhaps one easier way to explore this is the aux index idea, but only
use stored fields - that gets us lucenes commit stuff for free,
without analysis.  Then there is just the more difficult part of
ensuring transfer from this mini index to the main index for indexing
on commit.

I'd def open a jira issue for this functionality. You will still pay
for committing so often (frequent fsync is costly, especially on some
fs) but I'm sure you can pay a lot less than currently.

On Tuesday, May 25, 2010,   wrote:
> The reason for this is simple.  LCF keeps track of which documents it has 
> handed off to Solr, and has a fairly involved mechanism for making sure that 
> every document LCF *thinks* got there, actually does.  It even uses a 
> mechanism akin to a 2-phase commit to make sure that its internal records and 
> those of the downstream index are never out of synch.
>
> Now, along comes Solr, and the system loses a good deal of its resilience, 
> because there is a chance that somebody or something will kick Solr after a 
> document (or a set of documents) has been transmitted to it, but LCF will 
> have no awareness of this situation at all, and will thus never try to fix 
> the problem on the next job run (or whatever).  So instead of automatic 
> resilience, you get one of two possible solutions:
>
> (1) Manual intervention.  Somebody has to manually inform LCF of the Solr 
> hiccup, and LCF thus will have to invalidate all documents it ever sent to 
> Solr (because it doesn't know what documents could have been affected).
> (2) A solr commit on every post.  This slows down LCF significantly, because 
> each document post takes something like 10x as long to do.
>
> Does this help?
> Karl
>
> -Original Message-
> From: ext Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Monday, May 24, 2010 4:40 PM
> To: dev@lucene.apache.org
> Subject: Re: Solr updateRequestHandler and performance vs. atomicity
>
> Indexing a doc won't be as fast as raw disk IO. But you won't be doing
> just raw disk IO to guarantee acceptance. And that will have a cost and
> complexity that really makes me wonder if its worth the speed advantage.
> For very large documents with complex analyzers...perhaps. But its not
> going to be an easily implementable feature (if its a true guarantee).
> And its still got to involve logs and/or fsync and all that.
>
> The reasoning for this is not ringing a bell - can you elaborate on the
> motivations?
>
> Is this so that you can commit on every doc? Every few docs?
>
> I can def see how this would be desirable in general, but just to be
> clear on your motivations.
>
>
> - Mark
>
> On 5/24/10 10:03 PM, karl.wri...@nokia.com wrote:
>> Hi Mark,
>>
>> Unfortunately, indexing performance *is* of concern, otherwise I'd already 
>> be committing on every post.
>>
>> If your guess is correct, you are basically saying that adding a document to 
>> an index in Solr/Lucene is just as fast as writing that file directly to the 
>> disk.  Because, obviously, if we want guaranteed delivery, that's what we'd 
>> have to do.  But I think this is worth the experiment - Solr/Lucene may be 
>> fast, but I have doubts that it can perform as well as raw disk I/O and 
>> still manage to do anything in the way of document analysis or (heaven 
>> forbid) text extraction.
>>
>>
>>
>> -Original Message-
>> From: ext Mark Miller [mailto:markrmil...@gmail.com]
>> Sent: Monday, May 24, 2010 3:33 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Solr updateRequestHandler and performance vs. atomicity
>>
>> On 5/24/10 3:10 PM, karl.wri...@nokia.com wrote:
>>> Hi all,
>>> It seems to me that the "commit" logic in the Solr updateRequestHandler
>>> (or wherever the logic is actually located) conflates two different
>>> semantics. One semantic is what you need to do to make the index process
>>> perform well. The other semantic is guaranteed atomicity of document
>>> reception by Solr.
>>> In particular, it would be nice to be able to post documents in such a
>>> way that you can guarantee that the document is permanently in Solr's
>>> queue, safe in the event of a Solr restart, etc., even if the document
>>> has not yet been "committed".
>>> This issue came up in the LCF talk that I gave, and I initially thought
>>> that separating the two kinds of events would necessarily be an LCF
>>> change, but the more I thought about it the more I realized that other
>>> Solr indexing clients may also benefit from such a separation.
>>> Does anyone agree? Where should this logic properly live?
>>> Thanks,
>>> Karl
>>
>> Its an interesting idea - but I think you would 

[jira] Created: (SOLR-1924) Solr's updateRequestHandler does not have a fast way of guaranteeing document delivery

2010-05-25 Thread Karl Wright (JIRA)
Solr's updateRequestHandler does not have a fast way of guaranteeing document 
delivery
--

 Key: SOLR-1924
 URL: https://issues.apache.org/jira/browse/SOLR-1924
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Karl Wright


It is currently not possible, without performing a commit on every document, to 
use updateRequestHandler to guarantee delivery into the index of any document.  
The reason is that whenever Solr is restarted, some or all documents that have 
not been committed yet are dropped on the floor, and there is no way for a 
client of updateRequestHandler to know which ones this happened to.

I believe it is not even possible to write a middleware-style layer that stores 
documents and performs periodic commits on its own, because the update request 
handler never ACKs individual documents on a commit, but merely everything it 
has seen since the last time Solr bounced.  So you have this potential scenario:

- middleware layer receives document 1, saves it
- middleware layer receives document 2, saves it
Now it's time for the commit, so:
- middleware layer sends document 1 to updateRequestHandler
- solr is restarted, dropping all uncommitted documents on the floor
- middleware layer sends document 2 to updateRequestHandler
- middleware layer sends COMMIT to updateRequestHandler, but solr adds only 
document 2 to the index
- middleware believes incorrectly that it has successfully committed both 
documents

An ideal solution would be for Solr to separate the semantics of commit (the 
index building variety) from the semantics of commit (the 'I got the document' 
variety).  Perhaps this will involve a persistent document queue that will 
persist over a Solr restart.

An alternative mechanism might be for updateRequestHandler to acknowledge 
specifically committed documents in its response to an explicit commit.  But 
this would make it difficult or impossible to use autocommit usefully in such 
situations.  The only other alternative is to require clients that need 
guaranteed delivery to commit on every document, with a considerable 
performance penalty.

This ticket is related to LCF in that LCF is one of the clients that really 
needs some kind of guaranteed delivery mechanism.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Solr updateRequestHandler and performance vs. atomicity

2010-05-25 Thread karl.wright
Hi Simon,

I think you are on the right track.

I believe it is not even possible to write a middleware-style layer that stores 
documents and performs periodic commits on its own, because the update request 
handler never ACKs individual documents on a commit, but merely everything it 
has seen since the last time Solr bounced.  So you have this potential scenario:

- middleware layer receives document 1, saves it
- middleware layer receives document 2, saves it
Now it's time for the commit, so:
- middleware layer sends document 1 to updateRequestHandler
- solr is restarted, dropping all uncommitted documents on the floor
- middleware layer sends document 2 to updateRequestHandler
- middleware layer sends COMMIT to updateRequestHandler, but solr adds only 
document 2 to the index
- middleware believes incorrectly that it has successfully committed both 
documents

If I were any kind of mathematician, I suspect I could even prove that the 
current API has this inherent race condition built into its semantics.

I never claimed this was going to be easy :-).  But it does seem to be 
valuable, perhaps critically so.

Karl


From: ext Simon Willnauer [simon.willna...@googlemail.com]
Sent: Monday, May 24, 2010 4:29 PM
To: dev@lucene.apache.org
Subject: Re: Solr updateRequestHandler and performance vs. atomicity

Hi Karl,

what are you describing seems to be a good usecase for something like
a message queue where you push a document or record to a queue which
guarantees the queues persistence. I look at this from a little
different perspective, in a distributed environment you would have to
guarantee delivery to a single solr instance but on several or at
least n instances but that is a different story.

>From a Solr point of view this sounds like a need for a write-ahead
log that guarantees durability and atomicity. I like this idea as it
might also solve lots of problems in distributed environments (solr
cloud) etc.

Very interesting topic - should investigate more in this direction


simon


On Mon, May 24, 2010 at 10:03 PM,   wrote:
> Hi Mark,
>
> Unfortunately, indexing performance *is* of concern, otherwise I'd already be 
> committing on every post.
>
> If your guess is correct, you are basically saying that adding a document to 
> an index in Solr/Lucene is just as fast as writing that file directly to the 
> disk.  Because, obviously, if we want guaranteed delivery, that's what we'd 
> have to do.  But I think this is worth the experiment - Solr/Lucene may be 
> fast, but I have doubts that it can perform as well as raw disk I/O and still 
> manage to do anything in the way of document analysis or (heaven forbid) text 
> extraction.
>
>
>
> -Original Message-
> From: ext Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Monday, May 24, 2010 3:33 PM
> To: dev@lucene.apache.org
> Subject: Re: Solr updateRequestHandler and performance vs. atomicity
>
> On 5/24/10 3:10 PM, karl.wri...@nokia.com wrote:
>> Hi all,
>> It seems to me that the "commit" logic in the Solr updateRequestHandler
>> (or wherever the logic is actually located) conflates two different
>> semantics. One semantic is what you need to do to make the index process
>> perform well. The other semantic is guaranteed atomicity of document
>> reception by Solr.
>> In particular, it would be nice to be able to post documents in such a
>> way that you can guarantee that the document is permanently in Solr's
>> queue, safe in the event of a Solr restart, etc., even if the document
>> has not yet been "committed".
>> This issue came up in the LCF talk that I gave, and I initially thought
>> that separating the two kinds of events would necessarily be an LCF
>> change, but the more I thought about it the more I realized that other
>> Solr indexing clients may also benefit from such a separation.
>> Does anyone agree? Where should this logic properly live?
>> Thanks,
>> Karl
>
> Its an interesting idea - but I think you would likely pay a similar
> cost to guarantee reception as you would to commit (also, I'm not sure
> Lucene guarantees it - it works for consistency, but I'm not so sure it
> achieves durability).
>
> I can think of two things offhand -
>
> Perhaps store the text and use fsync to quasi guarantee acceptance -
> then index from the store on the commit.
>
> Another simpler idea if only the separation is important and not the
> performance - index to another side index, taking advantage of Lucene's
> current commit functionality, and then use addIndex to merge to the main
> index on commit.
>
> Just spit balling though.
>
> I think this would obviously need to be an optional mode.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> --

Re: Solr updateRequestHandler and performance vs. atomicity

2010-05-25 Thread Mark Miller
Okay, makes sense.

Perhaps one easier way to explore this is the aux index idea, but only
use stored fields - that gets us lucenes commit stuff for free,
without analysis.  Then there is just the more difficult part of
ensuring transfer from this mini index to the main index for indexing
on commit.

I'd def open a jira issue for this functionality. You will still pay
for committing so often (frequent fsync is costly, especially on some
fs) but I'm sure you can pay a lot less than currently.

On Tuesday, May 25, 2010,   wrote:
> The reason for this is simple.  LCF keeps track of which documents it has 
> handed off to Solr, and has a fairly involved mechanism for making sure that 
> every document LCF *thinks* got there, actually does.  It even uses a 
> mechanism akin to a 2-phase commit to make sure that its internal records and 
> those of the downstream index are never out of synch.
>
> Now, along comes Solr, and the system loses a good deal of its resilience, 
> because there is a chance that somebody or something will kick Solr after a 
> document (or a set of documents) has been transmitted to it, but LCF will 
> have no awareness of this situation at all, and will thus never try to fix 
> the problem on the next job run (or whatever).  So instead of automatic 
> resilience, you get one of two possible solutions:
>
> (1) Manual intervention.  Somebody has to manually inform LCF of the Solr 
> hiccup, and LCF thus will have to invalidate all documents it ever sent to 
> Solr (because it doesn't know what documents could have been affected).
> (2) A solr commit on every post.  This slows down LCF significantly, because 
> each document post takes something like 10x as long to do.
>
> Does this help?
> Karl
>
> -Original Message-
> From: ext Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Monday, May 24, 2010 4:40 PM
> To: dev@lucene.apache.org
> Subject: Re: Solr updateRequestHandler and performance vs. atomicity
>
> Indexing a doc won't be as fast as raw disk IO. But you won't be doing
> just raw disk IO to guarantee acceptance. And that will have a cost and
> complexity that really makes me wonder if its worth the speed advantage.
> For very large documents with complex analyzers...perhaps. But its not
> going to be an easily implementable feature (if its a true guarantee).
> And its still got to involve logs and/or fsync and all that.
>
> The reasoning for this is not ringing a bell - can you elaborate on the
> motivations?
>
> Is this so that you can commit on every doc? Every few docs?
>
> I can def see how this would be desirable in general, but just to be
> clear on your motivations.
>
>
> - Mark
>
> On 5/24/10 10:03 PM, karl.wri...@nokia.com wrote:
>> Hi Mark,
>>
>> Unfortunately, indexing performance *is* of concern, otherwise I'd already 
>> be committing on every post.
>>
>> If your guess is correct, you are basically saying that adding a document to 
>> an index in Solr/Lucene is just as fast as writing that file directly to the 
>> disk.  Because, obviously, if we want guaranteed delivery, that's what we'd 
>> have to do.  But I think this is worth the experiment - Solr/Lucene may be 
>> fast, but I have doubts that it can perform as well as raw disk I/O and 
>> still manage to do anything in the way of document analysis or (heaven 
>> forbid) text extraction.
>>
>>
>>
>> -Original Message-
>> From: ext Mark Miller [mailto:markrmil...@gmail.com]
>> Sent: Monday, May 24, 2010 3:33 PM
>> To: dev@lucene.apache.org
>> Subject: Re: Solr updateRequestHandler and performance vs. atomicity
>>
>> On 5/24/10 3:10 PM, karl.wri...@nokia.com wrote:
>>> Hi all,
>>> It seems to me that the "commit" logic in the Solr updateRequestHandler
>>> (or wherever the logic is actually located) conflates two different
>>> semantics. One semantic is what you need to do to make the index process
>>> perform well. The other semantic is guaranteed atomicity of document
>>> reception by Solr.
>>> In particular, it would be nice to be able to post documents in such a
>>> way that you can guarantee that the document is permanently in Solr's
>>> queue, safe in the event of a Solr restart, etc., even if the document
>>> has not yet been "committed".
>>> This issue came up in the LCF talk that I gave, and I initially thought
>>> that separating the two kinds of events would necessarily be an LCF
>>> change, but the more I thought about it the more I realized that other
>>> Solr indexing clients may also benefit from such a separation.
>>> Does anyone agree? Where should this logic properly live?
>>> Thanks,
>>> Karl
>>
>> Its an interesting idea - but I think you would likely pay a similar
>> cost to guarantee reception as you would to commit (also, I'm not sure
>> Lucene guarantees it - it works for consistency, but I'm not so sure it
>> achieves durability).
>>
>> I can think of two things offhand -
>>
>> Perhaps store the text and use fsync to quasi guarantee acceptance -
>> then index from the store o

RE: NPE Within IndexWriter.optimize (Solr Trunk Nightly)

2010-05-25 Thread Uwe Schindler
Maybe it's the 3x version?

The artifact names in Hudson are currently identical for solr-trunk and
solr-3x. You have to specifiy which version you use!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Tuesday, May 25, 2010 11:01 AM
> To: dev@lucene.apache.org
> Subject: Re: NPE Within IndexWriter.optimize (Solr Trunk Nightly)
> 
> Hmmm spooky.
> 
> For some reason I can't correlate the line numbers in TermInfosReader.java
> the current trunk sources; the line numbers for all the other sources line
up.
> This is a stock nightly build right?
> You don't have any mods/patches?
> 
> Can you enable assertions when you optimize and see if anything trips?
> 
> Also, can you run CheckIndex on your index (java -ea
> org.apache.lucene.index.CheckIndex /path/to/index), and post the output?
> 
> Mike
> 
> On Mon, May 24, 2010 at 7:43 PM, Chris Herron  wrote:
> > Hi,
> >
> > I'm using the latest nightly build of solr
(apache-solr-2010-05-24_08-05-13)
> and am repeatedly experiencing a NullPointerException after calling
delete,
> commit, optimize. Stack trace below. The index is ~20Gb.
> >
> > I'm not doing Lucene/Solr core development - I just figured this was a
> better place to ask given that this was a nightly build.
> >
> > Any observations that would help resolve?
> >
> > Thanks,
> >
> > Chris
> >
> > SEVERE: java.io.IOException: background merge hit exception:
> > _gr5a:C127 _gsbj:C486/3 _gsbk:C1 _gsbl:C1/1 _gsbm:C1 _gsbn:C1 _gsbo:C1
> > _gsbp:C1 _gsbq:C1 _gssn:C69 into _gsss [optimize] [mergeDocStores]
> >        at
> > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2418)
> >        at
> > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2343)
> >        at
> >
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler
> > 2.java:403)
> >        at
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(Run
> U
> > pdateProcessorFactory.java:85)
> >        at
> >
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandle
> > rUtils.java:107)
> >        at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> n
> > tentStreamHandlerBase.java:48)
> >        at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> e
> > rBase.java:131)
> >        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1321)
> >        at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:341)
> >        at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:244)
> >        at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletH
> > andler.java:1190)
> >        at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:
> > 424)
> >        at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> > va:119)
> >        at
> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java
> > :457)
> >        at
> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandle
> > r.java:229)
> >        at
> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandle
> > r.java:931)
> >        at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:3
> > 61)
> >        at
> > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler
> > .java:186)
> >        at
> > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler
> > .java:867)
> >        at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> > va:117)
> >        at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Conte
> > xtHandlerCollection.java:245)
> >        at
> > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColle
> > ction.java:126)
> >        at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> > java:113)
> >        at org.eclipse.jetty.server.Server.handle(Server.java:337)
> >        at
> > org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.j
> > ava:581)
> >        at
> >
> org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(
> > HttpConnection.java:1005)
> >        at
> > org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:560)
> >        at
> > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:222)
> >        at
> > org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:417
> > )
> >        at
> > org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoi
> > nt.java:474)
> >        at
> > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.
> > java:437)
> >        at java.lang.Thread.run(Thread.java:619)
> > Caused by: java.lang.NullPointerException
> >        at
> >
> org.apache.lucene.index.codecs.p

Re: NPE Within IndexWriter.optimize (Solr Trunk Nightly)

2010-05-25 Thread Michael McCandless
Hmmm spooky.

For some reason I can't correlate the line numbers in
TermInfosReader.java the current trunk sources; the line numbers for
all the other sources line up.  This is a stock nightly build right?
You don't have any mods/patches?

Can you enable assertions when you optimize and see if anything trips?

Also, can you run CheckIndex on your index (java -ea
org.apache.lucene.index.CheckIndex /path/to/index), and post the
output?

Mike

On Mon, May 24, 2010 at 7:43 PM, Chris Herron  wrote:
> Hi,
>
> I'm using the latest nightly build of solr (apache-solr-2010-05-24_08-05-13) 
> and am repeatedly experiencing a NullPointerException after calling delete, 
> commit, optimize. Stack trace below. The index is ~20Gb.
>
> I'm not doing Lucene/Solr core development - I just figured this was a better 
> place to ask given that this was a nightly build.
>
> Any observations that would help resolve?
>
> Thanks,
>
> Chris
>
> SEVERE: java.io.IOException: background merge hit exception: _gr5a:C127 
> _gsbj:C486/3 _gsbk:C1 _gsbl:C1/1 _gsbm:C1 _gsbn:C1 _gsbo:C1 _gsbp:C1 _gsbq:C1 
> _gssn:C69 into _gsss [optimize] [mergeDocStores]
>        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2418)
>        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2343)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:403)
>        at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>        at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
>        at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:48)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1321)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
>        at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1190)
>        at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:424)
>        at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>        at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:457)
>        at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
>        at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:931)
>        at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:361)
>        at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
>        at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:867)
>        at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>        at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:245)
>        at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>        at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
>        at org.eclipse.jetty.server.Server.handle(Server.java:337)
>        at 
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:581)
>        at 
> org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1005)
>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:560)
>        at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:222)
>        at 
> org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:417)
>        at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:474)
>        at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:437)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
>        at 
> org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:224)
>        at 
> org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:214)
>        at 
> org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:251)
>        at 
> org.apache.lucene.index.codecs.preflex.PreFlexFields$PreFlexFieldsEnum.terms(PreFlexFields.java:198)
>        at 
> org.apache.lucene.index.MultiFieldsEnum.terms(MultiFieldsEnum.java:103)
>        at 
> org.apache.lucene.index.codecs.FieldsConsumer.merge(FieldsConsumer.java:48)
>        at 
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:647)
>        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:151)
>        at 
> org.apache.lucene.index.IndexWriter.mergeMiddl