[jira] [Updated] (LUCENE-3419) Resolve JUnit assert deprecations

2011-09-06 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3419:
---

Attachment: LUCENE-3419.patch

Patch which adds appropriate epsilons to the float and double assertions and 
converts array assertions to assertArrayEquals.  

Everything passes.

Once this is committed, I want to nuke the deprecated assert* methods from 
LuceneTestCase, as they're no longer used.

> Resolve JUnit assert deprecations
> -
>
> Key: LUCENE-3419
> URL: https://issues.apache.org/jira/browse/LUCENE-3419
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3419.patch
>
>
> Many tests use assertEquals methods which have been deprecated.  The culprits 
> are assertEquals(float, float), assertEquals(double, double) and 
> assertEquals(Object[], Object[]).  Although not a big issue, they annoy me 
> every time I see them so I'm going to fix them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-09-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2959:


Attachment: LUCENE-2959_nocommits.patch

patch removing all nocommits

for the fake IDF/phrase issue, i thought it best not to "fake" statistics to 
SimilarityBase, since the whole point is to make it simpler for 
implementing/testing ranking models.

instead it sums scores across terms (kinda like boolean query)

for DFR P and D, I don't think there are really any great practical ways out of 
the fundamental problem. I added notes to both of these.

i think the workaround for dirichlet is fine, i looked around and found another 
implementation of this smoothing by hiemstra and it had the same workaround 
(http://mirex.sourceforge.net
 / trec.nist.gov/pubs/trec19/papers/univ.twente.web.rev.pdf)

all the other similarities seem to work fine being randomly swapped into 
lucene's tests.

> [GSoC] Implementing State of the Art Ranking for Lucene
> ---
>
> Key: LUCENE-2959
> URL: https://issues.apache.org/jira/browse/LUCENE-2959
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/query/scoring, general/javadocs, modules/examples
>Reporter: David Mark Nemeskey
>Assignee: Robert Muir
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: flexscoring branch
>
> Attachments: LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, 
> implementation_plan.pdf, proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the 
> architecture is
> tailored specically to VSM, which makes the addition of new ranking functions 
> a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to 
> implement a
> query architecture with pluggable ranking functions.
> The wiki page for the project can be found at 
> http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3419) Resolve JUnit assert deprecations

2011-09-06 Thread Chris Male (JIRA)
Resolve JUnit assert deprecations
-

 Key: LUCENE-3419
 URL: https://issues.apache.org/jira/browse/LUCENE-3419
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Chris Male
Priority: Minor


Many tests use assertEquals methods which have been deprecated.  The culprits 
are assertEquals(float, float), assertEquals(double, double) and 
assertEquals(Object[], Object[]).  Although not a big issue, they annoy me 
every time I see them so I'm going to fix them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-09-06 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3410.


   Resolution: Fixed
Fix Version/s: 4.0
 Assignee: Chris Male

Committed revision 1165995.

> Make WordDelimiterFilter's instantiation more readable
> --
>
> Key: LUCENE-3410
> URL: https://issues.apache.org/jira/browse/LUCENE-3410
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Assignee: Chris Male
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3410.patch, LUCENE-3410.patch, LUCENE-3410.patch
>
>
> Currently WordDelimiterFilter's constructor is:
> {code}
> public WordDelimiterFilter(TokenStream in,
>byte[] charTypeTable,
>int generateWordParts,
>int generateNumberParts,
>int catenateWords,
>int catenateNumbers,
>int catenateAll,
>int splitOnCaseChange,
>int preserveOriginal,
>int splitOnNumerics,
>int stemEnglishPossessive,
>CharArraySet protWords) {
> {code}
> which means its instantiation is an unreadable combination of 1s and 0s.  
> We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-09-06 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3410:
---

Attachment: LUCENE-3410.patch

Better patch.

> Make WordDelimiterFilter's instantiation more readable
> --
>
> Key: LUCENE-3410
> URL: https://issues.apache.org/jira/browse/LUCENE-3410
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3410.patch, LUCENE-3410.patch, LUCENE-3410.patch
>
>
> Currently WordDelimiterFilter's constructor is:
> {code}
> public WordDelimiterFilter(TokenStream in,
>byte[] charTypeTable,
>int generateWordParts,
>int generateNumberParts,
>int catenateWords,
>int catenateNumbers,
>int catenateAll,
>int splitOnCaseChange,
>int preserveOriginal,
>int splitOnNumerics,
>int stemEnglishPossessive,
>CharArraySet protWords) {
> {code}
> which means its instantiation is an unreadable combination of 1s and 0s.  
> We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-09-06 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3410:
---

Attachment: LUCENE-3410.patch

Patch with the Iterator back to using booleans.  Going to commit.

> Make WordDelimiterFilter's instantiation more readable
> --
>
> Key: LUCENE-3410
> URL: https://issues.apache.org/jira/browse/LUCENE-3410
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3410.patch, LUCENE-3410.patch
>
>
> Currently WordDelimiterFilter's constructor is:
> {code}
> public WordDelimiterFilter(TokenStream in,
>byte[] charTypeTable,
>int generateWordParts,
>int generateNumberParts,
>int catenateWords,
>int catenateNumbers,
>int catenateAll,
>int splitOnCaseChange,
>int preserveOriginal,
>int splitOnNumerics,
>int stemEnglishPossessive,
>CharArraySet protWords) {
> {code}
> which means its instantiation is an unreadable combination of 1s and 0s.  
> We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2308) Separately specify a field's type

2011-09-06 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-2308:
---

Attachment: LUCENE-2308-FT-interface.patch

Patch updated following Yonik's advice.  I'd removed the freeze() calls from 
Field so that it can now accept a FieldType instance.  If freezing is 
important, its up to the created of the CoreFieldType.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
> LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
> LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
> LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
> LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
> LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
> LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
> LUCENE-2308-FT-interface.patch, LUCENE-2308-FT-interface.patch, 
> LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
> LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
> LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
> LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
> LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch
>
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-06 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3396:
---

Attachment: LUCENE-3396-rab.patch

Patch updated to reset now returns void.  I'll make sure to note this compat 
break in the CHANGES.txt.

> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098580#comment-13098580
 ] 

Chris Male commented on LUCENE-3396:


Hmmm, I agree that we should change it to void.  If the source cannot be reset, 
it should throw an Exception.  We need to be able to rely on the fact that we 
are using reusable components.

I'll update the patch.


> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2011-09-06 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098568#comment-13098568
 ] 

Koji Sekiguchi commented on LUCENE-1824:


Uh, I forgot to add testSentenceBoundary(), testLineBoundary() etc., rather 
than not only word boundary test. Will add in the next patch.

> FastVectorHighlighter truncates words at beginning and end of fragments
> ---
>
> Key: LUCENE-1824
> URL: https://issues.apache.org/jira/browse/LUCENE-1824
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1824.patch, LUCENE-1824.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when 
> building fragments, so that in most cases the first and last word of a 
> fragment are truncated.  This makes the highlights less legible than they 
> should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
> by expanding the start and end boundaries of the fragment to the first 
> whitespace character on either side of the fragment, or the beginning or end 
> of the source text, whichever comes first.  This significantly improves 
> legibility, at the cost of returning a slightly larger number of characters 
> than specified for the fragment size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2011-09-06 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-1824:
---

Attachment: LUCENE-1824.patch

I added test cases for BoundaryScanner. Still need to modify 
FragmentsBuilderTests so that they can pass.

> FastVectorHighlighter truncates words at beginning and end of fragments
> ---
>
> Key: LUCENE-1824
> URL: https://issues.apache.org/jira/browse/LUCENE-1824
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-1824.patch, LUCENE-1824.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when 
> building fragments, so that in most cases the first and last word of a 
> fragment are truncated.  This makes the highlights less legible than they 
> should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
> by expanding the start and end boundaries of the fragment to the first 
> whitespace character on either side of the fragment, or the beginning or end 
> of the source text, whichever comes first.  This significantly improves 
> legibility, at the cost of returning a slightly larger number of characters 
> than specified for the fragment size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Rollback to old index stored with SolrDeletionPolicy

2011-09-06 Thread Emmanuel Espina
Apparently this feature does not exist so I copy the same mail I sent to
"user". What are your opinions on creating a Jira issue requesting this new
feature:

With SolrDeletionPolicy you can chose the number of "versions" of the index
to store ( maxCommitsToKeep, it defaults to 1). Well, how can you revert to
an arbitrary version that you have stored? Is there anything in Solr or in
Lucene to pick the version of the index to load?

The idea rose from a discussion with some fellows about "really paranoid
users" that want to keep several backup versions of the index and pick one
that worked in the past (after the index was corrupted in some way, probably
not immediately noticeable, and not having the possibility to re index the
data)


Thank you

Emmanuel


[jira] [Resolved] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation

2011-09-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2745.


Resolution: Duplicate

This is a dup of SOLR-2606, fixed in 3x (will be 3.4)

> Sorting on a field whose name resembles an integer in scientific notation
> -
>
> Key: SOLR-2745
> URL: https://issues.apache.org/jira/browse/SOLR-2745
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.3
>Reporter: Joey
>Priority: Minor
>
> I have created a schema where the field names are in a uuid format eg: 
> 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic 
> fields via the 'star underscore' format eg: *_t.
> Whenever I try and sort on a field name that has a format of one or more 
> integers followed by an 'e', I get a NumberFormatException like the 
> following: *java.lang.NumberFormatException: For input string: "8e"*. This 
> particular error comes from trying to sort on a field name 
> *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with 
> 12345e, I would get an error *java.lang.NumberFormatException: For input 
> string: "12345e"*.
> I'm not sure if this is a major issue or not but it is something that has 
> appeared in our testing quite often. You would be surprised at how often 
> randomly generated uuid's start with a number and then 'e'...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation

2011-09-06 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098517#comment-13098517
 ] 

Erick Erickson commented on SOLR-2745:
--

That is odd. I just reproduced it with 3.3. Here are the field defs (stock Solr 
schema):



It happens when sorting on either field. I have one doc in my index with these 
fields, put in this way:
  stuff and nonsense is rampant in our society
  2010-02-03T00:00:00Z

One other peculiar thing, when I named the fields 8e32_dt and 8e32 it 
succeeded. but 832e and 832e_dt produced the number format exception, a little 
at odds with the original statement, but still.

Why the field #name# should show a number format exception 
is...er...interesting. 

note also that my sort fragment was: &sort=832e_dt, which seems to be getting 
the _dt truncated.

But it's late, so I may be seeing things.

Full stack trace (minus most of the Jetty stuff):

HTTP ERROR 500

Problem accessing /solr/select. Reason:

For input string: "832e"

java.lang.NumberFormatException: For input string: "832e"
at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
at java.lang.Double.parseDouble(Double.java:510)
at 
org.apache.solr.search.QueryParsing$StrParser.getNumber(QueryParsing.java:694)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:293)
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:67)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:303)
at org.apache.solr.search.QParser.getSort(QParser.java:222)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)



> Sorting on a field whose name resembles an integer in scientific notation
> -
>
> Key: SOLR-2745
> URL: https://issues.apache.org/jira/browse/SOLR-2745
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.3
>Reporter: Joey
>Priority: Minor
>
> I have created a schema where the field names are in a uuid format eg: 
> 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic 
> fields via the 'star underscore' format eg: *_t.
> Whenever I try and sort on a field name that has a format of one or more 
> integers followed by an 'e', I get a NumberFormatException like the 
> following: *java.lang.NumberFormatException: For input string: "8e"*. This 
> particular error comes from trying to sort on a field name 
> *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with 
> 12345e, I would get an error *java.lang.NumberFormatException: For input 
> string: "12345e"*.
> I'm not sure if this is a major issue or not but it is something that has 
> appeared in our testing quite often. You would be surprised at how often 
> randomly generated uuid's start with a number and then 'e'...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation

2011-09-06 Thread Joey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098507#comment-13098507
 ] 

Joey commented on SOLR-2745:


I store data that I assume the dynamic field expects. ie: *_t is text, *_dt is 
storing only dates, *_i is storing only integers. I'm not putting unexpected  
data into these fields. Also, I'm not allowing users from the front-end of my 
system to sort on text fields but that's beside the point.

I reinstalled 3.3 to make sure I had the latest version and started from 
scratch with only 3 docs indexed. Sorting on the _dt field (actual field name 
*8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*) still gives me that same error. 

I can confirm that there is only datestamps in my dynamic date fields in the 3 
indexed docs I now have.

I am no expert but from this it seems that the error is not a data one but a 
field name parsing issue.

> Sorting on a field whose name resembles an integer in scientific notation
> -
>
> Key: SOLR-2745
> URL: https://issues.apache.org/jira/browse/SOLR-2745
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.3
>Reporter: Joey
>Priority: Minor
>
> I have created a schema where the field names are in a uuid format eg: 
> 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic 
> fields via the 'star underscore' format eg: *_t.
> Whenever I try and sort on a field name that has a format of one or more 
> integers followed by an 'e', I get a NumberFormatException like the 
> following: *java.lang.NumberFormatException: For input string: "8e"*. This 
> particular error comes from trying to sort on a field name 
> *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with 
> 12345e, I would get an error *java.lang.NumberFormatException: For input 
> string: "12345e"*.
> I'm not sure if this is a major issue or not but it is something that has 
> appeared in our testing quite often. You would be surprised at how often 
> randomly generated uuid's start with a number and then 'e'...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation

2011-09-06 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098499#comment-13098499
 ] 

Lance Norskog commented on SOLR-2745:
-

What kind of data do you want to store in these fields? What do you expect to 
get when you sort on the field?

Fields named "*_dt" are datestamps. They are stored in a numerical format. Solr 
should not even let you index non-dates into this field.

Fields named '*_t' are text fields. These are indexed, and sorting on them does 
not make sense. It used to be that sorting would blow up if there were more 
term facets (what you sort on) than documents. In recent Solr (trunk) this will 
not blow up, but it still does not make sense.



> Sorting on a field whose name resembles an integer in scientific notation
> -
>
> Key: SOLR-2745
> URL: https://issues.apache.org/jira/browse/SOLR-2745
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.3
>Reporter: Joey
>Priority: Minor
>
> I have created a schema where the field names are in a uuid format eg: 
> 1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic 
> fields via the 'star underscore' format eg: *_t.
> Whenever I try and sort on a field name that has a format of one or more 
> integers followed by an 'e', I get a NumberFormatException like the 
> following: *java.lang.NumberFormatException: For input string: "8e"*. This 
> particular error comes from trying to sort on a field name 
> *8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with 
> 12345e, I would get an error *java.lang.NumberFormatException: For input 
> string: "12345e"*.
> I'm not sure if this is a major issue or not but it is something that has 
> appeared in our testing quite often. You would be surprised at how often 
> randomly generated uuid's start with a number and then 'e'...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2745) Sorting on a field whose name resembles an integer in scientific notation

2011-09-06 Thread Joey (JIRA)
Sorting on a field whose name resembles an integer in scientific notation
-

 Key: SOLR-2745
 URL: https://issues.apache.org/jira/browse/SOLR-2745
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3
Reporter: Joey
Priority: Minor


I have created a schema where the field names are in a uuid format eg: 
1cf1691c0-a1a4-4255-8943-57d87c923e31_t. I am also implementing dynamic fields 
via the 'star underscore' format eg: *_t.

Whenever I try and sort on a field name that has a format of one or more 
integers followed by an 'e', I get a NumberFormatException like the following: 
*java.lang.NumberFormatException: For input string: "8e"*. This particular 
error comes from trying to sort on a field name 
*8ecdced6f-3eb4-e508-4e7d-d40a86305096_dt*. If the field name started with 
12345e, I would get an error *java.lang.NumberFormatException: For input 
string: "12345e"*.

I'm not sure if this is a major issue or not but it is something that has 
appeared in our testing quite often. You would be surprised at how often 
randomly generated uuid's start with a number and then 'e'...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: edismax and wildcards

2011-09-06 Thread Erick Erickson
Hmmm, I looked at this a bit more, and I don't think we can
legitimately call this a bug. The exact same thing happens
with non-edismax. That is, with a default field of "text", entering
nonsense * silliness parses to

text:nonsens text:* text:silli

Doing the same thing with a minimal edismax (qf of
text and features) produces:

+((title:nonsens | text:nonsens) (title:* | text:*) (title:silli | text:silli))

After all, the asterisk #is# just a token. And we do accept
things like field:* So this seems consistent.

And nonsense ~ silliness parses to
+((title:nonsense~2.0 | text:nonsense~2.0) (title:silli | text:silli))

So I think GIGO is accurate here

Erick

On Tue, Sep 6, 2011 at 6:10 PM, Chris Hostetter
 wrote:
>
> : GIGO is a valid response...
>
> I don't think GIGO is a valid attidude for a parser who'se whole purpose
> is to accept anything an end user might through at it and "try to do it's
> best"
>
> I agree with David: I think it's a bug that 0 length prefix/wildcard
> queries are accepted by default with no config option to disable them.
>
> (FWIW: this kind of scneerio is exactly the type of thing i was
> worried about i fought to not map "dismax"=>ExtendedDisMasQParser in 3.x)
>
> : >> blah blah * blah blah
> : >>
> : >> or
> : >>
> : >> blah blah ~ blah blah
> : >>
> : >> edismax happily spreads these across all of the fields leading
> : >> to...er...interesting behavior, and 5+ minute query responses.
>
>
> -Hoss
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3418) Lucene is not fsync'ing files on commit

2011-09-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3418.


Resolution: Fixed

> Lucene is not fsync'ing files on commit
> ---
>
> Key: LUCENE-3418
> URL: https://issues.apache.org/jira/browse/LUCENE-3418
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/store
>Affects Versions: 3.1, 3.2, 3.3, 3.4, 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.4, 4.0
>
>
> Thanks to hurricane Irene, when Mark's electricity became unreliable, he 
> discovered that on power loss Lucene could easily corrumpt the index, which 
> of course should never happen...
> I was able to easily repro, by pulling the plug on an Ubuntu box during 
> indexing.  On digging, I discovered, to my horror, that Lucene is failing to 
> fsync any files, ever!
> This bug was unfortunately created when we committed LUCENE-2328... that 
> issue added tracking, in FSDir, of which files have been closed but not 
> sync'd, so that when sync is called during IW.commit we only sync those files 
> that haven't already been sync'd.
> That tracking is done via the FSDir.onIndexOutputClosed callback, called when 
> an FSIndexOutput is closed.  The bug is that we only call it on exception 
> during close:
> {noformat}
> @Override
> public void close() throws IOException {
>   // only close the file if it has not been closed yet
>   if (isOpen) {
> boolean success = false;
> try {
>   super.close();
>   success = true;
> } finally {
>   isOpen = false;
>   if (!success) {
> try {
>   file.close();
>   parent.onIndexOutputClosed(this);
> } catch (Throwable t) {
>   // Suppress so we don't mask original exception
> }
>   } else
> file.close();
> }
>   }
> }
> {noformat}
> And so FSDir thinks no files need syncing when its sync method is called
> I think instead we should call it up-front; better to over-sync than 
> under-sync.
> The fix is trivial (move the callback up-front), but I'd love to somehow have 
> a test that can catch such a bad regression in the future still I think 
> we can do that test separately and commit this fix first.
> Note that even though LUCENE-2328 was backported to 2.9.x and 3.0.x, this bug 
> wasn't, ie the backport was a much simpler fix (to just address the original 
> memory leak); it's 3.1, 3.2, 3.3 and trunk when this bug is present.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1165902 - in /lucene/dev/trunk/lucene: CHANGES.txt src/java/org/apache/lucene/store/FSDirectory.java

2011-09-06 Thread Yonik Seeley
On Tue, Sep 6, 2011 at 6:12 PM,   wrote:
> +* LUCENE-3418: Lucene was failing to fsync index files on commit,
> +  meaning a crash or power loss could easily corrupt the index (Mark
> +  Miller, Robert Muir, Mike McCandless)

Perhaps "crash" should be expanded to "operating system crash"?  Some
might not realize that the more common JVM "crash", OOM, kill -9, etc,
wouldn't be an issue here.

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: edismax and wildcards

2011-09-06 Thread Chris Hostetter

: GIGO is a valid response...

I don't think GIGO is a valid attidude for a parser who'se whole purpose 
is to accept anything an end user might through at it and "try to do it's 
best"

I agree with David: I think it's a bug that 0 length prefix/wildcard 
queries are accepted by default with no config option to disable them.  

(FWIW: this kind of scneerio is exactly the type of thing i was 
worried about i fought to not map "dismax"=>ExtendedDisMasQParser in 3.x)

: >> blah blah * blah blah
: >>
: >> or
: >>
: >> blah blah ~ blah blah
: >>
: >> edismax happily spreads these across all of the fields leading
: >> to...er...interesting behavior, and 5+ minute query responses.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3418) Lucene is not fsync'ing files on commit

2011-09-06 Thread Michael McCandless (JIRA)
Lucene is not fsync'ing files on commit
---

 Key: LUCENE-3418
 URL: https://issues.apache.org/jira/browse/LUCENE-3418
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Affects Versions: 3.3, 3.2, 3.1, 3.4, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Blocker
 Fix For: 3.4, 4.0


Thanks to hurricane Irene, when Mark's electricity became unreliable, he 
discovered that on power loss Lucene could easily corrumpt the index, which of 
course should never happen...

I was able to easily repro, by pulling the plug on an Ubuntu box during 
indexing.  On digging, I discovered, to my horror, that Lucene is failing to 
fsync any files, ever!

This bug was unfortunately created when we committed LUCENE-2328... that issue 
added tracking, in FSDir, of which files have been closed but not sync'd, so 
that when sync is called during IW.commit we only sync those files that haven't 
already been sync'd.

That tracking is done via the FSDir.onIndexOutputClosed callback, called when 
an FSIndexOutput is closed.  The bug is that we only call it on exception 
during close:

{noformat}

@Override
public void close() throws IOException {
  // only close the file if it has not been closed yet
  if (isOpen) {
boolean success = false;
try {
  super.close();
  success = true;
} finally {
  isOpen = false;
  if (!success) {
try {
  file.close();
  parent.onIndexOutputClosed(this);
} catch (Throwable t) {
  // Suppress so we don't mask original exception
}
  } else
file.close();
}
  }
}
{noformat}

And so FSDir thinks no files need syncing when its sync method is called

I think instead we should call it up-front; better to over-sync than under-sync.

The fix is trivial (move the callback up-front), but I'd love to somehow have a 
test that can catch such a bad regression in the future still I think we 
can do that test separately and commit this fix first.

Note that even though LUCENE-2328 was backported to 2.9.x and 3.0.x, this bug 
wasn't, ie the backport was a much simpler fix (to just address the original 
memory leak); it's 3.1, 3.2, 3.3 and trunk when this bug is present.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2656) realtime get

2011-09-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2656.


   Resolution: Fixed
Fix Version/s: 4.0

I just committed the implementation attached to SOLR-2700.
Since the transaction logging does not yet provide durability, realtime-get is 
the actual feature competed and hence I used this issue number in CHANGES.

> realtime get
> 
>
> Key: SOLR-2656
> URL: https://issues.apache.org/jira/browse/SOLR-2656
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 4.0
>
> Attachments: SOLR-2656.patch, SOLR-2656_test.patch
>
>
> Provide a non point-in-time interface to get a document.
> For example, if you add a new document, you will be able to get it, 
> regardless of if the searcher has been refreshed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2066) Search Grouping: support distributed search

2011-09-06 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098385#comment-13098385
 ] 

Martijn van Groningen edited comment on SOLR-2066 at 9/6/11 9:38 PM:
-

Updated patch.
* Fixes for the errors that Matt reported.
* If group.ngroups is specified the groupCount is also merged. It is important 
that all documents of one group are in the same shard. Otherwise the groupCount 
will be incorrect.
* A lot of renames and refactorings.

  was (Author: martijn.v.groningen):
Updated patch.
* Fixes the errors that Matt reported.
* If group.ngroups is specified the groupCount is also merged. It is important 
that all documents of one group are in the same shard. Otherwise the groupCount 
will be incorrect.
* A lot of renames and refactorings.
  
> Search Grouping: support distributed search
> ---
>
> Key: SOLR-2066
> URL: https://issues.apache.org/jira/browse/SOLR-2066
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch
>
>
> Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search

2011-09-06 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2066:


Attachment: SOLR-2066.patch

Updated patch.
* Fixes the errors that Matt reported.
* If group.ngroups is specified the groupCount is also merged. It is important 
that all documents of one group are in the same shard. Otherwise the groupCount 
will be incorrect.
* A lot of renames and refactorings.

> Search Grouping: support distributed search
> ---
>
> Key: SOLR-2066
> URL: https://issues.apache.org/jira/browse/SOLR-2066
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch
>
>
> Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



DirectoryReader package protected?

2011-09-06 Thread Jason Rutherglen
I was browsing code, and noticed DirectoryReader is package protected.
 Why is this?  Ie, SegmentReader is not.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2650) Empty docs array on response with grouping and result pagination

2011-09-06 Thread Mike Lerley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098333#comment-13098333
 ] 

Mike Lerley edited comment on SOLR-2650 at 9/6/11 8:39 PM:
---

I seem to be having the same problem.  I've just tried the latest code from 
branch_3x (r1165749) and it's still a problem.  Note that I'm trying to output 
JSON, not XML. I get a similar exception:
{noformat}
Sep 6, 2011 4:11:31 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 49
at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:117)
at 
org.apache.solr.response.JSONWriter.writeDocList(JSONResponseWriter.java:492)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:129)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:180)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:296)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:93)
at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:52)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:679)
{noformat}

I seem to be able to trigger it using quoted strings, among other random 
things.  I hope this can get fixed soon.

  was (Author: mlerley):
I seem to be having the same problem.  I've just tried the latest code from 
branch_3x (r1165749) and it's still a problem.  I get a similar exception:
{noformat}
Sep 6, 2011 4:11:31 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 49
at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:117)
at 
org.apache.solr.response.JSONWriter.writeDocList(JSONResponseWriter.java:492)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:129)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:180)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:296)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:93)
at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:52)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:679)
{noformat}

I seem to be able to trigger it using quoted strings, among other random 
things.  I ho

[jira] [Commented] (SOLR-2650) Empty docs array on response with grouping and result pagination

2011-09-06 Thread Mike Lerley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098333#comment-13098333
 ] 

Mike Lerley commented on SOLR-2650:
---

I seem to be having the same problem.  I've just tried the latest code from 
branch_3x (r1165749) and it's still a problem.  I get a similar exception:
{noformat}
Sep 6, 2011 4:11:31 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 49
at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:117)
at 
org.apache.solr.response.JSONWriter.writeDocList(JSONResponseWriter.java:492)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:129)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:180)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:296)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:93)
at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:52)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:679)
{noformat}

I seem to be able to trigger it using quoted strings, among other random 
things.  I hope this can get fixed soon.

> Empty docs array on response with grouping and result pagination
> 
>
> Key: SOLR-2650
> URL: https://issues.apache.org/jira/browse/SOLR-2650
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.3
>Reporter: Massimo Schiavon
> Attachments: grouping_patch.txt
>
>
> Requesting a certain number of rows and setting start parameter to a greater 
> value returns 0 results with grouping enabled.
> For example, requesting:
> http://localhost:8080/solr/web/select/?q=*:*&rows=1&start=2
> (grouping and highlighting are enabled by default)
> I get this response:
> [...]
>   response: {
>   numFound: 117852
>   start: 2
>   docs: [ ]
>   }
>   highlighting: {
> 0938630598: {
>   title: [ "..." ]
>   content: [ "..." ]
> }
>   }
> [...]
> docs array is empty while the highlighted values of the document are present
> Debugging the request in
> org.apache.solr.search.Grouping.Command.createSimpleResponse() at row 534
> [...]
>  int len = Math.min(numGroups, docsGathered);
>   if (offset > len) {
> len = 0;
>   }
> [...]
> The initial vars values are:
> numGroups = 1
> docsGathered = 3
> offset = 2
> so after the execution len = 0
> I've tried commenting the if statement and this resolves the issue but could 
> introduce some other bugs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3417) DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word.

2011-09-06 Thread Njal Karevoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Njal Karevoll updated LUCENE-3417:
--

Description: 
Due to an off-by-one error, a subword placed at the end of a compound word will 
not get a token added to the token stream.


For example (from the unit test in the attached patch):
Dictionary: {"ab", "cd", "ef"}
Input: "abcdef"
Created tokens: {"abcdef", "ab", "cd"}
Expected tokens: {"abcdef", "ab", "cd", "ef"}


Additionally, it could produce tokens that were shorter than the minSubwordSize 
due to another off-by-one error. For example (again, from the attached patch):


Dictionary: {"abc", "d", "efg"}
Minimum subword length: 2
Input: "abcdefg"
Created tokens: {"abcdef", "abc", "d", "efg"}
Expected tokens: {"abcdef", "abc", "efg"}


  was:
Due to an off-by-one error, a subword placed at the end of a compound word will 
not get a token added to the token stream.


Example:
Dictionary: {"ab", "cd", "ef"}
word: "abcdef"
Created tokens: {"abcdef", "ab", "cd"}
Expected tokens: {"abcdef", "ab", "cd", "ef"}


Additionally, it could produce tokens that were shorter than the minSubwordSize 
due to another off-by-one error.


> DictionaryCompoundWordTokenFilter does not properly add tokens from the end 
> compound word.
> --
>
> Key: LUCENE-3417
> URL: https://issues.apache.org/jira/browse/LUCENE-3417
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 3.3, 4.0
>Reporter: Njal Karevoll
> Attachments: LUCENE-3417.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Due to an off-by-one error, a subword placed at the end of a compound word 
> will not get a token added to the token stream.
> For example (from the unit test in the attached patch):
> Dictionary: {"ab", "cd", "ef"}
> Input: "abcdef"
> Created tokens: {"abcdef", "ab", "cd"}
> Expected tokens: {"abcdef", "ab", "cd", "ef"}
> Additionally, it could produce tokens that were shorter than the 
> minSubwordSize due to another off-by-one error. For example (again, from the 
> attached patch):
> Dictionary: {"abc", "d", "efg"}
> Minimum subword length: 2
> Input: "abcdefg"
> Created tokens: {"abcdef", "abc", "d", "efg"}
> Expected tokens: {"abcdef", "abc", "efg"}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Lucene.Net] 2.9.4

2011-09-06 Thread Digy
+1 for an official release.
DIGY

-Original Message-
From: Prescott Nasser [mailto:geobmx...@hotmail.com] 
Sent: Monday, September 05, 2011 9:22 PM
To: lucene-net-...@lucene.apache.org; lucene-net-u...@lucene.apache.org
Subject: [Lucene.Net] 2.9.4


Hey All,

 

How do people feel about the 2.9.4 code base? I've been using it for
sometime, for my use cases it's be excellent. Do we feel we are ready to
package this up and make it an official release? Or do we have some tasks
left to take care of?

 

~Prescott =
-
Bu iletide virüs bulunamadı.
AVG tarafından kontrol edildi - www.avg.com
Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4478 - Sürüm Tarihi: 05.09.2011



[jira] [Commented] (LUCENE-3417) DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word.

2011-09-06 Thread Njal Karevoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098313#comment-13098313
 ] 

Njal Karevoll commented on LUCENE-3417:
---

The above patch is trivial to backport for 3.3/3.4.

It is similar to LUCENE-3038, but is not duplicated by LUCENE-3022, which deals 
with issues surrounding the interpretation of onlyLongestMatch.

> DictionaryCompoundWordTokenFilter does not properly add tokens from the end 
> compound word.
> --
>
> Key: LUCENE-3417
> URL: https://issues.apache.org/jira/browse/LUCENE-3417
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 3.3, 4.0
>Reporter: Njal Karevoll
> Attachments: LUCENE-3417.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Due to an off-by-one error, a subword placed at the end of a compound word 
> will not get a token added to the token stream.
> Example:
> Dictionary: {"ab", "cd", "ef"}
> word: "abcdef"
> Created tokens: {"abcdef", "ab", "cd"}
> Expected tokens: {"abcdef", "ab", "cd", "ef"}
> Additionally, it could produce tokens that were shorter than the 
> minSubwordSize due to another off-by-one error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3417) DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word.

2011-09-06 Thread Njal Karevoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Njal Karevoll updated LUCENE-3417:
--

Attachment: LUCENE-3417.patch

Adds two unit tests, one showing each behavior, and a fix for both issues.

> DictionaryCompoundWordTokenFilter does not properly add tokens from the end 
> compound word.
> --
>
> Key: LUCENE-3417
> URL: https://issues.apache.org/jira/browse/LUCENE-3417
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 3.3, 4.0
>Reporter: Njal Karevoll
> Attachments: LUCENE-3417.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Due to an off-by-one error, a subword placed at the end of a compound word 
> will not get a token added to the token stream.
> Example:
> Dictionary: {"ab", "cd", "ef"}
> word: "abcdef"
> Created tokens: {"abcdef", "ab", "cd"}
> Expected tokens: {"abcdef", "ab", "cd", "ef"}
> Additionally, it could produce tokens that were shorter than the 
> minSubwordSize due to another off-by-one error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3417) DictionaryCompoundWordTokenFilter does not properly add tokens from the end compound word.

2011-09-06 Thread Njal Karevoll (JIRA)
DictionaryCompoundWordTokenFilter does not properly add tokens from the end 
compound word.
--

 Key: LUCENE-3417
 URL: https://issues.apache.org/jira/browse/LUCENE-3417
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.3, 4.0
Reporter: Njal Karevoll


Due to an off-by-one error, a subword placed at the end of a compound word will 
not get a token added to the token stream.


Example:
Dictionary: {"ab", "cd", "ef"}
word: "abcdef"
Created tokens: {"abcdef", "ab", "cd"}
Expected tokens: {"abcdef", "ab", "cd", "ef"}


Additionally, it could produce tokens that were shorter than the minSubwordSize 
due to another off-by-one error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098286#comment-13098286
 ] 

Simon Willnauer commented on LUCENE-3416:
-

I see, I guess that is kind of overkill here. This patch looks fine to me 
though while I wonder why this needs to be synchronized since we don't read it 
from a synced block. if you want this to take immediate effect you should 
rather use volatile here? I doubt that this is necessary in this context - I'd 
rather not invalidate a cache line for each IndexOutput creation.

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3416:
---

Assignee: Simon Willnauer

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
>Assignee: Simon Willnauer
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-06 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098249#comment-13098249
 ] 

Martijn van Groningen commented on SOLR-2066:
-

Thanks for reporting these issues Matt! I'll update the patch soon.

> Search Grouping: support distributed search
> ---
>
> Key: SOLR-2066
> URL: https://issues.apache.org/jira/browse/SOLR-2066
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch
>
>
> Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2540) CommitWithin as an Update Request parameter

2011-09-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2540.
---

Resolution: Fixed

> CommitWithin as an Update Request parameter
> ---
>
> Key: SOLR-2540
> URL: https://issues.apache.org/jira/browse/SOLR-2540
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: commit, commitWithin
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2540.patch, SOLR-2540.patch
>
>
> It would be useful to support commitWithin HTTP GET request param on all 
> UpdateRequestHandlers.
> That way, you could set commitWithin on the request (for XML, JSON, CSV, 
> Binary and Extracting handlers) with this syntax:
> {code}
>   curl 
> http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1
>-H "Content-Type: application/pdf" --data-binary @file.pdf
> {code}
> PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already 
> support this syntax.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2540) CommitWithin as an Update Request parameter

2011-09-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2540:
--

Affects Version/s: (was: 3.1)
Fix Version/s: 4.0
   3.4

> CommitWithin as an Update Request parameter
> ---
>
> Key: SOLR-2540
> URL: https://issues.apache.org/jira/browse/SOLR-2540
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: commit, commitWithin
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2540.patch, SOLR-2540.patch
>
>
> It would be useful to support commitWithin HTTP GET request param on all 
> UpdateRequestHandlers.
> That way, you could set commitWithin on the request (for XML, JSON, CSV, 
> Binary and Extracting handlers) with this syntax:
> {code}
>   curl 
> http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1
>-H "Content-Type: application/pdf" --data-binary @file.pdf
> {code}
> PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already 
> support this syntax.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2703) Add support for the Lucene Surround Parser

2011-09-06 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned SOLR-2703:
--

Assignee: Erik Hatcher

> Add support for the Lucene Surround Parser
> --
>
> Key: SOLR-2703
> URL: https://issues.apache.org/jira/browse/SOLR-2703
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.0
>Reporter: Simon Rosenthal
>Assignee: Erik Hatcher
>Priority: Minor
> Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch
>
>
> The Lucene/contrib surround parser provides support for span queries. This 
> issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser

2011-09-06 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098133#comment-13098133
 ] 

Simon Rosenthal commented on SOLR-2703:
---

Should hold up on the commit until 
https://issues.apache.org/jira/browse/LUCENE-2945 patch has been committed, 
otherwise query caching is very broken. I updated the patch for that issue to 
work with trunk a few weeks ago.

> Add support for the Lucene Surround Parser
> --
>
> Key: SOLR-2703
> URL: https://issues.apache.org/jira/browse/SOLR-2703
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.0
>Reporter: Simon Rosenthal
>Assignee: Erik Hatcher
>Priority: Minor
> Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch
>
>
> The Lucene/contrib surround parser provides support for span queries. This 
> issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser

2011-09-06 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098106#comment-13098106
 ] 

Simon Rosenthal commented on SOLR-2703:
---

Wiki page to follow  at http://wiki.apache.org/solr/SurroundQueryParser

> Add support for the Lucene Surround Parser
> --
>
> Key: SOLR-2703
> URL: https://issues.apache.org/jira/browse/SOLR-2703
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.0
>Reporter: Simon Rosenthal
>Priority: Minor
> Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch
>
>
> The Lucene/contrib surround parser provides support for span queries. This 
> issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2703) Add support for the Lucene Surround Parser

2011-09-06 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-2703:
--

Attachment: SOLR-2703.patch

New patch. Query parser not registered by default, and a commented out entry in 
example solrconfig was added.

Hopefully ready to commit

> Add support for the Lucene Surround Parser
> --
>
> Key: SOLR-2703
> URL: https://issues.apache.org/jira/browse/SOLR-2703
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.0
>Reporter: Simon Rosenthal
>Priority: Minor
> Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch
>
>
> The Lucene/contrib surround parser provides support for span queries. This 
> issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098057#comment-13098057
 ] 

Robert Muir commented on LUCENE-3396:
-

patch looks great: one question, should we keep the 'reset can return false and 
we do not reuse' ?

this seems like it might be obselete, though it would introduce an API break, I 
think maybe we should change it to void?


> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-09-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098053#comment-13098053
 ] 

Chris Male commented on LUCENE-3410:


{quote}
should the iterator maybe keep the booleans and not use flags? just an idea, 
because the iterator doesn't "make use" of all the flags. its also not a public 
class and just a helper class to simplify the filter, so i think its ok for it 
to take 3 booleans?
{quote}

Yeah I thought about this as well.  It would make the iterator clearer since it 
wouldn't rely on people looking at the Filter's flags.  I will make the change.

> Make WordDelimiterFilter's instantiation more readable
> --
>
> Key: LUCENE-3410
> URL: https://issues.apache.org/jira/browse/LUCENE-3410
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3410.patch
>
>
> Currently WordDelimiterFilter's constructor is:
> {code}
> public WordDelimiterFilter(TokenStream in,
>byte[] charTypeTable,
>int generateWordParts,
>int generateNumberParts,
>int catenateWords,
>int catenateNumbers,
>int catenateAll,
>int splitOnCaseChange,
>int preserveOriginal,
>int splitOnNumerics,
>int stemEnglishPossessive,
>CharArraySet protWords) {
> {code}
> which means its instantiation is an unreadable combination of 1s and 0s.  
> We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-09-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098051#comment-13098051
 ] 

Yonik Seeley commented on LUCENE-2308:
--

Instead of introducing a dependency on CoreFieldType in many places (only to 
have to change it back later when some sort of consensus is finally reached), 
it would seem much cleaner to either
 - remove freeze() until we decide on the right approach
 - move freeze() to the FieldType interface temporarily (and remove it later if 
the approach changes)

The other changes in the patch look fine.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
> LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
> LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
> LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
> LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
> LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
> LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
> LUCENE-2308-FT-interface.patch, LUCENE-2308-branch.patch, 
> LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, 
> LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, 
> LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, 
> LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch
>
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-09-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098046#comment-13098046
 ] 

Robert Muir commented on LUCENE-3410:
-

looks good overall, a couple tiny nitpicks:

* looks like there is some dead code in WordDelimiterIterator (the booleans)
* should the iterator maybe keep the booleans and not use flags? just an idea, 
because the iterator doesn't "make use" of all the flags. its also not a public 
class and just a helper class to simplify the filter, so i think its ok for it 
to take 3 booleans?

> Make WordDelimiterFilter's instantiation more readable
> --
>
> Key: LUCENE-3410
> URL: https://issues.apache.org/jira/browse/LUCENE-3410
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3410.patch
>
>
> Currently WordDelimiterFilter's constructor is:
> {code}
> public WordDelimiterFilter(TokenStream in,
>byte[] charTypeTable,
>int generateWordParts,
>int generateNumberParts,
>int catenateWords,
>int catenateNumbers,
>int catenateAll,
>int splitOnCaseChange,
>int preserveOriginal,
>int splitOnNumerics,
>int stemEnglishPossessive,
>CharArraySet protWords) {
> {code}
> which means its instantiation is an unreadable combination of 1s and 0s.  
> We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098045#comment-13098045
 ] 

Chris Male commented on LUCENE-3396:


I'm going to commit this soon and then work on the remaining Analyzers.

> Make TokenStream Reuse Mandatory for Analyzers
> --
>
> Key: LUCENE-3396
> URL: https://issues.apache.org/jira/browse/LUCENE-3396
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
> LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
> to return reusable TokenStreams.  This is a big chunk of work, but its time 
> to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
> TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already 
> seen that some of the TokenStreams created in tests need some work to be 
> reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using 
> .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-09-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098041#comment-13098041
 ] 

Uwe Schindler commented on LUCENE-3410:
---

+1

> Make WordDelimiterFilter's instantiation more readable
> --
>
> Key: LUCENE-3410
> URL: https://issues.apache.org/jira/browse/LUCENE-3410
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3410.patch
>
>
> Currently WordDelimiterFilter's constructor is:
> {code}
> public WordDelimiterFilter(TokenStream in,
>byte[] charTypeTable,
>int generateWordParts,
>int generateNumberParts,
>int catenateWords,
>int catenateNumbers,
>int catenateAll,
>int splitOnCaseChange,
>int preserveOriginal,
>int splitOnNumerics,
>int stemEnglishPossessive,
>CharArraySet protWords) {
> {code}
> which means its instantiation is an unreadable combination of 1s and 0s.  
> We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-09-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098036#comment-13098036
 ] 

Chris Male commented on LUCENE-3410:


Plan to commit soon if theres no objections.

> Make WordDelimiterFilter's instantiation more readable
> --
>
> Key: LUCENE-3410
> URL: https://issues.apache.org/jira/browse/LUCENE-3410
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3410.patch
>
>
> Currently WordDelimiterFilter's constructor is:
> {code}
> public WordDelimiterFilter(TokenStream in,
>byte[] charTypeTable,
>int generateWordParts,
>int generateNumberParts,
>int catenateWords,
>int catenateNumbers,
>int catenateAll,
>int splitOnCaseChange,
>int preserveOriginal,
>int splitOnNumerics,
>int stemEnglishPossessive,
>CharArraySet protWords) {
> {code}
> which means its instantiation is an unreadable combination of 1s and 0s.  
> We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-09-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098032#comment-13098032
 ] 

Chris Male commented on LUCENE-2308:


Anyone else have any thoughts? Any objections to committing this patch as a 
first step?

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
> LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
> LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
> LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
> LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
> LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
> LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
> LUCENE-2308-FT-interface.patch, LUCENE-2308-branch.patch, 
> LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, 
> LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, 
> LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, 
> LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch
>
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098018#comment-13098018
 ] 

Shay Banon commented on LUCENE-3416:


It is possible, but requires more work to do, and depends on overriding the 
createOutput method (as well as all the other methods in Directory). If rate 
limiting makes sense on the directory level to be exposed as a "feature", I 
think that this small change allows for greater control over it.

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2383) Velocity: Generalize range and date facet display

2011-09-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2383.
---

Resolution: Fixed

Checked in backport to 3.x

> Velocity: Generalize range and date facet display
> -
>
> Key: SOLR-2383
> URL: https://issues.apache.org/jira/browse/SOLR-2383
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: facet, range, velocity
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, 
> SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, 
> SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, 
> SOLR-2383.patch, SOLR-2383.patch
>
>
> Velocity (/browse) GUI has hardcoded price range facet and a hardcoded 
> manufacturedate_dt date facet. Need general solution which work for any 
> facet.range and facet.date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2741) Bugs in facet range display in trunk

2011-09-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2741.
---

Resolution: Fixed

> Bugs in facet range display in trunk
> 
>
> Key: SOLR-2741
> URL: https://issues.apache.org/jira/browse/SOLR-2741
> Project: Solr
>  Issue Type: Sub-task
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 4.0
>
> Attachments: SOLR-2741.patch, SOLR-2741.patch
>
>
> In SOLR-2383 the hardcoded display of some facet ranges were replaced with 
> automatic, dynamic display.
> There were some shortcomings:
> a) Float range to-values were sometimes displayed as int
> b) Capitalizing the facet name was a mistake, sometimes looks good, sometimes 
> not
> c) facet.range on a date did not work - dates were displayed in whatever 
> locale formatting
> d) The deprecated facet.date syntax was used in solrconfig.xml instead of the 
> new facet.range

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Resolved] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.

2011-09-06 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-414.


Resolution: Fixed

Fixed.

DIGY

> The definition of CharArraySet is dangerously confusing and leads to bugs 
> when used.
> 
>
> Key: LUCENENET-414
> URL: https://issues.apache.org/jira/browse/LUCENENET-414
> Project: Lucene.Net
>  Issue Type: Bug
>  Components: Lucene.Net Core
>Affects Versions: Lucene.Net 2.9.2
> Environment: Irrelevant
>Reporter: Vincent Van Den Berghe
>Priority: Minor
> Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g
>
>
> Right now, CharArraySet derives from System.Collections.Hashtable, but 
> doesn't actually use this base type for storing elements.
> However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a 
> System.Collections.Hashtable. The trivial code to build your own stopword set 
> using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of 
> stopwords like this:
> CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, 
> ignoreCase: false);
> foreach (string domainSpecificStopWord in DomainSpecificStopWords)
> stopWords.Add(domainSpecificStopWord);
> ... will fail because the CharArraySet accepts an ICollection, which will be 
> passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords 
> will only contain the DomainSpecificStopWords, and not those from 
> STOP_WORDS_SET.
> One workaround would be to replace the first line with this:
> CharArraySet stopWords = new 
> CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + 
> DomainSpecificStopWords.Length, ignoreCase: false);
> foreach (string domainSpecificStopWord in 
> (CharArraySet)StandardAnalyzer.STOP_WORDS_SET)
> stopWords.Add(domainSpecificStopWord);
> ... but this makes use of the implementation detail (the STOP_WORDS_SET is 
> really an UnmodifiableCharArraySet which is itself a CharArraySet). It works 
> because it forces the foreach() to use the correct 
> CharArraySet.GetEnumerator(), which is defined as a "new" method (this has a 
> bad code smell to it)
> At least 2 possibilities exist to solve this problem:
> - Make CharArraySet use the Hashtable instance and a custom comparator, 
> instead of its own implementation.
> - Make CharArraySet use HashSet, defined in .NET 4.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097945#comment-13097945
 ] 

Robert Muir commented on LUCENE-3414:
-

I don't think we should do anything with the dictionaries ever, its much better 
to make small "test" dictionaries that are actually more like unit tests and 
test certain things, like what you did in the patch.


> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097937#comment-13097937
 ] 

Chris Male commented on LUCENE-3414:


Okay good spotting. so how do we want to proceed? Do we want to bring some of 
the dictionaries in? Should we address that in a later issue once its become 
clearer in OO what they're doing?

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097916#comment-13097916
 ] 

Robert Muir commented on LUCENE-3414:
-

{quote}
Bizarrely, from what I can see in the OpenOffice SVN, they are still under 
their original license. 
{quote}

I don't think we should read too much into that text file: its not even obvious 
which of the many dictionaries in that folder it applies to!

I know for a fact that some of the files in there are *NOT* GPL, for example 
the en_US dictionary: 
http://svn.apache.org/viewvc/incubator/ooo/trunk/main/dictionaries/en/README_en_US.txt?revision=1162288&view=markup

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too

2011-09-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097907#comment-13097907
 ] 

Robert Muir commented on LUCENE-3415:
-

{quote}
The index size increases because we dont have the option of with / without 
stemming in a single field and as a reason, we have to store in 2 separate 
fields. 
{quote}

This is not true. there are just as many postings either way.

> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too

2011-09-06 Thread Manish (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097904#comment-13097904
 ] 

Manish commented on LUCENE-3415:


The index size increases because we dont have the option of with / without 
stemming in a single field and as a reason, we have to store in 2 separate 
fields. 

Even with highlighting, we can hightlight another field also, but since the 
term vector information are different, it cannot hightlight it properly.

> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too

2011-09-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097898#comment-13097898
 ] 

Robert Muir commented on LUCENE-3415:
-

The index size is increasing because you are storing both fields, this has 
nothing to do with how the analysis is done.

I don't think we should modify every tokenfilter to optionally inject things 
instead of changing terms: or create a hack with KeywordAttribute.

Instead if the problem is the Highlighter, why not propose a modification to 
the highlighter so it can highlight field A with field B's stored value?

> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too

2011-09-06 Thread Manish (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097890#comment-13097890
 ] 

Manish commented on LUCENE-3415:


The index size becomes huge (infact double). 
We have 2 fields both indexed and stored, one with stemming and one without 
stemming. We thought of removing the stored=true from one of the fields, but 
highlighting becomes the problem(the field 1 wont have original words and hence 
term vectors wont highlight it properly)

I have an idea bases on Simon's comments, dont know if it going to work or not. 

1. Create new Filter Factory which will put both the stemmed word and original 
word. 
2. Field 1-> indexed=true, stored=true, use the above filter
3. Field 2-> indexed=true, stored=false, dont use the above filter. 

I can make searches against the corresponding fields. for highlighting, i can 
always use Field 1 and since term vectors, offsets and positions are present 
for original words too, it will highlight properly. 

Do let me know your thoughts on this. 

> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too

2011-09-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097878#comment-13097878
 ] 

Robert Muir commented on LUCENE-3415:
-

Manish, why not put your content in a different field without stemming?

You can use e.g. MultiFieldQueryParser to make this transparent.


> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097873#comment-13097873
 ] 

Simon Willnauer commented on LUCENE-3416:
-

Shay, can't you use a Input / Output wrapper on a 
RateLimitingDirectoryDelegate? With lucene 4.0 you get the IOContext when open 
/ creating streams so you can decide based on this?

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3416:
---

Attachment: LUCENE-3416.patch

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)
Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit 
merge IO across several directories / instances
--

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon


This can come in handy when running several Lucene indices in the same VM, and 
wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too

2011-09-06 Thread Manish (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097864#comment-13097864
 ] 

Manish commented on LUCENE-3415:


How to handle 2 different analyzers for query? Guess, not possible in current 
design. 

> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-06 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097861#comment-13097861
 ] 

Chris Male commented on LUCENE-3414:


bq. how is OpenOffice dealing with those dictionaries since they are now an ASF 
incubation project? Maybe the dictionaries are under ASL eventually?

Bizarrely, from what I can see in the OpenOffice 
[SVN|http://svn.apache.org/viewvc/incubator/ooo/trunk/main/dictionaries/en/license.txt?revision=1162288&view=markup],
 they are still under their original license.  I guess thats something they 
will have to sort out during incubation.

I don't see the licenses changing since the dictionaries tend to be developed 
by national language organisations, but maybe the ASF will negotiate.

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097859#comment-13097859
 ] 

Simon Willnauer commented on LUCENE-3414:
-

bq. ...so it should really be in Lucene, except the dictionaries.
how is OpenOffice dealing with those dictionaries since they are now an ASF 
incubation project? Maybe the dictionaries are under ASL eventually?

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3415) Snowball filter to include original word too

2011-09-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097857#comment-13097857
 ] 

Simon Willnauer commented on LUCENE-3415:
-

instead of modifying snowball filter you could write a filter that buffers the 
term and emits it twice. First you simply pass on the term and the second time 
you set KeywordAttribute#setKeyword(boolean) to true. This will force the 
stemmer to ignore this term an pass it along the tokenstream pipeline without 
modification. Would that solve your problem? I am not sure we should actually 
provide such a filter but others have more insight into this, robert?

> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.4, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-06 Thread Matt Beaumont (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097855#comment-13097855
 ] 

Matt Beaumont commented on SOLR-2066:
-

Found two issues with this patch:
1. Using Faceting with the combination of sharding and grouping in our queries 
an error occurs.
2. When a shard returns no results and other shards do an error occurs.

Thanks
Matt.

> Search Grouping: support distributed search
> ---
>
> Key: SOLR-2066
> URL: https://issues.apache.org/jira/browse/SOLR-2066
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch
>
>
> Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097852#comment-13097852
 ] 

Uwe Schindler commented on LUCENE-3414:
---

Thanks Chris for adding this to Lucene Analysis module. We did lots of work on 
Google Code, so it should really be in Lucene, except the dictionaries. We 
should only add links to web pages where to get them.

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3415) Snowball filter to include original word too

2011-09-06 Thread Manish (JIRA)
Snowball filter to include original word too


 Key: LUCENE-3415
 URL: https://issues.apache.org/jira/browse/LUCENE-3415
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.3
 Environment: All
Reporter: Manish
 Fix For: 3.4, 4.0



1. Currently, snowball filter deletes the original word and adds the stemmed 
word to the index. So, if i want to do search with / without stemming, i have 
to keep 2 fields, one with stemming and one without it. 
2. Rather than doing this, if we have configurable item to preserve original, 
it would solve more business problem. 
3. Using single field, i can do search using stemming / without stemming by 
changing the query filters. 

The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3412:


Attachment: LUCENE-3412.patch

I am able to see this inconsistent behavior!

Attached patch contains a test that fails on this. The test currently prints 
the trial number, and the first loop always pass in all 30 trials (expected) 
while the second loop always fail (for me) but is inconsistent about when it 
fails. Sometimes, it fails on the first iteration. Some other times it fails on 
the 3rd, 9th, etc.

Quite peculiar... investigating...

> SloppyPhraseScorer returns non-deterministic results for queries with many 
> repeats
> --
>
> Key: LUCENE-3412
> URL: https://issues.apache.org/jira/browse/LUCENE-3412
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1, 3.2, 3.3, 4.0
>Reporter: Michael Ryan
>Assignee: Doron Cohen
> Attachments: LUCENE-3412.patch
>
>
> Proximity queries with many repeats (four or more, based on my testing) 
> return non-deterministic results. I run the same query multiple times with 
> the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
> trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog". 
> http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query. 
> http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with 
> id 1 is sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog 
> dog dog"~100, etc show the same behavior.
> So far I've traced it down to the "repeats" array in 
> SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
> elements in this array, the document may or may not match. I think the 
> HashSet may be to blame, but I'm not sure - that at least seems to be where 
> the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3412:
---

Assignee: Doron Cohen

> SloppyPhraseScorer returns non-deterministic results for queries with many 
> repeats
> --
>
> Key: LUCENE-3412
> URL: https://issues.apache.org/jira/browse/LUCENE-3412
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1, 3.2, 3.3, 4.0
>Reporter: Michael Ryan
>Assignee: Doron Cohen
>
> Proximity queries with many repeats (four or more, based on my testing) 
> return non-deterministic results. I run the same query multiple times with 
> the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
> trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog". 
> http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query. 
> http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with 
> id 1 is sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog 
> dog dog"~100, etc show the same behavior.
> So far I've traced it down to the "repeats" array in 
> SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
> elements in this array, the document may or may not match. I think the 
> HashSet may be to blame, but I'm not sure - that at least seems to be where 
> the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org