date:20120121

[jira] [Updated] (LUCENE-3690) JFlex-based HTMLStripCharFilter replacement

2012-01-21 Thread Steven Rowe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3690:


Attachment: LUCENE-3690.patch

Here is the final patch.

{quote}
bq. sarowe: oh, you mean: don't even attempt back-compat - just provide the 
ability to use the previous implementation

right, this is what we did with DateField a while back, note the CHANGES.txt 
entry on r658003. now that we have luceneMatchVersion though i kind of go back 
and forth on when to use it to pick an impl vs when to do stuff like this. 
dealers choice...

https://svn.apache.org/viewvc?view=revision&revision=658003
{quote}

I took the same approach - here are the changes from the previous version of 
the patch:

# The previous {{HTMLStripCharFilter}} implementation is moved to Solr, renamed 
to {{LegacyHTMLStripCharFilter}}, and deprecated, and a Factory is added for it.
# {{JFlexHTMLStripCharFilter}} is renamed to {{HTMLStripCharFilter}}.
# Support for {{HTMLStripCharFilter}}'s "escapedTags" functionality is added to 
{{HTMLStripCharFilterFactory}}.
# Added {{TestHTMLStripCharFilterFactory}}.
# Solr and Lucene {{CHANGES.txt}} entries are added.

Run the following svn copy script before applying the patch:

{noformat}
svn cp 
modules/analysis/common/src/java/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.java
 solr/core/src/java/org/apache/solr/analysis/LegacyHTMLStripCharFilter.java
svn cp 
modules/analysis/common/src/test/org/apache/lucene/analysis/charfilter/htmlStripReaderTest.html
 solr/core/src/test/org/apache/solr/analysis/
svn cp 
modules/analysis/common/src/test/org/apache/lucene/analysis/charfilter/HTMLStripCharFilterTest.java
 solr/core/src/test/org/apache/solr/analysis/LegacyHTMLStripCharFilterTest.java
svn cp 
solr/core/src/java/org/apache/solr/analysis/HTMLStripCharFilterFactory.java 
solr/core/src/java/org/apache/solr/analysis/LegacyHTMLStripCharFilterFactory.java
{noformat}

I plan to commit to trunk shortly, then backport and commit to branch_3x.

> JFlex-based HTMLStripCharFilter replacement
> ---
>
> Key: LUCENE-3690
> URL: https://issues.apache.org/jira/browse/LUCENE-3690
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 3.5, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.6, 4.0
>
> Attachments: BaselineWarcTest.java, HTMLStripCharFilterWarcTest.java, 
> JFlexHTMLStripCharFilterWarcTest.java, LUCENE-3690.patch, LUCENE-3690.patch, 
> LUCENE-3690.patch, LUCENE-3690.patch, LUCENE-3690.patch
>
>
> A JFlex-based HTMLStripCharFilter replacement would be more performant and 
> easier to understand and maintain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3703) DirectoryTaxonomyReader.refresh misbehaves with ref counts

2012-01-21 Thread Shai Erera (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-3703.


Resolution: Fixed

Committed revision 1234450 (3x), 1234451 (trunk).

Thanks Doron !

> DirectoryTaxonomyReader.refresh misbehaves with ref counts
> --
>
> Key: LUCENE-3703
> URL: https://issues.apache.org/jira/browse/LUCENE-3703
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/facet
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3703.patch, LUCENE-3703.patch
>
>
> DirectoryTaxonomyReader uses the internal IndexReader in order to track its 
> own reference counting. However, when you call refresh(), it reopens the 
> internal IndexReader, and from that point, all previous reference counting 
> gets lost (since the new IndexReader's refCount is 1).
> The solution is to track reference counting in DTR itself. I wrote a simple 
> unit test which exposes the bug (will be attached with the patch shortly).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1283) Mark Invalid error on indexing

2012-01-21 Thread Steven Rowe (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190605#comment-13190605
 ] 

Steven Rowe commented on SOLR-1283:
---

The below-listed exception, which appears to be the same as that in other 
reports on this issue, is triggered when processing with 
{{HTMLStripCharFilter}} the ClueWeb09 documents with TREC-IDs 
clueweb09-en-00-14171, clueweb09-en-00-14228, 
clueweb09-en-00-14235, clueweb09-en-00-14240, 
clueweb09-en-00-14248, and clueweb09-en-00-14265:

{noformat}
java.io.IOException: Mark invalid
at java.io.BufferedReader.reset(BufferedReader.java:485)
at org.apache.lucene.analysis.CharReader.reset(CharReader.java:69)
at 
org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.restoreState(HTMLStripCharFilter.java:171)
at 
org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:734)
{noformat}

Once LUCENE-3690 has been committed, this will only affect the (deprecated) old 
implementation, which will be renamed to {{LegacyHTMLStripCharFilter}}.

> Mark Invalid error on indexing
> --
>
> Key: SOLR-1283
> URL: https://issues.apache.org/jira/browse/SOLR-1283
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
> Environment: Ubuntu 8.04, Sun Java 6
>Reporter: solrize
>Assignee: Yonik Seeley
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-1283.modules.patch, SOLR-1283.patch
>
>
> When indexing large (1 megabyte) documents I get a lot of exceptions with 
> stack traces like the below.  It happens both in the Solr 1.3 release and in 
> the July 9 1.4 nightly.  I believe this to NOT be the same issue as SOLR-42.  
> I found some further discussion on solr-user: 
> http://www.nabble.com/IOException:-Mark-invalid-while-analyzing-HTML-td17052153.html
>  
> In that discussion, Grant asked the original poster to open a Jira issue, but 
> I didn't see one so I'm opening one; please feel free to merge or close if 
> it's redundant. 
> My stack trace follows.
> Jul 15, 2009 8:36:42 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={} status=500 QTime=3 
> Jul 15, 2009 8:36:42 AM org.apache.solr.common.SolrException log
> SEVERE: java.io.IOException: Mark invalid
> at java.io.BufferedReader.reset(BufferedReader.java:485)
> at 
> org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171)
> at 
> org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:728)
> at 
> org.apache.solr.analysis.HTMLStripReader.read(HTMLStripReader.java:742)
> at java.io.Reader.read(Reader.java:123)
> at 
> org.apache.lucene.analysis.CharTokenizer.next(CharTokenizer.java:108)
> at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:178)
> at 
> org.apache.lucene.analysis.standard.StandardFilter.next(StandardFilter.java:84)
> at 
> org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:53)
> at 
> org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:347)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:159)
> at 
> org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:234)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:748)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2512)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2484)
>   at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
>   at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
>   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>   at 
> org.mortbay.jetty.ser

[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1577 - Failure

2012-01-21 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1577/

1 tests failed.
REGRESSION:  org.apache.solr.search.TestRealTimeGet.testStressGetRealtime

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:658)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:86)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:686)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:630)




Build Log (for compile errors):
[...truncated 9820 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3690) JFlex-based HTMLStripCharFilter replacement

2012-01-21 Thread Steven Rowe (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190588#comment-13190588
 ] 

Steven Rowe commented on LUCENE-3690:
-

bq. AFAICT, SOLR-2891 will be fixed by this implementation.

I misspoke, having misread that issue - despite the reference to 
{{HTMLStripCharFilter}} in the most recent comment on the issue, SOLR-2891 is 
not about {{HTMLStripCharFilter}}.

> JFlex-based HTMLStripCharFilter replacement
> ---
>
> Key: LUCENE-3690
> URL: https://issues.apache.org/jira/browse/LUCENE-3690
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 3.5, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.6, 4.0
>
> Attachments: BaselineWarcTest.java, HTMLStripCharFilterWarcTest.java, 
> JFlexHTMLStripCharFilterWarcTest.java, LUCENE-3690.patch, LUCENE-3690.patch, 
> LUCENE-3690.patch, LUCENE-3690.patch
>
>
> A JFlex-based HTMLStripCharFilter replacement would be more performant and 
> easier to understand and maintain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3055) Use NGramPhraseQuery in Solr

2012-01-21 Thread Koji Sekiguchi (Created) (JIRA)

Use NGramPhraseQuery in Solr


 Key: SOLR-3055
 URL: https://issues.apache.org/jira/browse/SOLR-3055
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, search
Reporter: Koji Sekiguchi
Priority: Minor


Solr should use NGramPhraseQuery when searching with default slop on n-gram 
field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery

2012-01-21 Thread Koji Sekiguchi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190560#comment-13190560
 ] 

Koji Sekiguchi commented on LUCENE-3426:


bq. Is this automatic in SOLR?

No. I've opened SOLR-3055.

> optimizer for n-gram PhraseQuery
> 
>
> Key: LUCENE-3426
> URL: https://issues.apache.org/jira/browse/LUCENE-3426
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3, 3.4, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
> LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, 
> PerfTest.java
>
>
> If 2-gram is used and the length of query string is 4, for example q="ABCD", 
> QueryParser generates (when autoGeneratePhraseQueries is true) 
> PhraseQuery("AB BC CD") with slop 0. But it can be optimized PhraseQuery("AB 
> CD") with appropriate positions.
> The idea came from the Japanese paper "N.M-gram: Implementation of Inverted 
> Index Using N-gram with Hash Values" by Mikio Hirabayashi, et al. (The main 
> theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-21 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190559#comment-13190559
 ] 

Uwe Schindler commented on LUCENE-2858:
---

I created the branch at 
[https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858] and committed 
my first steps:

- Add CompositeIndexReader and AtomicIndexReader
- Moved methods around, still now finished (see below)
- DirectoryReader is public now and is returned by IR.open() and IW.getReader()

TODO:

- IR.openIfChanged makes no sense for any reader other than DirectoryReader, 
let's move it also there
- isCurrent and getVersion() is also useless for atomic readers and composite 
readers except DR
- The strange generics in ReaderContext caused by the final field will go away, 
when changing reader field to aaccessor method returning the correct type (by 
return type overloading).

Comments welcome and also heavy committing.


> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> --
>
> Key: LUCENE-2858
> URL: https://issues.apache.org/jira/browse/LUCENE-2858
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get 
> back a DirectoryReader which is a composite reader. The interface of 
> IndexReader has now lots of methods that simply throw UOE (in fact more than 
> 50% of all methods that are commonly used ones are unuseable now). This 
> confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a 
> separate API. After that, you are no longer able, to get TermsEnum without 
> wrapping from those composite readers. We currently have helper classes for 
> wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
> Multi*), those should be retrofitted to implement the correct classes 
> (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
> reader as ctor param, maybe it could also simply take a List). 
> In my opinion, maybe composite readers could implement some collection APIs 
> and also have the ReaderUtil method directly built in (possibly as a "view" 
> in the util.Collection sense). In general composite readers do not really 
> need to look like the previous IndexReaders, they could simply be a 
> "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a 
> segment changes, you need a new atomic reader? - maybe because of deletions 
> thats not the best idea, but we should investigate. Maybe make the whole 
> reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-21 Thread Uwe Schindler (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190559#comment-13190559
 ] 

Uwe Schindler edited comment on LUCENE-2858 at 1/21/12 11:51 PM:
-

I created the branch at 
[https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858] and committed 
my first steps:

- Add CompositeIndexReader and AtomicIndexReader
- Moved methods around, still not yet finished (see below)
- DirectoryReader is public now and is returned by IR.open() and IW.getReader()

TODO:

- IR.openIfChanged makes no sense for any reader other than DirectoryReader, 
let's move it also there
- isCurrent and getVersion() is also useless for atomic readers and composite 
readers except DR
- The strange generics in ReaderContext caused by the final field will go away, 
when changing reader field to aaccessor method returning the correct type (by 
return type overloading).

Comments welcome and also heavy committing.


  was (Author: thetaphi):
I created the branch at 
[https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858] and committed 
my first steps:

- Add CompositeIndexReader and AtomicIndexReader
- Moved methods around, still now finished (see below)
- DirectoryReader is public now and is returned by IR.open() and IW.getReader()

TODO:

- IR.openIfChanged makes no sense for any reader other than DirectoryReader, 
let's move it also there
- isCurrent and getVersion() is also useless for atomic readers and composite 
readers except DR
- The strange generics in ReaderContext caused by the final field will go away, 
when changing reader field to aaccessor method returning the correct type (by 
return type overloading).

Comments welcome and also heavy committing.

  
> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> --
>
> Key: LUCENE-2858
> URL: https://issues.apache.org/jira/browse/LUCENE-2858
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get 
> back a DirectoryReader which is a composite reader. The interface of 
> IndexReader has now lots of methods that simply throw UOE (in fact more than 
> 50% of all methods that are commonly used ones are unuseable now). This 
> confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a 
> separate API. After that, you are no longer able, to get TermsEnum without 
> wrapping from those composite readers. We currently have helper classes for 
> wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
> Multi*), those should be retrofitted to implement the correct classes 
> (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
> reader as ctor param, maybe it could also simply take a List). 
> In my opinion, maybe composite readers could implement some collection APIs 
> and also have the ReaderUtil method directly built in (possibly as a "view" 
> in the util.Collection sense). In general composite readers do not really 
> need to look like the previous IndexReaders, they could simply be a 
> "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a 
> segment changes, you need a new atomic reader? - maybe because of deletions 
> thats not the best idea, but we should investigate. Maybe make the whole 
> reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery

2012-01-21 Thread Bill Bell (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190550#comment-13190550
 ] 

Bill Bell commented on LUCENE-3426:
---

Is this automatic in SOLR? Or do we need to add a feature to support his in 
SOLR?

> optimizer for n-gram PhraseQuery
> 
>
> Key: LUCENE-3426
> URL: https://issues.apache.org/jira/browse/LUCENE-3426
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3, 3.4, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
> LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, 
> PerfTest.java
>
>
> If 2-gram is used and the length of query string is 4, for example q="ABCD", 
> QueryParser generates (when autoGeneratePhraseQueries is true) 
> PhraseQuery("AB BC CD") with slop 0. But it can be optimized PhraseQuery("AB 
> CD") with appropriate positions.
> The idea came from the Japanese paper "N.M-gram: Implementation of Inverted 
> Index Using N-gram with Hash Values" by Mikio Hirabayashi, et al. (The main 
> theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Dawid Weiss (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190547#comment-13190547
 ] 

Dawid Weiss commented on LUCENE-3714:
-

If my feeling is right and the PQ can be kept constant-size then it won't 
matter much at runtime I think. With realistic data distributions the number of 
elements to be inserted into the PQ before you reach the top-N will be pretty 
much the same (?). And the benefit would be a much cleaner traversal (no need 
to deal with buckets, early termination, etc.).

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3054) Add a TypeTokenFilterFactory

2012-01-21 Thread Tommaso Teofili (Created) (JIRA)

Add a TypeTokenFilterFactory


 Key: SOLR-3054
 URL: https://issues.apache.org/jira/browse/SOLR-3054
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Tommaso Teofili
 Fix For: 3.6, 4.0


Create a TypeTokenFilterFactory to make the TypeTokenFilter (filtering tokens 
depending on token types, see LUCENE-3671) available in Solr too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3671) Add a TypeTokenFilter

2012-01-21 Thread Tommaso Teofili (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190544#comment-13190544
 ] 

Tommaso Teofili commented on LUCENE-3671:
-

Sure Uwe, I'll open a new one for the related Solr factory

> Add a TypeTokenFilter
> -
>
> Key: LUCENE-3671
> URL: https://issues.apache.org/jira/browse/LUCENE-3671
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/queryparser
>Reporter: Santiago M. Mola
>Assignee: Uwe Schindler
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3671.patch, LUCENE-3671_2.patch, 
> LUCENE-3671_3.patch
>
>
> It would be convenient to have a TypeTokenFilter that filters tokens by its 
> type, either with an exclude or include list. This might be a stupid thing to 
> provide for people who use Lucene directly, but it would be very useful to 
> later expose it to Solr and other Lucene-backed search solutions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190543#comment-13190543
 ] 

Robert Muir commented on LUCENE-3714:
-

{quote}
we could combine the two approaches: still use buckets, but within each bucket 
we have a wFST (ie, use the "true" score), so we don't actually do any 
quantizing in the end results. Then bucketing is purely an optimization...
{quote}

I like this idea!

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Dawid Weiss (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190541#comment-13190541
 ] 

Dawid Weiss commented on LUCENE-3714:
-

If I seem inconsistent above then it's because I don't have ready-to-use 
answers and I'm sort of thinking out loud :)


> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Dawid Weiss (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190539#comment-13190539
 ] 

Dawid Weiss commented on LUCENE-3714:
-

I thought you had a solution that collects top-N, but your patch selects one 
(best) matching solution only. I don't know how you planned to go around 
selecting top-N, but in my understanding (at that moment) top-N selection is 
not going to work via recursive scan because an output at the given level 
doesn't tell you much about which arcs to follow. 

I can see how this can be solved by picking the arc/direction with the "next 
smallest/largest" output among all arcs traversed so far but this will be more 
complex and I cannot provide any bounds on how large the queue can be or what 
the   worst case lookup then is. I do have a feeling a degenerate example can 
be devised, but then I also have a feeling these are uncommon in practice.

Sorting arcs by score doesn't help if you use the pq -- you need to add all of 
them to the pq and then pick the smallest path. In a way it is like what you 
did, but the pq is maintaining fast access to the next-smaller-cost path. 
Another feeling is that the PQ can be bound to a maximum size of N? Every arc 
leads to at least one leaf so while traversing you'd drop those arcs that 
definitely would have fallen out of the first N smallest/largest weights... 
Yes, this could work. I'd still try to devise a degenerate example to see what 
the cost of maintaining the PQ can be.

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190536#comment-13190536
 ] 

Michael McCandless commented on LUCENE-3714:


Dawid, by problematic example, you mean you think this approach is functionally 
correct but may not perform very well...?

That is definitely the worst-case performance (for either top-1 or top-K on a 
wFST with simple PQ), but this (number of non-competitive arcs you have to scan 
and discard) is a constant factor on the overall complexity right?

I think we should at least test the simple PQ on PositiveIntsOutputs wFST and 
see how it performs in practice.  If indeed having everything "in one bucket" 
is too slow, we could combine the two approaches: still use buckets, but within 
each bucket we have a wFST (ie, use the "true" score), so we don't actually do 
any quantizing in the end results.  Then bucketing is purely an optimization...

Or, maybe, we could keep one bucket but sort each node's arcs by their output 
instead of by label.  This'd mean the initial lookup-by-prefix gets slower 
(linear scan instead a bin search, assuming those nodes had array'd arcs), but 
then producing the top-N is very fast (no wasted arcs need to be scanned).  
Maybe we could keep the by-label sort for nodes within depth N, and then sort 
by output beyond that...

Or we could change the outputs algebra so that more "lookahead" is stored in 
each output so we have more guidance on which arcs are worth pursuing...


> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190535#comment-13190535
 ] 

Robert Muir commented on LUCENE-3714:
-

Yeah I think we should try that first, and see how it performs.

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Dawid Weiss (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190526#comment-13190526
 ] 

Dawid Weiss commented on LUCENE-3714:
-

I'm sure there are solutions to the problem if you change algebra ops -- the pq 
is a naive solutions that would work on top of positive outputs.

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Dawid Weiss (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190524#comment-13190524
 ] 

Dawid Weiss commented on LUCENE-3714:
-

The patch works because it finds the first (topmost) suggestion, but collecting 
suggestions with max-N (or min-N) will require a priority queue so that one 
knows which next arc to follow next (and this will also require storing 
partially collected paths for pointers in the fst/queue)?

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190525#comment-13190525
 ] 

Robert Muir commented on LUCENE-3714:
-

Not sure it requires one, http://www.cs.nyu.edu/~mohri/pub/nbest.ps has some 
solutions.

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Dawid Weiss (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-3714:


Attachment: out.png

An problematic example where root arcs, when traversed min-to-max collect 
outputs, but every outgoing arc only collects a single better suggestion (and 
should skip possibly lots of other suggestions). This is created by the 
following input:

aa|N
ab|1
ba|N
bb|2
ca|N
cb|3
..

collecting the K-th suggestion with the smallest score will require scanning 
pessimistically all of the arcs. Note that you can put arbitrarily large 
subtrees on _a|N nodes like:

aaa|N
aab|N
aac|N

etc.

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch, out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3715) TestStressIndexing2 failes with AssertionFailedError

2012-01-21 Thread Simon Willnauer (Created) (JIRA)

TestStressIndexing2 failes with AssertionFailedError


 Key: LUCENE-3715
 URL: https://issues.apache.org/jira/browse/LUCENE-3715
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


JENKINS reported this lately, I suspect a test issue due to the 
RandomDWPThreadPool but I need to dig deeper.

here is the failure to reproduce:

{noformat}
[junit] Testcase: testMultiConfig(org.apache.lucene.index.TestStressIndexing2): 
FAILED
[junit] r1 is not empty but r2 is
[junit] junit.framework.AssertionFailedError: r1 is not empty but r2 is
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
[junit] at 
org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:339)
[junit] at 
org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:277)
[junit] at 
org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:126)
[junit] at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
[junit] 
[junit] 
[junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 2.598 sec
[junit] 
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
-Dtestmethod=testMultiConfig 
-Dtests.seed=5df78431615a5fbf:45b35512c8b8741a:235b5758de97148e 
-Dtests.multiplier=3 -Dtests.nightly=true -Dargs="-Dfile.encoding=ISO8859-1"
[junit] NOTE: test params are: codec=Lucene3x, 
sim=RandomSimilarityProvider(queryNorm=true,coord=true): {f34=DFR GZ(0.3), 
f33=IB SPL-D2, f32=DFR I(n)B2, f31=DFR I(ne)B1, f30=IB LL-L2, f79=DFR 
I(n)3(800.0), f78=DFR I(F)L2, f75=DFR I(n)BZ(0.3), f76=DFR GLZ(0.3), f39=DFR 
I(n)BZ(0.3), f38=DFR I(F)3(800.0), f73=DFR I(ne)L1, f74=DFR I(F)3(800.0), 
f37=DFR I(ne)L1, f36=DFR I(ne)3(800.0), f71=DFR I(F)B3(800.0), f35=DFR 
I(F)B3(800.0), f72=DFR I(ne)3(800.0), f81=DFR GZ(0.3), f80=IB SPL-D2, f43=DFR 
I(ne)BZ(0.3), f42=DFR I(F)Z(0.3), f45=IB SPL-L2, f41=DFR I(F)BZ(0.3), f40=DFR 
I(n)B1, f86=DFR I(ne)B3(800.0), f87=DFR GB1, f88=IB SPL-D3(800.0), f89=DFR 
I(F)L3(800.0), f82=DFR GL2, f47=DFR I(ne)LZ(0.3), f46=DFR GL2, f83=DFR 
I(ne)LZ(0.3), f49=DFR I(ne)Z(0.3), f84=DFR I(F)B2, f48=DFR I(F)B2, f85=DFR 
I(ne)Z(0.3), f90=DFR I(ne)BZ(0.3), f92=IB SPL-L2, f91=DFR I(n)Z(0.3), f59=DFR 
G2, f6=IB SPL-DZ(0.3), f7=IB LL-L1, f57=IB LL-L3(800.0), f8=DFR I(n)L3(800.0), 
f58=DFR I(n)LZ(0.3), f12=DFR I(F)1, f11=DFR I(n)L2, f10=DFR I(F)LZ(0.3), 
f51=DFR I(n)L1, f15=DFR I(n)L1, f52=DFR I(F)L2, f14=DFR GLZ(0.3), f13=DFR 
I(n)BZ(0.3), f55=DFR GL3(800.0), f19=DFR GL3(800.0), f56=IB LL-L2, f53=DFR 
I(F)L1, f18=BM25(k1=1.2,b=0.75), f17=DFR I(F)L1, f54=BM25(k1=1.2,b=0.75), 
id=DFR I(F)L2, f1=DFR I(n)B3(800.0), f0=DFR G2, f3=DFR I(ne)3(800.0), f2=DFR 
I(F)B3(800.0), f5=DFR I(F)3(800.0), f4=DFR I(ne)L1, f68=DFR I(n)2, f69=DFR 
I(ne)2, f21=IB LL-LZ(0.3), f20=DFR I(n)1, f23=DFR GB2, f22=DFR I(ne)B2, f60=DFR 
I(ne)B3(800.0), f25=DFR GB1, f61=DFR GB1, f24=DFR I(ne)B3(800.0), f62=IB 
SPL-D3(800.0), f27=DFR I(F)L3(800.0), f26=IB SPL-D3(800.0), f63=DFR 
I(F)L3(800.0), f64=DFR GL1, f29=DFR I(ne)1, f65=DFR I(ne)1, f28=DFR GL1, 
f66=DFR I(n)B1, f67=DFR I(F)BZ(0.3), f98=DFR I(n)LZ(0.3), f97=IB LL-L3(800.0), 
f99=DFR G2, f94=DefaultSimilarity, f93=DFR I(n)3(800.0), f70=DFR GB2, f96=LM 
Jelinek-Mercer(0.70), f95=DFR GBZ(0.3)}, locale=ms, timezone=Africa/Bangui
[junit] NOTE: all tests run in this JVM:
[junit] [TestDemo, TestSearch, TestCachingTokenFilter, TestSurrogates, 
TestPulsingReuse, TestAddIndexes, TestBinaryTerms, TestCodecs, 
TestCrashCausesCorruptIndex, TestDocsAndPositions, TestFieldInfos, 
TestFilterIndexReader, TestFlex, TestIndexReader, TestIndexWriterMergePolicy, 
TestIndexWriterNRTIsCurrent, TestIndexWriterOnJRECrash, 
TestIndexWriterWithThreads, TestNeverDelete, TestNoDeletionPolicy, 
TestOmitNorms, TestParallelReader, TestPayloads, TestRandomStoredFields, 
TestRollback, TestRollingUpdates, TestSegmentInfo, TestStressIndexing2]
[junit] NOTE: FreeBSD 8.2-RELEASE amd64/Sun Microsystems Inc. 1.6.0 
(64-bit)/cpus=16,threads=1,free=349545000,total=477233152
{noformat}

this failed on revision:

http://svn.apache.org/repos/asf/lucene/dev/trunk : 1233708


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.

[jira] [Updated] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Robert Muir (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3714:


Attachment: LUCENE-3714.patch

patch that Mike and I came up with that finds the minimal output from an arc, 
and a random test showing it works.

> add suggester that uses shortest path/wFST instead of buckets
> -
>
> Key: LUCENE-3714
> URL: https://issues.apache.org/jira/browse/LUCENE-3714
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/spellchecker
>Reporter: Robert Muir
> Attachments: LUCENE-3714.patch
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3713) TestIndexWriterOnDiskFull.testAddIndexOnDiskFull fails with java.lang.IllegalStateException: CFS has pending open files

2012-01-21 Thread Simon Willnauer (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3713.
-

Resolution: Fixed

> TestIndexWriterOnDiskFull.testAddIndexOnDiskFull fails with 
> java.lang.IllegalStateException: CFS has pending open files 
> 
>
> Key: LUCENE-3713
> URL: https://issues.apache.org/jira/browse/LUCENE-3713
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3713.patch
>
>
> {noformat}
>  Testsuite: org.apache.lucene.index.TestIndexWriterOnDiskFull
> [junit] Testcase: 
> testAddIndexOnDiskFull(org.apache.lucene.index.TestIndexWriterOnDiskFull):
>   Caused an ERROR
> [junit] CFS has pending open files
> [junit] java.lang.IllegalStateException: CFS has pending open files
> [junit]   at 
> org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:162)
> [junit]   at 
> org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:206)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4099)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3661)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3260)
> [junit]   at 
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1902)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1716)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1670)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:304)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
> [junit] 
> [junit] 
> [junit] Tests run: 4, Failures: 0, Errors: 1, Time elapsed: 31.96 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test 
> -Dtestcase=TestIndexWriterOnDiskFull -Dtestmethod=testAddIndexOnDiskFull 
> -Dtests.seed=-7dd066d256827211:127c018cbf5b0975:20481cd18a7d8b6e 
> -Dtests.multiplier=3 -Dtests.nightly=true -Dargs="-Dfile.encoding=ISO8859-1"
> [junit] NOTE: test params are: codec=SimpleText, 
> sim=RandomSimilarityProvider(queryNorm=true,coord=false): {field=DFR GB1, 
> id=DFR I(F)L1, content=IB SPL-D3(800.0), f=DFR G2}, locale=de_AT, 
> timezone=America/Cambridge_Bay
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestAssertions, TestSearchForDuplicates, TestMockAnalyzer, 
> TestDocValues, TestPerFieldPostingsFormat, TestDocument, TestAddIndexes, 
> TestConcurrentMergeScheduler, TestCrashCausesCorruptIndex, TestDocCount, 
> TestDocumentsWriterDeleteQueue, TestFieldInfos, TestFilterIndexReader, 
> TestFlex, TestIndexInput, TestIndexWriter, TestIndexWriterMergePolicy, 
> TestIndexWriterMerging, TestIndexWriterNRTIsCurrent, 
> TestIndexWriterOnDiskFull]
> [junit] NOTE: FreeBSD 8.2-RELEASE amd64/Sun Microsystems Inc. 1.6.0 
> (64-bit)/cpus=16,threads=1,free=39156976,total=180748288
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-21 Thread Robert Muir (Created) (JIRA)

add suggester that uses shortest path/wFST instead of buckets
-

 Key: LUCENE-3714
 URL: https://issues.apache.org/jira/browse/LUCENE-3714
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/spellchecker
Reporter: Robert Muir


Currently the FST suggester (really an FSA) quantizes weights into buckets 
(e.g. single byte) and puts them in front of the word.
This makes it fast, but you lose granularity in your suggestions.

Lately the question was raised, if you build lucene's FST with 
positiveintoutputs, does it behave the same as a tropical semiring wFST?

In other words, after completing the word, we instead traverse min(output) at 
each node to find the 'shortest path' to the 
best suggestion (with the highest score).

This means we wouldnt need to quantize weights at all and it might make some 
operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3671) Add a TypeTokenFilter

2012-01-21 Thread Uwe Schindler (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3671.
---

Resolution: Fixed

Committed trunk revision: 1234396
Committed 3.x revision: 1234397

Tommaso: Can you maybe provide a Solr factory in a separate Solr issue (or 
reopen this one)?

> Add a TypeTokenFilter
> -
>
> Key: LUCENE-3671
> URL: https://issues.apache.org/jira/browse/LUCENE-3671
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/queryparser
>Reporter: Santiago M. Mola
>Assignee: Uwe Schindler
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3671.patch, LUCENE-3671_2.patch, 
> LUCENE-3671_3.patch
>
>
> It would be convenient to have a TypeTokenFilter that filters tokens by its 
> type, either with an exclude or include list. This might be a stupid thing to 
> provide for people who use Lucene directly, but it would be very useful to 
> later expose it to Solr and other Lucene-backed search solutions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3713) TestIndexWriterOnDiskFull.testAddIndexOnDiskFull fails with java.lang.IllegalStateException: CFS has pending open files

2012-01-21 Thread Simon Willnauer (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3713:


Attachment: LUCENE-3713.patch

sneaky, great example why random testing rocks! I really wonder why this took 
so long to fail right there. here is a patch - kind of obvious what went wrong 
here. 

Essentially, we don't release the "direct output" lock since the assignment to 
the flag marking the lock as taken is after the IO resource is accessed. 

I plan to commit shortly

> TestIndexWriterOnDiskFull.testAddIndexOnDiskFull fails with 
> java.lang.IllegalStateException: CFS has pending open files 
> 
>
> Key: LUCENE-3713
> URL: https://issues.apache.org/jira/browse/LUCENE-3713
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3713.patch
>
>
> {noformat}
>  Testsuite: org.apache.lucene.index.TestIndexWriterOnDiskFull
> [junit] Testcase: 
> testAddIndexOnDiskFull(org.apache.lucene.index.TestIndexWriterOnDiskFull):
>   Caused an ERROR
> [junit] CFS has pending open files
> [junit] java.lang.IllegalStateException: CFS has pending open files
> [junit]   at 
> org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:162)
> [junit]   at 
> org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:206)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4099)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3661)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3260)
> [junit]   at 
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1902)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1716)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1670)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:304)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
> [junit] 
> [junit] 
> [junit] Tests run: 4, Failures: 0, Errors: 1, Time elapsed: 31.96 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test 
> -Dtestcase=TestIndexWriterOnDiskFull -Dtestmethod=testAddIndexOnDiskFull 
> -Dtests.seed=-7dd066d256827211:127c018cbf5b0975:20481cd18a7d8b6e 
> -Dtests.multiplier=3 -Dtests.nightly=true -Dargs="-Dfile.encoding=ISO8859-1"
> [junit] NOTE: test params are: codec=SimpleText, 
> sim=RandomSimilarityProvider(queryNorm=true,coord=false): {field=DFR GB1, 
> id=DFR I(F)L1, content=IB SPL-D3(800.0), f=DFR G2}, locale=de_AT, 
> timezone=America/Cambridge_Bay
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestAssertions, TestSearchForDuplicates, TestMockAnalyzer, 
> TestDocValues, TestPerFieldPostingsFormat, TestDocument, TestAddIndexes, 
> TestConcurrentMergeScheduler, TestCrashCausesCorruptIndex, TestDocCount, 
> TestDocumentsWriterDeleteQueue, TestFieldInfos, TestFilterIndexReader, 
> TestFlex, TestIndexInput, TestIndexWriter, TestIndexWriterMergePolicy, 
> TestIndexWriterMerging, TestIndexWriterNRTIsCurrent, 
> TestIndexWriterOnDiskFull]
> [junit] NOTE: FreeBSD 8.2-RELEASE amd64/Sun Microsystems Inc. 1.6.0 
> (64-bit)/cpus=16,threads=1,free=39156976,total=180748288
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3713) TestIndexWriterOnDiskFull.testAddIndexOnDiskFull fails with java.lang.IllegalStateException: CFS has pending open files

2012-01-21 Thread Simon Willnauer (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3713:


Lucene Fields: New,Patch Available  (was: New)

> TestIndexWriterOnDiskFull.testAddIndexOnDiskFull fails with 
> java.lang.IllegalStateException: CFS has pending open files 
> 
>
> Key: LUCENE-3713
> URL: https://issues.apache.org/jira/browse/LUCENE-3713
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3713.patch
>
>
> {noformat}
>  Testsuite: org.apache.lucene.index.TestIndexWriterOnDiskFull
> [junit] Testcase: 
> testAddIndexOnDiskFull(org.apache.lucene.index.TestIndexWriterOnDiskFull):
>   Caused an ERROR
> [junit] CFS has pending open files
> [junit] java.lang.IllegalStateException: CFS has pending open files
> [junit]   at 
> org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:162)
> [junit]   at 
> org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:206)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4099)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3661)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3260)
> [junit]   at 
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1902)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1716)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1670)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:304)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
> [junit] 
> [junit] 
> [junit] Tests run: 4, Failures: 0, Errors: 1, Time elapsed: 31.96 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test 
> -Dtestcase=TestIndexWriterOnDiskFull -Dtestmethod=testAddIndexOnDiskFull 
> -Dtests.seed=-7dd066d256827211:127c018cbf5b0975:20481cd18a7d8b6e 
> -Dtests.multiplier=3 -Dtests.nightly=true -Dargs="-Dfile.encoding=ISO8859-1"
> [junit] NOTE: test params are: codec=SimpleText, 
> sim=RandomSimilarityProvider(queryNorm=true,coord=false): {field=DFR GB1, 
> id=DFR I(F)L1, content=IB SPL-D3(800.0), f=DFR G2}, locale=de_AT, 
> timezone=America/Cambridge_Bay
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestAssertions, TestSearchForDuplicates, TestMockAnalyzer, 
> TestDocValues, TestPerFieldPostingsFormat, TestDocument, TestAddIndexes, 
> TestConcurrentMergeScheduler, TestCrashCausesCorruptIndex, TestDocCount, 
> TestDocumentsWriterDeleteQueue, TestFieldInfos, TestFilterIndexReader, 
> TestFlex, TestIndexInput, TestIndexWriter, TestIndexWriterMergePolicy, 
> TestIndexWriterMerging, TestIndexWriterNRTIsCurrent, 
> TestIndexWriterOnDiskFull]
> [junit] NOTE: FreeBSD 8.2-RELEASE amd64/Sun Microsystems Inc. 1.6.0 
> (64-bit)/cpus=16,threads=1,free=39156976,total=180748288
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3713) TestIndexWriterOnDiskFull.testAddIndexOnDiskFull fails with java.lang.IllegalStateException: CFS has pending open files

2012-01-21 Thread Simon Willnauer (Created) (JIRA)

TestIndexWriterOnDiskFull.testAddIndexOnDiskFull fails with 
java.lang.IllegalStateException: CFS has pending open files 


 Key: LUCENE-3713
 URL: https://issues.apache.org/jira/browse/LUCENE-3713
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


{noformat}
 Testsuite: org.apache.lucene.index.TestIndexWriterOnDiskFull
[junit] Testcase: 
testAddIndexOnDiskFull(org.apache.lucene.index.TestIndexWriterOnDiskFull):  
  Caused an ERROR
[junit] CFS has pending open files
[junit] java.lang.IllegalStateException: CFS has pending open files
[junit] at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:162)
[junit] at 
org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:206)
[junit] at 
org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4099)
[junit] at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3661)
[junit] at 
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3260)
[junit] at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
[junit] at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1902)
[junit] at 
org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1716)
[junit] at 
org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1670)
[junit] at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:304)
[junit] at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
[junit] 
[junit] 
[junit] Tests run: 4, Failures: 0, Errors: 1, Time elapsed: 31.96 sec
[junit] 
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterOnDiskFull 
-Dtestmethod=testAddIndexOnDiskFull 
-Dtests.seed=-7dd066d256827211:127c018cbf5b0975:20481cd18a7d8b6e 
-Dtests.multiplier=3 -Dtests.nightly=true -Dargs="-Dfile.encoding=ISO8859-1"
[junit] NOTE: test params are: codec=SimpleText, 
sim=RandomSimilarityProvider(queryNorm=true,coord=false): {field=DFR GB1, 
id=DFR I(F)L1, content=IB SPL-D3(800.0), f=DFR G2}, locale=de_AT, 
timezone=America/Cambridge_Bay
[junit] NOTE: all tests run in this JVM:
[junit] [TestAssertions, TestSearchForDuplicates, TestMockAnalyzer, 
TestDocValues, TestPerFieldPostingsFormat, TestDocument, TestAddIndexes, 
TestConcurrentMergeScheduler, TestCrashCausesCorruptIndex, TestDocCount, 
TestDocumentsWriterDeleteQueue, TestFieldInfos, TestFilterIndexReader, 
TestFlex, TestIndexInput, TestIndexWriter, TestIndexWriterMergePolicy, 
TestIndexWriterMerging, TestIndexWriterNRTIsCurrent, TestIndexWriterOnDiskFull]
[junit] NOTE: FreeBSD 8.2-RELEASE amd64/Sun Microsystems Inc. 1.6.0 
(64-bit)/cpus=16,threads=1,free=39156976,total=180748288
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-trunk - Build # 1805 - Still Failing

2012-01-21 Thread Simon Willnauer

I opened LUCENE-3713 for this failure

On Sat, Jan 21, 2012 at 6:20 AM, Apache Jenkins Server
 wrote:
> Build: https://builds.apache.org/job/Lucene-trunk/1805/
>
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull
>
> Error Message:
> CFS has pending open files
>
> Stack Trace:
> java.lang.IllegalStateException: CFS has pending open files
>        at 
> org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:162)
>        at 
> org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:206)
>        at 
> org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4099)
>        at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3661)
>        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3260)
>        at 
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
>        at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1902)
>        at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1716)
>        at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1670)
>        at 
> org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:304)
>        at 
> org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
>        at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
>        at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
>
>
>
>
> Build Log (for compile errors):
> [...truncated 13077 lines...]
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy

2012-01-21 Thread Simon Willnauer (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190500#comment-13190500
 ] 

Simon Willnauer commented on SOLR-2983:
---

tomasso can you update changes.txt too. once this is done I can just commit it, 
thanks!

> Unable to load custom MergePolicy
> -
>
> Key: SOLR-2983
> URL: https://issues.apache.org/jira/browse/SOLR-2983
> Project: Solr
>  Issue Type: Bug
>Reporter: Mathias Herberts
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-2983.patch
>
>
> As part of a recent upgrade to Solr 3.5.0 we encountered an error related to 
> our use of LinkedIn's ZoieMergePolicy.
> It seems the code that loads a custom MergePolicy was at some point moved 
> into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was 
> copied verbatim it now contains a bug:
> try {
>   policy = (MergePolicy) 
> schema.getResourceLoader().newInstance(mpClassName, null, new 
> Class[]{IndexWriter.class}, new Object[]{this});
> } catch (Exception e) {
>   policy = (MergePolicy) 
> schema.getResourceLoader().newInstance(mpClassName);
> }
> 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call 
> to newInstance will always throw an exception and the catch clause will be 
> executed. If the custom MergePolicy does not have a default constructor 
> (which is the case of ZoieMergePolicy), the second attempt to create the 
> MergePolicy will also fail and Solr won't start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-21 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190456#comment-13190456
 ] 

Uwe Schindler commented on LUCENE-2858:
---

Simon: Just to inform you, I am working on this. Currently I have a heavy 
broken checkout that does no longer compile at all :( Working, working, 
working... It's a mess!

Once I have something intially compiling for core (not tests), I will create a 
branch!

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> --
>
> Key: LUCENE-2858
> URL: https://issues.apache.org/jira/browse/LUCENE-2858
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 4.0
>
>
> With current trunk, whenever you open an IndexReader on a directory you get 
> back a DirectoryReader which is a composite reader. The interface of 
> IndexReader has now lots of methods that simply throw UOE (in fact more than 
> 50% of all methods that are commonly used ones are unuseable now). This 
> confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a 
> separate API. After that, you are no longer able, to get TermsEnum without 
> wrapping from those composite readers. We currently have helper classes for 
> wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
> Multi*), those should be retrofitted to implement the correct classes 
> (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
> reader as ctor param, maybe it could also simply take a List). 
> In my opinion, maybe composite readers could implement some collection APIs 
> and also have the ReaderUtil method directly built in (possibly as a "view" 
> in the util.Collection sense). In general composite readers do not really 
> need to look like the previous IndexReaders, they could simply be a 
> "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a 
> segment changes, you need a new atomic reader? - maybe because of deletions 
> thats not the best idea, but we should investigate. Maybe make the whole 
> reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3706) add offsets into lucene40 postings

2012-01-21 Thread Robert Muir (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3706:


Attachment: LUCENE-3706.patch

Updated patch with tests for skipping and offsets + payloads.

this found a bad assert in FieldInfosWriter, I think its ready now.

> add offsets into lucene40 postings
> --
>
> Key: LUCENE-3706
> URL: https://issues.apache.org/jira/browse/LUCENE-3706
> Project: Lucene - Java
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-3706.patch, LUCENE-3706.patch
>
>
> LUCENE-3684 added support for 
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS, but
> only SimpleText implements it.
> I think we should implement it in the other 4.0 codecs (starting with 
> Lucene40PostingsFormat).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3712) Remove unused (and untested) methods from ReaderUtil that are also veeeeery ineffective

2012-01-21 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190429#comment-13190429
 ] 

Robert Muir commented on LUCENE-3712:
-

+1, untested and unused, nuke it.

> Remove unused (and untested) methods from ReaderUtil that are also very 
> ineffective
> ---
>
> Key: LUCENE-3712
> URL: https://issues.apache.org/jira/browse/LUCENE-3712
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Affects Versions: 3.5
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3712.patch
>
>
> ReaderUtil contains two methods that are nowhere used and not even tested. 
> Additionally those are implemented with useless List->array copying; 
> ineffective docStart calculation for a binary search later instead directly 
> returning the reader while scanning -- and I am not sure if they really work 
> as expected. As ReaderUtil is @lucene.internal we should remove them in 3.x 
> and trunk, alternatively the useless array copy / docStarts handling should 
> be removed and tests added:
> {code:java}
> public static IndexReader subReader(int doc, IndexReader reader)
> public static IndexReader subReader(IndexReader reader, int subIndex)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3712) Remove unused (and untested) methods from ReaderUtil that are also veeeeery ineffective

2012-01-21 Thread Uwe Schindler (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3712:
--

Affects Version/s: 3.5
Fix Version/s: 4.0
   3.6

> Remove unused (and untested) methods from ReaderUtil that are also very 
> ineffective
> ---
>
> Key: LUCENE-3712
> URL: https://issues.apache.org/jira/browse/LUCENE-3712
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Affects Versions: 3.5
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3712.patch
>
>
> ReaderUtil contains two methods that are nowhere used and not even tested. 
> Additionally those are implemented with useless List->array copying; 
> ineffective docStart calculation for a binary search later instead directly 
> returning the reader while scanning -- and I am not sure if they really work 
> as expected. As ReaderUtil is @lucene.internal we should remove them in 3.x 
> and trunk, alternatively the useless array copy / docStarts handling should 
> be removed and tests added:
> {code:java}
> public static IndexReader subReader(int doc, IndexReader reader)
> public static IndexReader subReader(IndexReader reader, int subIndex)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3712) Remove unused (and untested) methods from ReaderUtil that are also veeeeery ineffective

2012-01-21 Thread Uwe Schindler (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3712:
--

Attachment: LUCENE-3712.patch

> Remove unused (and untested) methods from ReaderUtil that are also very 
> ineffective
> ---
>
> Key: LUCENE-3712
> URL: https://issues.apache.org/jira/browse/LUCENE-3712
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Affects Versions: 3.5
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3712.patch
>
>
> ReaderUtil contains two methods that are nowhere used and not even tested. 
> Additionally those are implemented with useless List->array copying; 
> ineffective docStart calculation for a binary search later instead directly 
> returning the reader while scanning -- and I am not sure if they really work 
> as expected. As ReaderUtil is @lucene.internal we should remove them in 3.x 
> and trunk, alternatively the useless array copy / docStarts handling should 
> be removed and tests added:
> {code:java}
> public static IndexReader subReader(int doc, IndexReader reader)
> public static IndexReader subReader(IndexReader reader, int subIndex)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3712) Remove unused (and untested) methods from ReaderUtil that are also veeeeery ineffective

2012-01-21 Thread Uwe Schindler (Created) (JIRA)

Remove unused (and untested) methods from ReaderUtil that are also very 
ineffective
---

 Key: LUCENE-3712
 URL: https://issues.apache.org/jira/browse/LUCENE-3712
 Project: Lucene - Java
  Issue Type: Task
  Components: core/other
Reporter: Uwe Schindler
Assignee: Uwe Schindler


ReaderUtil contains two methods that are nowhere used and not even tested. 
Additionally those are implemented with useless List->array copying; 
ineffective docStart calculation for a binary search later instead directly 
returning the reader while scanning -- and I am not sure if they really work as 
expected. As ReaderUtil is @lucene.internal we should remove them in 3.x and 
trunk, alternatively the useless array copy / docStarts handling should be 
removed and tests added:

{code:java}
public static IndexReader subReader(int doc, IndexReader reader)
public static IndexReader subReader(IndexReader reader, int subIndex)
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3671) Add a TypeTokenFilter

2012-01-21 Thread Tommaso Teofili (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190425#comment-13190425
 ] 

Tommaso Teofili commented on LUCENE-3671:
-

Thanks Uwe for taking care of it :)

> Add a TypeTokenFilter
> -
>
> Key: LUCENE-3671
> URL: https://issues.apache.org/jira/browse/LUCENE-3671
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/queryparser
>Reporter: Santiago M. Mola
>Assignee: Uwe Schindler
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3671.patch, LUCENE-3671_2.patch, 
> LUCENE-3671_3.patch
>
>
> It would be convenient to have a TypeTokenFilter that filters tokens by its 
> type, either with an exclude or include list. This might be a stupid thing to 
> provide for people who use Lucene directly, but it would be very useful to 
> later expose it to Solr and other Lucene-backed search solutions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3045) eDismax: Allow virtual fields

2012-01-21 Thread Commented


[ 
https://issues.apache.org/jira/browse/SOLR-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190371#comment-13190371
 ] 

Jan Høydahl commented on SOLR-3045:
---

Alternatively, Hoss' suggestion from SOLR-3026, with per-field override syntax 
for the virtual fields that will cause DMQ sub-queries. I like this syntax 
better than mine :)

{noformat}
 q=elephant title:dumbo who:george
  &qf=title^3 firstname lastname^2 description^2 catchall
  &uf=title^5 who^2 *
  &f.who.qf=firstname lastname^10
{noformat}

> eDismax: Allow virtual fields
> -
>
> Key: SOLR-3045
> URL: https://issues.apache.org/jira/browse/SOLR-3045
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Jan Høydahl
>
> Imagine a one-field yellow page search using eDisMax across fields
> {noformat}
> qf=firstname middlename lastname companyname category^10.0 subcategory 
> products address street zip city^5.0 state
> {noformat}
> Now this of course works well. But what if I want to offer my users fielded 
> search on "who", "what" and "where".
> A way to do this now is copyField into three new fields with these names. But 
> then you lose the internal weight between the sub fields.
> A more elegant way would be allowing virtual field names mapping to multiple 
> fields, so user can search where:london and match address, street, zip, city 
> or state automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3026) eDismax: Locking down which fields can be explicitly queried (user fields aka uf)

2012-01-21 Thread Commented


[ 
https://issues.apache.org/jira/browse/SOLR-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190369#comment-13190369
 ] 

Jan Høydahl commented on SOLR-3026:
---

I like the f.who.qf style. And the fact that you then can boost the whole DMQ 
clause as a whole.. I'll add that to SOLR-3045 as a suggestion.

But it's a bit overkill to spin a DMQ for simple single-field-aliasing, i.e. my 
example &uf=title:searchable_title_t.
Ideally such a simple field name aliasing should be supported on the Lucene 
parser level.
Alternatively it could be another per-field param
{noformat}
&f.title.fmap=searchable_title_t
{noformat}

I'm still not sure how to use the built-in aliasing to implement this

> eDismax: Locking down which fields can be explicitly queried (user fields aka 
> uf)
> -
>
> Key: SOLR-3026
> URL: https://issues.apache.org/jira/browse/SOLR-3026
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3026.patch
>
>
> We need a way to specify exactly what fields should be available to the end 
> user as fielded search.
> In the original SOLR-1553, there's a patch implementing "user fields", but it 
> was never committed even if that issue was closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3045) eDismax: Allow virtual fields

2012-01-21 Thread Updated


 [ 
https://issues.apache.org/jira/browse/SOLR-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3045:
--

Description: 
Imagine a one-field yellow page search using eDisMax across fields
{noformat}
qf=firstname middlename lastname companyname category^10.0 subcategory products 
address street zip city^5.0 state
{noformat}

Now this of course works well. But what if I want to offer my users fielded 
search on "who", "what" and "where".
A way to do this now is copyField into three new fields with these names. But 
then you lose the internal weight between the sub fields.
A more elegant way would be allowing virtual field names mapping to multiple 
fields, so user can search where:london and match address, street, zip, city or 
state automatically.

  was:
Imagine a one-field yellow page search using eDisMax across fields
{noformat}
qf=firstname middlename lastname companyname category^10.0 subcategory products 
address street zip city^5.0 state
{noformat}

Now this of course works well. But what if I want to offer my users fielded 
search on "who", "what" and "where".
A way to do this now is copyField into three new fields with these names. But 
then you lose the internal weight between the sub fields.
A more elegant way would be allowing virtual field names mapping to multiple 
fields. Imagine uf extended further:
{noformat}
&uf=who:firstname,middlename,lastname^2.0,companyname 
what:category,subcategory,products where:address,street,zip,city^10.0,state
{noformat}

This could probably be solved by adding each as a dismax sub-Query


One option: Imagine uf extended further:
{noformat}
&uf=who:firstname,middlename,lastname^2.0,companyname 
what:category,subcategory,products where:address,street,zip,city^10.0,state
{noformat}

This could probably be solved by adding each as a dismax sub-Query

> eDismax: Allow virtual fields
> -
>
> Key: SOLR-3045
> URL: https://issues.apache.org/jira/browse/SOLR-3045
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Jan Høydahl
>
> Imagine a one-field yellow page search using eDisMax across fields
> {noformat}
> qf=firstname middlename lastname companyname category^10.0 subcategory 
> products address street zip city^5.0 state
> {noformat}
> Now this of course works well. But what if I want to offer my users fielded 
> search on "who", "what" and "where".
> A way to do this now is copyField into three new fields with these names. But 
> then you lose the internal weight between the sub fields.
> A more elegant way would be allowing virtual field names mapping to multiple 
> fields, so user can search where:london and match address, street, zip, city 
> or state automatically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

42 matches

Mail list logo