date:20120802

[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-02 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427840#comment-13427840
 ] 

Steven Rowe commented on SOLR-1725:
---

The tests committed here 3 weeks ago have never succeeded under the Jenkins 
trunk and branch_4x maven builds.  (For some reason failure notification emails 
aren't making it to the dev list.)  E.g. 
[https://builds.apache.org/job/Lucene-Solr-Maven-trunk/554/].  Javascript 
engine appears to not be found.  I don't understand why this would be the case, 
though, since the Ant tests succeed running under the same JVM.

> Script based UpdateRequestProcessorFactory
> --
>
> Key: SOLR-1725
> URL: https://issues.apache.org/jira/browse/SOLR-1725
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Uri Boness
>Assignee: Erik Hatcher
>  Labels: UpdateProcessor
> Fix For: 4.0
>
> Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
> SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
> SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
> SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
> support). The main goal of this plugin is to be able to configure/write 
> update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in 
> scripts located in {{solr.solr.home}} directory. The functory accepts one 
> (mandatory) configuration parameter named {{scripts}} which accepts a 
> comma-separated list of file names. It will look for these files under the 
> {{conf}} directory in solr home. When multiple scripts are defined, their 
> execution order is defined by the lexicographical order of the script file 
> name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, 
> a *.js files will be treated as a JavaScript script), therefore an extension 
> is mandatory.
> Each script file is expected to have one or more methods with the same 
> signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
> *not* required to define all methods, only those hat are required by the 
> processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat

2012-08-02 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427811#comment-13427811
 ] 

Han Jiang commented on LUCENE-4225:
---

OK, thanks Mike!

> New FixedPostingsFormat for less overhead than SepPostingsFormat
> 
>
> Key: LUCENE-4225
> URL: https://issues.apache.org/jira/browse/LUCENE-4225
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-4225-on-rev-1362013.patch, LUCENE-4225.patch, 
> LUCENE-4225.patch, LUCENE-4225.patch, LUCENE-4225.patch
>
>
> I've worked out the start at a new postings format that should have
> less overhead for fixed-int[] encoders (For,PFor)... using ideas from
> the old bulk branch, and new ideas from Robert.
> It's only a start: there's no payloads support yet, and I haven't run
> Lucene's tests with it, except for one new test I added that tries to
> be a thorough PostingsFormat tester (to make it easier to create new
> postings formats).  It does pass luceneutil's performance test, so
> it's at least able to run those queries correctly...
> Like Lucene40, it uses two files (though once we add payloads it may
> be 3).  The .doc file interleaves doc delta and freq blocks, and .pos
> has position delta blocks.  Unlike sep, blocks are NOT shared across
> terms; instead, it uses block encoding if there are enough ints to
> encode, else the same Lucene40 vInt format.  This means low-freq terms
> (< 128 = current default block size) are always vInts, and high-freq
> terms will have some number of blocks, with a vInt final block.
> Skip points are only recorded at block starts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-3639) We should update to ZooKeeper 3.3.5

2012-08-02 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reopened SOLR-3639:
---


3.3.6 just hit - once ivy can find it, I'll update: 
http://www.cloudera.com/blog/2012/08/apache-zookeeper-3-3-6-has-been-released/

> We should update to ZooKeeper 3.3.5
> ---
>
> Key: SOLR-3639
> URL: https://issues.apache.org/jira/browse/SOLR-3639
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0, 5.0
>
>
> We should update to 3.3.5 - there was a corruption issue fixed.
> http://zookeeper.apache.org/doc/r3.3.5/releasenotes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4203) Add IndexWriter.tryDeleteDocument, to delete by document id when possible

2012-08-02 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4203.


   Resolution: Fixed
Fix Version/s: 5.0
   4.0

> Add IndexWriter.tryDeleteDocument, to delete by document id when possible
> -
>
> Key: LUCENE-4203
> URL: https://issues.apache.org/jira/browse/LUCENE-4203
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4203.patch, LUCENE-4203.patch
>
>
> Spinoff from LUCENE-4069.
> In that use case, where the app needs to first lookup a document, then
> call updateDocument, it's wasteful today because the relatively costly
> lookup (by a primary key field, eg "id") is done twice.
> But, since you already resolved the PK to docID on the first lookup,
> it would be nice to then delete by that docID and then you can call
> addDocument instead.
> So I worked out a rough start at this, by adding
> IndexWriter.tryDeleteDocument.  It'd be a very expert API: it takes a
> SegmentInfo (referencing the segment that contains the docID), and as
> long as that segment hasn't yet been merged away, it will mark the
> document for deletion and return true (success).  If it has been
> merged away it returns false and the app must then delete-by-term.  It
> only works if the writer is in NRT mode (ie you've opened an NRT
> reader).
> In LUCENE-4069 using tryDeleteDocument gave a ~20% net speedup.
> I think tryDeleteDocument would also be useful when Solr "updates" a
> document by loading all stored fields, changing them, and calling
> updateDocument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427667#comment-13427667
 ] 

Michael McCandless commented on LUCENE-4283:


I think we shouldn't have to do our own buffering up of the skip points within 
one block?

Can't we call skipWriter.bufferSkip every skipInterval docs (and pass it 
lastDocID, etc.)?  Then it can write the skip point immediately.

Also, in BlockPostingsReader, why do we need a separate docBufferOffset?  Can't 
we just set docBufferUpto to wherever (36, 64, 96) we had skipped to within the 
block?

> Support more frequent skip with Block Postings Format
> -
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Han Jiang
>Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
> Every time the skipper reaches the last level 0 skip point, we'll have to 
> decode a whole block to read doc/freq data. Also,  a higher level skip list 
> will be created only for those df>blockSize^k, which means for most terms, 
> skipping will just be a linear scan. If we increase current blockSize for 
> better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set 
> skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3658) SolrCmdDistributor can briefly create spikes of threads in the thousands.

2012-08-02 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427659#comment-13427659
 ] 

Mark Miller commented on SOLR-3658:
---

There were some real problems with my previous solution - it somewhat worked 
accidentally - but I think really damaged performance probably.

I just committed a new approach that has tested out nicely so far.

> SolrCmdDistributor can briefly create spikes of threads in the thousands.
> -
>
> Key: SOLR-3658
> URL: https://issues.apache.org/jira/browse/SOLR-3658
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3658.patch
>
>
> see mailing list http://markmail.org/thread/yy5b7g6g7733wgcp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3703) Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser

2012-08-02 Thread srinivas (JIRA)

srinivas created SOLR-3703:
--

 Summary: Escape character which is in the query, is getting 
ignored in solr 3.6 with lucene parser
 Key: SOLR-3703
 URL: https://issues.apache.org/jira/browse/SOLR-3703
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Linux
Reporter: srinivas


I noticed, escape character which is in the query, is getting ignored in solr 
3.6 with lucene parser. If I give edismax, then it is giving expected results 
for the following query. 


select?q=author:David\ Duke&defType=lucene 
Would render the same results as: 
select?q=author:(David OR Duke)&defType=lucene 

But 
select?q=author:David\ Duke&defType=edismax 
Would render the same results as: 
select?q=author:"David Duke"&defType=lucene 

Regards
Srini


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2115) DataImportHandler config file must be specified in "defaults" or status will be "DataImportHandler started. Not Initialized. No commands can be run"

2012-08-02 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2115:
-

Attachment: SOLR-2115.patch

Updated patch, which I plan to commit soon.

> DataImportHandler config file *must* be specified in "defaults" or status 
> will be "DataImportHandler started. Not Initialized. No commands can be run"
> --
>
> Key: SOLR-2115
> URL: https://issues.apache.org/jira/browse/SOLR-2115
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4.1, 1.4.2, 3.1, 4.0-ALPHA
>Reporter: Lance Norskog
>Assignee: James Dyer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2115.patch, SOLR-2115.patch
>
>
> The DataImportHandler has two URL parameters for defining the data-config.xml 
> file to be used for the command. 'config' is used in some places and 
> 'dataConfig' is used in other places.
> 'config' does not work from an HTTP request. However, if it is in the 
> "defaults" section of the DIH  definition, it works. If the 
> 'config' parameter is used in an HTTP request, the DIH uses the default in 
> the  anyway.
> This is the exception stack recieved by the client if there is no default. 
> (This is the 3.X branch.)
> 
> 
> 
> Error 500 
> 
> HTTP ERROR: 500null
> java.lang.NullPointerException
>   at 
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:146)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> ..etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3699) SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig

2012-08-02 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3699:
---

Attachment: SOLR-3699.patch

My quick and dirty attempt to fix this by making SolrIndexWriter's constructor 
private and adding a static "create" method that deals with calling 
directoryFactory.release() if the private constructor fails.

Unfortunately it's still not working ... not clear to me why, but i'm about to 
get on a plain and won't have a chance to dig into it anymore for another 3-4 
days, so i wanted to get what i have into Jira in case anyone else wants to 
take a stab at it.

> SolrIndexWriter constructor leaks Directory if Exception creating 
> IndexWriterConfig
> ---
>
> Key: SOLR-3699
> URL: https://issues.apache.org/jira/browse/SOLR-3699
> Project: Solr
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3699.patch, SOLR-3699.patch
>
>
> in LUCENE-4278 i had to add a hack to force SimpleFSDir for 
> CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on 
> certain errors.
> This might indicate a problem that leaks happen if certain errors happen 
> (e.g. not handled in finally)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2446) org.apache.solr.client.solrj.beans.DocumentObjectBinder customization

2012-08-02 Thread Michael Andrews (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427604#comment-13427604
 ] 

Michael Andrews commented on SOLR-2446:
---

This would be very helpful.

> org.apache.solr.client.solrj.beans.DocumentObjectBinder customization
> -
>
> Key: SOLR-2446
> URL: https://issues.apache.org/jira/browse/SOLR-2446
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.4.1
>Reporter: Alexander Suslov
>Priority: Minor
> Fix For: 3.1.1
>
> Attachments: patch.zip
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I suggest adding a way to customize DocumentObjectBinder behavior.  It is not 
> always possible to perform mapping between SolrInputDocument and beans using 
> the default implementation. And SolrServer doesn't have a way to change the 
> default binder for a different implementation. My suggestion is very simple: 
> introduce an interface for binder, and add the ability to set the custom 
> binder for the SolrServer. Please find suggested changes in the attached 
> file. Such an addition will make the solr4J library more flexible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427573#comment-13427573
 ] 

Michael McCandless commented on LUCENE-4283:


Billy, it looks like this patch is a bit stale (it doesn't apply on the current 
branch)?  Can you please update it?  Thanks.

> Support more frequent skip with Block Postings Format
> -
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Han Jiang
>Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
> Every time the skipper reaches the last level 0 skip point, we'll have to 
> decode a whole block to read doc/freq data. Also,  a higher level skip list 
> will be created only for those df>blockSize^k, which means for most terms, 
> skipping will just be a linear scan. If we increase current blockSize for 
> better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set 
> skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2012-08-02 Thread Luca Stancapiano (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427569#comment-13427569
 ] 

Luca Stancapiano commented on LUCENE-3167:
--

Hi guys, Nicolas, I confirm for the October 2011... in that time the patch 
worked.I'm wondered that none has still committed the work...I will be here 
for help. Let me know!

> Make lucene/solr a OSGI bundle through Ant
> --
>
> Key: LUCENE-3167
> URL: https://issues.apache.org/jira/browse/LUCENE-3167
> Project: Lucene - Core
>  Issue Type: New Feature
> Environment: bndtools
>Reporter: Luca Stancapiano
> Attachments: LUCENE-3167.patch, LUCENE-3167.patch, LUCENE-3167.patch, 
> lucene_trunk.patch, lucene_trunk.patch
>
>
> We need to make a bundle thriugh Ant, so the binary can be published and no 
> more need the download of the sources. Actually to get a OSGI bundle we need 
> to use maven tools and build the sources. Here the reference for the creation 
> of the OSGI bundle through Maven:
> https://issues.apache.org/jira/browse/LUCENE-1344
> Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat

2012-08-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427562#comment-13427562
 ] 

Michael McCandless commented on LUCENE-4225:


OK I committed the fix: Block/PackedPF was incorrectly encoding offsets as 
startOffset - lastEndOffset.  It must instead be startOffset - lastStartOffset 
because it is possible (though rare) for startOffset - lastEndOffset to be 
negative.

I also separately committed a fix for NPEs that tests were hitting when the 
index didn't index payloads nor offsets.  Tests should now pass for BlockPF and 
BlockPackedPF...

> New FixedPostingsFormat for less overhead than SepPostingsFormat
> 
>
> Key: LUCENE-4225
> URL: https://issues.apache.org/jira/browse/LUCENE-4225
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-4225-on-rev-1362013.patch, LUCENE-4225.patch, 
> LUCENE-4225.patch, LUCENE-4225.patch, LUCENE-4225.patch
>
>
> I've worked out the start at a new postings format that should have
> less overhead for fixed-int[] encoders (For,PFor)... using ideas from
> the old bulk branch, and new ideas from Robert.
> It's only a start: there's no payloads support yet, and I haven't run
> Lucene's tests with it, except for one new test I added that tries to
> be a thorough PostingsFormat tester (to make it easier to create new
> postings formats).  It does pass luceneutil's performance test, so
> it's at least able to run those queries correctly...
> Like Lucene40, it uses two files (though once we add payloads it may
> be 3).  The .doc file interleaves doc delta and freq blocks, and .pos
> has position delta blocks.  Unlike sep, blocks are NOT shared across
> terms; instead, it uses block encoding if there are enough ints to
> encode, else the same Lucene40 vInt format.  This means low-freq terms
> (< 128 = current default block size) are always vInts, and high-freq
> terms will have some number of blocks, with a vInt final block.
> Skip points are only recorded at block starts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427559#comment-13427559
 ] 

Michael McCandless commented on LUCENE-4282:


+1

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: LUCENE-4282-tests.patch, LUCENE-4282.patch, 
> LUCENE-4282.patch, ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2012-08-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427554#comment-13427554
 ] 

Michael McCandless commented on LUCENE-2501:


I committed the patch, but I'll leave this open until we can hear back from Tim 
or Gili that this has resolved the issue ...

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
> Attachments: LUCENE-2501.patch
>
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427531#comment-13427531
 ] 

Robert Muir commented on LUCENE-2501:
-

+1

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
> Attachments: LUCENE-2501.patch
>
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4282:


Attachment: LUCENE-4282.patch

A simpler patch, i also benchmarked.

The problem is this comment in the legacy scoring (in all previous lucene 
versions):
{noformat}
  // this will return less than 0.0 when the edit distance is
  // greater than the number of characters in the shorter word.
  // but this was the formula that was previously used in FuzzyTermEnum,
  // so it has not been changed (even though minimumSimilarity must be
  // greater than 0.0)
{noformat}

Because of that its really impossible to fix until we remove that deprecated 
one completely :)

So i think this one is good to commit, and separately I will look at removing 
the deprecated one from trunk and cleaning all this up when i have time (I 
would port the math-proof tests from automata-package to run as queries so we 
are sure).


> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: LUCENE-4282-tests.patch, LUCENE-4282.patch, 
> LUCENE-4282.patch, ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2012-08-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427518#comment-13427518
 ] 

Michael McCandless commented on LUCENE-2501:


I'm glad too!

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
> Attachments: LUCENE-2501.patch
>
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2012-08-02 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427517#comment-13427517
 ] 

Simon Willnauer commented on LUCENE-2501:
-

sneaky, glad that this stuff is single threaded in 4.0 :)

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
> Attachments: LUCENE-2501.patch
>
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2012-08-02 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2501:
---

Attachment: LUCENE-2501.patch

OK I found a possible cause behind this ... it was something I had
fixed but didn't pull out and backport to 3.x LUCENE-3684.

It's a thread safety issue, when FielfInfo.indexOptions changes from
DOCS_AND_FREQS_AND_POSITIONS to not indexing positions.  If this
happens in one thread while a new thread is suddenly indexing a that
same field there's a narrow window where the 2nd thread's
FreqProxTermsWriterPerField can mis-report the streamCount as 1 when
it should be 2.

Attached patch (3.6.x) should fix it.  I tried to get a thread test to
provoke this but couldn't ... I think the window is too small (if I
forcefully add sleeps at the "right time" in
FreqProxTermsWriterPerField then I could provoke it...).


> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
> Attachments: LUCENE-2501.patch
>
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4177) TestPerfTasksLogic.testBGSearchTaskThreads sometimes fails or hangs on Windows

2012-08-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427479#comment-13427479
 ] 

Michael McCandless commented on LUCENE-4177:


bq. As a side note, wouldn't it be easier to propagate a single flag object 
instead of method calls? I

I completely agree AtomicBoolean is the right solution here ... but I don't 
have time now to fix it.  I'll commit the patch ...

> TestPerfTasksLogic.testBGSearchTaskThreads sometimes fails or hangs on Windows
> --
>
> Key: LUCENE-4177
> URL: https://issues.apache.org/jira/browse/LUCENE-4177
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-4177.patch
>
>
> e.g.
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java6-64/147/
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/408/
> this has happened a couple times... but always on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: TestIndexWriterDelete fails with OOM

2012-08-02 Thread Michael McCandless

Thanks Robert.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Aug 2, 2012 at 1:35 PM, Robert Muir  wrote:
> this test method uses only one field, but disables simpletext and
> memory already. i think directPF was just an omission. I'll add it
> too.
>
> On Thu, Aug 2, 2012 at 11:07 AM, Simon Willnauer
>  wrote:
>> I see this on http://85.25.120.39/job/Lucene-trunk-Linux-Java6-64/162/console
>> should we disable the mem intensive codecs for this test?
>>
>> [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterDelete
>> [junit4:junit4] ERROR   35.5s J1 |
>> TestIndexWriterDelete.testIndexingThenDeleting
>> [junit4:junit4]> Throwable #1: java.lang.OutOfMemoryError: Java heap 
>> space
>> [junit4:junit4]>at
>> __randomizedtesting.SeedInfo.seed([F645C683FAA013CE:8FD749EA602CCB95]:0)
>> [junit4:junit4]>at
>> org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectField.(DirectPostingsFormat.java:385)
>> [junit4:junit4]>at
>> org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectFields.(DirectPostingsFormat.java:130)
>> [junit4:junit4]>at
>> org.apache.lucene.codecs.memory.DirectPostingsFormat.fieldsProducer(DirectPostingsFormat.java:112)
>> [junit4:junit4]>at
>> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:186)
>> [junit4:junit4]>at
>> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:250)
>> [junit4:junit4]>at
>> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:107)
>> [junit4:junit4]>at
>> org.apache.lucene.index.SegmentReader.(SegmentReader.java:55)
>> [junit4:junit4]>at
>> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
>> [junit4:junit4]>at
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:638)
>> [junit4:junit4]>at
>> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
>> [junit4:junit4]>at
>> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:62)
>> [junit4:junit4]>at
>> org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:583)
>> [junit4:junit4]>at
>> org.apache.lucene.index.TestIndexWriterDelete.testIndexingThenDeleting(TestIndexWriterDelete.java:935)
>> [junit4:junit4]>at
>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> [junit4:junit4]>at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> [junit4:junit4]>at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> [junit4:junit4]>at java.lang.reflect.Method.invoke(Method.java:597)
>> [junit4:junit4]>at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
>> [junit4:junit4]>at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>> [junit4:junit4]>at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
>> [junit4:junit4]>at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
>> [junit4:junit4]>at
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
>> [junit4:junit4]>at
>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>> [junit4:junit4]>at
>> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>> [junit4:junit4]>at
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>> [junit4:junit4]>at
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>> [junit4:junit4]>at
>> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>> [junit4:junit4]>at
>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>> [junit4:junit4]>at
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
>> [junit4:junit4]>at
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>> [junit4:junit4]>at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
>> [junit4:junit4]>at
>> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>> [junit4:junit4]>
>> [junit4:junit4]   2> NOTE: download the large Jenkins line-docs file
>> by running 'ant get-jenkins-line-docs' in the lucene directory.
>> [junit4:junit4]   2> NOTE: reproduce with: ant test
>> -Dtestcase=TestIndexWriterDelete
>> -Dtests.method=t

Re: TestIndexWriterDelete fails with OOM

2012-08-02 Thread Robert Muir

this test method uses only one field, but disables simpletext and
memory already. i think directPF was just an omission. I'll add it
too.

On Thu, Aug 2, 2012 at 11:07 AM, Simon Willnauer
 wrote:
> I see this on http://85.25.120.39/job/Lucene-trunk-Linux-Java6-64/162/console
> should we disable the mem intensive codecs for this test?
>
> [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterDelete
> [junit4:junit4] ERROR   35.5s J1 |
> TestIndexWriterDelete.testIndexingThenDeleting
> [junit4:junit4]> Throwable #1: java.lang.OutOfMemoryError: Java heap space
> [junit4:junit4]>at
> __randomizedtesting.SeedInfo.seed([F645C683FAA013CE:8FD749EA602CCB95]:0)
> [junit4:junit4]>at
> org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectField.(DirectPostingsFormat.java:385)
> [junit4:junit4]>at
> org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectFields.(DirectPostingsFormat.java:130)
> [junit4:junit4]>at
> org.apache.lucene.codecs.memory.DirectPostingsFormat.fieldsProducer(DirectPostingsFormat.java:112)
> [junit4:junit4]>at
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:186)
> [junit4:junit4]>at
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:250)
> [junit4:junit4]>at
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:107)
> [junit4:junit4]>at
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:55)
> [junit4:junit4]>at
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
> [junit4:junit4]>at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:638)
> [junit4:junit4]>at
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> [junit4:junit4]>at
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:62)
> [junit4:junit4]>at
> org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:583)
> [junit4:junit4]>at
> org.apache.lucene.index.TestIndexWriterDelete.testIndexingThenDeleting(TestIndexWriterDelete.java:935)
> [junit4:junit4]>at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit4:junit4]>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit4:junit4]>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit4:junit4]>at java.lang.reflect.Method.invoke(Method.java:597)
> [junit4:junit4]>at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
> [junit4:junit4]>at
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
> [junit4:junit4]>at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
> [junit4:junit4]>at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
> [junit4:junit4]>at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
> [junit4:junit4]>at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> [junit4:junit4]>at
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
> [junit4:junit4]>at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> [junit4:junit4]>at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> [junit4:junit4]>at
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
> [junit4:junit4]>at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> [junit4:junit4]>at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
> [junit4:junit4]>at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> [junit4:junit4]>at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
> [junit4:junit4]>at
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
> [junit4:junit4]>
> [junit4:junit4]   2> NOTE: download the large Jenkins line-docs file
> by running 'ant get-jenkins-line-docs' in the lucene directory.
> [junit4:junit4]   2> NOTE: reproduce with: ant test
> -Dtestcase=TestIndexWriterDelete
> -Dtests.method=testIndexingThenDeleting -Dtests.seed=F645C683FAA013CE
> -Dtests.multiplier=3 -Dtests.nightly=true -Dtests.slow=true
> -Dtests.linedocsfile=/var/lib/jenkins/lucene-data/enwiki.random.lines.txt
> -Dtests.locale=v

[jira] [Updated] (SOLR-3428) SolrCmdDistributor flushAdds/flushDeletes problems

2012-08-02 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3428:
--

Fix Version/s: 5.0
   4.0

> SolrCmdDistributor flushAdds/flushDeletes problems
> --
>
> Key: SOLR-3428
> URL: https://issues.apache.org/jira/browse/SOLR-3428
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud, update
>Affects Versions: 4.0-ALPHA
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: add, delete, replica, solrcloud, update
> Fix For: 4.0, 5.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A few problems with SolrCmdDistributor.flushAdds/flushDeletes
> * If number of AddRequests/DeleteRequests in alist/dlist is below limit for a 
> specific node the method returns immediately and doesnt flush for subsequent 
> nodes
> * When returning immediately because there is below limit requests for a 
> given node, then previous nodes that have already been flushed/submitted are 
> not removed from adds/deletes maps (causing them to be flushed/submitted 
> again the next time flushAdds/flushDeletes is executed)
> * The idea about just combining params does not work for SEEN_LEADER params 
> (and probably others as well). Since SEEN_LEADER cannot be expressed (unlike 
> commitWithin and overwrite) for individual operations in the request, you 
> need to sent two separate submits. One containing requests with 
> SEEN_LEADER=true and one with SEEN_LEADER=false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 14992 - Failure

2012-08-02 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14992/

All tests passed

Build Log:
[...truncated 10303 lines...]
javadocs-lint:

[...truncated 1667 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/build.xml:47:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build.xml:525:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build.xml:515:
 exec returned: 7

Total time: 2 minutes 18 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Publishing Clover coverage report...
No Clover report will be published due to a Build Failure
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3428) SolrCmdDistributor flushAdds/flushDeletes problems

2012-08-02 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427463#comment-13427463
 ] 

Mark Miller commented on SOLR-3428:
---

I've committed the simple fix for the flush issue and added a test.

> SolrCmdDistributor flushAdds/flushDeletes problems
> --
>
> Key: SOLR-3428
> URL: https://issues.apache.org/jira/browse/SOLR-3428
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud, update
>Affects Versions: 4.0-ALPHA
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: add, delete, replica, solrcloud, update
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A few problems with SolrCmdDistributor.flushAdds/flushDeletes
> * If number of AddRequests/DeleteRequests in alist/dlist is below limit for a 
> specific node the method returns immediately and doesnt flush for subsequent 
> nodes
> * When returning immediately because there is below limit requests for a 
> given node, then previous nodes that have already been flushed/submitted are 
> not removed from adds/deletes maps (causing them to be flushed/submitted 
> again the next time flushAdds/flushDeletes is executed)
> * The idea about just combining params does not work for SEEN_LEADER params 
> (and probably others as well). Since SEEN_LEADER cannot be expressed (unlike 
> commitWithin and overwrite) for individual operations in the request, you 
> need to sent two separate submits. One containing requests with 
> SEEN_LEADER=true and one with SEEN_LEADER=false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4285) Improve FST API usability for mere mortals

2012-08-02 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427435#comment-13427435
 ] 

David Smiley commented on LUCENE-4285:
--

Keep in mind, from an FST outsider like me, *FSTs are basically a fancy 
SortedMap*.  Yet Lucene's FST API is so complicated that there is a dedicated 
package of classes, and I need to understand a fair amount of it.  I'm not 
saying the package should go away or just one class is realistic, just that 
conceptually for outsiders it can and should be simpler than it is.

The Util.get* methods should have instance methods on the FST.  I shouldn't 
need to look at Util, I think.

The BytesReader concept is confusing and should be hidden.

Outputs... this aspect of the API is over-exposed; maybe it can be hidden more? 
 I know I need to choose an implementation at construction.

FSTEnum is pretty cool, and improving it or creating variants of it could help 
to simply using the overall API.  The FST should have a getter for it.  It 
would be nice if FSTEnum could advance to the next arc by a label (I need 
this).  It would be something like next(int).  Can it be improved to the point 
where, for example, SynonymFilter can use it?  It would be nice to reduce the 
use-cases where users/client-code don't have to even see an Arc.

> Improve FST API usability for mere mortals
> --
>
> Key: LUCENE-4285
> URL: https://issues.apache.org/jira/browse/LUCENE-4285
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: David Smiley
>
> FST technology is something that has brought amazing advances to Lucene, yet 
> the API is hard to use for the vast majority of users like me.  I know that 
> performance of FSTs is really important, but surely a lot can be done without 
> sacrificing that.
> (comments will hold specific ideas and problems)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Outstanding issues for 3.0.3

2012-08-02 Thread Michael Herndon

if you do rename stuff other than poorly named signature parameters, its
helpful to document the java version's name and the reason for the rename.
Even for java, some of the internal naming makes the code that much harder
to understand and follow.

On Thu, Aug 2, 2012 at 12:33 PM, Prescott Nasser wrote:

> Excellent Idea - I'll do that monday to give you guys the weekend to do
> any last minute code cleaning you want.
>
>
> 
> > Date: Thu, 2 Aug 2012 19:30:02 +0300
> > Subject: Re: Outstanding issues for 3.0.3
> > From: ita...@code972.com
> > To: lucene-net-...@lucene.apache.org
> >
> > Prescott - we could make an RC and push it to Nuget as a PreRelease, to
> get
> > real feedback.
> >
> > On Thu, Aug 2, 2012 at 7:13 PM, Prescott Nasser  >wrote:
> >
> > > I don't think we ever fully adopted the style guidelines, probably not
> a
> > > terrible discussion to have. As for this release, I think that by lazy
> > > consensus we should branch the trunk at the end of this weekend (say
> > > monday), and begin the process of cutting a release. - my $.02 below
> > >
> > >
> > > > 1) Usage of "this" prefix when not required.
> > > >
> > > > this.blah = blah; <- required this.
> > > > this.aBlah = blah; <- optional this, which Re# doesn't like.
> > > >
> > > > I'm assuming consistency wins here, and 'this.' stays, but wanted to
> > > double check.
> > >
> > > I'd error with consistency
> > >
> > >
> > > >
> > > > 2) Using different conventions for fields and parameters\local vars.
> > > >
> > > > blah vs. _blah
> > > >
> > >
> > > > Combined with 1, Re# wants (and I'm personally accustomed to):
> > > >
> > > > _blah = blah;
> > > >
> > >
> > >
> > > For private variables _ is ok, for anything else, don't use _ as it's
> not
> > > CLR compliant
> > >
> > >
> > > > However, that seems to violate the adopted style.
> > > >
> > > > 3) Full qualification of type names.
> > > >
> > > > Re # wants to remove redundant namespace qualifiers. Leave them or
> > > remove them?
> > > >
> > >
> > > I try to remove them
> > >
> > > > 4) Removing unreferenced classes.
> > > >
> > > > Should I remove non-public unreferenced classes? The ones I've come
> > > across so far are private.
> > > >
> > >
> > > I'm not sure I understand - are you saying we have classes that are
> never
> > > used in random places? If so, I think before removing them we should
> have a
> > > conversation; what are they, why are they there, etc. - I'm hoping
> there
> > > aren't too many of these..
> > >
> > > > 5) var vs. explicit
> > > >
> > > > I know this has been brought up before, but not sure of the final
> > > disposition. FWIW, I prefer var.
> > > >
> > >
> > > I use var with it's plainly obvious the object var obj = new
> MyClass(). I
> > > usually use explicit when it's an object returned from some function
> that
> > > makes it unclear what the return value is:
> > >
> > >
> > > var items = search.GetResults();
> > >
> > > vs
> > >
> > > IList items = search.GetResults(); //prefer
> > >
> > >
> > > >
> > > > There are some non-Re# issues I came across as well that look like
> > > artifacts of code generation:
> > > >
> > > > 6) Weird param names.
> > > >
> > > > Param1 vs. directory
> > > >
> > > > I assume it's okay to replace 'Param1' with something a descriptive
> name
> > > like 'directory'.
> > > >
> > >
> > > Weird - I think a rename is OK for this release (Since we're ticking
> up a
> > > full version number), but I believe changing param names can
> potentially
> > > break code. That said, I don't really think we need to change the
> names and
> > > push the 3.0.3 release out, and if it does in fact cause breaking
> changes,
> > > I'd be a little careful about how we do it going forward to 3.6.
> > >
> > > > 7) Field names that follow local variable naming conventions.
> > > >
> > > > Lots of issues related to private vars with names like i, j, k, etc.
> It
> > > feels like the right thing to do is to change the scope so that they go
> > > back to being local vars instead of fields. However, this requires a
> much
> > > more significant refactoring, and I didn't want to assume it was okay
> to do
> > > that.
> > > >
> > >
> > > I'd avoid this for now - a lot of this is a carry over from the java
> > > version and to rename all those, it starts to get a bit confusing if we
> > > have to compare java to C# and these are all changed around.
> > >
> > >
> > >
> > > > If these questions have already been answered elsewhere and I missed
> the
> > > documentation/FAQ/developer guide, then I apologize and would
> appreciate
> > > the links. Alternatively, if someone has a Re# rule config that they
> are
> > > willing to post somewhere, I would be glad to use it.
> > > >
> > >
> > > I think we talked about Re#'s rules at one point, I'll try to dig that
> > > conversation up and see where it landed. It's probably a good idea for
> us
> > > to build rules though.
> > >
> > > > - Zack
> > > >
> > > >
> > > > On J

[jira] [Updated] (SOLR-3699) SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig

2012-08-02 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3699:
---

Fix Version/s: 5.0
   4.0
  Summary: SolrIndexWriter constructor leaks Directory if Exception 
creating IndexWriterConfig  (was: fix CoreContainerCoreInitFailuresTest 
directory leak)

> SolrIndexWriter constructor leaks Directory if Exception creating 
> IndexWriterConfig
> ---
>
> Key: SOLR-3699
> URL: https://issues.apache.org/jira/browse/SOLR-3699
> Project: Solr
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3699.patch
>
>
> in LUCENE-4278 i had to add a hack to force SimpleFSDir for 
> CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on 
> certain errors.
> This might indicate a problem that leaks happen if certain errors happen 
> (e.g. not handled in finally)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3699) fix CoreContainerCoreInitFailuresTest directory leak

2012-08-02 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3699:
---

Attachment: SOLR-3699.patch

Tracked the problem down to SolrIndexWriter ... attached patch demonstrates it 
in the simplest usecase: a SolrCore that constructs a SolrIndexWriter where the 
Directory is created fine, but then the IndexWriterConfig has a problem.

Unfortunately there's no clear and easy route to a fix because of how this is 
all done inline in a call to {{super(...)}} ... as noted in the test comments...

{code}
  public void testBogusMergePolicy() throws Exception {
// Directory is leaked because SolrIndexWriter constructor has inline 
// calls to both DirectoryFactory (which succeeds) and 
// Config.toIndexWriterConfig (which fails) -- but there is nothing to 
// decref the DerectoryFactory when Config throws an Exception
// 
// Not good to require the caller of "new SolrIndexWriter(...)" to decref 
// the DirectoryFactory on exception, because they would have to be sure 
// the exception didn't already come from the DirectoryFactory in the first 
place.
// I think we need to re-work the inline calls in SolrIndexWriter construct
{code}

(Ironically: this "bad-mp-config.xml" i was using in 
CoreContainerCoreInitFailuresTest has existed for a while, but wasn't already 
being used in the "TestBadConfig" class that tries to create SolrCores with bad 
cofigs -- if it had we would have caught this a long time ago.  It was only 
being used in SolrIndexConfigTest where it was micro testing the 
SolrIndexConfig and the DirectoryFactory wasn't used)


> fix CoreContainerCoreInitFailuresTest directory leak
> 
>
> Key: SOLR-3699
> URL: https://issues.apache.org/jira/browse/SOLR-3699
> Project: Solr
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3699.patch
>
>
> in LUCENE-4278 i had to add a hack to force SimpleFSDir for 
> CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on 
> certain errors.
> This might indicate a problem that leaks happen if certain errors happen 
> (e.g. not handled in finally)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4285) Improve FST API usability for mere mortals

2012-08-02 Thread David Smiley (JIRA)

David Smiley created LUCENE-4285:


 Summary: Improve FST API usability for mere mortals
 Key: LUCENE-4285
 URL: https://issues.apache.org/jira/browse/LUCENE-4285
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: David Smiley


FST technology is something that has brought amazing advances to Lucene, yet 
the API is hard to use for the vast majority of users like me.  I know that 
performance of FSTs is really important, but surely a lot can be done without 
sacrificing that.

(comments will hold specific ideas and problems)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat

2012-08-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427413#comment-13427413
 ] 

Michael McCandless commented on LUCENE-4225:


Thanks Billy, I'll dig...

> New FixedPostingsFormat for less overhead than SepPostingsFormat
> 
>
> Key: LUCENE-4225
> URL: https://issues.apache.org/jira/browse/LUCENE-4225
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-4225-on-rev-1362013.patch, LUCENE-4225.patch, 
> LUCENE-4225.patch, LUCENE-4225.patch, LUCENE-4225.patch
>
>
> I've worked out the start at a new postings format that should have
> less overhead for fixed-int[] encoders (For,PFor)... using ideas from
> the old bulk branch, and new ideas from Robert.
> It's only a start: there's no payloads support yet, and I haven't run
> Lucene's tests with it, except for one new test I added that tries to
> be a thorough PostingsFormat tester (to make it easier to create new
> postings formats).  It does pass luceneutil's performance test, so
> it's at least able to run those queries correctly...
> Like Lucene40, it uses two files (though once we add payloads it may
> be 3).  The .doc file interleaves doc delta and freq blocks, and .pos
> has position delta blocks.  Unlike sep, blocks are NOT shared across
> terms; instead, it uses block encoding if there are enough ints to
> encode, else the same Lucene40 vInt format.  This means low-freq terms
> (< 128 = current default block size) are always vInts, and high-freq
> terms will have some number of blocks, with a vInt final block.
> Skip points are only recorded at block starts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Outstanding issues for 3.0.3

2012-08-02 Thread Prescott Nasser

I don't think we ever fully adopted the style guidelines, probably not a 
terrible discussion to have. As for this release, I think that by lazy 
consensus we should branch the trunk at the end of this weekend (say monday), 
and begin the process of cutting a release. - my $.02 below

> 1) Usage of "this" prefix when not required.
>
> this.blah = blah; <- required this.
> this.aBlah = blah; <- optional this, which Re# doesn't like.
>
> I'm assuming consistency wins here, and 'this.' stays, but wanted to double 
> check.

I'd error with consistency

>
> 2) Using different conventions for fields and parameters\local vars.
>
> blah vs. _blah
>

> Combined with 1, Re# wants (and I'm personally accustomed to):
>
> _blah = blah;
>

For private variables _ is ok, for anything else, don't use _ as it's not CLR 
compliant

> However, that seems to violate the adopted style.
>
> 3) Full qualification of type names.
>
> Re # wants to remove redundant namespace qualifiers. Leave them or remove 
> them?
>

I try to remove them

> 4) Removing unreferenced classes.
>
> Should I remove non-public unreferenced classes? The ones I've come across so 
> far are private.
>

I'm not sure I understand - are you saying we have classes that are never used 
in random places? If so, I think before removing them we should have a 
conversation; what are they, why are they there, etc. - I'm hoping there aren't 
too many of these..

> 5) var vs. explicit
>
> I know this has been brought up before, but not sure of the final 
> disposition. FWIW, I prefer var.
>

I use var with it's plainly obvious the object var obj = new MyClass(). I 
usually use explicit when it's an object returned from some function that makes 
it unclear what the return value is:

var items = search.GetResults();

vs

IList items = search.GetResults(); //prefer

>
> There are some non-Re# issues I came across as well that look like artifacts 
> of code generation:
>
> 6) Weird param names.
>
> Param1 vs. directory
>
> I assume it's okay to replace 'Param1' with something a descriptive name like 
> 'directory'.
>

Weird - I think a rename is OK for this release (Since we're ticking up a full 
version number), but I believe changing param names can potentially break code. 
That said, I don't really think we need to change the names and push the 3.0.3 
release out, and if it does in fact cause breaking changes, I'd be a little 
careful about how we do it going forward to 3.6.

> 7) Field names that follow local variable naming conventions.
>
> Lots of issues related to private vars with names like i, j, k, etc. It feels 
> like the right thing to do is to change the scope so that they go back to 
> being local vars instead of fields. However, this requires a much more 
> significant refactoring, and I didn't want to assume it was okay to do that.
>

I'd avoid this for now - a lot of this is a carry over from the java version 
and to rename all those, it starts to get a bit confusing if we have to compare 
java to C# and these are all changed around.

> If these questions have already been answered elsewhere and I missed the 
> documentation/FAQ/developer guide, then I apologize and would appreciate the 
> links. Alternatively, if someone has a Re# rule config that they are willing 
> to post somewhere, I would be glad to use it.
>

I think we talked about Re#'s rules at one point, I'll try to dig that 
conversation up and see where it landed. It's probably a good idea for us to 
build rules though.

> - Zack
>
>
> On Jul 27, 2012, at 12:00 PM, Itamar Syn-Hershko wrote:
>
> > The cleanup consists mainly of going file by file with ReSharper and trying
> > to get them as green as possible. Making a lot of fields readonly, removing
> > unused vars and stuff like that. There are still loads of files left.
> >
> > I was also hoping to get to updating the spatial module with some recent
> > updates, and to also support polygon searches. But that may take a bit more
> > time, so it's really up to you guys (or we can open a vote for it).
>

[jira] [Updated] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-08-02 Thread Mark Harwood (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Harwood updated LUCENE-4069:
-

Fix Version/s: 5.0

Applied to trunk in revision 1368567

> Segment-level Bloom filters for a 2 x speed up on rare term searches
> 
>
> Key: LUCENE-4069
> URL: https://issues.apache.org/jira/browse/LUCENE-4069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.6, 4.0-ALPHA
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, 
> LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, 
> MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java
>
>
> An addition to each segment which stores a Bloom filter for selected fields 
> in order to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with 
> many segments but also speeds up general searching in my tests.
> Overview slideshow here: 
> http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" 
> on the end of the name to invoke special indexing/querying capability. 
> Clearly a new Field or schema declaration(!) would need adding to APIs to 
> configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3702) String concatenation function

2012-08-02 Thread Ted Strauss (JIRA)

Ted Strauss created SOLR-3702:
-

 Summary: String concatenation function
 Key: SOLR-3702
 URL: https://issues.apache.org/jira/browse/SOLR-3702
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Affects Versions: 4.0-ALPHA
Reporter: Ted Strauss


Related to https://issues.apache.org/jira/browse/SOLR-2526

Add query function to support concatenation of Strings.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.6.0_33) - Build # 58 - Failure!

2012-08-02 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/58/
Java: 64bit/jdk1.6.0_33 -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 19185 lines...]
javadocs-lint:

[...truncated 1728 lines...]
javadocs-lint:
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec]   build/docs/core/org/apache/lucene/store/package-use.html
 [exec] WARNING: anchor 
"../../../../org/apache/lucene/store/subclasses" appears more than once
 [exec] 
 [exec] Verify...
 [exec] 
 [exec] build/docs\core/overview-summary.html
 [exec]   missing: org.apache.lucene.util.hash
 [exec] 
 [exec] build/docs\test-framework/overview-summary.html
 [exec]   missing: org.apache.lucene.codecs.bloom
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\build.xml:47: The following error 
occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:246: The 
following error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\common-build.xml:1704: exec 
returned: 1

Total time: 51 minutes 24 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Outstanding issues for 3.0.3

2012-08-02 Thread Prescott Nasser

Actually that's a good point, I don't think mercurial is an option for apache 
software projects - but I know git was rolled out over the last year as an 
option


> Subject: Re: Outstanding issues for 3.0.3
> From: zgram...@gmail.com
> Date: Thu, 2 Aug 2012 10:42:14 -0400
> To: lucene-net-...@lucene.apache.org
>
> On Aug 2, 2012, at 3:04 AM, Itamar Syn-Hershko wrote:
>
> > Nowadays git works just great for Windows, and it's much easier to work
> > with than Hg
>
> In the interest of full disclosure, I have done a lot of work on hosting 
> Mercurial in C# apps and have committed to both Mercurial and IronPython, so 
> one might guess, I view hg > git. I didn't realize the Apache Foundation 
> already had it's own git server + github mirror, though. If the choice is 
> between git and svn, git wins my vote every time. 
>

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427360#comment-13427360
 ] 

Robert Muir commented on LUCENE-4282:
-

I will think about this one more: the patch is correct for 'edits' but the 
scoring
becomes crazy. this is because of the historical behavior of this query.

Just try porting Uwe's test to 3.6 and you will see what I mean :)

I think its too tricky for the query in core (and used by spellchecker) to also
be the base for the SlowFuzzyQuery which is supposed to mimic the old crazy 
behavior.



> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: LUCENE-4282-tests.patch, LUCENE-4282.patch, 
> ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (LUCENE-4284) RFE: stopword filter without lowercase side-effect

2012-08-02 Thread Sam Halliday (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Halliday closed LUCENE-4284.


Resolution: Invalid

> RFE: stopword filter without lowercase side-effect
> --
>
> Key: LUCENE-4284
> URL: https://issues.apache.org/jira/browse/LUCENE-4284
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Sam Halliday
>Priority: Minor
>
> It would appear that accept()-time lowercasing of Tokens is not favourable 
> anymore, due to the @Deprecation of the only constructor in StopFilter that 
> allows this.
> Please support some way to allow stop-word removal without lowercasing the 
> output:
>   http://stackoverflow.com/questions/1185

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect

2012-08-02 Thread Sam Halliday (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427353#comment-13427353
 ] 

Sam Halliday commented on LUCENE-4284:
--

OK, thanks. Actually all I needed was to remove stop words from a String, so 
the following did the trick

{noformat}
Set stops = StopFilter.makeStopSet(Version.LUCENE_36, 
Lists.newArrayList(StopAnalyzer.ENGLISH_STOP_WORDS_SET), true);
Tokenizer tokeniser = new ClassicTokenizer(Version.LUCENE_36, new 
StringReader(text));
StopFilter stopFilter = new StopFilter(Version.LUCENE_36, 
tokeniser, stops);

List words = Lists.newArrayList();
try {
while (stopFilter.incrementToken()) {
String token = 
stopFilter.getAttribute(CharTermAttribute.class).toString();
words.add(token);
}
} catch (IOException ex) {
throw new GuruMeditationFailure();
}
{noformat}

The API is a bit of a labyrinth - it'll take me some time to understand many of 
the design decisions.

> RFE: stopword filter without lowercase side-effect
> --
>
> Key: LUCENE-4284
> URL: https://issues.apache.org/jira/browse/LUCENE-4284
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Sam Halliday
>Priority: Minor
>
> It would appear that accept()-time lowercasing of Tokens is not favourable 
> anymore, due to the @Deprecation of the only constructor in StopFilter that 
> allows this.
> Please support some way to allow stop-word removal without lowercasing the 
> output:
>   http://stackoverflow.com/questions/1185

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Outstanding issues for 3.0.3

2012-08-02 Thread Zachary Gramana

On Aug 2, 2012, at 3:04 AM, Itamar Syn-Hershko wrote:

> Nowadays git works just great for Windows, and it's much easier to work
> with than Hg

In the interest of full disclosure, I have done a lot of work on hosting 
Mercurial in C# apps and have committed to both Mercurial and IronPython, so 
one might guess, I view hg > git. I didn't realize the Apache Foundation 
already had it's own git server + github mirror, though. If the choice is 
between git and svn, git wins my vote every time.

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_33) - Build # 107 - Still Failing!

2012-08-02 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/107/
Java: 32bit/jdk1.6.0_33 -client -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 10288 lines...]
javadocs-lint:

[...truncated 1696 lines...]
javadocs-lint:
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec]   build/docs/core/org/apache/lucene/store/package-use.html
 [exec] WARNING: anchor 
"../../../../org/apache/lucene/store/subclasses" appears more than once
 [exec] 
 [exec] Verify...
 [exec] 
 [exec] build/docs/core/overview-summary.html
 [exec]   missing: org.apache.lucene.util.hash
 [exec] 
 [exec] build/docs/test-framework/overview-summary.html
 [exec]   missing: org.apache.lucene.codecs.bloom
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/build.xml:47: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/build.xml:246: 
The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/common-build.xml:1704:
 exec returned: 1

Total time: 2 minutes 23 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427350#comment-13427350
 ] 

Robert Muir commented on LUCENE-4284:
-

Really all these analyzers are just simple examples and not intended to solve 
all use cases. 
You can just make your own that doesnt lowercase at all with hardly any code, 
and 
if you want to control case sensitivity of the stopword set, again do this on 
your stopset itself
(pass the boolean to StopFilter.makeStopSet etc).

{noformat}
Analyzer a = new ReusableAnalyzerBase() {
  protected TokenStreamComponents createComponents(String fieldName, Reader 
reader) {
Tokenizer source = new LetterTokenizer(matchVersion, reader);
return new TokenStreamComponents(source, new StopFilter(matchVersion, 
source, stopwords));
  }
};
{noformat}

Otherwise we have to implement options to all Analyzers for everyones possible 
usecases,
which is too many (we will never make everyone happy).


> RFE: stopword filter without lowercase side-effect
> --
>
> Key: LUCENE-4284
> URL: https://issues.apache.org/jira/browse/LUCENE-4284
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Sam Halliday
>Priority: Minor
>
> It would appear that accept()-time lowercasing of Tokens is not favourable 
> anymore, due to the @Deprecation of the only constructor in StopFilter that 
> allows this.
> Please support some way to allow stop-word removal without lowercasing the 
> output:
>   http://stackoverflow.com/questions/1185

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/ibm-j9-jdk7) - Build # 67 - Failure!

2012-08-02 Thread Michael McCandless

On Wed, Aug 1, 2012 at 9:17 AM, Robert Muir  wrote:
> On Wed, Aug 1, 2012 at 4:35 AM, Uwe Schindler  wrote:
>> Hi Robert,
>>
>> I checked Jenkin's settings and the "ulimit -n" is 8192 by default for 
>> jenkins. To prevent this problem I raised this to 32768.
>> The thing with IBM J9 is that is has several caches for class files (so 
>> compiled class files can be cached in shared memory for parallel JVMs using 
>> the same classes), but I assume this needs more file handles.
>>
>
> Right I bumped this on charlie cron too and forgot about it, until i
> installed this IBM jre on this machine.
>
> IBM J9 using a few extra files compared to sun doesnt seem to explain
> to me why SimpleFS/NIOFS use more filehandles than mmap though? And
> that this problem never happens with other JREs
>
> I feel like something might be wrong here.

Rob and I dug a bit on this ... the hard open-file limit was 4096 and
the soft limit was 1024, and curiously, it looks like Oracle JVMs
"allow" themselves to go up to the hard limit (likely change the soft
limit on startup), while the IBM JVM uses the soft limit.  Has anyone
heard of JVMs doing this (increasing the soft limit to the hard limit
for open files) before...?  I see this curious "-XX:+MaxFDLimit"
option here:

http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

But it says it's Solaris only ...

If I use ulimit to set the hard and soft limit to 1024 then both
Oracle and IBM JVMs fail TestShardSearching with NIOFSDir due to too
many open files.

But, for some reason if you run the test with MMapDir, far fewer file
descriptors are consumed ...

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-4.x - Build # 386 - Failure

2012-08-02 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/386/

All tests passed

Build Log:
[...truncated 10352 lines...]
javadocs-lint:

[...truncated 6 lines...]
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene...
  [javadoc] Loading source files for package org.apache.lucene.analysis...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.tokenattributes...
  [javadoc] Loading source files for package org.apache.lucene.codecs...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.appending...
  [javadoc] Loading source files for package org.apache.lucene.codecs.bloom...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.intblock...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene3x...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene40...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene40.values...
  [javadoc] Loading source files for package org.apache.lucene.codecs.memory...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.perfield...
  [javadoc] Loading source files for package org.apache.lucene.codecs.pulsing...
  [javadoc] Loading source files for package org.apache.lucene.codecs.sep...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.simpletext...
  [javadoc] Loading source files for package org.apache.lucene.document...
  [javadoc] Loading source files for package org.apache.lucene.index...
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package 
org.apache.lucene.search.payloads...
  [javadoc] Loading source files for package 
org.apache.lucene.search.similarities...
  [javadoc] Loading source files for package org.apache.lucene.search.spans...
  [javadoc] Loading source files for package org.apache.lucene.store...
  [javadoc] Loading source files for package org.apache.lucene.util...
  [javadoc] Loading source files for package org.apache.lucene.util.automaton...
  [javadoc] Loading source files for package org.apache.lucene.util.fst...
  [javadoc] Loading source files for package org.apache.lucene.util.hash...
  [javadoc] Loading source files for package org.apache.lucene.util.mutable...
  [javadoc] Loading source files for package org.apache.lucene.util.packed...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0_32
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/src/java/org/apache/lucene/codecs/bloom/BloomFilterFactory.java:61:
 warning - @return tag has no arguments.
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:101:
 warning - @return tag has no arguments.
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:219:
 warning - @param argument "bytes" is not a parameter name.
  [javadoc] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:236:
 warning - @param argument "targetSaturation" is not a parameter name.
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/build/docs/core/stylesheet.css...
  [javadoc] 4 warnings

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/build.xml:47:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/common-build.xml:621:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/build.xml:49:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/common-build.xml:1480:
 Javadocs warnings were found!

Total time: 11 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Publishing Clover coverage report...
No Clover report will be published due to a Build Failure
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect

2012-08-02 Thread Sam Halliday (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427329#comment-13427329
 ] 

Sam Halliday commented on LUCENE-4284:
--

ok, but wouldn't it then be a good idea to have a StopAnalyzer that didn't 
enforce lowercase? It seems bizarre that the StopAnalyzer would be tied to the 
character and case filters.

> RFE: stopword filter without lowercase side-effect
> --
>
> Key: LUCENE-4284
> URL: https://issues.apache.org/jira/browse/LUCENE-4284
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Sam Halliday
>Priority: Minor
>
> It would appear that accept()-time lowercasing of Tokens is not favourable 
> anymore, due to the @Deprecation of the only constructor in StopFilter that 
> allows this.
> Please support some way to allow stop-word removal without lowercasing the 
> output:
>   http://stackoverflow.com/questions/1185

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427328#comment-13427328
 ] 

Robert Muir commented on LUCENE-4281:
-

+1 to the patch: the forbidden check is a 2nd priority. it can be a separate 
.txt file with its own ant fileset.

> Delegate to default thread factory in NamedThreadFactory
> 
>
> Key: LUCENE-4281
> URL: https://issues.apache.org/jira/browse/LUCENE-4281
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 3.6.1, 4.0, 5.0
>Reporter: Simon Willnauer
>Priority: Minor
> Fix For: 4.0, 5.0, 3.6.2
>
> Attachments: LUCENE-4281.patch
>
>
> currently we state that we yield the same behavior as 
> Executors#defaultThreadFactory() but this behavior could change over time 
> even if it is compatible. We should just delegate to the default thread 
> factory instead of creating the threads ourself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4282:


Attachment: LUCENE-4282.patch

here's a patch, with Uwe's test.

The float comparison is wasted cpu for FuzzyQuery, as you already know its 
accepted by the automaton.

But the deprecated SlowFuzzyQuery in sandbox needs this, because it has crazier 
logic. So it overrides the logic and does the float comparison. We should 
really remove that one from trunk since its deprecated since 4.x, it will make 
it easier to clean this up to be much simpler.


> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: LUCENE-4282-tests.patch, LUCENE-4282.patch, 
> ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-08-02 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427322#comment-13427322
 ] 

Mark Harwood commented on LUCENE-4069:
--

Will do.

> Segment-level Bloom filters for a 2 x speed up on rare term searches
> 
>
> Key: LUCENE-4069
> URL: https://issues.apache.org/jira/browse/LUCENE-4069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.6, 4.0-ALPHA
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, 
> LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, 
> MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java
>
>
> An addition to each segment which stores a Bloom filter for selected fields 
> in order to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with 
> many segments but also speeds up general searching in my tests.
> Overview slideshow here: 
> http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" 
> on the end of the name to invoke special indexing/querying capability. 
> Clearly a new Field or schema declaration(!) would need adding to APIs to 
> configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-08-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427318#comment-13427318
 ] 

Adrien Grand commented on LUCENE-4069:
--

Mark, is there a reason why this patch hasn't been committed to trunk too?

> Segment-level Bloom filters for a 2 x speed up on rare term searches
> 
>
> Key: LUCENE-4069
> URL: https://issues.apache.org/jira/browse/LUCENE-4069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.6, 4.0-ALPHA
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, 
> LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, 
> MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java
>
>
> An addition to each segment which stores a Bloom filter for selected fields 
> in order to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with 
> many segments but also speeds up general searching in my tests.
> Overview slideshow here: 
> http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" 
> on the end of the name to invoke special indexing/querying capability. 
> Clearly a new Field or schema declaration(!) would need adding to APIs to 
> configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427317#comment-13427317
 ] 

Robert Muir commented on LUCENE-4069:
-

Hi Mark: I noticed this was committed only to the 4.x branch. 

can you also merge the change to trunk?


> Segment-level Bloom filters for a 2 x speed up on rare term searches
> 
>
> Key: LUCENE-4069
> URL: https://issues.apache.org/jira/browse/LUCENE-4069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.6, 4.0-ALPHA
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, 
> LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, 
> MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java
>
>
> An addition to each segment which stores a Bloom filter for selected fields 
> in order to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with 
> many segments but also speeds up general searching in my tests.
> Overview slideshow here: 
> http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" 
> on the end of the name to invoke special indexing/querying capability. 
> Clearly a new Field or schema declaration(!) would need adding to APIs to 
> configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_33) - Build # 106 - Failure!

2012-08-02 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/106/
Java: 32bit/jdk1.6.0_33 -server -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 10304 lines...]
javadocs-lint:

[...truncated 6 lines...]
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene...
  [javadoc] Loading source files for package org.apache.lucene.analysis...
  [javadoc] Loading source files for package 
org.apache.lucene.analysis.tokenattributes...
  [javadoc] Loading source files for package org.apache.lucene.codecs...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.appending...
  [javadoc] Loading source files for package org.apache.lucene.codecs.bloom...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.intblock...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene3x...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene40...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.lucene40.values...
  [javadoc] Loading source files for package org.apache.lucene.codecs.memory...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.perfield...
  [javadoc] Loading source files for package org.apache.lucene.codecs.pulsing...
  [javadoc] Loading source files for package org.apache.lucene.codecs.sep...
  [javadoc] Loading source files for package 
org.apache.lucene.codecs.simpletext...
  [javadoc] Loading source files for package org.apache.lucene.document...
  [javadoc] Loading source files for package org.apache.lucene.index...
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package 
org.apache.lucene.search.payloads...
  [javadoc] Loading source files for package 
org.apache.lucene.search.similarities...
  [javadoc] Loading source files for package org.apache.lucene.search.spans...
  [javadoc] Loading source files for package org.apache.lucene.store...
  [javadoc] Loading source files for package org.apache.lucene.util...
  [javadoc] Loading source files for package org.apache.lucene.util.automaton...
  [javadoc] Loading source files for package org.apache.lucene.util.fst...
  [javadoc] Loading source files for package org.apache.lucene.util.hash...
  [javadoc] Loading source files for package org.apache.lucene.util.mutable...
  [javadoc] Loading source files for package org.apache.lucene.util.packed...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0_33
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/src/java/org/apache/lucene/codecs/bloom/BloomFilterFactory.java:61:
 warning - @return tag has no arguments.
  [javadoc] 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:101:
 warning - @return tag has no arguments.
  [javadoc] 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:219:
 warning - @param argument "bytes" is not a parameter name.
  [javadoc] 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:236:
 warning - @param argument "targetSaturation" is not a parameter name.
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/build/docs/core/stylesheet.css...
  [javadoc] 4 warnings

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/build.xml:47: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/common-build.xml:621:
 The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/build.xml:49:
 The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/common-build.xml:1480:
 Javadocs warnings were found!

Total time: 7 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: ToParentBlockJoinQuery vs filtered search

2012-08-02 Thread Martijn v Groningen

Hi Mikhail,

I'd love to have a look at the patch. I wasn't involved at all with
reviewing the work, just pointed you to the relevant issue, and
currently quite busy, so on my end, it will take time until I manage
to get to reviewing it.
Maybe someone else involved with the patch can take this over to speed things up

Martijn

On 2 August 2012 08:51, Mikhail Khludnev  wrote:
> Martin,
> Half year ago you asked me attach my work to SOLR-3076. From my point of
> view the latest patch is considerable for commit. I want to add "override"
> support for block indexing, but I'm not really sure that it's needed for
> anyone.
>
> Could you please provide feedback for the latest patch, and/or move it forth
> or back?
>
>  Regards
>
>> by Martijn v Groningen-2 on Feb 06, 2012; 7:57pm
>> URL:
>> http://lucene.472066.n3.nabble.com/ToParentBlockJoinQuery-vs-filtered-search-tp3717911p3719987.html
>
>> Hi Mikhail,
>
>> There is already an issue open for supporting block join in Solr:
> https://issues.apache.org/jira/browse/SOLR-3076
>
>> Maybe you can attach your work in that issue and we can iterate from
>> there.
>
>> Martijn
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-08-02 Thread Mark Harwood (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Harwood resolved LUCENE-4069.
--

Resolution: Fixed
  Assignee: Mark Harwood

Committed to 4.0 branch, revision 1368442

> Segment-level Bloom filters for a 2 x speed up on rare term searches
> 
>
> Key: LUCENE-4069
> URL: https://issues.apache.org/jira/browse/LUCENE-4069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.6, 4.0-ALPHA
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, 
> LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, 
> MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java
>
>
> An addition to each segment which stores a Bloom filter for selected fields 
> in order to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with 
> many segments but also speeds up general searching in my tests.
> Overview slideshow here: 
> http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" 
> on the end of the name to invoke special indexing/querying capability. 
> Clearly a new Field or schema declaration(!) would need adding to APIs to 
> configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat

2012-08-02 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427304#comment-13427304
 ] 

Han Jiang commented on LUCENE-4225:
---

Just hit an error on BlockPostingsFormat, this should reproduce in latest branch

{noformat}
ant test-core -Dtestcase=TestGraphTokenizers 
-Dtests.method=testDoubleMockGraphTokenFilterRandom 
-Dtests.seed=1FD78436D5E26B9A -Dtests.postingsformat=Block
{noformat}

> New FixedPostingsFormat for less overhead than SepPostingsFormat
> 
>
> Key: LUCENE-4225
> URL: https://issues.apache.org/jira/browse/LUCENE-4225
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-4225-on-rev-1362013.patch, LUCENE-4225.patch, 
> LUCENE-4225.patch, LUCENE-4225.patch, LUCENE-4225.patch
>
>
> I've worked out the start at a new postings format that should have
> less overhead for fixed-int[] encoders (For,PFor)... using ideas from
> the old bulk branch, and new ideas from Robert.
> It's only a start: there's no payloads support yet, and I haven't run
> Lucene's tests with it, except for one new test I added that tries to
> be a thorough PostingsFormat tester (to make it easier to create new
> postings formats).  It does pass luceneutil's performance test, so
> it's at least able to run those queries correctly...
> Like Lucene40, it uses two files (though once we add payloads it may
> be 3).  The .doc file interleaves doc delta and freq blocks, and .pos
> has position delta blocks.  Unlike sep, blocks are NOT shared across
> terms; instead, it uses block encoding if there are enough ints to
> encode, else the same Lucene40 vInt format.  This means low-freq terms
> (< 128 = current default block size) are always vInts, and high-freq
> terms will have some number of blocks, with a vInt final block.
> Skip points are only recorded at block starts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427301#comment-13427301
 ] 

Robert Muir commented on LUCENE-4284:
-

The constructor is deprecated because you should set the ignoreCase
property on the CharArraySet (the stopwords list itself) that you pass in.

This is described in the javadocs, basically stopfilter does not have any case 
sensitivity options.
this is instead controlled in the set (see makeStopSet etc, you can construct a 
case-sensitive ones)

{noformat}
   * If stopWords is an instance of {@link CharArraySet} (true if
   * makeStopSet() was used to construct the set) it will be 
directly used
   * and ignoreCase will be ignored since CharArraySet
   * directly controls case sensitivity.
   * 
   * If stopWords is not an instance of {@link CharArraySet},
   * a new CharArraySet will be constructed and ignoreCase will be
   * used to specify the case sensitivity of that set.
   * @deprecated Use {@link #StopFilter(Version, TokenStream, Set)} instead
{noformat}


> RFE: stopword filter without lowercase side-effect
> --
>
> Key: LUCENE-4284
> URL: https://issues.apache.org/jira/browse/LUCENE-4284
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Sam Halliday
>Priority: Minor
>
> It would appear that accept()-time lowercasing of Tokens is not favourable 
> anymore, due to the @Deprecation of the only constructor in StopFilter that 
> allows this.
> Please support some way to allow stop-word removal without lowercasing the 
> output:
>   http://stackoverflow.com/questions/1185

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4280) TestReaderClosed leaks threads

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427296#comment-13427296
 ] 

Robert Muir commented on LUCENE-4280:
-

Do we know if the problem happens from the method 'test' or from 
'testReaderChaining'?

here are my notes basically for 'test'. I think we could apply the same logic 
to 'testReaderChaining',
but I want Uwe's opinion:

{noformat}
@@ -65,6 +66,17 @@
   searcher.search(query, 5);
 } catch (AlreadyClosedException ace) {
   // expected
+} finally {
+  // we may have wrapped the reader1 in newSearcher, meaning we created 
reader2(reader1)
+  // but we only closed the inner reader1, not the reader2 which is the 
one with the
+  // close hook to shut down the executor service.
+  //
+  // a better general solution is probably to fix 
LuceneTestCase.newSearcher to add 
+  // the close hook to the underlying reader that was passed in (reader1), 
however
+  // if we do that, is this test still just as good? we will get an 
exception from
+  // IndexSearcher instead?
+  IOUtils.close(searcher.getIndexReader());
 }
{noformat}

I think we need Uwe to review :)


> TestReaderClosed leaks threads
> --
>
> Key: LUCENE-4280
> URL: https://issues.apache.org/jira/browse/LUCENE-4280
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Robert Muir
>Priority: Minor
>
> {code}
> -ea
> -Dtests.seed=9449688B90185FA5
> -Dtests.iters=100
> {code}
> reproduces 100% for me, multiple thread leak out from newSearcher's internal 
> threadfactory:
> {code}
> Aug 02, 2012 8:46:05 AM com.carrotsearch.randomizedtesting.ThreadLeakControl 
> checkThreadLeaks
> SEVERE: 6 threads leaked from SUITE scope at 
> org.apache.lucene.index.TestReaderClosed: 
>1) Thread[id=13, name=LuceneTestCase-1-thread-1, state=WAITING, 
> group=TGRP-TestReaderClosed]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>2) Thread[id=15, name=LuceneTestCase-3-thread-1, state=WAITING, 
> group=TGRP-TestReaderClosed]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>3) Thread[id=17, name=LuceneTestCase-5-thread-1, state=WAITING, 
> group=TGRP-TestReaderClosed]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>4) Thread[id=18, name=LuceneTestCase-6-thread-1, state=WAITING, 
> group=TGRP-TestReaderClosed]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.

[jira] [Created] (LUCENE-4284) RFE: stopword filter without lowercase side-effect

2012-08-02 Thread Sam Halliday (JIRA)

Sam Halliday created LUCENE-4284:


 Summary: RFE: stopword filter without lowercase side-effect
 Key: LUCENE-4284
 URL: https://issues.apache.org/jira/browse/LUCENE-4284
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Sam Halliday
Priority: Minor


It would appear that accept()-time lowercasing of Tokens is not favourable 
anymore, due to the @Deprecation of the only constructor in StopFilter that 
allows this.

Please support some way to allow stop-word removal without lowercasing the 
output:

  http://stackoverflow.com/questions/1185


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-3527) Optimize ignores maxSegments in distributed environment

2012-08-02 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-3527:
-

Assignee: Mark Miller

> Optimize ignores maxSegments in distributed environment
> ---
>
> Key: SOLR-3527
> URL: https://issues.apache.org/jira/browse/SOLR-3527
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 4.0-ALPHA
>Reporter: Andy Laird
>Assignee: Mark Miller
>
> Send the following command to a Solr server with many segments in a 
> multi-shard, multi-server environment:
> curl 
> "http://localhost:8080/solr/update?optimize=true&waitFlush=true&maxSegments=6&distrib=false";
> The local server will end up with the number of segments at 6, as requested, 
> but all other shards in the index will be optimized with maxSegments=1, which 
> takes far longer to complete.  All shards should be optimized to the 
> requested value of 6.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3985) Refactor support for thread leaks

2012-08-02 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427284#comment-13427284
 ] 

Mark Miller commented on LUCENE-3985:
-

Hopefully I can look at my piece of this today or tomorrow.

> Refactor support for thread leaks
> -
>
> Key: LUCENE-3985
> URL: https://issues.apache.org/jira/browse/LUCENE-3985
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, 
> LUCENE-3985.patch
>
>
> This will be duplicated in the runner and in LuceneTestCase; try to 
> consolidate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427276#comment-13427276
 ] 

Robert Muir commented on LUCENE-4282:
-

Johannes: we will have the same scoring when i say 'removing floats' only less 
code actually (we can remove this entire if i think).

the only floats will be what is put into the boost attribute: but no 
*comparisons* against floats. the latter is what causes the bug.

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: LUCENE-4282-tests.patch, ModifiedFuzzyTermsEnum.java, 
> ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427269#comment-13427269
 ] 

Johannes Christen commented on LUCENE-4282:
---

Hi Robert. Yes this might be right, but I am still using the similarity float 
based stuff, since 2 edits on a three letter word is much more difference to me 
than 2 edits on a 10 letter word.
If you apply the stuff I sent, it will work for both cases.


> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: LUCENE-4282-tests.patch, ModifiedFuzzyTermsEnum.java, 
> ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4282:
--

Attachment: LUCENE-4282-tests.patch

Robert Muir: I added my tests as patch. TestFuzzyQuery is currently not the 
best test we have: All terms there have equal length, this helps here. I added 
some more terms (longer ones, too), still the 2 shorter ones fail without a fix.

I am now away, I hope that helps.

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: LUCENE-4282-tests.patch, ModifiedFuzzyTermsEnum.java, 
> ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427264#comment-13427264
 ] 

Robert Muir commented on LUCENE-4282:
-

thanks for reporting and looking into this!

I think the bug is just the use of floats at all in this enum.
{noformat}
-if (similarity > minSimilarity) {
+if (ed <= maxEdits) {
   boostAtt.setBoost((similarity - minSimilarity) * scale_factor);
   //System.out.println("  yes");
   return AcceptStatus.YES;
 } else {
+  System.out.println("reject: " + term.utf8ToString());
   return AcceptStatus.NO;
 }
{noformat}

This seems to fix it for me. We should remove all float crap from this enum,
we dont need it, only a slower deprecated class in the sandbox needs it.



> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427262#comment-13427262
 ] 

Uwe Schindler commented on LUCENE-4282:
---

Thanks for help. We are starting to investigate what's wrong!

I did another test in parallel:
{code:java}
query.setRewriteMethod(FuzzyQuery.SCORING_BOOLEAN_QUERY_REWRITE);
{code}

With that one it is also failing, so the boost attribute itsself is not the 
problem. Because this rewrite method does not use it at all (no PriorityQueue).

Also the Automaton is correct, if you pass the terms to the automaton, they all 
pass:

{code:java}
LevenshteinAutomata builder = new LevenshteinAutomata("EBER", true);
Automaton a = builder.toAutomaton(2);
a = BasicOperations.concatenate(BasicAutomata.makeChar('W'), a);
System.out.println(BasicOperations.run(a, "WBR"));
System.out.println(BasicOperations.run(a, "WEB"));
System.out.println(BasicOperations.run(a, "WEBE"));
System.out.println(BasicOperations.run(a, "WEBER"));
{code}

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427254#comment-13427254
 ] 

Johannes Christen edited comment on LUCENE-4282 at 8/2/12 12:01 PM:


Well. I think I found the solution.
You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum 
class.
Calculating the similarity in the accept() method is based on the offset of the 
smallest length of request term and index term.

I attached my ModifiedFuzzyTermEnum class, where you can find the modification 
which makes it work.
BTW. There are some more modifications, fixing bugs in calculating the 
similarity out of the edit distance and vise versa.
The modification of the boost factor was only necessary for my boolean address 
search approach and possibly doesn't apply here.
The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags.




  was (Author: superjo):
Well. I think I found the solution.
You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum 
class.
Calculating the similarity in the accept() method is based on the offset of the 
smallest length of request term and index term.

I will attach my ModifiedFuzzyTermEnum class, where you can find the 
modification which makes it work.
BTW. There are some more modifications, fixing bugs in calculating the 
similarity out of the edit distance and vise versa.
The modification of the boost factor was only necessary for my boolean address 
search approach and possibly doesn't apply here.
The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags.



  
> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Christen updated LUCENE-4282:
--

Comment: was deleted

(was: Modification of FuzzyTermsEnum class fixing issue LUCENE-4282)

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Christen updated LUCENE-4282:
--

Attachment: ModifiedFuzzyTermsEnum.java

Modification of FuzzyTermsEnum class fixing issue LUCENE-4282

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427254#comment-13427254
 ] 

Johannes Christen commented on LUCENE-4282:
---

Well. I think I found the solution.
You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum 
class.
Calculating the similarity in the accept() method is based on the offset of the 
smallest length of request term and index term.

I will attach my ModifiedFuzzyTermEnum class, where you can find the 
modification which makes it work.
BTW. There are some more modifications, fixing bugs in calculating the 
similarity out of the edit distance and vise versa.
The modification of the boost factor was only necessary for my boolean address 
search approach and possibly doesn't apply here.
The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags.




> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Christen updated LUCENE-4282:
--

Attachment: ModifiedFuzzyTermsEnum.java

Modification of FuzzyTermEnum class fixing issue LUCENE-4282

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
> Attachments: ModifiedFuzzyTermsEnum.java
>
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3701) Solr Spellcheck for words with apostrophe

2012-08-02 Thread Shri Kanishka (JIRA)

Shri Kanishka created SOLR-3701:
---

 Summary: Solr Spellcheck for words with apostrophe
 Key: SOLR-3701
 URL: https://issues.apache.org/jira/browse/SOLR-3701
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.5
 Environment: All
Reporter: Shri Kanishka


Solr Spellcheck incorrect for words with Apostrophe.

http://10.224.64.10/solr5/select?q=pandora's star 
&spellcheck=true&spellcheck.collate=true&spellcheck.count=5

The result is

- 
- 
- 
  2 
  6 
  13 
- 
  pandora's 
  sandra 
  
  
  spell:pandora's's star 
  
  

textSpell configuration in schema is as below

   

   




 


  








   

But the same when given in &spellcheck.q paramter , it works,
http://10.224.64.10/solr5/select?q=spell:pandora's 
star&spellcheck=true&spellcheck.collate=true&spellcheck.q=pandora's star

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-02 Thread Han Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-4283:
--

Attachment: LUCENE-4283-buggy.patch

oh, forgot to revert TestPF

> Support more frequent skip with Block Postings Format
> -
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Han Jiang
>Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
> Every time the skipper reaches the last level 0 skip point, we'll have to 
> decode a whole block to read doc/freq data. Also,  a higher level skip list 
> will be created only for those df>blockSize^k, which means for most terms, 
> skipping will just be a linear scan. If we increase current blockSize for 
> better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set 
> skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-02 Thread Han Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-4283:
--

Attachment: LUCENE-4283-buggy.patch

An initial try to support partial decode & skipInterval == 32. Details about 
the skip format is mentioned in BlockSkipWriter. This patch works against 
pfor-3892 branch, with revision 1365112.

It passes TestPostingsFormat, but still fail to pass CheckIndex. Mike, these 
test seed should fail the patch.
{noformat}

ant test-core -Dtestcase=TestLongPostings 
-Dtests.method=testLongPostingsNoPositions -Dtests.seed=EC8F49E9088B926C 
-Dtests.postingsformat=Block

ant test-core  -Dtestcase=TestCustomSearcherSort 
-Dtests.method=testFieldSortSingleSearcher -Dtests.seed=EC8F49E9088B926C 
-Dtests.postingsformat=Block 

{noformat}

> Support more frequent skip with Block Postings Format
> -
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Han Jiang
>Priority: Minor
> Attachments: LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
> Every time the skipper reaches the last level 0 skip point, we'll have to 
> decode a whole block to read doc/freq data. Also,  a higher level skip list 
> will be created only for those df>blockSize^k, which means for most terms, 
> skipping will just be a linear scan. If we increase current blockSize for 
> better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set 
> skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-02 Thread Han Jiang (JIRA)

Han Jiang created LUCENE-4283:
-

 Summary: Support more frequent skip with Block Postings Format
 Key: LUCENE-4283
 URL: https://issues.apache.org/jira/browse/LUCENE-4283
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Priority: Minor


This change works on the new bulk branch.

Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every 
time the skipper reaches the last level 0 skip point, we'll have to decode a 
whole block to read doc/freq data. Also,  a higher level skip list will be 
created only for those df>blockSize^k, which means for most terms, skipping 
will just be a linear scan. If we increase current blockSize for better bulk 
i/o performance, current skip setting will be a bottleneck. 

For ForPF, the encoded block can be easily splitted if we set 
skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427229#comment-13427229
 ] 

Johannes Christen commented on LUCENE-4282:
---

Ok. I keep on digging in the code and come back when I found something.

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427230#comment-13427230
 ] 

Uwe Schindler commented on LUCENE-4281:
---

bq. This will require manual exclusion of that source file once the ban on 
Executors.defaultThreadFactory() is in

Then we need a separate forbiddenApis.txt file... :-)

> Delegate to default thread factory in NamedThreadFactory
> 
>
> Key: LUCENE-4281
> URL: https://issues.apache.org/jira/browse/LUCENE-4281
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 3.6.1, 4.0, 5.0
>Reporter: Simon Willnauer
>Priority: Minor
> Fix For: 4.0, 5.0, 3.6.2
>
> Attachments: LUCENE-4281.patch
>
>
> currently we state that we yield the same behavior as 
> Executors#defaultThreadFactory() but this behavior could change over time 
> even if it is compatible. We should just delegate to the default thread 
> factory instead of creating the threads ourself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory

2012-08-02 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427228#comment-13427228
 ] 

Dawid Weiss commented on LUCENE-4281:
-

This will require manual exclusion of that source file once the ban on 
Executors.defaultThreadFactory() is in. An alternate route is to change the 
documentation and not claim compatibility with defaultThreadFactory, instead 
just say that we create non-daemon threads with NORM_PRIORITY?

> Delegate to default thread factory in NamedThreadFactory
> 
>
> Key: LUCENE-4281
> URL: https://issues.apache.org/jira/browse/LUCENE-4281
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 3.6.1, 4.0, 5.0
>Reporter: Simon Willnauer
>Priority: Minor
> Fix For: 4.0, 5.0, 3.6.2
>
> Attachments: LUCENE-4281.patch
>
>
> currently we state that we yield the same behavior as 
> Executors#defaultThreadFactory() but this behavior could change over time 
> even if it is compatible. We should just delegate to the default thread 
> factory instead of creating the threads ourself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory

2012-08-02 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427224#comment-13427224
 ] 

Simon Willnauer commented on LUCENE-4281:
-

bq. Uh, sorry – I see what you did now. Anything on your mind in particular 
when you talk about behavioral changes?

I don't have anything in mind I just wanna replace logic with already existing 
logic that is "guaranteed" consistent with the documentation. This won't change 
anything really.



> Delegate to default thread factory in NamedThreadFactory
> 
>
> Key: LUCENE-4281
> URL: https://issues.apache.org/jira/browse/LUCENE-4281
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 3.6.1, 4.0, 5.0
>Reporter: Simon Willnauer
>Priority: Minor
> Fix For: 4.0, 5.0, 3.6.2
>
> Attachments: LUCENE-4281.patch
>
>
> currently we state that we yield the same behavior as 
> Executors#defaultThreadFactory() but this behavior could change over time 
> even if it is compatible. We should just delegate to the default thread 
> factory instead of creating the threads ourself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427223#comment-13427223
 ] 

Uwe Schindler commented on LUCENE-4282:
---

I also added more terms that are in fact *longer* than WEBER (WEBERE and 
WEBERES), both are returned, only the shorter ones now. WBRE also works. I dont 
think the automaton is broken, it may be the FuzzyTermsEnum that does some 
stuff on top of AutomatonTermsEnum. We have to wait for Robert, he might 
understand whats going on.

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427223#comment-13427223
 ] 

Uwe Schindler edited comment on LUCENE-4282 at 8/2/12 10:06 AM:


I also added more terms that are in fact *longer* than WEBER (WEBERE and 
WEBERES), both are returned, only the shorter ones not. WBRE also works. I dont 
think the automaton is broken, it may be the FuzzyTermsEnum that does some 
stuff on top of AutomatonTermsEnum. We have to wait for Robert, he might 
understand whats going on.

  was (Author: thetaphi):
I also added more terms that are in fact *longer* than WEBER (WEBERE and 
WEBERES), both are returned, only the shorter ones now. WBRE also works. I dont 
think the automaton is broken, it may be the FuzzyTermsEnum that does some 
stuff on top of AutomatonTermsEnum. We have to wait for Robert, he might 
understand whats going on.
  
> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-4282:
-

Assignee: Robert Muir

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>Assignee: Robert Muir
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427220#comment-13427220
 ] 

Johannes Christen commented on LUCENE-4282:
---

Yes I tried this as well. Also the prefix is not the problem.
I expect the error deep in the automaton.


> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427219#comment-13427219
 ] 

Uwe Schindler commented on LUCENE-4282:
---

The same happens, if I disable traspositions, so the transposition supporting 
automatons are not the problem.

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427204#comment-13427204
 ] 

Uwe Schindler commented on LUCENE-4282:
---

There is indeed something strange, I have to wait for Robert to get awake. The 
following test failes (when added to TestFuzzyQuery.java):

{code:java}
  public void test2() throws Exception {
Directory directory = newDirectory();
RandomIndexWriter writer = new RandomIndexWriter(random(), directory, new 
MockAnalyzer(random(), MockTokenizer.KEYWORD, false));
addDoc("LANGE", writer);
addDoc("LUETH", writer);
addDoc("PIRSING", writer);
addDoc("RIEGEL", writer);
addDoc("TRZECZIAK", writer);
addDoc("WALKER", writer);
addDoc("WBR", writer);
addDoc("WE", writer);
addDoc("WEB", writer);
addDoc("WEBE", writer);
addDoc("WEBER", writer);
addDoc("WITTKOPF", writer);
addDoc("WOJNAROWSKI", writer);
addDoc("WRICKE", writer);

IndexReader reader = writer.getReader();
IndexSearcher searcher = newSearcher(reader);
writer.close();

FuzzyQuery query = new FuzzyQuery(new Term("field", "WEBER"), 2, 1);
ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
assertEquals(4, hits.length);

reader.close();
directory.close();
}
{code}

The two missing terms have 2 deletions, so they are in edit distance.

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4256) Improve Analysis Factory configuration workflow

2012-08-02 Thread Chris Male (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-4256:
---

Attachment: LUCENE-4256-version.patch

Going to do this in smaller steps so they are easier to review and be sure 
about.

This patch moves the Version back into the args Map.

Once this is committed I'll tackle the constructor stuff.

> Improve Analysis Factory configuration workflow
> ---
>
> Key: LUCENE-4256
> URL: https://issues.apache.org/jira/browse/LUCENE-4256
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-4256-further.patch, LUCENE-4256-version.patch, 
> LUCENE-4256_incomplete.patch
>
>
> With the Factorys now available for more general use, I'd like to look at 
> ways to improve the configuration workflow.  Currently it's a little disjoint 
> and confusing, especially around using {{inform(ResourceLoader)}}.
> What I think we should do is:
> - Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader 
> in {{init}}, so it'd become {{init(Map args, ResourceLoader 
> loader)}}
> - Consider moving away from the generic args Map and using setters.  This 
> gives us better typing and could mitigate bugs due to using the wrong 
> configure key.  However it does force the consumer to invoke each setter.
> - If we're going to stick with using the args Map, then move the Version 
> parameter into {{init}} as well, rather than being a setter as I currently 
> made it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427188#comment-13427188
 ] 

Johannes Christen edited comment on LUCENE-4282 at 8/2/12 9:15 AM:
---

Query query = new FuzzyQuery(new Term("NAME", "WEBER"),2,1);

Here are all the terms for the field NAME in my index:
LANGE
LUETH
PIRSING
RIEGEL
TRZECZIAK
WALKER
WBR
WE
WEB
WEBE
WEBER
WITTKOPF
WOJNAROWSKI
WRICKE


  was (Author: superjo):
Query query = new FuzzyQuery(new Term("NAME", "WEBER"),2,1);

  
> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.6.0_33) - Build # 56 - Failure!

2012-08-02 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/56/
Java: 32bit/jdk1.6.0_33 -client -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 18997 lines...]
javadocs-lint:

[...truncated 1670 lines...]
BUILD FAILED
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\build.xml:47: The following 
error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:525: The 
following error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:515: exec 
returned: 1

Total time: 54 minutes 48 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427188#comment-13427188
 ] 

Johannes Christen commented on LUCENE-4282:
---

Query query = new FuzzyQuery(new Term("NAME", "WEBER"),2,1);


> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427187#comment-13427187
 ] 

Uwe Schindler commented on LUCENE-4282:
---

What was your query?

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427185#comment-13427185
 ] 

Johannes Christen commented on LUCENE-4282:
---

Thanks for the quick response Uwe.
I don't think that is the cause. My test index is very small (less than 100 
terms), so I don't think the terms get dropped. I thinkt they are missed by the 
automaton.
My rewritten query has only 2 terms: NAME:WEBE^0.584 NAME:WEBER


> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427184#comment-13427184
 ] 

Uwe Schindler edited comment on LUCENE-4282 at 8/2/12 8:54 AM:
---

This is caused by the rewrite method not FuzzyQuery itsself. The rewrite mode 
uses an internal priority queue, where it collects all terms from the index, 
that match the levensthein distance. If there are more terms available, some 
are dropped. This depends on their distance and other factors. If you want to 
use a larger PQ, create a separate instance of the 
TopTermsScoringBooleanQueryRewrite, giving a queue size.

  was (Author: thetaphi):
This is caused by the rewrite method not FuzzyQuery itsself. The rewrite 
mode uses an internal priority queue, where it collects all terms from the 
index, that match the levensthein distance. If there are more terms available, 
some are dropped. This depends on their distance and other factors. If you want 
to use a larger PQ, create a separate instance of the TopTermsRewriteMethod, 
giving a queue size.
  
> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427184#comment-13427184
 ] 

Uwe Schindler commented on LUCENE-4282:
---

This is caused by the rewrite method not FuzzyQuery itsself. The rewrite mode 
uses an internal priority queue, where it collects all terms from the index, 
that match the levensthein distance. If there are more terms available, some 
are dropped. This depends on their distance and other factors. If you want to 
use a larger PQ, create a separate instance of the TopTermsRewriteMethod, 
giving a queue size.

> Automaton Fuzzy Query doesn't deliver all results
> -
>
> Key: LUCENE-4282
> URL: https://issues.apache.org/jira/browse/LUCENE-4282
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Johannes Christen
>  Labels: newbie
>
> Having a small index with n documents where each document has one of the 
> following terms:
> WEBER, WEBE, WEB, WBR, WE, (and some more)
> The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected 
> terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR 
> which have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results

2012-08-02 Thread Johannes Christen (JIRA)

Johannes Christen created LUCENE-4282:
-

 Summary: Automaton Fuzzy Query doesn't deliver all results
 Key: LUCENE-4282
 URL: https://issues.apache.org/jira/browse/LUCENE-4282
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0-ALPHA
Reporter: Johannes Christen


Having a small index with n documents where each document has one of the 
following terms:
WEBER, WEBE, WEB, WBR, WE, (and some more)
The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected terms 
WEBER and WEBE in the rewritten query. The expected terms WEB and WBR which 
have an edit distance of 2 as well are missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2012-08-02 Thread Gili Nachum (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427179#comment-13427179
 ] 

Gili Nachum commented on LUCENE-2501:
-

Seeing a similar issue on 3.1.0. Was this ever resolved? or there's a 
workaround?

Stack:
{quote}
0049 SeedlistOpera Failed to process operation ADD 
java.lang.ArrayIndexOutOfBoundsException at 
org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:135) at 
org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:502) 
at 
org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:523) 
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:106)
 at 
org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:126)
 at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:479) 
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:169)
 at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:248)
 at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:701)
 at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2194) 
at 
...
{quote}

> ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
> --
>
> Key: LUCENE-2501
> URL: https://issues.apache.org/jira/browse/LUCENE-2501
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.0.1
>Reporter: Tim Smith
>
> I'm seeing the following exception during indexing:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
> at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
> at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
> at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
> at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
> at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
> at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
> ... 37 more
> {code}
> This seems to be caused by the following code:
> {code}
> final int level = slice[upto] & 15;
> final int newLevel = nextLevelArray[level];
> final int newSize = levelSizeArray[newLevel];
> {code}
> this can result in "level" being a value between 0 and 14
> the array nextLevelArray is only of size 10
> i suspect the solution would be to either max the level to 10, or to add more 
> entries to the nextLevelArray so it has 15 entries
> however, i don't know if something more is going wrong here and this is just 
> where the exception hits from a deeper issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3985) Refactor support for thread leaks

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427177#comment-13427177
 ] 

Uwe Schindler commented on LUCENE-3985:
---

OK, doesnt matter. I would just prefer to have it merged in - or we should 
rename the other files, too.

> Refactor support for thread leaks
> -
>
> Key: LUCENE-3985
> URL: https://issues.apache.org/jira/browse/LUCENE-3985
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, 
> LUCENE-3985.patch
>
>
> This will be duplicated in the runner and in LuceneTestCase; try to 
> consolidate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3985) Refactor support for thread leaks

2012-08-02 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427174#comment-13427174
 ] 

Dawid Weiss edited comment on LUCENE-3985 at 8/2/12 8:34 AM:
-

It is a separate file, I wanted it to be somewhat explicit. We can merge in 
later on, not a problem.

  was (Author: dweiss):
I already added it to a patch in LUCENE-3985 and fixed most of the calls 
there. It is a separate file, I wanted it to be somewhat explicit.
  
> Refactor support for thread leaks
> -
>
> Key: LUCENE-3985
> URL: https://issues.apache.org/jira/browse/LUCENE-3985
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, 
> LUCENE-3985.patch
>
>
> This will be duplicated in the runner and in LuceneTestCase; try to 
> consolidate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3985) Refactor support for thread leaks

2012-08-02 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427174#comment-13427174
 ] 

Dawid Weiss commented on LUCENE-3985:
-

I already added it to a patch in LUCENE-3985 and fixed most of the calls there. 
It is a separate file, I wanted it to be somewhat explicit.

> Refactor support for thread leaks
> -
>
> Key: LUCENE-3985
> URL: https://issues.apache.org/jira/browse/LUCENE-3985
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, 
> LUCENE-3985.patch
>
>
> This will be duplicated in the runner and in LuceneTestCase; try to 
> consolidate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3985) Refactor support for thread leaks

2012-08-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427168#comment-13427168
 ] 

Uwe Schindler commented on LUCENE-3985:
---

bq. You can add the signatures to the forbidden API list under jdk.txt (with 
comment) or a new file (but don't forget to place this new signature file in 
Lucene and Solr's filesets).

I think, to not complicate the filesets, we should use for this case simply 
jdk.txt and not a separate file (as all signatures refer to JDK. Otherwise we 
must rename jdk.txt to defaultCharsJdk.txt or whatever). Just place a comment 
in the introduction and add the signatures to jdk.txt. The other txt files in 
banned methods are more for other parts of lucene code-base (like test-only), 
or like commons-io, refer to a solr-only lib.

> Refactor support for thread leaks
> -
>
> Key: LUCENE-3985
> URL: https://issues.apache.org/jira/browse/LUCENE-3985
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, 
> LUCENE-3985.patch
>
>
> This will be duplicated in the runner and in LuceneTestCase; try to 
> consolidate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory

2012-08-02 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427167#comment-13427167
 ] 

Dawid Weiss commented on LUCENE-4281:
-

I see the default one resets inherited priority and daemon status. Security 
manager I wouldn't worry about...
{code}
if (t.isDaemon())
t.setDaemon(false);
if (t.getPriority() != Thread.NORM_PRIORITY)
t.setPriority(Thread.NORM_PRIORITY);
{code}

> Delegate to default thread factory in NamedThreadFactory
> 
>
> Key: LUCENE-4281
> URL: https://issues.apache.org/jira/browse/LUCENE-4281
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 3.6.1, 4.0, 5.0
>Reporter: Simon Willnauer
>Priority: Minor
> Fix For: 4.0, 5.0, 3.6.2
>
> Attachments: LUCENE-4281.patch
>
>
> currently we state that we yield the same behavior as 
> Executors#defaultThreadFactory() but this behavior could change over time 
> even if it is compatible. We should just delegate to the default thread 
> factory instead of creating the threads ourself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory

2012-08-02 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427165#comment-13427165
 ] 

Dawid Weiss commented on LUCENE-4281:
-

Uh, sorry -- I see what you did now. Anything on your mind in particular when 
you talk about behavioral changes?

> Delegate to default thread factory in NamedThreadFactory
> 
>
> Key: LUCENE-4281
> URL: https://issues.apache.org/jira/browse/LUCENE-4281
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 3.6.1, 4.0, 5.0
>Reporter: Simon Willnauer
>Priority: Minor
> Fix For: 4.0, 5.0, 3.6.2
>
> Attachments: LUCENE-4281.patch
>
>
> currently we state that we yield the same behavior as 
> Executors#defaultThreadFactory() but this behavior could change over time 
> even if it is compatible. We should just delegate to the default thread 
> factory instead of creating the threads ourself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

1 - 100 of 103 matches

Mail list logo