[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427840#comment-13427840 ] Steven Rowe commented on SOLR-1725: --- The tests committed here 3 weeks ago have never succeeded under the Jenkins trunk and branch_4x maven builds. (For some reason failure notification emails aren't making it to the dev list.) E.g. [https://builds.apache.org/job/Lucene-Solr-Maven-trunk/554/]. Javascript engine appears to not be found. I don't understand why this would be the case, though, since the Ant tests succeed running under the same JVM. > Script based UpdateRequestProcessorFactory > -- > > Key: SOLR-1725 > URL: https://issues.apache.org/jira/browse/SOLR-1725 > Project: Solr > Issue Type: New Feature > Components: update >Affects Versions: 1.4 >Reporter: Uri Boness >Assignee: Erik Hatcher > Labels: UpdateProcessor > Fix For: 4.0 > > Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, > SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, > SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, > SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch > > > A script based UpdateRequestProcessorFactory (Uses JDK6 script engine > support). The main goal of this plugin is to be able to configure/write > update processors without the need to write and package Java code. > The update request processor factory enables writing update processors in > scripts located in {{solr.solr.home}} directory. The functory accepts one > (mandatory) configuration parameter named {{scripts}} which accepts a > comma-separated list of file names. It will look for these files under the > {{conf}} directory in solr home. When multiple scripts are defined, their > execution order is defined by the lexicographical order of the script file > name (so {{scriptA.js}} will be executed before {{scriptB.js}}). > The script language is resolved based on the script file extension (that is, > a *.js files will be treated as a JavaScript script), therefore an extension > is mandatory. > Each script file is expected to have one or more methods with the same > signature as the methods in the {{UpdateRequestProcessor}} interface. It is > *not* required to define all methods, only those hat are required by the > processing logic. > The following variables are define as global variables for each script: > * {{req}} - The SolrQueryRequest > * {{rsp}}- The SolrQueryResponse > * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat
[ https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427811#comment-13427811 ] Han Jiang commented on LUCENE-4225: --- OK, thanks Mike! > New FixedPostingsFormat for less overhead than SepPostingsFormat > > > Key: LUCENE-4225 > URL: https://issues.apache.org/jira/browse/LUCENE-4225 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-4225-on-rev-1362013.patch, LUCENE-4225.patch, > LUCENE-4225.patch, LUCENE-4225.patch, LUCENE-4225.patch > > > I've worked out the start at a new postings format that should have > less overhead for fixed-int[] encoders (For,PFor)... using ideas from > the old bulk branch, and new ideas from Robert. > It's only a start: there's no payloads support yet, and I haven't run > Lucene's tests with it, except for one new test I added that tries to > be a thorough PostingsFormat tester (to make it easier to create new > postings formats). It does pass luceneutil's performance test, so > it's at least able to run those queries correctly... > Like Lucene40, it uses two files (though once we add payloads it may > be 3). The .doc file interleaves doc delta and freq blocks, and .pos > has position delta blocks. Unlike sep, blocks are NOT shared across > terms; instead, it uses block encoding if there are enough ints to > encode, else the same Lucene40 vInt format. This means low-freq terms > (< 128 = current default block size) are always vInts, and high-freq > terms will have some number of blocks, with a vInt final block. > Skip points are only recorded at block starts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-3639) We should update to ZooKeeper 3.3.5
[ https://issues.apache.org/jira/browse/SOLR-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened SOLR-3639: --- 3.3.6 just hit - once ivy can find it, I'll update: http://www.cloudera.com/blog/2012/08/apache-zookeeper-3-3-6-has-been-released/ > We should update to ZooKeeper 3.3.5 > --- > > Key: SOLR-3639 > URL: https://issues.apache.org/jira/browse/SOLR-3639 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.0, 5.0 > > > We should update to 3.3.5 - there was a corruption issue fixed. > http://zookeeper.apache.org/doc/r3.3.5/releasenotes.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4203) Add IndexWriter.tryDeleteDocument, to delete by document id when possible
[ https://issues.apache.org/jira/browse/LUCENE-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-4203. Resolution: Fixed Fix Version/s: 5.0 4.0 > Add IndexWriter.tryDeleteDocument, to delete by document id when possible > - > > Key: LUCENE-4203 > URL: https://issues.apache.org/jira/browse/LUCENE-4203 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4203.patch, LUCENE-4203.patch > > > Spinoff from LUCENE-4069. > In that use case, where the app needs to first lookup a document, then > call updateDocument, it's wasteful today because the relatively costly > lookup (by a primary key field, eg "id") is done twice. > But, since you already resolved the PK to docID on the first lookup, > it would be nice to then delete by that docID and then you can call > addDocument instead. > So I worked out a rough start at this, by adding > IndexWriter.tryDeleteDocument. It'd be a very expert API: it takes a > SegmentInfo (referencing the segment that contains the docID), and as > long as that segment hasn't yet been merged away, it will mark the > document for deletion and return true (success). If it has been > merged away it returns false and the app must then delete-by-term. It > only works if the writer is in NRT mode (ie you've opened an NRT > reader). > In LUCENE-4069 using tryDeleteDocument gave a ~20% net speedup. > I think tryDeleteDocument would also be useful when Solr "updates" a > document by loading all stored fields, changing them, and calling > updateDocument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format
[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427667#comment-13427667 ] Michael McCandless commented on LUCENE-4283: I think we shouldn't have to do our own buffering up of the skip points within one block? Can't we call skipWriter.bufferSkip every skipInterval docs (and pass it lastDocID, etc.)? Then it can write the skip point immediately. Also, in BlockPostingsReader, why do we need a separate docBufferOffset? Can't we just set docBufferUpto to wherever (36, 64, 96) we had skipped to within the block? > Support more frequent skip with Block Postings Format > - > > Key: LUCENE-4283 > URL: https://issues.apache.org/jira/browse/LUCENE-4283 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Han Jiang >Priority: Minor > Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch > > > This change works on the new bulk branch. > Currently, our BlockPostingsFormat only supports skipInterval==blockSize. > Every time the skipper reaches the last level 0 skip point, we'll have to > decode a whole block to read doc/freq data. Also, a higher level skip list > will be created only for those df>blockSize^k, which means for most terms, > skipping will just be a linear scan. If we increase current blockSize for > better bulk i/o performance, current skip setting will be a bottleneck. > For ForPF, the encoded block can be easily splitted if we set > skipInterval=32*k. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3658) SolrCmdDistributor can briefly create spikes of threads in the thousands.
[ https://issues.apache.org/jira/browse/SOLR-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427659#comment-13427659 ] Mark Miller commented on SOLR-3658: --- There were some real problems with my previous solution - it somewhat worked accidentally - but I think really damaged performance probably. I just committed a new approach that has tested out nicely so far. > SolrCmdDistributor can briefly create spikes of threads in the thousands. > - > > Key: SOLR-3658 > URL: https://issues.apache.org/jira/browse/SOLR-3658 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.0, 5.0 > > Attachments: SOLR-3658.patch > > > see mailing list http://markmail.org/thread/yy5b7g6g7733wgcp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3703) Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser
srinivas created SOLR-3703: -- Summary: Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser Key: SOLR-3703 URL: https://issues.apache.org/jira/browse/SOLR-3703 Project: Solr Issue Type: Bug Affects Versions: 3.6 Environment: Linux Reporter: srinivas I noticed, escape character which is in the query, is getting ignored in solr 3.6 with lucene parser. If I give edismax, then it is giving expected results for the following query. select?q=author:David\ Duke&defType=lucene Would render the same results as: select?q=author:(David OR Duke)&defType=lucene But select?q=author:David\ Duke&defType=edismax Would render the same results as: select?q=author:"David Duke"&defType=lucene Regards Srini -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2115) DataImportHandler config file *must* be specified in "defaults" or status will be "DataImportHandler started. Not Initialized. No commands can be run"
[ https://issues.apache.org/jira/browse/SOLR-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2115: - Attachment: SOLR-2115.patch Updated patch, which I plan to commit soon. > DataImportHandler config file *must* be specified in "defaults" or status > will be "DataImportHandler started. Not Initialized. No commands can be run" > -- > > Key: SOLR-2115 > URL: https://issues.apache.org/jira/browse/SOLR-2115 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 1.4.1, 1.4.2, 3.1, 4.0-ALPHA >Reporter: Lance Norskog >Assignee: James Dyer >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2115.patch, SOLR-2115.patch > > > The DataImportHandler has two URL parameters for defining the data-config.xml > file to be used for the command. 'config' is used in some places and > 'dataConfig' is used in other places. > 'config' does not work from an HTTP request. However, if it is in the > "defaults" section of the DIH definition, it works. If the > 'config' parameter is used in an HTTP request, the DIH uses the default in > the anyway. > This is the exception stack recieved by the client if there is no default. > (This is the 3.X branch.) > > > > Error 500 > > HTTP ERROR: 500null > java.lang.NullPointerException > at > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:146) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > ..etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3699) SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig
[ https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-3699: --- Attachment: SOLR-3699.patch My quick and dirty attempt to fix this by making SolrIndexWriter's constructor private and adding a static "create" method that deals with calling directoryFactory.release() if the private constructor fails. Unfortunately it's still not working ... not clear to me why, but i'm about to get on a plain and won't have a chance to dig into it anymore for another 3-4 days, so i wanted to get what i have into Jira in case anyone else wants to take a stab at it. > SolrIndexWriter constructor leaks Directory if Exception creating > IndexWriterConfig > --- > > Key: SOLR-3699 > URL: https://issues.apache.org/jira/browse/SOLR-3699 > Project: Solr > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.0, 5.0 > > Attachments: SOLR-3699.patch, SOLR-3699.patch > > > in LUCENE-4278 i had to add a hack to force SimpleFSDir for > CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on > certain errors. > This might indicate a problem that leaks happen if certain errors happen > (e.g. not handled in finally) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2446) org.apache.solr.client.solrj.beans.DocumentObjectBinder customization
[ https://issues.apache.org/jira/browse/SOLR-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427604#comment-13427604 ] Michael Andrews commented on SOLR-2446: --- This would be very helpful. > org.apache.solr.client.solrj.beans.DocumentObjectBinder customization > - > > Key: SOLR-2446 > URL: https://issues.apache.org/jira/browse/SOLR-2446 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 1.4.1 >Reporter: Alexander Suslov >Priority: Minor > Fix For: 3.1.1 > > Attachments: patch.zip > > Original Estimate: 1h > Remaining Estimate: 1h > > I suggest adding a way to customize DocumentObjectBinder behavior. It is not > always possible to perform mapping between SolrInputDocument and beans using > the default implementation. And SolrServer doesn't have a way to change the > default binder for a different implementation. My suggestion is very simple: > introduce an interface for binder, and add the ability to set the custom > binder for the SolrServer. Please find suggested changes in the attached > file. Such an addition will make the solr4J library more flexible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format
[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427573#comment-13427573 ] Michael McCandless commented on LUCENE-4283: Billy, it looks like this patch is a bit stale (it doesn't apply on the current branch)? Can you please update it? Thanks. > Support more frequent skip with Block Postings Format > - > > Key: LUCENE-4283 > URL: https://issues.apache.org/jira/browse/LUCENE-4283 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Han Jiang >Priority: Minor > Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch > > > This change works on the new bulk branch. > Currently, our BlockPostingsFormat only supports skipInterval==blockSize. > Every time the skipper reaches the last level 0 skip point, we'll have to > decode a whole block to read doc/freq data. Also, a higher level skip list > will be created only for those df>blockSize^k, which means for most terms, > skipping will just be a linear scan. If we increase current blockSize for > better bulk i/o performance, current skip setting will be a bottleneck. > For ForPF, the encoded block can be easily splitted if we set > skipInterval=32*k. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant
[ https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427569#comment-13427569 ] Luca Stancapiano commented on LUCENE-3167: -- Hi guys, Nicolas, I confirm for the October 2011... in that time the patch worked.I'm wondered that none has still committed the work...I will be here for help. Let me know! > Make lucene/solr a OSGI bundle through Ant > -- > > Key: LUCENE-3167 > URL: https://issues.apache.org/jira/browse/LUCENE-3167 > Project: Lucene - Core > Issue Type: New Feature > Environment: bndtools >Reporter: Luca Stancapiano > Attachments: LUCENE-3167.patch, LUCENE-3167.patch, LUCENE-3167.patch, > lucene_trunk.patch, lucene_trunk.patch > > > We need to make a bundle thriugh Ant, so the binary can be published and no > more need the download of the sources. Actually to get a OSGI bundle we need > to use maven tools and build the sources. Here the reference for the creation > of the OSGI bundle through Maven: > https://issues.apache.org/jira/browse/LUCENE-1344 > Bndtools could be used inside Ant -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat
[ https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427562#comment-13427562 ] Michael McCandless commented on LUCENE-4225: OK I committed the fix: Block/PackedPF was incorrectly encoding offsets as startOffset - lastEndOffset. It must instead be startOffset - lastStartOffset because it is possible (though rare) for startOffset - lastEndOffset to be negative. I also separately committed a fix for NPEs that tests were hitting when the index didn't index payloads nor offsets. Tests should now pass for BlockPF and BlockPackedPF... > New FixedPostingsFormat for less overhead than SepPostingsFormat > > > Key: LUCENE-4225 > URL: https://issues.apache.org/jira/browse/LUCENE-4225 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-4225-on-rev-1362013.patch, LUCENE-4225.patch, > LUCENE-4225.patch, LUCENE-4225.patch, LUCENE-4225.patch > > > I've worked out the start at a new postings format that should have > less overhead for fixed-int[] encoders (For,PFor)... using ideas from > the old bulk branch, and new ideas from Robert. > It's only a start: there's no payloads support yet, and I haven't run > Lucene's tests with it, except for one new test I added that tries to > be a thorough PostingsFormat tester (to make it easier to create new > postings formats). It does pass luceneutil's performance test, so > it's at least able to run those queries correctly... > Like Lucene40, it uses two files (though once we add payloads it may > be 3). The .doc file interleaves doc delta and freq blocks, and .pos > has position delta blocks. Unlike sep, blocks are NOT shared across > terms; instead, it uses block encoding if there are enough ints to > encode, else the same Lucene40 vInt format. This means low-freq terms > (< 128 = current default block size) are always vInts, and high-freq > terms will have some number of blocks, with a vInt final block. > Skip points are only recorded at block starts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427559#comment-13427559 ] Michael McCandless commented on LUCENE-4282: +1 > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: LUCENE-4282-tests.patch, LUCENE-4282.patch, > LUCENE-4282.patch, ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
[ https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427554#comment-13427554 ] Michael McCandless commented on LUCENE-2501: I committed the patch, but I'll leave this open until we can hear back from Tim or Gili that this has resolved the issue ... > ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice > -- > > Key: LUCENE-2501 > URL: https://issues.apache.org/jira/browse/LUCENE-2501 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.1 >Reporter: Tim Smith > Attachments: LUCENE-2501.patch > > > I'm seeing the following exception during indexing: > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 14 > at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118) > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490) > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120) > at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757) > at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085) > ... 37 more > {code} > This seems to be caused by the following code: > {code} > final int level = slice[upto] & 15; > final int newLevel = nextLevelArray[level]; > final int newSize = levelSizeArray[newLevel]; > {code} > this can result in "level" being a value between 0 and 14 > the array nextLevelArray is only of size 10 > i suspect the solution would be to either max the level to 10, or to add more > entries to the nextLevelArray so it has 15 entries > however, i don't know if something more is going wrong here and this is just > where the exception hits from a deeper issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
[ https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427531#comment-13427531 ] Robert Muir commented on LUCENE-2501: - +1 > ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice > -- > > Key: LUCENE-2501 > URL: https://issues.apache.org/jira/browse/LUCENE-2501 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.1 >Reporter: Tim Smith > Attachments: LUCENE-2501.patch > > > I'm seeing the following exception during indexing: > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 14 > at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118) > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490) > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120) > at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757) > at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085) > ... 37 more > {code} > This seems to be caused by the following code: > {code} > final int level = slice[upto] & 15; > final int newLevel = nextLevelArray[level]; > final int newSize = levelSizeArray[newLevel]; > {code} > this can result in "level" being a value between 0 and 14 > the array nextLevelArray is only of size 10 > i suspect the solution would be to either max the level to 10, or to add more > entries to the nextLevelArray so it has 15 entries > however, i don't know if something more is going wrong here and this is just > where the exception hits from a deeper issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4282: Attachment: LUCENE-4282.patch A simpler patch, i also benchmarked. The problem is this comment in the legacy scoring (in all previous lucene versions): {noformat} // this will return less than 0.0 when the edit distance is // greater than the number of characters in the shorter word. // but this was the formula that was previously used in FuzzyTermEnum, // so it has not been changed (even though minimumSimilarity must be // greater than 0.0) {noformat} Because of that its really impossible to fix until we remove that deprecated one completely :) So i think this one is good to commit, and separately I will look at removing the deprecated one from trunk and cleaning all this up when i have time (I would port the math-proof tests from automata-package to run as queries so we are sure). > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: LUCENE-4282-tests.patch, LUCENE-4282.patch, > LUCENE-4282.patch, ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
[ https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427518#comment-13427518 ] Michael McCandless commented on LUCENE-2501: I'm glad too! > ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice > -- > > Key: LUCENE-2501 > URL: https://issues.apache.org/jira/browse/LUCENE-2501 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.1 >Reporter: Tim Smith > Attachments: LUCENE-2501.patch > > > I'm seeing the following exception during indexing: > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 14 > at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118) > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490) > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120) > at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757) > at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085) > ... 37 more > {code} > This seems to be caused by the following code: > {code} > final int level = slice[upto] & 15; > final int newLevel = nextLevelArray[level]; > final int newSize = levelSizeArray[newLevel]; > {code} > this can result in "level" being a value between 0 and 14 > the array nextLevelArray is only of size 10 > i suspect the solution would be to either max the level to 10, or to add more > entries to the nextLevelArray so it has 15 entries > however, i don't know if something more is going wrong here and this is just > where the exception hits from a deeper issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
[ https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427517#comment-13427517 ] Simon Willnauer commented on LUCENE-2501: - sneaky, glad that this stuff is single threaded in 4.0 :) > ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice > -- > > Key: LUCENE-2501 > URL: https://issues.apache.org/jira/browse/LUCENE-2501 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.1 >Reporter: Tim Smith > Attachments: LUCENE-2501.patch > > > I'm seeing the following exception during indexing: > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 14 > at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118) > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490) > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120) > at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757) > at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085) > ... 37 more > {code} > This seems to be caused by the following code: > {code} > final int level = slice[upto] & 15; > final int newLevel = nextLevelArray[level]; > final int newSize = levelSizeArray[newLevel]; > {code} > this can result in "level" being a value between 0 and 14 > the array nextLevelArray is only of size 10 > i suspect the solution would be to either max the level to 10, or to add more > entries to the nextLevelArray so it has 15 entries > however, i don't know if something more is going wrong here and this is just > where the exception hits from a deeper issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
[ https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2501: --- Attachment: LUCENE-2501.patch OK I found a possible cause behind this ... it was something I had fixed but didn't pull out and backport to 3.x LUCENE-3684. It's a thread safety issue, when FielfInfo.indexOptions changes from DOCS_AND_FREQS_AND_POSITIONS to not indexing positions. If this happens in one thread while a new thread is suddenly indexing a that same field there's a narrow window where the 2nd thread's FreqProxTermsWriterPerField can mis-report the streamCount as 1 when it should be 2. Attached patch (3.6.x) should fix it. I tried to get a thread test to provoke this but couldn't ... I think the window is too small (if I forcefully add sleeps at the "right time" in FreqProxTermsWriterPerField then I could provoke it...). > ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice > -- > > Key: LUCENE-2501 > URL: https://issues.apache.org/jira/browse/LUCENE-2501 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.1 >Reporter: Tim Smith > Attachments: LUCENE-2501.patch > > > I'm seeing the following exception during indexing: > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 14 > at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118) > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490) > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120) > at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757) > at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085) > ... 37 more > {code} > This seems to be caused by the following code: > {code} > final int level = slice[upto] & 15; > final int newLevel = nextLevelArray[level]; > final int newSize = levelSizeArray[newLevel]; > {code} > this can result in "level" being a value between 0 and 14 > the array nextLevelArray is only of size 10 > i suspect the solution would be to either max the level to 10, or to add more > entries to the nextLevelArray so it has 15 entries > however, i don't know if something more is going wrong here and this is just > where the exception hits from a deeper issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4177) TestPerfTasksLogic.testBGSearchTaskThreads sometimes fails or hangs on Windows
[ https://issues.apache.org/jira/browse/LUCENE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427479#comment-13427479 ] Michael McCandless commented on LUCENE-4177: bq. As a side note, wouldn't it be easier to propagate a single flag object instead of method calls? I I completely agree AtomicBoolean is the right solution here ... but I don't have time now to fix it. I'll commit the patch ... > TestPerfTasksLogic.testBGSearchTaskThreads sometimes fails or hangs on Windows > -- > > Key: LUCENE-4177 > URL: https://issues.apache.org/jira/browse/LUCENE-4177 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-4177.patch > > > e.g. > http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java6-64/147/ > http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/408/ > this has happened a couple times... but always on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: TestIndexWriterDelete fails with OOM
Thanks Robert. Mike McCandless http://blog.mikemccandless.com On Thu, Aug 2, 2012 at 1:35 PM, Robert Muir wrote: > this test method uses only one field, but disables simpletext and > memory already. i think directPF was just an omission. I'll add it > too. > > On Thu, Aug 2, 2012 at 11:07 AM, Simon Willnauer > wrote: >> I see this on http://85.25.120.39/job/Lucene-trunk-Linux-Java6-64/162/console >> should we disable the mem intensive codecs for this test? >> >> [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterDelete >> [junit4:junit4] ERROR 35.5s J1 | >> TestIndexWriterDelete.testIndexingThenDeleting >> [junit4:junit4]> Throwable #1: java.lang.OutOfMemoryError: Java heap >> space >> [junit4:junit4]>at >> __randomizedtesting.SeedInfo.seed([F645C683FAA013CE:8FD749EA602CCB95]:0) >> [junit4:junit4]>at >> org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectField.(DirectPostingsFormat.java:385) >> [junit4:junit4]>at >> org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectFields.(DirectPostingsFormat.java:130) >> [junit4:junit4]>at >> org.apache.lucene.codecs.memory.DirectPostingsFormat.fieldsProducer(DirectPostingsFormat.java:112) >> [junit4:junit4]>at >> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:186) >> [junit4:junit4]>at >> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:250) >> [junit4:junit4]>at >> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:107) >> [junit4:junit4]>at >> org.apache.lucene.index.SegmentReader.(SegmentReader.java:55) >> [junit4:junit4]>at >> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62) >> [junit4:junit4]>at >> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:638) >> [junit4:junit4]>at >> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) >> [junit4:junit4]>at >> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:62) >> [junit4:junit4]>at >> org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:583) >> [junit4:junit4]>at >> org.apache.lucene.index.TestIndexWriterDelete.testIndexingThenDeleting(TestIndexWriterDelete.java:935) >> [junit4:junit4]>at >> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> [junit4:junit4]>at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> [junit4:junit4]>at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> [junit4:junit4]>at java.lang.reflect.Method.invoke(Method.java:597) >> [junit4:junit4]>at >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) >> [junit4:junit4]>at >> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) >> [junit4:junit4]>at >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) >> [junit4:junit4]>at >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) >> [junit4:junit4]>at >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) >> [junit4:junit4]>at >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) >> [junit4:junit4]>at >> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) >> [junit4:junit4]>at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >> [junit4:junit4]>at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> [junit4:junit4]>at >> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) >> [junit4:junit4]>at >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) >> [junit4:junit4]>at >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) >> [junit4:junit4]>at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> [junit4:junit4]>at >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) >> [junit4:junit4]>at >> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) >> [junit4:junit4]> >> [junit4:junit4] 2> NOTE: download the large Jenkins line-docs file >> by running 'ant get-jenkins-line-docs' in the lucene directory. >> [junit4:junit4] 2> NOTE: reproduce with: ant test >> -Dtestcase=TestIndexWriterDelete >> -Dtests.method=t
Re: TestIndexWriterDelete fails with OOM
this test method uses only one field, but disables simpletext and memory already. i think directPF was just an omission. I'll add it too. On Thu, Aug 2, 2012 at 11:07 AM, Simon Willnauer wrote: > I see this on http://85.25.120.39/job/Lucene-trunk-Linux-Java6-64/162/console > should we disable the mem intensive codecs for this test? > > [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterDelete > [junit4:junit4] ERROR 35.5s J1 | > TestIndexWriterDelete.testIndexingThenDeleting > [junit4:junit4]> Throwable #1: java.lang.OutOfMemoryError: Java heap space > [junit4:junit4]>at > __randomizedtesting.SeedInfo.seed([F645C683FAA013CE:8FD749EA602CCB95]:0) > [junit4:junit4]>at > org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectField.(DirectPostingsFormat.java:385) > [junit4:junit4]>at > org.apache.lucene.codecs.memory.DirectPostingsFormat$DirectFields.(DirectPostingsFormat.java:130) > [junit4:junit4]>at > org.apache.lucene.codecs.memory.DirectPostingsFormat.fieldsProducer(DirectPostingsFormat.java:112) > [junit4:junit4]>at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:186) > [junit4:junit4]>at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:250) > [junit4:junit4]>at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:107) > [junit4:junit4]>at > org.apache.lucene.index.SegmentReader.(SegmentReader.java:55) > [junit4:junit4]>at > org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62) > [junit4:junit4]>at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:638) > [junit4:junit4]>at > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) > [junit4:junit4]>at > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:62) > [junit4:junit4]>at > org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:583) > [junit4:junit4]>at > org.apache.lucene.index.TestIndexWriterDelete.testIndexingThenDeleting(TestIndexWriterDelete.java:935) > [junit4:junit4]>at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit4:junit4]>at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > [junit4:junit4]>at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [junit4:junit4]>at java.lang.reflect.Method.invoke(Method.java:597) > [junit4:junit4]>at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) > [junit4:junit4]>at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) > [junit4:junit4]>at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) > [junit4:junit4]>at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) > [junit4:junit4]>at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) > [junit4:junit4]>at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > [junit4:junit4]>at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) > [junit4:junit4]>at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > [junit4:junit4]>at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > [junit4:junit4]>at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > [junit4:junit4]>at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > [junit4:junit4]>at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) > [junit4:junit4]>at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > [junit4:junit4]>at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) > [junit4:junit4]>at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) > [junit4:junit4]> > [junit4:junit4] 2> NOTE: download the large Jenkins line-docs file > by running 'ant get-jenkins-line-docs' in the lucene directory. > [junit4:junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestIndexWriterDelete > -Dtests.method=testIndexingThenDeleting -Dtests.seed=F645C683FAA013CE > -Dtests.multiplier=3 -Dtests.nightly=true -Dtests.slow=true > -Dtests.linedocsfile=/var/lib/jenkins/lucene-data/enwiki.random.lines.txt > -Dtests.locale=v
[jira] [Updated] (SOLR-3428) SolrCmdDistributor flushAdds/flushDeletes problems
[ https://issues.apache.org/jira/browse/SOLR-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3428: -- Fix Version/s: 5.0 4.0 > SolrCmdDistributor flushAdds/flushDeletes problems > -- > > Key: SOLR-3428 > URL: https://issues.apache.org/jira/browse/SOLR-3428 > Project: Solr > Issue Type: Bug > Components: replication (java), SolrCloud, update >Affects Versions: 4.0-ALPHA >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: add, delete, replica, solrcloud, update > Fix For: 4.0, 5.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > A few problems with SolrCmdDistributor.flushAdds/flushDeletes > * If number of AddRequests/DeleteRequests in alist/dlist is below limit for a > specific node the method returns immediately and doesnt flush for subsequent > nodes > * When returning immediately because there is below limit requests for a > given node, then previous nodes that have already been flushed/submitted are > not removed from adds/deletes maps (causing them to be flushed/submitted > again the next time flushAdds/flushDeletes is executed) > * The idea about just combining params does not work for SEEN_LEADER params > (and probably others as well). Since SEEN_LEADER cannot be expressed (unlike > commitWithin and overwrite) for individual operations in the request, you > need to sent two separate submits. One containing requests with > SEEN_LEADER=true and one with SEEN_LEADER=false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 14992 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14992/ All tests passed Build Log: [...truncated 10303 lines...] javadocs-lint: [...truncated 1667 lines...] BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/build.xml:47: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build.xml:525: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build.xml:515: exec returned: 7 Total time: 2 minutes 18 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Publishing Clover coverage report... No Clover report will be published due to a Build Failure Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3428) SolrCmdDistributor flushAdds/flushDeletes problems
[ https://issues.apache.org/jira/browse/SOLR-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427463#comment-13427463 ] Mark Miller commented on SOLR-3428: --- I've committed the simple fix for the flush issue and added a test. > SolrCmdDistributor flushAdds/flushDeletes problems > -- > > Key: SOLR-3428 > URL: https://issues.apache.org/jira/browse/SOLR-3428 > Project: Solr > Issue Type: Bug > Components: replication (java), SolrCloud, update >Affects Versions: 4.0-ALPHA >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: add, delete, replica, solrcloud, update > Original Estimate: 24h > Remaining Estimate: 24h > > A few problems with SolrCmdDistributor.flushAdds/flushDeletes > * If number of AddRequests/DeleteRequests in alist/dlist is below limit for a > specific node the method returns immediately and doesnt flush for subsequent > nodes > * When returning immediately because there is below limit requests for a > given node, then previous nodes that have already been flushed/submitted are > not removed from adds/deletes maps (causing them to be flushed/submitted > again the next time flushAdds/flushDeletes is executed) > * The idea about just combining params does not work for SEEN_LEADER params > (and probably others as well). Since SEEN_LEADER cannot be expressed (unlike > commitWithin and overwrite) for individual operations in the request, you > need to sent two separate submits. One containing requests with > SEEN_LEADER=true and one with SEEN_LEADER=false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4285) Improve FST API usability for mere mortals
[ https://issues.apache.org/jira/browse/LUCENE-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427435#comment-13427435 ] David Smiley commented on LUCENE-4285: -- Keep in mind, from an FST outsider like me, *FSTs are basically a fancy SortedMap*. Yet Lucene's FST API is so complicated that there is a dedicated package of classes, and I need to understand a fair amount of it. I'm not saying the package should go away or just one class is realistic, just that conceptually for outsiders it can and should be simpler than it is. The Util.get* methods should have instance methods on the FST. I shouldn't need to look at Util, I think. The BytesReader concept is confusing and should be hidden. Outputs... this aspect of the API is over-exposed; maybe it can be hidden more? I know I need to choose an implementation at construction. FSTEnum is pretty cool, and improving it or creating variants of it could help to simply using the overall API. The FST should have a getter for it. It would be nice if FSTEnum could advance to the next arc by a label (I need this). It would be something like next(int). Can it be improved to the point where, for example, SynonymFilter can use it? It would be nice to reduce the use-cases where users/client-code don't have to even see an Arc. > Improve FST API usability for mere mortals > -- > > Key: LUCENE-4285 > URL: https://issues.apache.org/jira/browse/LUCENE-4285 > Project: Lucene - Core > Issue Type: Improvement > Components: core/FSTs >Reporter: David Smiley > > FST technology is something that has brought amazing advances to Lucene, yet > the API is hard to use for the vast majority of users like me. I know that > performance of FSTs is really important, but surely a lot can be done without > sacrificing that. > (comments will hold specific ideas and problems) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Outstanding issues for 3.0.3
if you do rename stuff other than poorly named signature parameters, its helpful to document the java version's name and the reason for the rename. Even for java, some of the internal naming makes the code that much harder to understand and follow. On Thu, Aug 2, 2012 at 12:33 PM, Prescott Nasser wrote: > Excellent Idea - I'll do that monday to give you guys the weekend to do > any last minute code cleaning you want. > > > > > Date: Thu, 2 Aug 2012 19:30:02 +0300 > > Subject: Re: Outstanding issues for 3.0.3 > > From: ita...@code972.com > > To: lucene-net-...@lucene.apache.org > > > > Prescott - we could make an RC and push it to Nuget as a PreRelease, to > get > > real feedback. > > > > On Thu, Aug 2, 2012 at 7:13 PM, Prescott Nasser >wrote: > > > > > I don't think we ever fully adopted the style guidelines, probably not > a > > > terrible discussion to have. As for this release, I think that by lazy > > > consensus we should branch the trunk at the end of this weekend (say > > > monday), and begin the process of cutting a release. - my $.02 below > > > > > > > > > > 1) Usage of "this" prefix when not required. > > > > > > > > this.blah = blah; <- required this. > > > > this.aBlah = blah; <- optional this, which Re# doesn't like. > > > > > > > > I'm assuming consistency wins here, and 'this.' stays, but wanted to > > > double check. > > > > > > I'd error with consistency > > > > > > > > > > > > > > 2) Using different conventions for fields and parameters\local vars. > > > > > > > > blah vs. _blah > > > > > > > > > > > Combined with 1, Re# wants (and I'm personally accustomed to): > > > > > > > > _blah = blah; > > > > > > > > > > > > > For private variables _ is ok, for anything else, don't use _ as it's > not > > > CLR compliant > > > > > > > > > > However, that seems to violate the adopted style. > > > > > > > > 3) Full qualification of type names. > > > > > > > > Re # wants to remove redundant namespace qualifiers. Leave them or > > > remove them? > > > > > > > > > > I try to remove them > > > > > > > 4) Removing unreferenced classes. > > > > > > > > Should I remove non-public unreferenced classes? The ones I've come > > > across so far are private. > > > > > > > > > > I'm not sure I understand - are you saying we have classes that are > never > > > used in random places? If so, I think before removing them we should > have a > > > conversation; what are they, why are they there, etc. - I'm hoping > there > > > aren't too many of these.. > > > > > > > 5) var vs. explicit > > > > > > > > I know this has been brought up before, but not sure of the final > > > disposition. FWIW, I prefer var. > > > > > > > > > > I use var with it's plainly obvious the object var obj = new > MyClass(). I > > > usually use explicit when it's an object returned from some function > that > > > makes it unclear what the return value is: > > > > > > > > > var items = search.GetResults(); > > > > > > vs > > > > > > IList items = search.GetResults(); //prefer > > > > > > > > > > > > > > There are some non-Re# issues I came across as well that look like > > > artifacts of code generation: > > > > > > > > 6) Weird param names. > > > > > > > > Param1 vs. directory > > > > > > > > I assume it's okay to replace 'Param1' with something a descriptive > name > > > like 'directory'. > > > > > > > > > > Weird - I think a rename is OK for this release (Since we're ticking > up a > > > full version number), but I believe changing param names can > potentially > > > break code. That said, I don't really think we need to change the > names and > > > push the 3.0.3 release out, and if it does in fact cause breaking > changes, > > > I'd be a little careful about how we do it going forward to 3.6. > > > > > > > 7) Field names that follow local variable naming conventions. > > > > > > > > Lots of issues related to private vars with names like i, j, k, etc. > It > > > feels like the right thing to do is to change the scope so that they go > > > back to being local vars instead of fields. However, this requires a > much > > > more significant refactoring, and I didn't want to assume it was okay > to do > > > that. > > > > > > > > > > I'd avoid this for now - a lot of this is a carry over from the java > > > version and to rename all those, it starts to get a bit confusing if we > > > have to compare java to C# and these are all changed around. > > > > > > > > > > > > > If these questions have already been answered elsewhere and I missed > the > > > documentation/FAQ/developer guide, then I apologize and would > appreciate > > > the links. Alternatively, if someone has a Re# rule config that they > are > > > willing to post somewhere, I would be glad to use it. > > > > > > > > > > I think we talked about Re#'s rules at one point, I'll try to dig that > > > conversation up and see where it landed. It's probably a good idea for > us > > > to build rules though. > > > > > > > - Zack > > > > > > > > > > > > On J
[jira] [Updated] (SOLR-3699) SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig
[ https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-3699: --- Fix Version/s: 5.0 4.0 Summary: SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig (was: fix CoreContainerCoreInitFailuresTest directory leak) > SolrIndexWriter constructor leaks Directory if Exception creating > IndexWriterConfig > --- > > Key: SOLR-3699 > URL: https://issues.apache.org/jira/browse/SOLR-3699 > Project: Solr > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.0, 5.0 > > Attachments: SOLR-3699.patch > > > in LUCENE-4278 i had to add a hack to force SimpleFSDir for > CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on > certain errors. > This might indicate a problem that leaks happen if certain errors happen > (e.g. not handled in finally) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3699) fix CoreContainerCoreInitFailuresTest directory leak
[ https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-3699: --- Attachment: SOLR-3699.patch Tracked the problem down to SolrIndexWriter ... attached patch demonstrates it in the simplest usecase: a SolrCore that constructs a SolrIndexWriter where the Directory is created fine, but then the IndexWriterConfig has a problem. Unfortunately there's no clear and easy route to a fix because of how this is all done inline in a call to {{super(...)}} ... as noted in the test comments... {code} public void testBogusMergePolicy() throws Exception { // Directory is leaked because SolrIndexWriter constructor has inline // calls to both DirectoryFactory (which succeeds) and // Config.toIndexWriterConfig (which fails) -- but there is nothing to // decref the DerectoryFactory when Config throws an Exception // // Not good to require the caller of "new SolrIndexWriter(...)" to decref // the DirectoryFactory on exception, because they would have to be sure // the exception didn't already come from the DirectoryFactory in the first place. // I think we need to re-work the inline calls in SolrIndexWriter construct {code} (Ironically: this "bad-mp-config.xml" i was using in CoreContainerCoreInitFailuresTest has existed for a while, but wasn't already being used in the "TestBadConfig" class that tries to create SolrCores with bad cofigs -- if it had we would have caught this a long time ago. It was only being used in SolrIndexConfigTest where it was micro testing the SolrIndexConfig and the DirectoryFactory wasn't used) > fix CoreContainerCoreInitFailuresTest directory leak > > > Key: SOLR-3699 > URL: https://issues.apache.org/jira/browse/SOLR-3699 > Project: Solr > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.0, 5.0 > > Attachments: SOLR-3699.patch > > > in LUCENE-4278 i had to add a hack to force SimpleFSDir for > CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on > certain errors. > This might indicate a problem that leaks happen if certain errors happen > (e.g. not handled in finally) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4285) Improve FST API usability for mere mortals
David Smiley created LUCENE-4285: Summary: Improve FST API usability for mere mortals Key: LUCENE-4285 URL: https://issues.apache.org/jira/browse/LUCENE-4285 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: David Smiley FST technology is something that has brought amazing advances to Lucene, yet the API is hard to use for the vast majority of users like me. I know that performance of FSTs is really important, but surely a lot can be done without sacrificing that. (comments will hold specific ideas and problems) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat
[ https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427413#comment-13427413 ] Michael McCandless commented on LUCENE-4225: Thanks Billy, I'll dig... > New FixedPostingsFormat for less overhead than SepPostingsFormat > > > Key: LUCENE-4225 > URL: https://issues.apache.org/jira/browse/LUCENE-4225 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-4225-on-rev-1362013.patch, LUCENE-4225.patch, > LUCENE-4225.patch, LUCENE-4225.patch, LUCENE-4225.patch > > > I've worked out the start at a new postings format that should have > less overhead for fixed-int[] encoders (For,PFor)... using ideas from > the old bulk branch, and new ideas from Robert. > It's only a start: there's no payloads support yet, and I haven't run > Lucene's tests with it, except for one new test I added that tries to > be a thorough PostingsFormat tester (to make it easier to create new > postings formats). It does pass luceneutil's performance test, so > it's at least able to run those queries correctly... > Like Lucene40, it uses two files (though once we add payloads it may > be 3). The .doc file interleaves doc delta and freq blocks, and .pos > has position delta blocks. Unlike sep, blocks are NOT shared across > terms; instead, it uses block encoding if there are enough ints to > encode, else the same Lucene40 vInt format. This means low-freq terms > (< 128 = current default block size) are always vInts, and high-freq > terms will have some number of blocks, with a vInt final block. > Skip points are only recorded at block starts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Outstanding issues for 3.0.3
I don't think we ever fully adopted the style guidelines, probably not a terrible discussion to have. As for this release, I think that by lazy consensus we should branch the trunk at the end of this weekend (say monday), and begin the process of cutting a release. - my $.02 below > 1) Usage of "this" prefix when not required. > > this.blah = blah; <- required this. > this.aBlah = blah; <- optional this, which Re# doesn't like. > > I'm assuming consistency wins here, and 'this.' stays, but wanted to double > check. I'd error with consistency > > 2) Using different conventions for fields and parameters\local vars. > > blah vs. _blah > > Combined with 1, Re# wants (and I'm personally accustomed to): > > _blah = blah; > For private variables _ is ok, for anything else, don't use _ as it's not CLR compliant > However, that seems to violate the adopted style. > > 3) Full qualification of type names. > > Re # wants to remove redundant namespace qualifiers. Leave them or remove > them? > I try to remove them > 4) Removing unreferenced classes. > > Should I remove non-public unreferenced classes? The ones I've come across so > far are private. > I'm not sure I understand - are you saying we have classes that are never used in random places? If so, I think before removing them we should have a conversation; what are they, why are they there, etc. - I'm hoping there aren't too many of these.. > 5) var vs. explicit > > I know this has been brought up before, but not sure of the final > disposition. FWIW, I prefer var. > I use var with it's plainly obvious the object var obj = new MyClass(). I usually use explicit when it's an object returned from some function that makes it unclear what the return value is: var items = search.GetResults(); vs IList items = search.GetResults(); //prefer > > There are some non-Re# issues I came across as well that look like artifacts > of code generation: > > 6) Weird param names. > > Param1 vs. directory > > I assume it's okay to replace 'Param1' with something a descriptive name like > 'directory'. > Weird - I think a rename is OK for this release (Since we're ticking up a full version number), but I believe changing param names can potentially break code. That said, I don't really think we need to change the names and push the 3.0.3 release out, and if it does in fact cause breaking changes, I'd be a little careful about how we do it going forward to 3.6. > 7) Field names that follow local variable naming conventions. > > Lots of issues related to private vars with names like i, j, k, etc. It feels > like the right thing to do is to change the scope so that they go back to > being local vars instead of fields. However, this requires a much more > significant refactoring, and I didn't want to assume it was okay to do that. > I'd avoid this for now - a lot of this is a carry over from the java version and to rename all those, it starts to get a bit confusing if we have to compare java to C# and these are all changed around. > If these questions have already been answered elsewhere and I missed the > documentation/FAQ/developer guide, then I apologize and would appreciate the > links. Alternatively, if someone has a Re# rule config that they are willing > to post somewhere, I would be glad to use it. > I think we talked about Re#'s rules at one point, I'll try to dig that conversation up and see where it landed. It's probably a good idea for us to build rules though. > - Zack > > > On Jul 27, 2012, at 12:00 PM, Itamar Syn-Hershko wrote: > > > The cleanup consists mainly of going file by file with ReSharper and trying > > to get them as green as possible. Making a lot of fields readonly, removing > > unused vars and stuff like that. There are still loads of files left. > > > > I was also hoping to get to updating the spatial module with some recent > > updates, and to also support polygon searches. But that may take a bit more > > time, so it's really up to you guys (or we can open a vote for it). >
[jira] [Updated] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches
[ https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-4069: - Fix Version/s: 5.0 Applied to trunk in revision 1368567 > Segment-level Bloom filters for a 2 x speed up on rare term searches > > > Key: LUCENE-4069 > URL: https://issues.apache.org/jira/browse/LUCENE-4069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 3.6, 4.0-ALPHA >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, > LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, > MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java > > > An addition to each segment which stores a Bloom filter for selected fields > in order to give fast-fail to term searches, helping avoid wasted disk access. > Best suited for low-frequency fields e.g. primary keys on big indexes with > many segments but also speeds up general searching in my tests. > Overview slideshow here: > http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments > Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU > Patch based on 3.6 codebase attached. > There are no 3.6 API changes currently - to play just add a field with "_blm" > on the end of the name to invoke special indexing/querying capability. > Clearly a new Field or schema declaration(!) would need adding to APIs to > configure the service properly. > Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3702) String concatenation function
Ted Strauss created SOLR-3702: - Summary: String concatenation function Key: SOLR-3702 URL: https://issues.apache.org/jira/browse/SOLR-3702 Project: Solr Issue Type: New Feature Components: query parsers Affects Versions: 4.0-ALPHA Reporter: Ted Strauss Related to https://issues.apache.org/jira/browse/SOLR-2526 Add query function to support concatenation of Strings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.6.0_33) - Build # 58 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/58/ Java: 64bit/jdk1.6.0_33 -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 19185 lines...] javadocs-lint: [...truncated 1728 lines...] javadocs-lint: [exec] [exec] Crawl/parse... [exec] [exec] build/docs/core/org/apache/lucene/store/package-use.html [exec] WARNING: anchor "../../../../org/apache/lucene/store/subclasses" appears more than once [exec] [exec] Verify... [exec] [exec] build/docs\core/overview-summary.html [exec] missing: org.apache.lucene.util.hash [exec] [exec] build/docs\test-framework/overview-summary.html [exec] missing: org.apache.lucene.codecs.bloom [exec] [exec] Missing javadocs were found! BUILD FAILED C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\build.xml:47: The following error occurred while executing this line: C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:246: The following error occurred while executing this line: C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\common-build.xml:1704: exec returned: 1 Total time: 51 minutes 24 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Outstanding issues for 3.0.3
Actually that's a good point, I don't think mercurial is an option for apache software projects - but I know git was rolled out over the last year as an option > Subject: Re: Outstanding issues for 3.0.3 > From: zgram...@gmail.com > Date: Thu, 2 Aug 2012 10:42:14 -0400 > To: lucene-net-...@lucene.apache.org > > On Aug 2, 2012, at 3:04 AM, Itamar Syn-Hershko wrote: > > > Nowadays git works just great for Windows, and it's much easier to work > > with than Hg > > In the interest of full disclosure, I have done a lot of work on hosting > Mercurial in C# apps and have committed to both Mercurial and IronPython, so > one might guess, I view hg > git. I didn't realize the Apache Foundation > already had it's own git server + github mirror, though. If the choice is > between git and svn, git wins my vote every time. >
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427360#comment-13427360 ] Robert Muir commented on LUCENE-4282: - I will think about this one more: the patch is correct for 'edits' but the scoring becomes crazy. this is because of the historical behavior of this query. Just try porting Uwe's test to 3.6 and you will see what I mean :) I think its too tricky for the query in core (and used by spellchecker) to also be the base for the SlowFuzzyQuery which is supposed to mimic the old crazy behavior. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: LUCENE-4282-tests.patch, LUCENE-4282.patch, > ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-4284) RFE: stopword filter without lowercase side-effect
[ https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Halliday closed LUCENE-4284. Resolution: Invalid > RFE: stopword filter without lowercase side-effect > -- > > Key: LUCENE-4284 > URL: https://issues.apache.org/jira/browse/LUCENE-4284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Sam Halliday >Priority: Minor > > It would appear that accept()-time lowercasing of Tokens is not favourable > anymore, due to the @Deprecation of the only constructor in StopFilter that > allows this. > Please support some way to allow stop-word removal without lowercasing the > output: > http://stackoverflow.com/questions/1185 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect
[ https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427353#comment-13427353 ] Sam Halliday commented on LUCENE-4284: -- OK, thanks. Actually all I needed was to remove stop words from a String, so the following did the trick {noformat} Set stops = StopFilter.makeStopSet(Version.LUCENE_36, Lists.newArrayList(StopAnalyzer.ENGLISH_STOP_WORDS_SET), true); Tokenizer tokeniser = new ClassicTokenizer(Version.LUCENE_36, new StringReader(text)); StopFilter stopFilter = new StopFilter(Version.LUCENE_36, tokeniser, stops); List words = Lists.newArrayList(); try { while (stopFilter.incrementToken()) { String token = stopFilter.getAttribute(CharTermAttribute.class).toString(); words.add(token); } } catch (IOException ex) { throw new GuruMeditationFailure(); } {noformat} The API is a bit of a labyrinth - it'll take me some time to understand many of the design decisions. > RFE: stopword filter without lowercase side-effect > -- > > Key: LUCENE-4284 > URL: https://issues.apache.org/jira/browse/LUCENE-4284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Sam Halliday >Priority: Minor > > It would appear that accept()-time lowercasing of Tokens is not favourable > anymore, due to the @Deprecation of the only constructor in StopFilter that > allows this. > Please support some way to allow stop-word removal without lowercasing the > output: > http://stackoverflow.com/questions/1185 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Outstanding issues for 3.0.3
On Aug 2, 2012, at 3:04 AM, Itamar Syn-Hershko wrote: > Nowadays git works just great for Windows, and it's much easier to work > with than Hg In the interest of full disclosure, I have done a lot of work on hosting Mercurial in C# apps and have committed to both Mercurial and IronPython, so one might guess, I view hg > git. I didn't realize the Apache Foundation already had it's own git server + github mirror, though. If the choice is between git and svn, git wins my vote every time.
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_33) - Build # 107 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/107/ Java: 32bit/jdk1.6.0_33 -client -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 10288 lines...] javadocs-lint: [...truncated 1696 lines...] javadocs-lint: [exec] [exec] Crawl/parse... [exec] [exec] build/docs/core/org/apache/lucene/store/package-use.html [exec] WARNING: anchor "../../../../org/apache/lucene/store/subclasses" appears more than once [exec] [exec] Verify... [exec] [exec] build/docs/core/overview-summary.html [exec] missing: org.apache.lucene.util.hash [exec] [exec] build/docs/test-framework/overview-summary.html [exec] missing: org.apache.lucene.codecs.bloom [exec] [exec] Missing javadocs were found! BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/build.xml:47: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/build.xml:246: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/common-build.xml:1704: exec returned: 1 Total time: 2 minutes 23 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect
[ https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427350#comment-13427350 ] Robert Muir commented on LUCENE-4284: - Really all these analyzers are just simple examples and not intended to solve all use cases. You can just make your own that doesnt lowercase at all with hardly any code, and if you want to control case sensitivity of the stopword set, again do this on your stopset itself (pass the boolean to StopFilter.makeStopSet etc). {noformat} Analyzer a = new ReusableAnalyzerBase() { protected TokenStreamComponents createComponents(String fieldName, Reader reader) { Tokenizer source = new LetterTokenizer(matchVersion, reader); return new TokenStreamComponents(source, new StopFilter(matchVersion, source, stopwords)); } }; {noformat} Otherwise we have to implement options to all Analyzers for everyones possible usecases, which is too many (we will never make everyone happy). > RFE: stopword filter without lowercase side-effect > -- > > Key: LUCENE-4284 > URL: https://issues.apache.org/jira/browse/LUCENE-4284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Sam Halliday >Priority: Minor > > It would appear that accept()-time lowercasing of Tokens is not favourable > anymore, due to the @Deprecation of the only constructor in StopFilter that > allows this. > Please support some way to allow stop-word removal without lowercasing the > output: > http://stackoverflow.com/questions/1185 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/ibm-j9-jdk7) - Build # 67 - Failure!
On Wed, Aug 1, 2012 at 9:17 AM, Robert Muir wrote: > On Wed, Aug 1, 2012 at 4:35 AM, Uwe Schindler wrote: >> Hi Robert, >> >> I checked Jenkin's settings and the "ulimit -n" is 8192 by default for >> jenkins. To prevent this problem I raised this to 32768. >> The thing with IBM J9 is that is has several caches for class files (so >> compiled class files can be cached in shared memory for parallel JVMs using >> the same classes), but I assume this needs more file handles. >> > > Right I bumped this on charlie cron too and forgot about it, until i > installed this IBM jre on this machine. > > IBM J9 using a few extra files compared to sun doesnt seem to explain > to me why SimpleFS/NIOFS use more filehandles than mmap though? And > that this problem never happens with other JREs > > I feel like something might be wrong here. Rob and I dug a bit on this ... the hard open-file limit was 4096 and the soft limit was 1024, and curiously, it looks like Oracle JVMs "allow" themselves to go up to the hard limit (likely change the soft limit on startup), while the IBM JVM uses the soft limit. Has anyone heard of JVMs doing this (increasing the soft limit to the hard limit for open files) before...? I see this curious "-XX:+MaxFDLimit" option here: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html But it says it's Solaris only ... If I use ulimit to set the hard and soft limit to 1024 then both Oracle and IBM JVMs fail TestShardSearching with NIOFSDir due to too many open files. But, for some reason if you run the test with MMapDir, far fewer file descriptors are consumed ... Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-4.x - Build # 386 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/386/ All tests passed Build Log: [...truncated 10352 lines...] javadocs-lint: [...truncated 6 lines...] [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene... [javadoc] Loading source files for package org.apache.lucene.analysis... [javadoc] Loading source files for package org.apache.lucene.analysis.tokenattributes... [javadoc] Loading source files for package org.apache.lucene.codecs... [javadoc] Loading source files for package org.apache.lucene.codecs.appending... [javadoc] Loading source files for package org.apache.lucene.codecs.bloom... [javadoc] Loading source files for package org.apache.lucene.codecs.intblock... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene3x... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene40... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene40.values... [javadoc] Loading source files for package org.apache.lucene.codecs.memory... [javadoc] Loading source files for package org.apache.lucene.codecs.perfield... [javadoc] Loading source files for package org.apache.lucene.codecs.pulsing... [javadoc] Loading source files for package org.apache.lucene.codecs.sep... [javadoc] Loading source files for package org.apache.lucene.codecs.simpletext... [javadoc] Loading source files for package org.apache.lucene.document... [javadoc] Loading source files for package org.apache.lucene.index... [javadoc] Loading source files for package org.apache.lucene.search... [javadoc] Loading source files for package org.apache.lucene.search.payloads... [javadoc] Loading source files for package org.apache.lucene.search.similarities... [javadoc] Loading source files for package org.apache.lucene.search.spans... [javadoc] Loading source files for package org.apache.lucene.store... [javadoc] Loading source files for package org.apache.lucene.util... [javadoc] Loading source files for package org.apache.lucene.util.automaton... [javadoc] Loading source files for package org.apache.lucene.util.fst... [javadoc] Loading source files for package org.apache.lucene.util.hash... [javadoc] Loading source files for package org.apache.lucene.util.mutable... [javadoc] Loading source files for package org.apache.lucene.util.packed... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.6.0_32 [javadoc] Building tree for all the packages and classes... [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/src/java/org/apache/lucene/codecs/bloom/BloomFilterFactory.java:61: warning - @return tag has no arguments. [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:101: warning - @return tag has no arguments. [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:219: warning - @param argument "bytes" is not a parameter name. [javadoc] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:236: warning - @param argument "targetSaturation" is not a parameter name. [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/build/docs/core/stylesheet.css... [javadoc] 4 warnings BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/build.xml:47: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/common-build.xml:621: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/core/build.xml:49: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x/checkout/lucene/common-build.xml:1480: Javadocs warnings were found! Total time: 11 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Publishing Clover coverage report... No Clover report will be published due to a Build Failure Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect
[ https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427329#comment-13427329 ] Sam Halliday commented on LUCENE-4284: -- ok, but wouldn't it then be a good idea to have a StopAnalyzer that didn't enforce lowercase? It seems bizarre that the StopAnalyzer would be tied to the character and case filters. > RFE: stopword filter without lowercase side-effect > -- > > Key: LUCENE-4284 > URL: https://issues.apache.org/jira/browse/LUCENE-4284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Sam Halliday >Priority: Minor > > It would appear that accept()-time lowercasing of Tokens is not favourable > anymore, due to the @Deprecation of the only constructor in StopFilter that > allows this. > Please support some way to allow stop-word removal without lowercasing the > output: > http://stackoverflow.com/questions/1185 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory
[ https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427328#comment-13427328 ] Robert Muir commented on LUCENE-4281: - +1 to the patch: the forbidden check is a 2nd priority. it can be a separate .txt file with its own ant fileset. > Delegate to default thread factory in NamedThreadFactory > > > Key: LUCENE-4281 > URL: https://issues.apache.org/jira/browse/LUCENE-4281 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 3.6.1, 4.0, 5.0 >Reporter: Simon Willnauer >Priority: Minor > Fix For: 4.0, 5.0, 3.6.2 > > Attachments: LUCENE-4281.patch > > > currently we state that we yield the same behavior as > Executors#defaultThreadFactory() but this behavior could change over time > even if it is compatible. We should just delegate to the default thread > factory instead of creating the threads ourself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4282: Attachment: LUCENE-4282.patch here's a patch, with Uwe's test. The float comparison is wasted cpu for FuzzyQuery, as you already know its accepted by the automaton. But the deprecated SlowFuzzyQuery in sandbox needs this, because it has crazier logic. So it overrides the logic and does the float comparison. We should really remove that one from trunk since its deprecated since 4.x, it will make it easier to clean this up to be much simpler. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: LUCENE-4282-tests.patch, LUCENE-4282.patch, > ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches
[ https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427322#comment-13427322 ] Mark Harwood commented on LUCENE-4069: -- Will do. > Segment-level Bloom filters for a 2 x speed up on rare term searches > > > Key: LUCENE-4069 > URL: https://issues.apache.org/jira/browse/LUCENE-4069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 3.6, 4.0-ALPHA >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Fix For: 4.0 > > Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, > LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, > MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java > > > An addition to each segment which stores a Bloom filter for selected fields > in order to give fast-fail to term searches, helping avoid wasted disk access. > Best suited for low-frequency fields e.g. primary keys on big indexes with > many segments but also speeds up general searching in my tests. > Overview slideshow here: > http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments > Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU > Patch based on 3.6 codebase attached. > There are no 3.6 API changes currently - to play just add a field with "_blm" > on the end of the name to invoke special indexing/querying capability. > Clearly a new Field or schema declaration(!) would need adding to APIs to > configure the service properly. > Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches
[ https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427318#comment-13427318 ] Adrien Grand commented on LUCENE-4069: -- Mark, is there a reason why this patch hasn't been committed to trunk too? > Segment-level Bloom filters for a 2 x speed up on rare term searches > > > Key: LUCENE-4069 > URL: https://issues.apache.org/jira/browse/LUCENE-4069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 3.6, 4.0-ALPHA >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Fix For: 4.0 > > Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, > LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, > MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java > > > An addition to each segment which stores a Bloom filter for selected fields > in order to give fast-fail to term searches, helping avoid wasted disk access. > Best suited for low-frequency fields e.g. primary keys on big indexes with > many segments but also speeds up general searching in my tests. > Overview slideshow here: > http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments > Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU > Patch based on 3.6 codebase attached. > There are no 3.6 API changes currently - to play just add a field with "_blm" > on the end of the name to invoke special indexing/querying capability. > Clearly a new Field or schema declaration(!) would need adding to APIs to > configure the service properly. > Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches
[ https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427317#comment-13427317 ] Robert Muir commented on LUCENE-4069: - Hi Mark: I noticed this was committed only to the 4.x branch. can you also merge the change to trunk? > Segment-level Bloom filters for a 2 x speed up on rare term searches > > > Key: LUCENE-4069 > URL: https://issues.apache.org/jira/browse/LUCENE-4069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 3.6, 4.0-ALPHA >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Fix For: 4.0 > > Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, > LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, > MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java > > > An addition to each segment which stores a Bloom filter for selected fields > in order to give fast-fail to term searches, helping avoid wasted disk access. > Best suited for low-frequency fields e.g. primary keys on big indexes with > many segments but also speeds up general searching in my tests. > Overview slideshow here: > http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments > Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU > Patch based on 3.6 codebase attached. > There are no 3.6 API changes currently - to play just add a field with "_blm" > on the end of the name to invoke special indexing/querying capability. > Clearly a new Field or schema declaration(!) would need adding to APIs to > configure the service properly. > Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_33) - Build # 106 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/106/ Java: 32bit/jdk1.6.0_33 -server -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 10304 lines...] javadocs-lint: [...truncated 6 lines...] [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene... [javadoc] Loading source files for package org.apache.lucene.analysis... [javadoc] Loading source files for package org.apache.lucene.analysis.tokenattributes... [javadoc] Loading source files for package org.apache.lucene.codecs... [javadoc] Loading source files for package org.apache.lucene.codecs.appending... [javadoc] Loading source files for package org.apache.lucene.codecs.bloom... [javadoc] Loading source files for package org.apache.lucene.codecs.intblock... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene3x... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene40... [javadoc] Loading source files for package org.apache.lucene.codecs.lucene40.values... [javadoc] Loading source files for package org.apache.lucene.codecs.memory... [javadoc] Loading source files for package org.apache.lucene.codecs.perfield... [javadoc] Loading source files for package org.apache.lucene.codecs.pulsing... [javadoc] Loading source files for package org.apache.lucene.codecs.sep... [javadoc] Loading source files for package org.apache.lucene.codecs.simpletext... [javadoc] Loading source files for package org.apache.lucene.document... [javadoc] Loading source files for package org.apache.lucene.index... [javadoc] Loading source files for package org.apache.lucene.search... [javadoc] Loading source files for package org.apache.lucene.search.payloads... [javadoc] Loading source files for package org.apache.lucene.search.similarities... [javadoc] Loading source files for package org.apache.lucene.search.spans... [javadoc] Loading source files for package org.apache.lucene.store... [javadoc] Loading source files for package org.apache.lucene.util... [javadoc] Loading source files for package org.apache.lucene.util.automaton... [javadoc] Loading source files for package org.apache.lucene.util.fst... [javadoc] Loading source files for package org.apache.lucene.util.hash... [javadoc] Loading source files for package org.apache.lucene.util.mutable... [javadoc] Loading source files for package org.apache.lucene.util.packed... [javadoc] Constructing Javadoc information... [javadoc] Standard Doclet version 1.6.0_33 [javadoc] Building tree for all the packages and classes... [javadoc] /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/src/java/org/apache/lucene/codecs/bloom/BloomFilterFactory.java:61: warning - @return tag has no arguments. [javadoc] /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:101: warning - @return tag has no arguments. [javadoc] /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:219: warning - @param argument "bytes" is not a parameter name. [javadoc] /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/src/java/org/apache/lucene/util/FuzzySet.java:236: warning - @param argument "targetSaturation" is not a parameter name. [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/build/docs/core/stylesheet.css... [javadoc] 4 warnings BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/build.xml:47: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/common-build.xml:621: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/core/build.xml:49: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/checkout/lucene/common-build.xml:1480: Javadocs warnings were found! Total time: 7 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ToParentBlockJoinQuery vs filtered search
Hi Mikhail, I'd love to have a look at the patch. I wasn't involved at all with reviewing the work, just pointed you to the relevant issue, and currently quite busy, so on my end, it will take time until I manage to get to reviewing it. Maybe someone else involved with the patch can take this over to speed things up Martijn On 2 August 2012 08:51, Mikhail Khludnev wrote: > Martin, > Half year ago you asked me attach my work to SOLR-3076. From my point of > view the latest patch is considerable for commit. I want to add "override" > support for block indexing, but I'm not really sure that it's needed for > anyone. > > Could you please provide feedback for the latest patch, and/or move it forth > or back? > > Regards > >> by Martijn v Groningen-2 on Feb 06, 2012; 7:57pm >> URL: >> http://lucene.472066.n3.nabble.com/ToParentBlockJoinQuery-vs-filtered-search-tp3717911p3719987.html > >> Hi Mikhail, > >> There is already an issue open for supporting block join in Solr: > https://issues.apache.org/jira/browse/SOLR-3076 > >> Maybe you can attach your work in that issue and we can iterate from >> there. > >> Martijn > > -- > Sincerely yours > Mikhail Khludnev > Tech Lead > Grid Dynamics > > > -- Met vriendelijke groet, Martijn van Groningen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches
[ https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood resolved LUCENE-4069. -- Resolution: Fixed Assignee: Mark Harwood Committed to 4.0 branch, revision 1368442 > Segment-level Bloom filters for a 2 x speed up on rare term searches > > > Key: LUCENE-4069 > URL: https://issues.apache.org/jira/browse/LUCENE-4069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 3.6, 4.0-ALPHA >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Fix For: 4.0 > > Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, > LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, > MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, > PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java > > > An addition to each segment which stores a Bloom filter for selected fields > in order to give fast-fail to term searches, helping avoid wasted disk access. > Best suited for low-frequency fields e.g. primary keys on big indexes with > many segments but also speeds up general searching in my tests. > Overview slideshow here: > http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments > Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU > Patch based on 3.6 codebase attached. > There are no 3.6 API changes currently - to play just add a field with "_blm" > on the end of the name to invoke special indexing/querying capability. > Clearly a new Field or schema declaration(!) would need adding to APIs to > configure the service properly. > Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat
[ https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427304#comment-13427304 ] Han Jiang commented on LUCENE-4225: --- Just hit an error on BlockPostingsFormat, this should reproduce in latest branch {noformat} ant test-core -Dtestcase=TestGraphTokenizers -Dtests.method=testDoubleMockGraphTokenFilterRandom -Dtests.seed=1FD78436D5E26B9A -Dtests.postingsformat=Block {noformat} > New FixedPostingsFormat for less overhead than SepPostingsFormat > > > Key: LUCENE-4225 > URL: https://issues.apache.org/jira/browse/LUCENE-4225 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-4225-on-rev-1362013.patch, LUCENE-4225.patch, > LUCENE-4225.patch, LUCENE-4225.patch, LUCENE-4225.patch > > > I've worked out the start at a new postings format that should have > less overhead for fixed-int[] encoders (For,PFor)... using ideas from > the old bulk branch, and new ideas from Robert. > It's only a start: there's no payloads support yet, and I haven't run > Lucene's tests with it, except for one new test I added that tries to > be a thorough PostingsFormat tester (to make it easier to create new > postings formats). It does pass luceneutil's performance test, so > it's at least able to run those queries correctly... > Like Lucene40, it uses two files (though once we add payloads it may > be 3). The .doc file interleaves doc delta and freq blocks, and .pos > has position delta blocks. Unlike sep, blocks are NOT shared across > terms; instead, it uses block encoding if there are enough ints to > encode, else the same Lucene40 vInt format. This means low-freq terms > (< 128 = current default block size) are always vInts, and high-freq > terms will have some number of blocks, with a vInt final block. > Skip points are only recorded at block starts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect
[ https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427301#comment-13427301 ] Robert Muir commented on LUCENE-4284: - The constructor is deprecated because you should set the ignoreCase property on the CharArraySet (the stopwords list itself) that you pass in. This is described in the javadocs, basically stopfilter does not have any case sensitivity options. this is instead controlled in the set (see makeStopSet etc, you can construct a case-sensitive ones) {noformat} * If stopWords is an instance of {@link CharArraySet} (true if * makeStopSet() was used to construct the set) it will be directly used * and ignoreCase will be ignored since CharArraySet * directly controls case sensitivity. * * If stopWords is not an instance of {@link CharArraySet}, * a new CharArraySet will be constructed and ignoreCase will be * used to specify the case sensitivity of that set. * @deprecated Use {@link #StopFilter(Version, TokenStream, Set)} instead {noformat} > RFE: stopword filter without lowercase side-effect > -- > > Key: LUCENE-4284 > URL: https://issues.apache.org/jira/browse/LUCENE-4284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Sam Halliday >Priority: Minor > > It would appear that accept()-time lowercasing of Tokens is not favourable > anymore, due to the @Deprecation of the only constructor in StopFilter that > allows this. > Please support some way to allow stop-word removal without lowercasing the > output: > http://stackoverflow.com/questions/1185 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4280) TestReaderClosed leaks threads
[ https://issues.apache.org/jira/browse/LUCENE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427296#comment-13427296 ] Robert Muir commented on LUCENE-4280: - Do we know if the problem happens from the method 'test' or from 'testReaderChaining'? here are my notes basically for 'test'. I think we could apply the same logic to 'testReaderChaining', but I want Uwe's opinion: {noformat} @@ -65,6 +66,17 @@ searcher.search(query, 5); } catch (AlreadyClosedException ace) { // expected +} finally { + // we may have wrapped the reader1 in newSearcher, meaning we created reader2(reader1) + // but we only closed the inner reader1, not the reader2 which is the one with the + // close hook to shut down the executor service. + // + // a better general solution is probably to fix LuceneTestCase.newSearcher to add + // the close hook to the underlying reader that was passed in (reader1), however + // if we do that, is this test still just as good? we will get an exception from + // IndexSearcher instead? + IOUtils.close(searcher.getIndexReader()); } {noformat} I think we need Uwe to review :) > TestReaderClosed leaks threads > -- > > Key: LUCENE-4280 > URL: https://issues.apache.org/jira/browse/LUCENE-4280 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Robert Muir >Priority: Minor > > {code} > -ea > -Dtests.seed=9449688B90185FA5 > -Dtests.iters=100 > {code} > reproduces 100% for me, multiple thread leak out from newSearcher's internal > threadfactory: > {code} > Aug 02, 2012 8:46:05 AM com.carrotsearch.randomizedtesting.ThreadLeakControl > checkThreadLeaks > SEVERE: 6 threads leaked from SUITE scope at > org.apache.lucene.index.TestReaderClosed: >1) Thread[id=13, name=LuceneTestCase-1-thread-1, state=WAITING, > group=TGRP-TestReaderClosed] > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) >2) Thread[id=15, name=LuceneTestCase-3-thread-1, state=WAITING, > group=TGRP-TestReaderClosed] > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) >3) Thread[id=17, name=LuceneTestCase-5-thread-1, state=WAITING, > group=TGRP-TestReaderClosed] > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) >4) Thread[id=18, name=LuceneTestCase-6-thread-1, state=WAITING, > group=TGRP-TestReaderClosed] > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
[jira] [Created] (LUCENE-4284) RFE: stopword filter without lowercase side-effect
Sam Halliday created LUCENE-4284: Summary: RFE: stopword filter without lowercase side-effect Key: LUCENE-4284 URL: https://issues.apache.org/jira/browse/LUCENE-4284 Project: Lucene - Core Issue Type: Improvement Reporter: Sam Halliday Priority: Minor It would appear that accept()-time lowercasing of Tokens is not favourable anymore, due to the @Deprecation of the only constructor in StopFilter that allows this. Please support some way to allow stop-word removal without lowercasing the output: http://stackoverflow.com/questions/1185 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3527) Optimize ignores maxSegments in distributed environment
[ https://issues.apache.org/jira/browse/SOLR-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-3527: - Assignee: Mark Miller > Optimize ignores maxSegments in distributed environment > --- > > Key: SOLR-3527 > URL: https://issues.apache.org/jira/browse/SOLR-3527 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 4.0-ALPHA >Reporter: Andy Laird >Assignee: Mark Miller > > Send the following command to a Solr server with many segments in a > multi-shard, multi-server environment: > curl > "http://localhost:8080/solr/update?optimize=true&waitFlush=true&maxSegments=6&distrib=false"; > The local server will end up with the number of segments at 6, as requested, > but all other shards in the index will be optimized with maxSegments=1, which > takes far longer to complete. All shards should be optimized to the > requested value of 6. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3985) Refactor support for thread leaks
[ https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427284#comment-13427284 ] Mark Miller commented on LUCENE-3985: - Hopefully I can look at my piece of this today or tomorrow. > Refactor support for thread leaks > - > > Key: LUCENE-3985 > URL: https://issues.apache.org/jira/browse/LUCENE-3985 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, > LUCENE-3985.patch > > > This will be duplicated in the runner and in LuceneTestCase; try to > consolidate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427276#comment-13427276 ] Robert Muir commented on LUCENE-4282: - Johannes: we will have the same scoring when i say 'removing floats' only less code actually (we can remove this entire if i think). the only floats will be what is put into the boost attribute: but no *comparisons* against floats. the latter is what causes the bug. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: LUCENE-4282-tests.patch, ModifiedFuzzyTermsEnum.java, > ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427269#comment-13427269 ] Johannes Christen commented on LUCENE-4282: --- Hi Robert. Yes this might be right, but I am still using the similarity float based stuff, since 2 edits on a three letter word is much more difference to me than 2 edits on a 10 letter word. If you apply the stuff I sent, it will work for both cases. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: LUCENE-4282-tests.patch, ModifiedFuzzyTermsEnum.java, > ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4282: -- Attachment: LUCENE-4282-tests.patch Robert Muir: I added my tests as patch. TestFuzzyQuery is currently not the best test we have: All terms there have equal length, this helps here. I added some more terms (longer ones, too), still the 2 shorter ones fail without a fix. I am now away, I hope that helps. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: LUCENE-4282-tests.patch, ModifiedFuzzyTermsEnum.java, > ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427264#comment-13427264 ] Robert Muir commented on LUCENE-4282: - thanks for reporting and looking into this! I think the bug is just the use of floats at all in this enum. {noformat} -if (similarity > minSimilarity) { +if (ed <= maxEdits) { boostAtt.setBoost((similarity - minSimilarity) * scale_factor); //System.out.println(" yes"); return AcceptStatus.YES; } else { + System.out.println("reject: " + term.utf8ToString()); return AcceptStatus.NO; } {noformat} This seems to fix it for me. We should remove all float crap from this enum, we dont need it, only a slower deprecated class in the sandbox needs it. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427262#comment-13427262 ] Uwe Schindler commented on LUCENE-4282: --- Thanks for help. We are starting to investigate what's wrong! I did another test in parallel: {code:java} query.setRewriteMethod(FuzzyQuery.SCORING_BOOLEAN_QUERY_REWRITE); {code} With that one it is also failing, so the boost attribute itsself is not the problem. Because this rewrite method does not use it at all (no PriorityQueue). Also the Automaton is correct, if you pass the terms to the automaton, they all pass: {code:java} LevenshteinAutomata builder = new LevenshteinAutomata("EBER", true); Automaton a = builder.toAutomaton(2); a = BasicOperations.concatenate(BasicAutomata.makeChar('W'), a); System.out.println(BasicOperations.run(a, "WBR")); System.out.println(BasicOperations.run(a, "WEB")); System.out.println(BasicOperations.run(a, "WEBE")); System.out.println(BasicOperations.run(a, "WEBER")); {code} > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427254#comment-13427254 ] Johannes Christen edited comment on LUCENE-4282 at 8/2/12 12:01 PM: Well. I think I found the solution. You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum class. Calculating the similarity in the accept() method is based on the offset of the smallest length of request term and index term. I attached my ModifiedFuzzyTermEnum class, where you can find the modification which makes it work. BTW. There are some more modifications, fixing bugs in calculating the similarity out of the edit distance and vise versa. The modification of the boost factor was only necessary for my boolean address search approach and possibly doesn't apply here. The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags. was (Author: superjo): Well. I think I found the solution. You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum class. Calculating the similarity in the accept() method is based on the offset of the smallest length of request term and index term. I will attach my ModifiedFuzzyTermEnum class, where you can find the modification which makes it work. BTW. There are some more modifications, fixing bugs in calculating the similarity out of the edit distance and vise versa. The modification of the boost factor was only necessary for my boolean address search approach and possibly doesn't apply here. The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Christen updated LUCENE-4282: -- Comment: was deleted (was: Modification of FuzzyTermsEnum class fixing issue LUCENE-4282) > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Christen updated LUCENE-4282: -- Attachment: ModifiedFuzzyTermsEnum.java Modification of FuzzyTermsEnum class fixing issue LUCENE-4282 > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427254#comment-13427254 ] Johannes Christen commented on LUCENE-4282: --- Well. I think I found the solution. You were right Uwe. It happens in the FuzzyTermsEnum:AutomatonFuzzyTermsEnum class. Calculating the similarity in the accept() method is based on the offset of the smallest length of request term and index term. I will attach my ModifiedFuzzyTermEnum class, where you can find the modification which makes it work. BTW. There are some more modifications, fixing bugs in calculating the similarity out of the edit distance and vise versa. The modification of the boost factor was only necessary for my boolean address search approach and possibly doesn't apply here. The modified bits are marked with USERCODE_BEGIN and USERCODE_END tags. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: ModifiedFuzzyTermsEnum.java, ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Christen updated LUCENE-4282: -- Attachment: ModifiedFuzzyTermsEnum.java Modification of FuzzyTermEnum class fixing issue LUCENE-4282 > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > Attachments: ModifiedFuzzyTermsEnum.java > > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3701) Solr Spellcheck for words with apostrophe
Shri Kanishka created SOLR-3701: --- Summary: Solr Spellcheck for words with apostrophe Key: SOLR-3701 URL: https://issues.apache.org/jira/browse/SOLR-3701 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.5 Environment: All Reporter: Shri Kanishka Solr Spellcheck incorrect for words with Apostrophe. http://10.224.64.10/solr5/select?q=pandora's star &spellcheck=true&spellcheck.collate=true&spellcheck.count=5 The result is - - - 2 6 13 - pandora's sandra spell:pandora's's star textSpell configuration in schema is as below But the same when given in &spellcheck.q paramter , it works, http://10.224.64.10/solr5/select?q=spell:pandora's star&spellcheck=true&spellcheck.collate=true&spellcheck.q=pandora's star -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4283) Support more frequent skip with Block Postings Format
[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-4283: -- Attachment: LUCENE-4283-buggy.patch oh, forgot to revert TestPF > Support more frequent skip with Block Postings Format > - > > Key: LUCENE-4283 > URL: https://issues.apache.org/jira/browse/LUCENE-4283 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Han Jiang >Priority: Minor > Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch > > > This change works on the new bulk branch. > Currently, our BlockPostingsFormat only supports skipInterval==blockSize. > Every time the skipper reaches the last level 0 skip point, we'll have to > decode a whole block to read doc/freq data. Also, a higher level skip list > will be created only for those df>blockSize^k, which means for most terms, > skipping will just be a linear scan. If we increase current blockSize for > better bulk i/o performance, current skip setting will be a bottleneck. > For ForPF, the encoded block can be easily splitted if we set > skipInterval=32*k. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4283) Support more frequent skip with Block Postings Format
[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-4283: -- Attachment: LUCENE-4283-buggy.patch An initial try to support partial decode & skipInterval == 32. Details about the skip format is mentioned in BlockSkipWriter. This patch works against pfor-3892 branch, with revision 1365112. It passes TestPostingsFormat, but still fail to pass CheckIndex. Mike, these test seed should fail the patch. {noformat} ant test-core -Dtestcase=TestLongPostings -Dtests.method=testLongPostingsNoPositions -Dtests.seed=EC8F49E9088B926C -Dtests.postingsformat=Block ant test-core -Dtestcase=TestCustomSearcherSort -Dtests.method=testFieldSortSingleSearcher -Dtests.seed=EC8F49E9088B926C -Dtests.postingsformat=Block {noformat} > Support more frequent skip with Block Postings Format > - > > Key: LUCENE-4283 > URL: https://issues.apache.org/jira/browse/LUCENE-4283 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Han Jiang >Priority: Minor > Attachments: LUCENE-4283-buggy.patch > > > This change works on the new bulk branch. > Currently, our BlockPostingsFormat only supports skipInterval==blockSize. > Every time the skipper reaches the last level 0 skip point, we'll have to > decode a whole block to read doc/freq data. Also, a higher level skip list > will be created only for those df>blockSize^k, which means for most terms, > skipping will just be a linear scan. If we increase current blockSize for > better bulk i/o performance, current skip setting will be a bottleneck. > For ForPF, the encoded block can be easily splitted if we set > skipInterval=32*k. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4283) Support more frequent skip with Block Postings Format
Han Jiang created LUCENE-4283: - Summary: Support more frequent skip with Block Postings Format Key: LUCENE-4283 URL: https://issues.apache.org/jira/browse/LUCENE-4283 Project: Lucene - Core Issue Type: Improvement Reporter: Han Jiang Priority: Minor This change works on the new bulk branch. Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck. For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427229#comment-13427229 ] Johannes Christen commented on LUCENE-4282: --- Ok. I keep on digging in the code and come back when I found something. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory
[ https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427230#comment-13427230 ] Uwe Schindler commented on LUCENE-4281: --- bq. This will require manual exclusion of that source file once the ban on Executors.defaultThreadFactory() is in Then we need a separate forbiddenApis.txt file... :-) > Delegate to default thread factory in NamedThreadFactory > > > Key: LUCENE-4281 > URL: https://issues.apache.org/jira/browse/LUCENE-4281 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 3.6.1, 4.0, 5.0 >Reporter: Simon Willnauer >Priority: Minor > Fix For: 4.0, 5.0, 3.6.2 > > Attachments: LUCENE-4281.patch > > > currently we state that we yield the same behavior as > Executors#defaultThreadFactory() but this behavior could change over time > even if it is compatible. We should just delegate to the default thread > factory instead of creating the threads ourself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory
[ https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427228#comment-13427228 ] Dawid Weiss commented on LUCENE-4281: - This will require manual exclusion of that source file once the ban on Executors.defaultThreadFactory() is in. An alternate route is to change the documentation and not claim compatibility with defaultThreadFactory, instead just say that we create non-daemon threads with NORM_PRIORITY? > Delegate to default thread factory in NamedThreadFactory > > > Key: LUCENE-4281 > URL: https://issues.apache.org/jira/browse/LUCENE-4281 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 3.6.1, 4.0, 5.0 >Reporter: Simon Willnauer >Priority: Minor > Fix For: 4.0, 5.0, 3.6.2 > > Attachments: LUCENE-4281.patch > > > currently we state that we yield the same behavior as > Executors#defaultThreadFactory() but this behavior could change over time > even if it is compatible. We should just delegate to the default thread > factory instead of creating the threads ourself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory
[ https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427224#comment-13427224 ] Simon Willnauer commented on LUCENE-4281: - bq. Uh, sorry – I see what you did now. Anything on your mind in particular when you talk about behavioral changes? I don't have anything in mind I just wanna replace logic with already existing logic that is "guaranteed" consistent with the documentation. This won't change anything really. > Delegate to default thread factory in NamedThreadFactory > > > Key: LUCENE-4281 > URL: https://issues.apache.org/jira/browse/LUCENE-4281 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 3.6.1, 4.0, 5.0 >Reporter: Simon Willnauer >Priority: Minor > Fix For: 4.0, 5.0, 3.6.2 > > Attachments: LUCENE-4281.patch > > > currently we state that we yield the same behavior as > Executors#defaultThreadFactory() but this behavior could change over time > even if it is compatible. We should just delegate to the default thread > factory instead of creating the threads ourself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427223#comment-13427223 ] Uwe Schindler commented on LUCENE-4282: --- I also added more terms that are in fact *longer* than WEBER (WEBERE and WEBERES), both are returned, only the shorter ones now. WBRE also works. I dont think the automaton is broken, it may be the FuzzyTermsEnum that does some stuff on top of AutomatonTermsEnum. We have to wait for Robert, he might understand whats going on. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427223#comment-13427223 ] Uwe Schindler edited comment on LUCENE-4282 at 8/2/12 10:06 AM: I also added more terms that are in fact *longer* than WEBER (WEBERE and WEBERES), both are returned, only the shorter ones not. WBRE also works. I dont think the automaton is broken, it may be the FuzzyTermsEnum that does some stuff on top of AutomatonTermsEnum. We have to wait for Robert, he might understand whats going on. was (Author: thetaphi): I also added more terms that are in fact *longer* than WEBER (WEBERE and WEBERES), both are returned, only the shorter ones now. WBRE also works. I dont think the automaton is broken, it may be the FuzzyTermsEnum that does some stuff on top of AutomatonTermsEnum. We have to wait for Robert, he might understand whats going on. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-4282: - Assignee: Robert Muir > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen >Assignee: Robert Muir > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427220#comment-13427220 ] Johannes Christen commented on LUCENE-4282: --- Yes I tried this as well. Also the prefix is not the problem. I expect the error deep in the automaton. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427219#comment-13427219 ] Uwe Schindler commented on LUCENE-4282: --- The same happens, if I disable traspositions, so the transposition supporting automatons are not the problem. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427204#comment-13427204 ] Uwe Schindler commented on LUCENE-4282: --- There is indeed something strange, I have to wait for Robert to get awake. The following test failes (when added to TestFuzzyQuery.java): {code:java} public void test2() throws Exception { Directory directory = newDirectory(); RandomIndexWriter writer = new RandomIndexWriter(random(), directory, new MockAnalyzer(random(), MockTokenizer.KEYWORD, false)); addDoc("LANGE", writer); addDoc("LUETH", writer); addDoc("PIRSING", writer); addDoc("RIEGEL", writer); addDoc("TRZECZIAK", writer); addDoc("WALKER", writer); addDoc("WBR", writer); addDoc("WE", writer); addDoc("WEB", writer); addDoc("WEBE", writer); addDoc("WEBER", writer); addDoc("WITTKOPF", writer); addDoc("WOJNAROWSKI", writer); addDoc("WRICKE", writer); IndexReader reader = writer.getReader(); IndexSearcher searcher = newSearcher(reader); writer.close(); FuzzyQuery query = new FuzzyQuery(new Term("field", "WEBER"), 2, 1); ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs; assertEquals(4, hits.length); reader.close(); directory.close(); } {code} The two missing terms have 2 deletions, so they are in edit distance. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4256) Improve Analysis Factory configuration workflow
[ https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-4256: --- Attachment: LUCENE-4256-version.patch Going to do this in smaller steps so they are easier to review and be sure about. This patch moves the Version back into the args Map. Once this is committed I'll tackle the constructor stuff. > Improve Analysis Factory configuration workflow > --- > > Key: LUCENE-4256 > URL: https://issues.apache.org/jira/browse/LUCENE-4256 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Chris Male > Attachments: LUCENE-4256-further.patch, LUCENE-4256-version.patch, > LUCENE-4256_incomplete.patch > > > With the Factorys now available for more general use, I'd like to look at > ways to improve the configuration workflow. Currently it's a little disjoint > and confusing, especially around using {{inform(ResourceLoader)}}. > What I think we should do is: > - Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader > in {{init}}, so it'd become {{init(Map args, ResourceLoader > loader)}} > - Consider moving away from the generic args Map and using setters. This > gives us better typing and could mitigate bugs due to using the wrong > configure key. However it does force the consumer to invoke each setter. > - If we're going to stick with using the args Map, then move the Version > parameter into {{init}} as well, rather than being a setter as I currently > made it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427188#comment-13427188 ] Johannes Christen edited comment on LUCENE-4282 at 8/2/12 9:15 AM: --- Query query = new FuzzyQuery(new Term("NAME", "WEBER"),2,1); Here are all the terms for the field NAME in my index: LANGE LUETH PIRSING RIEGEL TRZECZIAK WALKER WBR WE WEB WEBE WEBER WITTKOPF WOJNAROWSKI WRICKE was (Author: superjo): Query query = new FuzzyQuery(new Term("NAME", "WEBER"),2,1); > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.6.0_33) - Build # 56 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/56/ Java: 32bit/jdk1.6.0_33 -client -XX:+UseParallelGC All tests passed Build Log: [...truncated 18997 lines...] javadocs-lint: [...truncated 1670 lines...] BUILD FAILED C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\build.xml:47: The following error occurred while executing this line: C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:525: The following error occurred while executing this line: C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:515: exec returned: 1 Total time: 54 minutes 48 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427188#comment-13427188 ] Johannes Christen commented on LUCENE-4282: --- Query query = new FuzzyQuery(new Term("NAME", "WEBER"),2,1); > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427187#comment-13427187 ] Uwe Schindler commented on LUCENE-4282: --- What was your query? > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427185#comment-13427185 ] Johannes Christen commented on LUCENE-4282: --- Thanks for the quick response Uwe. I don't think that is the cause. My test index is very small (less than 100 terms), so I don't think the terms get dropped. I thinkt they are missed by the automaton. My rewritten query has only 2 terms: NAME:WEBE^0.584 NAME:WEBER > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427184#comment-13427184 ] Uwe Schindler edited comment on LUCENE-4282 at 8/2/12 8:54 AM: --- This is caused by the rewrite method not FuzzyQuery itsself. The rewrite mode uses an internal priority queue, where it collects all terms from the index, that match the levensthein distance. If there are more terms available, some are dropped. This depends on their distance and other factors. If you want to use a larger PQ, create a separate instance of the TopTermsScoringBooleanQueryRewrite, giving a queue size. was (Author: thetaphi): This is caused by the rewrite method not FuzzyQuery itsself. The rewrite mode uses an internal priority queue, where it collects all terms from the index, that match the levensthein distance. If there are more terms available, some are dropped. This depends on their distance and other factors. If you want to use a larger PQ, create a separate instance of the TopTermsRewriteMethod, giving a queue size. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
[ https://issues.apache.org/jira/browse/LUCENE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427184#comment-13427184 ] Uwe Schindler commented on LUCENE-4282: --- This is caused by the rewrite method not FuzzyQuery itsself. The rewrite mode uses an internal priority queue, where it collects all terms from the index, that match the levensthein distance. If there are more terms available, some are dropped. This depends on their distance and other factors. If you want to use a larger PQ, create a separate instance of the TopTermsRewriteMethod, giving a queue size. > Automaton Fuzzy Query doesn't deliver all results > - > > Key: LUCENE-4282 > URL: https://issues.apache.org/jira/browse/LUCENE-4282 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.0-ALPHA >Reporter: Johannes Christen > Labels: newbie > > Having a small index with n documents where each document has one of the > following terms: > WEBER, WEBE, WEB, WBR, WE, (and some more) > The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected > terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR > which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4282) Automaton Fuzzy Query doesn't deliver all results
Johannes Christen created LUCENE-4282: - Summary: Automaton Fuzzy Query doesn't deliver all results Key: LUCENE-4282 URL: https://issues.apache.org/jira/browse/LUCENE-4282 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0-ALPHA Reporter: Johannes Christen Having a small index with n documents where each document has one of the following terms: WEBER, WEBE, WEB, WBR, WE, (and some more) The new FuzzyQuery (Automaton) with maxEdits=2 only delivers the expected terms WEBER and WEBE in the rewritten query. The expected terms WEB and WBR which have an edit distance of 2 as well are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
[ https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427179#comment-13427179 ] Gili Nachum commented on LUCENE-2501: - Seeing a similar issue on 3.1.0. Was this ever resolved? or there's a workaround? Stack: {quote} 0049 SeedlistOpera Failed to process operation ADD java.lang.ArrayIndexOutOfBoundsException at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:135) at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:502) at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:523) at org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:106) at org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:126) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:479) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:169) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:248) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:701) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2194) at ... {quote} > ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice > -- > > Key: LUCENE-2501 > URL: https://issues.apache.org/jira/browse/LUCENE-2501 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 3.0.1 >Reporter: Tim Smith > > I'm seeing the following exception during indexing: > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 14 > at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118) > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490) > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120) > at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757) > at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085) > ... 37 more > {code} > This seems to be caused by the following code: > {code} > final int level = slice[upto] & 15; > final int newLevel = nextLevelArray[level]; > final int newSize = levelSizeArray[newLevel]; > {code} > this can result in "level" being a value between 0 and 14 > the array nextLevelArray is only of size 10 > i suspect the solution would be to either max the level to 10, or to add more > entries to the nextLevelArray so it has 15 entries > however, i don't know if something more is going wrong here and this is just > where the exception hits from a deeper issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3985) Refactor support for thread leaks
[ https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427177#comment-13427177 ] Uwe Schindler commented on LUCENE-3985: --- OK, doesnt matter. I would just prefer to have it merged in - or we should rename the other files, too. > Refactor support for thread leaks > - > > Key: LUCENE-3985 > URL: https://issues.apache.org/jira/browse/LUCENE-3985 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, > LUCENE-3985.patch > > > This will be duplicated in the runner and in LuceneTestCase; try to > consolidate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3985) Refactor support for thread leaks
[ https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427174#comment-13427174 ] Dawid Weiss edited comment on LUCENE-3985 at 8/2/12 8:34 AM: - It is a separate file, I wanted it to be somewhat explicit. We can merge in later on, not a problem. was (Author: dweiss): I already added it to a patch in LUCENE-3985 and fixed most of the calls there. It is a separate file, I wanted it to be somewhat explicit. > Refactor support for thread leaks > - > > Key: LUCENE-3985 > URL: https://issues.apache.org/jira/browse/LUCENE-3985 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, > LUCENE-3985.patch > > > This will be duplicated in the runner and in LuceneTestCase; try to > consolidate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3985) Refactor support for thread leaks
[ https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427174#comment-13427174 ] Dawid Weiss commented on LUCENE-3985: - I already added it to a patch in LUCENE-3985 and fixed most of the calls there. It is a separate file, I wanted it to be somewhat explicit. > Refactor support for thread leaks > - > > Key: LUCENE-3985 > URL: https://issues.apache.org/jira/browse/LUCENE-3985 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, > LUCENE-3985.patch > > > This will be duplicated in the runner and in LuceneTestCase; try to > consolidate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3985) Refactor support for thread leaks
[ https://issues.apache.org/jira/browse/LUCENE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427168#comment-13427168 ] Uwe Schindler commented on LUCENE-3985: --- bq. You can add the signatures to the forbidden API list under jdk.txt (with comment) or a new file (but don't forget to place this new signature file in Lucene and Solr's filesets). I think, to not complicate the filesets, we should use for this case simply jdk.txt and not a separate file (as all signatures refer to JDK. Otherwise we must rename jdk.txt to defaultCharsJdk.txt or whatever). Just place a comment in the introduction and add the signatures to jdk.txt. The other txt files in banned methods are more for other parts of lucene code-base (like test-only), or like commons-io, refer to a solr-only lib. > Refactor support for thread leaks > - > > Key: LUCENE-3985 > URL: https://issues.apache.org/jira/browse/LUCENE-3985 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3985.patch, LUCENE-3985.patch, LUCENE-3985.patch, > LUCENE-3985.patch > > > This will be duplicated in the runner and in LuceneTestCase; try to > consolidate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory
[ https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427167#comment-13427167 ] Dawid Weiss commented on LUCENE-4281: - I see the default one resets inherited priority and daemon status. Security manager I wouldn't worry about... {code} if (t.isDaemon()) t.setDaemon(false); if (t.getPriority() != Thread.NORM_PRIORITY) t.setPriority(Thread.NORM_PRIORITY); {code} > Delegate to default thread factory in NamedThreadFactory > > > Key: LUCENE-4281 > URL: https://issues.apache.org/jira/browse/LUCENE-4281 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 3.6.1, 4.0, 5.0 >Reporter: Simon Willnauer >Priority: Minor > Fix For: 4.0, 5.0, 3.6.2 > > Attachments: LUCENE-4281.patch > > > currently we state that we yield the same behavior as > Executors#defaultThreadFactory() but this behavior could change over time > even if it is compatible. We should just delegate to the default thread > factory instead of creating the threads ourself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory
[ https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427165#comment-13427165 ] Dawid Weiss commented on LUCENE-4281: - Uh, sorry -- I see what you did now. Anything on your mind in particular when you talk about behavioral changes? > Delegate to default thread factory in NamedThreadFactory > > > Key: LUCENE-4281 > URL: https://issues.apache.org/jira/browse/LUCENE-4281 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 3.6.1, 4.0, 5.0 >Reporter: Simon Willnauer >Priority: Minor > Fix For: 4.0, 5.0, 3.6.2 > > Attachments: LUCENE-4281.patch > > > currently we state that we yield the same behavior as > Executors#defaultThreadFactory() but this behavior could change over time > even if it is compatible. We should just delegate to the default thread > factory instead of creating the threads ourself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org