[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163346#comment-13163346 ] Mark Miller commented on SOLR-2358: --- I just made it so that version can be specified on delete's in solrxml and did the work necessary for distrib deletes to work with versioning. You can do delete by id now. > Distributing Indexing > - > > Key: SOLR-2358 > URL: https://issues.apache.org/jira/browse/SOLR-2358 > Project: Solr > Issue Type: New Feature > Components: SolrCloud, update >Reporter: William Mayor >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2358.patch > > > The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3370) Support for a "SpanNotNearQuery"
[ https://issues.apache.org/jira/browse/LUCENE-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163307#comment-13163307 ] Trejkaz commented on LUCENE-3370: - Well, I ran with a modified version of SpanNotQuery for some time and nobody noticed any issues with it, but I just found the one thing which SpanNotQuery does differently from SpanNearQuery which makes it unsuitable for this task. With a SpanNearQuery, if you have "cat" in the document only once, and you search for span-near("cat","cat"), you will get no hits. It doesn't regard terms as being "near" themselves. However with a SpanNotQuery, if you have "cat" in the document only once, and you search for span-not("cat","cat"), you *also* get no hits, because you have subtracted all the spans you got in the first round. Since SpanNotNearQuery works like an expanded SpanNotQuery, it inherits this behaviour. Thus, SpanNearQuery and SpanNotNearQuery end up in a situation where, quite confusingly to someone who doesn't know how they work, the results when added together for some reason do not give the full set of spans you would have had before applying the additional query. > Support for a "SpanNotNearQuery" > > > Key: LUCENE-3370 > URL: https://issues.apache.org/jira/browse/LUCENE-3370 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search >Reporter: Trejkaz > > Sometimes you want to find an instance of a span which does not hit near some > other span query. SpanNotQuery only excludes exact hits on the term, but > sometimes you want to exclude hits 1 away from the first, and other times you > might want the range to be wider. > So a SpanNotNearQuery could be useful. > SpanNotQuery is actually very close, and adding slop+inOrder support to it is > probably sufficient to make a SpanNotNearQuery. :) > There appears to be one project which has done it in this fashion, although > this particular code looks like it's out of date: > http://www.koders.com/java/fid933A84488EBE1F3492B19DE01B2A4FC1D68DA258.aspx?s=ArrayQuery -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Take a look at this...
Hola!hope was fading fast finding this was such a relief its crazy how the tables have turned I had to share this with someonehttp://www.llantasgigantes.com.mx/profile/89SimonWalker/";>http://www.llantasgigantes.com.mx/profile/89SimonWalker/see you
[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163224#comment-13163224 ] Hoss Man commented on SOLR-2487: Jan: +1 > Do not include slf4j-jdk14 jar in WAR > - > > Key: SOLR-2487 > URL: https://issues.apache.org/jira/browse/SOLR-2487 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 3.2, 4.0 >Reporter: Jan Høydahl > Labels: logging, slf4j > Attachments: SOLR-2487.patch, SOLR-2487.patch > > > I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help > newbies get up and running. But I find myself re-packaging the war for every > customer when adapting to their choice of logger framework, which is > counter-productive. > It would be sufficient to have the jdk-logging binding in example/lib to let > the example and tutorial still work OOTB but as soon as you deploy solr.war > to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2935) Better docs for numeric FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2935. Resolution: Fixed Fix Version/s: 4.0 3.6 Committed revision 1210714. - trunk Committed revision 1210718. - 3x > Better docs for numeric FieldTypes > -- > > Key: SOLR-2935 > URL: https://issues.apache.org/jira/browse/SOLR-2935 > Project: Solr > Issue Type: Improvement > Components: documentation >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 3.6, 4.0 > > Attachments: SOLR-2935.patch > > > It was recently pointed out to me that if you don't come from a java > background, understanding the range of legal values for "TrieIntField" vs > "TrieLongField" may not be obvious to you (particularly if you are use to > dealing with databases that have INT, SMALLINT, TINYINT, etc... with UNSIGNED > vs SIGNED modifiers). That subsequently made me realize that to this day the > javadocs for the various FieldTypes don't explain the diff between the > TrieFoo, SortableFoo, and Foo field types. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163178#comment-13163178 ] Mark Miller commented on SOLR-2358: --- We are starting to get some stable, usable stuff here (even though there is much to do!). We are also starting to get some users that are interested in using this stuff (critical feedback there). So I'd like to propose we try and merge the branch into trunk sooner rather than later, and then iterate from there. Anything too experimental in the future could move back onto a branch again. This will make the merge a bit more digestible as well - rather than building up a crazy amount of differences on the branch. There are also a variety of improvements and fixes in the testing framework and elsewhere that would be nice to get back into trunk. Perhaps within couple/few weeks, after we stabilize and finish up some hanging work? > Distributing Indexing > - > > Key: SOLR-2358 > URL: https://issues.apache.org/jira/browse/SOLR-2358 > Project: Solr > Issue Type: New Feature > Components: SolrCloud, update >Reporter: William Mayor >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2358.patch > > > The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2509) spellcheck: StringIndexOutOfBoundsException: String index out of range: -1
[ https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163114#comment-13163114 ] Yonik Seeley commented on SOLR-2509: bq. Indeed, this test scenario was added during a refactoring (r1022768) with no JIRA # or bug mentioned at all in the comments. My commit :-) The commit comment said "tests: fix resource leaks and simplify", and hopefully that's all I did! Looking back wrt pixma, it looks like I replaced this: {code} - @Test - public void testCollate2() throws Exception { -SolrCore core = h.getCore(); -SearchComponent speller = core.getSearchComponent("spellcheck"); -assertTrue("speller is null and it shouldn't be", speller != null); - -ModifiableSolrParams params = new ModifiableSolrParams(); -params.add(CommonParams.QT, "spellCheckCompRH"); -params.add(SpellCheckComponent.SPELLCHECK_BUILD, "true"); -params.add(CommonParams.Q, "pixma-a-b-c-d-e-f-g"); -params.add(SpellCheckComponent.COMPONENT_NAME, "true"); -params.add(SpellCheckComponent.SPELLCHECK_COLLATE, "true"); - -SolrRequestHandler handler = core.getRequestHandler("spellCheckCompRH"); -SolrQueryResponse rsp = new SolrQueryResponse(); -rsp.add("responseHeader", new SimpleOrderedMap()); -handler.handleRequest(new LocalSolrQueryRequest(core, params), rsp); -NamedList values = rsp.getValues(); -NamedList spellCheck = (NamedList) values.get("spellcheck"); -NamedList suggestions = (NamedList) spellCheck.get("suggestions"); -String collation = (String) suggestions.get("collation"); -assertEquals("pixmaa", collation); - } {code} With this: {code} +assertJQ(req("json.nl","map", "qt",rh, SpellCheckComponent.COMPONENT_NAME, "true", "q","pixma-a-b-c-d-e-f-g", SpellCheckComponent.SPELLCHECK_COLLATE, "true") + ,"/spellcheck/suggestions/collation=='pixmaa'" +); {code} > spellcheck: StringIndexOutOfBoundsException: String index out of range: -1 > -- > > Key: SOLR-2509 > URL: https://issues.apache.org/jira/browse/SOLR-2509 > Project: Solr > Issue Type: Bug >Affects Versions: 3.1 > Environment: Debian Lenny > JAVA Version "1.6.0_20" >Reporter: Thomas Gambier >Assignee: Erick Erickson >Priority: Blocker > Attachments: SOLR-2509.patch, SOLR-2509.patch, document.xml, > schema.xml, solrconfig.xml > > > Hi, > I'm a french user of SOLR and i've encountered a problem since i've installed > SOLR 3.1. > I've got an error with this query : > cle_frbr:"LYSROUGE1149-73190" > *SEE COMMENTS BELOW* > I've tested to escape the minus char and the query worked : > cle_frbr:"LYSROUGE1149(BACKSLASH)-73190" > But, strange fact, if i change one letter in my query it works : > cle_frbr:"LASROUGE1149-73190" > I've tested the same query on SOLR 1.4 and it works ! > Can someone test the query on next line on a 3.1 SOLR version and tell me if > he have the same problem ? > yourfield:"LYSROUGE1149-73190" > Where do the problem come from ? > Thank you by advance for your help. > Tom -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2509) spellcheck: StringIndexOutOfBoundsException: String index out of range: -1
[ https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163103#comment-13163103 ] James Dyer commented on SOLR-2509: -- Steffen's changes are most certainly correct. The index contains "pixmaa" and we are querying on "pixma-a-b-c-d-e-f-g". The spelling index is using analyzer "lowerpunctfilt" (solrconfig-spellcheckcomponent.xml, line 44) which includes WordDelimiterFilter and "generateWordParts=1". So we would expect this query to tokenize down to "pixma" "a" "b" "c" "d" "e" "f" "g". As the Collate feature is only supposed to replace the misspelled token with the new one, I wonder why this test scenario would expect all 8 tokens to be replaced by 1 token (!). Indeed, this test scenario was added during a refactoring (r1022768) with no JIRA # or bug mentioned at all in the comments. So we can't know for sure why it was added. I'm thinking this is invalid. I would expect the correct collation to be "pixma-a-b-c-d-e-f-g". Just for grins, I put a "println" in SpellingQueryConverter to show the start & end offsets for each token before and after the patch. In both cases, we get the same token texts, but prior to the patch the offset values are clearly wrong. --before: TOKEN: pixma so=0 eo=19 TOKEN: a so=0 eo=19 TOKEN: b so=0 eo=19 TOKEN: c so=0 eo=19 TOKEN: d so=0 eo=19 TOKEN: e so=0 eo=19 TOKEN: f so=0 eo=19 TOKEN: g so=0 eo=19 TOKEN: pixmaabcdefg so=0 eo=19 --after: TOKEN: pixma so=0 eo=5 TOKEN: a so=6 eo=7 TOKEN: b so=8 eo=9 TOKEN: c so=10 eo=11 TOKEN: d so=12 eo=13 TOKEN: e so=14 eo=15 TOKEN: f so=16 eo=17 TOKEN: g so=18 eo=19 TOKEN: pixmaabcdefg so=0 eo=19 > spellcheck: StringIndexOutOfBoundsException: String index out of range: -1 > -- > > Key: SOLR-2509 > URL: https://issues.apache.org/jira/browse/SOLR-2509 > Project: Solr > Issue Type: Bug >Affects Versions: 3.1 > Environment: Debian Lenny > JAVA Version "1.6.0_20" >Reporter: Thomas Gambier >Assignee: Erick Erickson >Priority: Blocker > Attachments: SOLR-2509.patch, SOLR-2509.patch, document.xml, > schema.xml, solrconfig.xml > > > Hi, > I'm a french user of SOLR and i've encountered a problem since i've installed > SOLR 3.1. > I've got an error with this query : > cle_frbr:"LYSROUGE1149-73190" > *SEE COMMENTS BELOW* > I've tested to escape the minus char and the query worked : > cle_frbr:"LYSROUGE1149(BACKSLASH)-73190" > But, strange fact, if i change one letter in my query it works : > cle_frbr:"LASROUGE1149-73190" > I've tested the same query on SOLR 1.4 and it works ! > Can someone test the query on next line on a 3.1 SOLR version and tell me if > he have the same problem ? > yourfield:"LYSROUGE1149-73190" > Where do the problem come from ? > Thank you by advance for your help. > Tom -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11692 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11692/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads Error Message: thread Indexer 3: hit unexpected failure Stack Trace: junit.framework.AssertionFailedError: thread Indexer 3: hit unexpected failure at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:237) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432) Build Log (for compile errors): [...truncated 7954 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3615) Make it easier to run Test2BTerms
[ https://issues.apache.org/jira/browse/LUCENE-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162986#comment-13162986 ] Hoss Man commented on LUCENE-3615: -- bq. Because of this, all of the tests behave in totally different ways that you cant really assign a weight to, e.g. take a look at the history of test times for this test: "Weight" may be the wrong term... i wasn't suggesting that it would be any sort of quantitative, comparable metric of how long the test would take -- my point was just that having a numeric annotation where bigger means "this test does more stuff" would allow people to run more or less tests as they see fit with simple configuration, regardless of whether their idea of a test to be run "Nightly" jives directly with the @Nightly annotation (maybe i want to only run @Nightly tests on weekends?) As things stand, we have regular tests, and then we have @Nightly tests, and then we have @Slow tests ... hypothetically: if we add a new test later that's not nearly as bad as Test2BTerms, so we still want it to run as part of a "full test run" but is bad enough that we don't want to jenkins to do it was part of our "@Nightly" run, we have to consider some intermediate "@SortOfSlow" attribute ... hence my suggestion that instead of adding more special case annotations (and more build params for deciding when to exececute what), we just use an arbitrary range of numbers and two simple "min" and "max" build params to pick the tests to run. ...anyway ... it was just an idea. > Make it easier to run Test2BTerms > - > > Key: LUCENE-3615 > URL: https://issues.apache.org/jira/browse/LUCENE-3615 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3615.patch, LUCENE-3615.patch, LUCENE-3615.patch > > > Currently, Test2BTerms has an @Ignore annotation which means that the only > way to run it as a test is to edit the file. > There are a couple of options to fix this: > # Add a main() so it can be invoked via the command line outside of the test > framework > # Add some new annotations that mark it as slow or weekly or something like > that and have the test target ignore @slow (or whatever) by default, but can > also turn it on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2943) DIHCacheWriter & DIHCacheProcessor (entity processor)
[ https://issues.apache.org/jira/browse/SOLR-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2943: - Description: This is a spin-off of SOLR-2382. Currently DIH requires users to retrieve, join and index all data for a full or delta update in one big step. This issue is to allow us to break this into individual steps. The idea is to have multiple "data-config.xml" files, some of which retrieve and cache data while others join and index data. This is useful when Solr Records are a conglomeration of several data sources. With this feature, each data source can be retrieved and cached separately. Once all data sources have been retrieved, they can be joined and indexed in a final step. When doing a delta update, only the data sources that change need to have their caches updated (or frequently-changing data can remain un-cached while caching the more static data). This is particularly useful in light of the fact that Lucene/Solr cannot do a true "update" operation. DIH Caches also provide a handy way to archive source data for which there is no stable system-of-record. Implementation Details: - The DIHCacheWriter allows us to write the final (root entity) DIH output to a DIHCache rather than to Solr. Caches can be created from scratch ("full-update") or existing caches can be modified ("delta-update"). - The DIHCacheProcessor is an Entity Processor that reads a DIHCache. This Entity Processor can be used for both Root Entities and Child Entities. Cached data can be read back, joined to other Entities and indexed. - Both DIHCacheWriter and DIHCacheProcessor support partitioning. DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can read back a particular partition. This can be handy when indexing to multiple shards. - This patch is 100% stand-alone from the rest of DIH, so while users can patch and rebuild the DIH .jar file to include these classes, it is unnecessary. To use this functionality, simply include the code here in the classpath. (ex: in SOLR_HOME/lib) - In addition to this patch, a persistent cache implementation is required. - See SOLR-2948 for a DIH Cache Implementation built on Lucene (no additional dependencies). - See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE (we use this in Production). - Other Cache Implementations (hopefully) will be developed in the future and become available for general use. - This patch includes extensive unit tests. A MockDIHCache that supports persistence and delta updates facilitates the tests. Do not attempt to use MockDIHCache for anything other than testing or as a reference for developing your own DIHCache implementations. was: This is a spin-off of SOLR-2382. Currently DIH requires users to retrieve, join and index all data for a full or delta update in one big step. This issue is to allow us to break this into individual steps. The idea is to have multiple "data-config.xml" files, some of which retrieve and cache data while others join and index data. This is useful when Solr Records are a conglomeration of several data sources. With this feature, each data source can be retrieved and cached separately. Once all data sources have been retrieved, they can be joined and indexed in a final step. When doing a delta update, only the data sources that change need to have their caches updated (or frequently-changing data can remain un-cached while caching the more static data). This is particularly useful in light of the fact that Lucene/Solr cannot do a true "update" operation. DIH Caches also provide a handy way to archive source data for which there is no stable system-of-record. Implementation Details: - The DIHCacheWriter allows us to write the final (root entity) DIH output to a DIHCache rather than to Solr. Caches can be created from scratch ("full-update") or existing caches can be modified ("delta-update"). - The DIHCacheProcessor is an Entity Processor that reads a DIHCache. This Entity Processor can be used for both Root Entities and Child Entities. Cached data can be read back, joined to other Entities and indexed. - Both DIHCacheWriter and DIHCacheProcessor support partitioning. DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can read back a particular partition. This can be handy when indexing to multiple shards. - This patch is 100% stand-alone from the rest of DIH, so while users can patch and rebuild the DIH .jar file to include these classes, it is unnecessary. To use this functionality, simply include the code here in the classpath. (ex: in SOLR_HOME/lib) - In addition to this patch, a persistent cache implementation is required. See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE. Other Cache Implementations (hopefully) will be developed in the future and become available
[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 1175 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1175/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/build/solr-core/test/3/solrtest-SignatureUpdateProcessorFactoryTest-1323111362955/index/_e.frq Stack Trace: java.io.IOException: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/build/solr-core/test/3/solrtest-SignatureUpdateProcessorFactoryTest-1323111362955/index/_e.frq at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) at org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:370) at org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:243) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:535) at org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72) FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278) Build Log (for compile errors): [...truncated 14636 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2948) DIH Cache backed w/Lucene
[ https://issues.apache.org/jira/browse/SOLR-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2948: - Attachment: SOLR-2948.patch This initial version has never been used in a production environment, but I have used (an earlier version of) this in a development context. No doubt this would be adequate in many situations but likely could stand some improvement. Unit tests included and all pass. > DIH Cache backed w/Lucene > - > > Key: SOLR-2948 > URL: https://issues.apache.org/jira/browse/SOLR-2948 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0 >Reporter: James Dyer >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2948.patch > > > This is a DIH Cache Implementation that supports persistence and delta > updates on the cache. The cache is backed by a stand-alone Lucene index. By > requiring no additional dependencies, this allows users to easily use the DIH > Cache persistence functionality (see SOLR-2943). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2948) DIH Cache backed w/Lucene
DIH Cache backed w/Lucene - Key: SOLR-2948 URL: https://issues.apache.org/jira/browse/SOLR-2948 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 This is a DIH Cache Implementation that supports persistence and delta updates on the cache. The cache is backed by a stand-alone Lucene index. By requiring no additional dependencies, this allows users to easily use the DIH Cache persistence functionality (see SOLR-2943). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2613) DIH Cache backed w/bdb-je
[ https://issues.apache.org/jira/browse/SOLR-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2613: - Attachment: SOLR-2613.patch updated to fix a parameter-naming bug. > DIH Cache backed w/bdb-je > - > > Key: SOLR-2613 > URL: https://issues.apache.org/jira/browse/SOLR-2613 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 4.0 >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch, > SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch > > > This is spun out of SOLR-2382, which provides a framework for multiple > cacheing implementations with DIH. This cache implementation is fast & > flexible, supporting persistence and delta updates. However, it depends on > Berkley Database Java Edition so in order to evaluate this and use it you > must download bdb-je from Oracle and accept the license requirements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2943) DIHCacheWriter & DIHCacheProcessor (entity processor)
[ https://issues.apache.org/jira/browse/SOLR-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2943: - Attachment: SOLR-2943.patch updated patch. fixes a parameter-naming bug. > DIHCacheWriter & DIHCacheProcessor (entity processor) > - > > Key: SOLR-2943 > URL: https://issues.apache.org/jira/browse/SOLR-2943 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler >Affects Versions: 4.0 >Reporter: James Dyer >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2943.patch, SOLR-2943.patch > > > This is a spin-off of SOLR-2382. > Currently DIH requires users to retrieve, join and index all data for a full > or delta update in one big step. This issue is to allow us to break this > into individual steps. The idea is to have multiple "data-config.xml" files, > some of which retrieve and cache data while others join and index data. > This is useful when Solr Records are a conglomeration of several data > sources. With this feature, each data source can be retrieved and cached > separately. Once all data sources have been retrieved, they can be joined > and indexed in a final step. When doing a delta update, only the data > sources that change need to have their caches updated (or frequently-changing > data can remain un-cached while caching the more static data). This is > particularly useful in light of the fact that Lucene/Solr cannot do a true > "update" operation. DIH Caches also provide a handy way to archive source > data for which there is no stable system-of-record. > Implementation Details: > - The DIHCacheWriter allows us to write the final (root entity) DIH output to > a DIHCache rather than to Solr. Caches can be created from scratch > ("full-update") or existing caches can be modified ("delta-update"). > - The DIHCacheProcessor is an Entity Processor that reads a DIHCache. This > Entity Processor can be used for both Root Entities and Child Entities. > Cached data can be read back, joined to other Entities and indexed. > - Both DIHCacheWriter and DIHCacheProcessor support partitioning. > DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can > read back a particular partition. This can be handy when indexing to > multiple shards. > - This patch is 100% stand-alone from the rest of DIH, so while users can > patch and rebuild the DIH .jar file to include these classes, it is > unnecessary. To use this functionality, simply include the code here in the > classpath. (ex: in SOLR_HOME/lib) > - In addition to this patch, a persistent cache implementation is required. > See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE. Other Cache > Implementations (hopefully) will be developed in the future and become > available for general use. > - This patch includes extensive unit tests. A MockDIHCache that supports > persistence and delta updates facilitates the tests. Do not attempt to use > MockDIHCache for anything other than testing or as a reference for developing > your own DIHCache implementations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2208) Token div exceeds length of provided text sized 4114
[ https://issues.apache.org/jira/browse/LUCENE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162898#comment-13162898 ] Matan Zinger commented on LUCENE-2208: -- Hello Guys, I am blocked with bug as well. Is there any update / progress on this subject? Thank you in advance... > Token div exceeds length of provided text sized 4114 > > > Key: LUCENE-2208 > URL: https://issues.apache.org/jira/browse/LUCENE-2208 > Project: Lucene - Java > Issue Type: Bug > Components: modules/highlighter >Affects Versions: 3.0 > Environment: diagnostics = {os.version=5.1, os=Windows XP, > lucene.version=3.0.0 883080 - 2009-11-22 15:43:58, source=flush, os.arch=x86, > java.version=1.6.0_12, java.vendor=Sun Microsystems Inc.} > >Reporter: Ramazan VARLIKLI > Attachments: LUCENE-2208.patch, LUCENE-2208_test.patch > > > I have a doc which contains html codes. I want to strip html tags and make > the test clear after then apply highlighter on the clear text . But > highlighter throws an exceptions if I strip out the html characters , if i > don't strip out , it works fine. It just confuses me at the moment > I copy paste 3 thing here from the console as it may contain special > characters which might cause the problem. > 1 -) Here is the html text > Starter > > > > Learning path: History > Key question > Did transport fuel the industrial revolution? > Learning Objective > > To categorise points as for or against an argument > > > What to do? > > Watch the clip: Transport fuelled the industrial > revolution. > > The clips claims that transport fuelled the industrial > revolution. Some historians argue that the industrial revolution only > happened because of developments in transport. > > Read the statements below and decide which > points are for and which points are against the argument > that industry expanded in the 18th and 19th centuries because of developments > in transport. > > > > Industry expanded because of inventions and > the discovery of steam power. > Improvements in transport allowed goods to > be sold all over the country and all over the world so there were more > customers to develop industry for. > Developments in transport allowed > resources, such as coal from mines and cotton from America to come together > to manufacture products. > Transport only developed because industry > needed it. It was slow to develop as money was spent on improving roads, then > building canals and the replacing them with railways in order to keep up with > industry. > > > Now try to think of 2 more statements of your > own. > > > > > Main activity > > > Learning path: > History > Learning Objective > > To select evidence to support points > > What to do? > > Choose the 4 points that you think are most important - > try to be balanced by having two for and two > against. > Write one in each of the point boxes of the > paragraphs on the sheet class="link-internal">Constructing a balanced argument. You > might like to re write the points in your own words and use connectives to > link the paragraphs. > > In history and in any argument, you need evidence > to support your points. > Find evidence from these sources and from > your own knowledge to support each of your points: > > href="../servlet/link?template=vid¯o=setResource&resourceID=2044" > class="link-internal">At a toll gate > href="../servlet/link?macro=setResource&template=vid&resourceID=2046" > class="link-internal">Canals > href="../servlet/link?macro=setResource&template=vid&resourceID=2043" > class="link-internal">Growing cities: traffic >href="../servlet/link?macro=setResource&template=vid&resourceID=2047" > class="link-internal">Impact of the railway >href="../servlet/link?macro=setResource&template=vid&resourceID=
[jira] [Commented] (SOLR-2880) Investigate adding an overseer that can assign shards, later do re-balancing, etc
[ https://issues.apache.org/jira/browse/SOLR-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162811#comment-13162811 ] Sami Siren commented on SOLR-2880: -- bq. Why does the overseer class have it's own cloud state and watches on live nodes and stuff? The watch for live nodes is also used for adding watches for node states: when a new node pops up a watch is generated for /node_states/ bq. The ZkControllers ZkStateReader is already tracking all this stuff and should be the owner of the cloud state, shouldn't it? Yeah, makes sense. I'll see how that would work. > Investigate adding an overseer that can assign shards, later do re-balancing, > etc > - > > Key: SOLR-2880 > URL: https://issues.apache.org/jira/browse/SOLR-2880 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-2880-merge-elections.patch, SOLR-2880.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Created] (LUCENENET-459) Italian stemmer (from SnowballAnalyzer) does not work
Italian stemmer (from SnowballAnalyzer) does not work - Key: LUCENENET-459 URL: https://issues.apache.org/jira/browse/LUCENENET-459 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4 Reporter: Santiago M. Mola Italian stemmer does not work. Consider this code: var englishAnalyzer = new SnowballAnalyzer("English"); var tk = englishAnalyzer.TokenStream("text", new StringReader("horses")); var ta = (TermAttribute)tk.GetAttribute(typeof(TermAttribute)); tk.IncrementToken(); Console.WriteLine("English stemmer: horses -> " + ta.Term()); var italianAnalyzer = new SnowballAnalyzer("Italian"); tk = italianAnalyzer.TokenStream("text", new StringReader("abbandonata")); ta = (TermAttribute)tk.GetAttribute(typeof(TermAttribute)); tk.IncrementToken(); Console.WriteLine("Italian stemmer: abbandonata -> " + ta.Term()); It outputs: English stemmer: horses -> hors Italian stemmer: abbandonata -> abbandonata While Java Lucene 2.9.4 outputs: English stemmer: horses -> hors Italian stemmer: abbandonata -> abbandon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENE-3619) in trunk if you switch up omitNorms while indexing, you get a corrumpt norms file
[ https://issues.apache.org/jira/browse/LUCENE-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3619. - Resolution: Fixed Fix Version/s: 4.0 > in trunk if you switch up omitNorms while indexing, you get a corrumpt norms > file > - > > Key: LUCENE-3619 > URL: https://issues.apache.org/jira/browse/LUCENE-3619 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3619.patch > > > document 1 has > body: norms=true > title: norms=true > document 2 has > body: norms=false > title: norms=true > when seeing 'body' for the first time, normswriterperfield gets 'initial > fieldinfo' and > saves it away, which says norms=true > however, at flush time we dont check, so we write the norms happily anyway. > then SegmentReader reads the norms later: it skips "body" since it omits norms > and if you ask for the norms of 'title' it instead returns the bogus "body" > norms. > asserting that SegmentReader "plans to" read the whole .nrm file exposes the > bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Lucene.net twitter account and chat room
Oh.. I thought the CMS was a bit more "dynamic"... not just a bunch of static files :) Maybe an unofficial site could be done... the official stuff, releases, and other institutional news will stay on the ASF site, while announcements, articles, demos and more dynamic and frequently released news might stay on this external site. But yes, the twitter gadget might help... also adding a "Follow me on twitter" button in the page will help raising awareness about the twitter account Simo On Mon, Dec 5, 2011 at 2:08 PM, Michael Herndon wrote: > oh. I totally misread that. > > its *possible. though the data would need a place to live outside of the > apache cms and I don't know apache's view on that kind of thing at the > moment or what the other available options are. > > To my knowledge, we're currently limited to putting things into svn. I > don't like putting docs into the source when its a ridiculous amount of > static html files. I'd rather store them as a zip and have the site unzip > them into a directory. I think others feels the same way but don't quote me > on that. On the plus side it does have a staging mechanism built into the > process before you publish to production. > > Though it would be nice to have a .NET or Mono website for the simple fact > we could put up live demos of Lucene.Net and index the mailing list, > website, wiki, twitter feed,, chat logs, articles, etc in one place. Also > the docs would then be able to use the binary format is more compact than > the purely html format. > > But a twitter widget on the site would help visibility =).Just my 2.5 > cents worth. > > @Prescott, let me know when you want to work on that release checklist. > > - Michael. > > > > > > On Sun, Dec 4, 2011 at 5:12 PM, Simone Chiaretta < > simone.chiare...@gmail.com> wrote: > >> That's not exactly the same thing... >> I meant a way to have a kind of blog for release announcements and >> similar things (just the same things that are now in the homepage), not a >> list of 140chars messages :) >> >> --- >> Simone Chiaretta >> @simonech >> Sent from a tablet >> >> On 04/dic/2011, at 22:02, Michael Herndon wrote: >> >> > The JavaScript twitter widget should work. >> > >> > Sent from my Windows Phone >> > From: Simone Chiaretta >> > Sent: 12/4/2011 2:34 PM >> > To: lucene-net-...@lucene.apache.org >> > Subject: Re: [Lucene.Net] Lucene.net twitter account and chat room >> > One good idea, but not sure if possible with ASF CMS, is to have a >> > feed with the news that are now in home page, and the possibility to >> > link to single news. Well... a blog :) >> > Would be much better than adding all news one after the other in the >> home page >> > >> > Simo >> > >> > --- >> > Simone Chiaretta >> > @simonech >> > Sent from a tablet >> > >> > On 04/dic/2011, at 07:27, michael herndon >> wrote: >> > >> >> Maybe we use should build a mail/chat/social media search with >> >> lucene.netat some point in the future? I'm sure there is a way to log >> >> the chat. >> >> >> >> I posted up a tweet tonight on the release. better later than never. >> >> >> >> I have two more tweets scheduled using hootsuite, one to thank Simone >> for >> >> the packages, another to ask for article and application submissions on >> >> monday. If anyone else has ideas for the branding aspect, do share. >> I'll >> >> try to check the feed daily. >> >> >> >> hashtag: #lucenenet >> >> >> >> - Michael. >> >> >> >> On Fri, Dec 2, 2011 at 2:14 PM, Troy Howard >> wrote: >> >> >> >>> Re: Twitter >> >>> >> >>> Sadly not a single tweet has been sent out on our twitter account. >> >>> Really need to remedy that. >> >>> >> >>> Re: IRC/realtime chat >> >>> >> >>> There have been some good reasons expressed by various folks at Apache >> >>> (and in our team) that realtime chat in channels which are not >> >>> publicly logged should generally be discouraged. This is because it's >> >>> all too easy to have a discussion in which only a few members of the >> >>> community are present, and make decisions without any opportunity for >> >>> the rest of the community to have input and without the ability to >> >>> review the reasoning or discourse later. The same holds true for user >> >>> support, as it's much better to have that public and logged in a >> >>> mailing list message so that others might find that through searches >> >>> and use as a reference. >> >>> >> >>> That said, people do use IRC/IM from time to time, but we prefer to >> >>> keep most if not all of the communications public and on the Apache >> >>> mailing lists. So feel free to set up a chat room and chat with >> >>> whomever wants to join about whatever topic, but for most things at >> >>> Apache the philosophy is "mailing list, or it didn't happen". :) >> >>> >> >>> Thanks, >> >>> Troy >> >>> >> >>> >> >>> On Fri, Dec 2, 2011 at 10:03 AM, Prescott Nasser < >> geobmx...@hotmail.com> >> >>> wrote: >> >> > >> > I just saw that there is a twitter account for Lucene.n
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11685 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11685/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1323090765194/index/_d.tii Stack Trace: java.io.IOException: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1323090765194/index/_d.tii at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) at org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:370) at org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:243) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:535) at org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72) FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278) Build Log (for compile errors): [...truncated 14640 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB
[ https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162752#comment-13162752 ] Carlos González-Cadenas commented on LUCENE-3298: - Yeap, at the beginning of this project we tried to implement this autocomplete system using regular inverted indexes, but the response time required for autocomplete to work from a user perspective is very low (<50ms), and it would be quite hard to achieve such a performance with inverted indexes. I still think this is the way to go, but as you say we have to be careful with the data generation part. Most of the work should be put in making sure that the data is well distributed and organized in order to avoid combinatorial explosion. Let me go in detail with the sources of data permutations and the reasoning behind them: 1) With regards to infix matches, if a user types "barcelona" we want to match "hotels in barcelona". In order to achieve this, we generate: hotels in barcelona => hotels in barcelona in barcelona => hotels in barcelona barcelona => hotels in barcelona The FST should be able to conflate these prefixes nicely in just one path, right?. Therefore this part shouldn't be a problem. 2) In addition, another feature we want to achieve is to be able to match inputs without prepositions. That means that if the user types "hotels barcelona jacuzzi", we should be able to match "hoteles in barcelona with jacuzzi". Now the only way we envision of doing it properly is to generate this permutation within the data: hotels barcelona jacuzzi => hotels in barcelona with jacuzzi I can see how this can explode the FST by creating different branches. Theoretically this could be done at runtime without the need of generating the data, but we don't see a way to do it in a clean way. To make things more complicated :) we've implemented fuzzy matching at query time (we use a levenshtein automata generated with the user input + an edit distance and then we intersect with the FST), and this is making very complicated to do preposition handling at query time. 3) PP permutations (i.e. hoteles in barcelona with jacuzzi and hoteles with jacuzzi in barcelona). I don't really see a way to work around this. Probably we need to be careful and only generate these permutations for the top-K cities, in order to limit the potential size. Summarizing, I believe that we can reduce the set of "bad permutations" a lot if we can figure out how to implement the prepositions at runtime. If you have any ideas, let me know. Thanks! :) > FST has hard limit max size of 2.1 GB > - > > Key: LUCENE-3298 > URL: https://issues.apache.org/jira/browse/LUCENE-3298 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Michael McCandless >Priority: Minor > Attachments: LUCENE-3298.patch > > > The FST uses a single contiguous byte[] under the hood, which in java is > indexed by int so we cannot grow this over Integer.MAX_VALUE. It also > internally encodes references to this array as vInt. > We could switch this to a paged byte[] and make the far larger. > But I think this is low priority... I'm not going to work on it any time soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB
[ https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162719#comment-13162719 ] Dawid Weiss commented on LUCENE-3298: - If you have so many permutations then they become different paths in the FST and it will grow exponentially to the number of input words/ combinations. To be honest, this looks more suitable for a regular inverted index search. > FST has hard limit max size of 2.1 GB > - > > Key: LUCENE-3298 > URL: https://issues.apache.org/jira/browse/LUCENE-3298 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Michael McCandless >Priority: Minor > Attachments: LUCENE-3298.patch > > > The FST uses a single contiguous byte[] under the hood, which in java is > indexed by int so we cannot grow this over Integer.MAX_VALUE. It also > internally encodes references to this array as vInt. > We could switch this to a paged byte[] and make the far larger. > But I think this is low priority... I'm not going to work on it any time soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB
[ https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162718#comment-13162718 ] Carlos González-Cadenas commented on LUCENE-3298: - Hello Dawid, The sentences have variants at different levels. The first is the one you mention, different prefixes for different accomodation types. The second one is different positions of the prepositional phrases of the query (i.e. "hotels in barcelona with jacuzzi" and "hotels with jacuzzi in barcelona"). The third one we have is sentences with and without prepositions ("hotels barcelona jacuzzi"). W.r.t the patch, sorry, I got confused. James, do you have a version of this patch that works with trunk? Thanks a lot. > FST has hard limit max size of 2.1 GB > - > > Key: LUCENE-3298 > URL: https://issues.apache.org/jira/browse/LUCENE-3298 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Michael McCandless >Priority: Minor > Attachments: LUCENE-3298.patch > > > The FST uses a single contiguous byte[] under the hood, which in java is > indexed by int so we cannot grow this over Integer.MAX_VALUE. It also > internally encodes references to this array as vInt. > We could switch this to a paged byte[] and make the far larger. > But I think this is low priority... I'm not going to work on it any time soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1162 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1162/ 2 tests failed. REGRESSION: org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch Error Message: Could not connect to ZooKeeper 127.0.0.1:46439 within 3 ms Stack Trace: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:46439 within 3 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:121) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:84) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:65) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:71) at org.apache.solr.cloud.AbstractDistributedZkTestCase.setUp(AbstractDistributedZkTestCase.java:47) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: java.lang.AssertionError: ensure your setUp() calls super.setUp() and your tearDown() calls super.tearDown()!!! Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: ensure your setUp() calls super.setUp() and your tearDown() calls super.tearDown()!!! at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:402) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:344) Build Log (for compile errors): [...truncated 11483 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11682 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11682/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1323078096363/index/_9.prx Stack Trace: java.io.IOException: Cannot delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1323078096363/index/_9.prx at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296) at org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:370) at org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:243) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:535) at org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72) FAILED: junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278) Build Log (for compile errors): [...truncated 14639 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org