Re: Czech stemmer
Hi, I would recommend you to look at stemmer or token filter based on Hunspell dictionaries. I am not a Solr user so I can not point you to appropriate documentation about this but Czech dictionary that can be used with Hunspell is of high quality. It can be downloaded from OpenOffice here http://extensions.services.openoffice.org/en/project/czech-dictionary-pack-ceske-slovniky-cs-cz (distributed under GPL). Note: when I was looking at it the last time I noticed that the dictionary contained one broken affix rule which may require manual fix depending on how strict the rule loaded is in Solr. If you are interested in more details and can not figure it yourself feel free to ping me again, I can point you to some resources about how I used it in connection with Elasticsearch, I assume the basic concepts apply to Solr as well. Regards, Lukas 2014-09-09 22:14 GMT+02:00 Shamik Bandopadhyay sham...@gmail.com: Hi, I'm facing stemming issues with the Czech language search. Solr/Lucene currently provides CzechStemFilterFactory as the sole option. Snowball Porter doesn't seem to be available for Czech. Here's the issue. I'm trying to search for posunout (means move in English) which returns result, but fails if I use ''posunulo (means moved in English). I used the following text as field for search. Pomocí multifunkčních uzlů je možné odkazy mnoha způsoby upravovat. Můžete přidat a odstranit odkazy, přidat a odstranit vrcholy, prodloužit nebo přesunout prodloužení čáry nebo přesunout text odkazu. Přístup k požadované možnosti získáte po přesunutí ukazatele myši na uzel. Z uzlu prodloužení čáry můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout prodloužení odkazové čáry. Délka prodloužení čáry: Umožňuje prodloužit prodloužení čáry. Přidat odkaz: Umožňuje přidat jednu nebo více odkazových čar. Z uzlu koncového bodu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout koncový bod odkazové čáry. Přidat vrchol: Umožňuje přidat vrchol k odkazové čáře. Odstranit odkaz: Umožňuje odstranit vybranou odkazovou čáru. Z uzlu vrcholu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout vrchol. Přidat vrchol: Umožňuje přidat vrchol na odkazovou čáru. Odstranit vrchol: Umožňuje odstranit vrchol. Just wondering if there's a different stemmer available or a way to address this. Schema : fieldType name=text_csy class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_cz.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms_csy.txt ignoreCase=true expand=true/ filter class=solr.CzechStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_cz.txt / filter class=solr.CzechStemFilterFactory/ /analyzer /fieldType Any pointers will be appreciated. - Thanks, Shamik
FileNotFoundException, Error closing IndexWriter, Error opening new searcher
Hi, in the last few days we had some troubles with one of our clusters (5 machines each running 4.7.2 inside jetty container, no replication, Java 1.7.21). Two time we had troubles to restart one server (same machine) because of some FileNotFoundException. 1. First time: Stopping Solr while indexing resulted in the following log output: 2014-09-04 10:09:45,633 INFO o.a.s.s.SolrIndexSearcher [recoveryExecutor-6-thread-1] Opening Searcher@2b94db[shard2_replica1] realtime 2014-09-04 10:09:45,634 INFO o.a.s.u.DirectUpdateHandler2 [recoveryExecutor-6-thread-1] Reordered DBQs detected. Update=add{...} DBQs=[...] 2014-09-04 10:09:45,646 ERROR o.a.s.c.SolrException [recoveryExecutor-6-thread-1] Error opening realtime searcher for deleteByQuery:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1521) at org.apache.solr.update.UpdateLog.add(UpdateLog.java:422) at org.apache.solr.update.DirectUpdateHandler2.addAndDelete(DirectUpdateHandler2.java:449) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:216) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:704) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:858) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1326) at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1215) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: _7omin_Lucene41_0.tip at org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252) at org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238) at java.util.TimSort.binarySort(TimSort.java:265) at java.util.TimSort.sort(TimSort.java:208) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286) at org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:1970) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1940) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:404) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:289) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:274) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:250) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1445) ... 21 more 2. Second time: brought some updates to init.d scripts, had to restart each server on the cluster. No indexing at this time. Same server chrashed now with this output: While shutting down 2014-09-05 15:13:39,204 INFO o.a.s.c.c.ZkStateReader$2 [main-EventThread] A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 2014-09-05 15:13:39,585 INFO o.a.s.c.c.ZkStateReader$2 [main-EventThread] A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 2014-09-05 15:13:39,586 INFO o.a.s.c.c.ZkStateReader$2 [main-EventThread] A cluster state change: WatchedEvent state:SyncConnected
Solr Spellcheck suggestions only return from /select handler when returning search results
Hi, I'm experimenting with the Spellcheck component and have therefor used the example configuration for spell checking to try things out. My solrconfig.xml looks like this: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespell/str !-- Multiple Spell Checkers can be declared and used by this component -- !-- a spellchecker built from a field of the main index -- lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- str name=distanceMeasureinternal/str !-- uncomment this to require suggestions to occur in 1% of the documents float name=thresholdTokenFrequency.01/float -- /lst !-- a spellchecker that can break or combine words. See /spell handler below for usage -- lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspell/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent And I've added the spellcheck component to my /select request handler: requestHandler name=/select class=solr.SearchHandler ... arr name=last-components strspellcheck/str /arr /requestHandler I have built up the spellchecker source in the schema.xml from the name field: field name=spell type=spell indexed=true stored=true required=false multiValued=false/ copyField source=name dest=spell maxChars=3 / ... fieldType name=spell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ /analyzer /fieldType As I'm querying the /select request handler, I should get spellcheck suggestions with my results. However, I rarely get a suggestion. Examples: query: Sichtscheibe, spellcheck suggestion: Sichtscheiben (works) query: Sichtscheib, spellcheck suggestion: Sichtscheiben (works) query: ichtscheiben, no spellcheck suggestions As far as I can identify, I only get suggestions when I get real search results. I get results for the first 2 examples, because the german StemFilterFactory translates Sichtscheibe and Sichtscheiben into Sichtscheib, so there are matches found. However, the third query should result in a suggestion, as the Levenshtein distance is less than in the second example. Suggestions, improvements, corrections?
Re: Integrate solr with openNLP
Hi, What is the progress of integration of nlp with solr. If you have achieved this integration techniques successfully then please share with us. With Regards Aman Tandon On Tue, Jun 10, 2014 at 11:04 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Aman, Yeah, We are also thinking the same. Using UIMA is better. And thanks to everyone. You guys really showed us the way(UIMA). We'll work on it. Thanks, Vivek On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Vikek, As everybody in the mail list mentioned to use UIMA you should go for it, as opennlp issues are not tracking properly, it can make stuck your development in near future if any issue comes, so its better to start investigate with uima. With Regards Aman Tandon On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Can anyone pleas reply..? Thanks, Vivek -- Forwarded message -- From: Vivekanand Ittigi vi...@biginfolabs.com Date: Wed, Jun 4, 2014 at 4:38 PM Subject: Re: Integrate solr with openNLP To: Tommaso Teofili tommaso.teof...@gmail.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet Arslan iori...@yahoo.com Hi Tommaso, Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm trying to apply named recognition(person name) token but im not seeing any change. my schema.xml looks like this: field name=text type=text_opennlp_pos_ner indexed=true stored=true multiValued=true/ fieldType name=text_opennlp_pos_ner class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin / filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please guide..? Thanks, Vivek On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, Ahment was suggesting to eventually use UIMA integration because OpenNLP has already an integration with Apache UIMA and so you would just have to use that [1]. And that's one of the main reason UIMA integration was done: it's a framework that you can easily hook into in order to plug your NLP algorithm. If you want to just use OpenNLP then it's up to you if either write your own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP to your documents or either you can write a dedicated analyzer / tokenizer / token filter. For the OpenNLP integration (LUCENE-2899), the patch is not up to date with the latest APIs in trunk, however you should be able to apply it to (if I recall correctly) to 4.4 version or so, and also adapting it to the latest API shouldn't be too hard. Regards, Tommaso [1] : http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima [2] : http://wiki.apache.org/solr/UpdateRequestProcessor 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid: Can you extract names, locations etc using OpenNLP in plain/straight java program? If yes, here are two seperate options : 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example to integrate your NER code into it and write your own indexing code. You have the full power here. No solr-plugins are involved. 2) Use 'Implementing a conditional copyField' given here : http://wiki.apache.org/solr/UpdateRequestProcessor as an example and integrate your NER code into it. Please note that these are separate ways to enrich your incoming documents, choose either (1) or (2). On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Okay, but i dint understand what you said. Can you please elaborate. Thanks, Vivek On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vivekanand, I have never use UIMA+Solr before. Personally I think it takes more time to learn how to configure/use these uima stuff. If you are familiar with java, write a class that extends UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new fields (organisation, city, person name, etc, to your document. This phase is usually called 'enrichment'. Does that makes sense? On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Ahmet, I followed what you said
Genre classification/Document classification for apache solr
Hi, I want to crawl links and want to identify if link is company website. For example, If I use word 'financial advisory' in google search engine. I will get list of urls in search result. Some of links are company website. I want to identify those links which are company website and index them into solr. Does any body know some api/tools which can identify if link is company website or not, or api/tool which can identify url genre/type on the basis of taxonomy. Thanks Vineet Yadav
Re: Edismax mm and efficiency
I implemented a custom QueryComponent that issues the edismax query with mm=100%, and if no results are found, it reissues the query with mm=1. This doubled our query throughput (compared to mm=1 always), as we do some expensive RankQuery processing. For your very long student queries, mm=100% would obviously be too high, so you'd have to experiment. On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood wun...@wunderwood.org wrote: Great! We have some very long queries, where students paste entire homework problems. One of them was 1051 words. Many of them are over 100 words. This could help. In the Jira discussion, I saw some comments about handling the most sparse lists first. We did something like that in the Infoseek Ultra engine about twenty years ago. Short termlists (documents matching a term) were processed first, which kept the in-memory lists of matching docs small. It also allowed early short-circuiting for no-hits queries. What would be a high mm value, 75%? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: indeed https://issues.apache.org/jira/browse/LUCENE-4571 my feeling is it gives a significant gain in mm high values. On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood wun...@wunderwood.org wrote: Are there any speed advantages to using “mm”? I can imagine pruning the set of matching documents early, which could help, but is that (or something else) done? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Integrate solr with openNLP
Actually we dropped integrating nlp with solr but we took two different ideas: * we're using nlp seperately not with solr * we're taking help of UIMA for solr. Its more advanced. If you've a specific question. you can ask me. I'll tell you if i know. -Vivek On Wed, Sep 10, 2014 at 3:46 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, What is the progress of integration of nlp with solr. If you have achieved this integration techniques successfully then please share with us. With Regards Aman Tandon On Tue, Jun 10, 2014 at 11:04 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Aman, Yeah, We are also thinking the same. Using UIMA is better. And thanks to everyone. You guys really showed us the way(UIMA). We'll work on it. Thanks, Vivek On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Vikek, As everybody in the mail list mentioned to use UIMA you should go for it, as opennlp issues are not tracking properly, it can make stuck your development in near future if any issue comes, so its better to start investigate with uima. With Regards Aman Tandon On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Can anyone pleas reply..? Thanks, Vivek -- Forwarded message -- From: Vivekanand Ittigi vi...@biginfolabs.com Date: Wed, Jun 4, 2014 at 4:38 PM Subject: Re: Integrate solr with openNLP To: Tommaso Teofili tommaso.teof...@gmail.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet Arslan iori...@yahoo.com Hi Tommaso, Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm trying to apply named recognition(person name) token but im not seeing any change. my schema.xml looks like this: field name=text type=text_opennlp_pos_ner indexed=true stored=true multiValued=true/ fieldType name=text_opennlp_pos_ner class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin / filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please guide..? Thanks, Vivek On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, Ahment was suggesting to eventually use UIMA integration because OpenNLP has already an integration with Apache UIMA and so you would just have to use that [1]. And that's one of the main reason UIMA integration was done: it's a framework that you can easily hook into in order to plug your NLP algorithm. If you want to just use OpenNLP then it's up to you if either write your own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP to your documents or either you can write a dedicated analyzer / tokenizer / token filter. For the OpenNLP integration (LUCENE-2899), the patch is not up to date with the latest APIs in trunk, however you should be able to apply it to (if I recall correctly) to 4.4 version or so, and also adapting it to the latest API shouldn't be too hard. Regards, Tommaso [1] : http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima [2] : http://wiki.apache.org/solr/UpdateRequestProcessor 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid : Can you extract names, locations etc using OpenNLP in plain/straight java program? If yes, here are two seperate options : 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example to integrate your NER code into it and write your own indexing code. You have the full power here. No solr-plugins are involved. 2) Use 'Implementing a conditional copyField' given here : http://wiki.apache.org/solr/UpdateRequestProcessor as an example and integrate your NER code into it. Please note that these are separate ways to enrich your incoming documents, choose either (1) or (2). On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Okay, but i dint understand what you said. Can you please elaborate. Thanks, Vivek On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vivekanand, I have never use UIMA+Solr before. Personally I think it takes more time to learn how to configure/use these
Installing solr on tomcat 7.x | Window 8
I am trying to follow official document as well other resource available on the net but unable to run solr on my tomcat. I am trying to install and run `solr-4.10.0` on tomcat. this is what I have done so far 1. Copy solr-4.10.0.war to tomcat web-app and renamed it to solr.war. 2. Created a folder in my `D` drive with name `solr-home`. 3. copied everything from `solr-4.10.0\example\solr` and pasted it in solr-home` folder. 4. Through Environment variable , under user variable, I set following path `solr.solr.home=D:\solr-home` Started tomcat server, it is starting without any error / exception, but when I try to hit following URL `http://localhost:8080/solr/`, I am getting following error `message {msg=SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Error loading solr config from solr/collection1\solrconfig.xml,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Error loading solr config from solr/collection1\solrconfig.xml at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:745) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:307) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:603) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Could not load conf for core collection1: Error loading solr config from solr/collection1\solrconfig.xml at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:66) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:489) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249) at java.util.concurrent.FutureTask.run(Unknown Source) ... 3 more Caused by: org.apache.solr.common.SolrException: Error loading solr config from solr/collection1\solrconfig.xml at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:148) at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) ... 7 more Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or 'C:\Program Files\Apache Software Foundation\Tomcat 7.0\solr\collection1\conf' at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:362) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:308) at org.apache.solr.core.Config.init(Config.java:116) at org.apache.solr.core.Config.init(Config.java:86) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:161) at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:144) ... 9 more ,code=500}` I am not sure where I am doing wrong or is it so tricky to install `Solr`. -- With Regards Umesh Awasthi http://www.travellingrants.com/
Problem while extending TokenizerFactory in Solr 4.4.0
Hi All, I’m using Solr 4.4.0 distro and now, I have a strange issue while extending TokenizerFactory with a custom class. This is an excerpt of pom I use: properties solr.version4.4.0/solr.version /properties dependency groupIdorg.apache.lucene/groupId artifactId*lucene*-core/artifactId version${solr.version}/version /dependency dependency groupIdorg.apache.lucene/groupId artifactId*lucene*-analyzers-common/artifactId version${solr.version}/version /dependency dependency groupIdorg.apache.lucene/groupId artifactId*lucene*-*queryparser*/artifactId version${solr.version}/version /dependency dependency groupIdorg.apache.solr/groupId artifactId*solr*-core/artifactId version${solr.version}/version /dependency I always get the exception below during solr engine initialization: com.mytest.tokenizer.RelationChunkTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:619) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: Error instantiating class: 'com.mytest.tokenizer.RelationChunkTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 14 more Caused by: org.apache.solr.common.SolrException: Error instantiating class: 'com.mytest.tokenizer.RelationChunkTokenizerFactory' at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:556) at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342) at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 18 more Caused by: java.lang.NoSuchMethodException: com.mytest.tokenizer.RelationChunkTokenizerFactory.init(java.util.Map) at java.lang.Class.getConstructor0(Unknown Source) at java.lang.Class.getConstructor(Unknown Source) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:552) ... 21 more 8604 [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer û null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1150) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:666) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType rel: Plugin init failure for [schema.xml] analyzer/tokenizer: Error instantiating class: 'com.altilia. platform.tokenizer.RelationChunkTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at
RE: Solr Spellcheck suggestions only return from /select handler when returning search results
Thomas, It looks like you've set things up correctly in that while the user is searching against a stemmed field (name), spellcheck is checking against a lightly-analyzed copy of it (spell). This is the right way to do it as spellcheck against stemmed forms is usually undesirable. But as you've experienced, you will sometimes get results (due to stemming) and also suggestions (because the spellechecker is looking at unstemmed forms). If you do not want spellcheck to return anything when you get results, you can set spellcheck.maxResultsForSuggest=0. Now keeping in mind we're comparing unstemmed forms, can you verify you indeed have something in your index that is within 2 edits of ichtscheiben ? My guess is you probably don't, which would be why you do not get spelling results in that case. Also, even if you do have something within 2 edits, if ichtscheiben occurs in your index, by default it won't try to correct it at all (even if the query returns nothing, maybe because of filters or other required terms on the query). In this case you need to set spellcheck.alternativeTermCount to a non-zero value (try maybe 5). See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount and following sections. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] Sent: Wednesday, September 10, 2014 5:00 AM To: Solr user Subject: Solr Spellcheck suggestions only return from /select handler when returning search results Hi, I'm experimenting with the Spellcheck component and have therefor used the example configuration for spell checking to try things out. My solrconfig.xml looks like this: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespell/str !-- Multiple Spell Checkers can be declared and used by this component -- !-- a spellchecker built from a field of the main index -- lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- str name=distanceMeasureinternal/str !-- uncomment this to require suggestions to occur in 1% of the documents float name=thresholdTokenFrequency.01/float -- /lst !-- a spellchecker that can break or combine words. See /spell handler below for usage -- lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldspell/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent And I've added the spellcheck component to my /select request handler: requestHandler name=/select class=solr.SearchHandler ... arr name=last-components strspellcheck/str /arr /requestHandler I have built up the spellchecker source in the schema.xml from the name field: field name=spell type=spell indexed=true stored=true required=false multiValued=false/ copyField source=name dest=spell maxChars=3 / ... fieldType name=spell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ /analyzer /fieldType As I'm querying the /select request handler, I should get spellcheck suggestions with my results. However, I rarely get a suggestion. Examples: query: Sichtscheibe, spellcheck suggestion: Sichtscheiben (works) query: Sichtscheib, spellcheck suggestion: Sichtscheiben (works) query: ichtscheiben, no spellcheck suggestions As far as I can identify, I only get suggestions when I get real search results. I get results for the first 2 examples, because the german StemFilterFactory translates Sichtscheibe and Sichtscheiben into Sichtscheib, so there are matches found. However, the third query should result in a suggestion, as the Levenshtein distance is less than in the second example. Suggestions, improvements, corrections?
Re: Problem while extending TokenizerFactory in Solr 4.4.0
On 9/10/2014 7:14 AM, Francesco Valentini wrote: I’m using Solr 4.4.0 distro and now, I have a strange issue while extending TokenizerFactory with a custom class. I think what we have here is a basic Java error, nothing specific to Solr. This jumps out at me: Caused by: java.lang.NoSuchMethodException: com.mytest.tokenizer.RelationChunkTokenizerFactory.init(java.util.Map) at java.lang.Class.getConstructor0(Unknown Source) at java.lang.Class.getConstructor(Unknown Source) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:552) ... 21 more Java is trying to execute a method that doesn't exist. The getConstructor pieces after the message suggest that perhaps it's a constructor with a Map as an argument, but I'm not familiar enough with this error to know whether it's trying to run a constructor that doesn't exist, or whether it's trying to actually use a method called init. The constructor in TokenizerFactory is protected, and all of the existing descendants that I looked at have a public constructor ... this message would make sense in all of the following situations: 1) You didn't create a constructor for your object with a Map argument. 2) You made your constructor protected. 3) You made your constructor private. Thanks, Shawn
Re: Installing solr on tomcat 7.x | Window 8
On 9/10/2014 6:45 AM, Umesh Awasthi wrote: I am trying to follow official document as well other resource available on the net but unable to run solr on my tomcat. I am trying to install and run `solr-4.10.0` on tomcat. this is what I have done so far 1. Copy solr-4.10.0.war to tomcat web-app and renamed it to solr.war. 2. Created a folder in my `D` drive with name `solr-home`. 3. copied everything from `solr-4.10.0\example\solr` and pasted it in solr-home` folder. 4. Through Environment variable , under user variable, I set following path `solr.solr.home=D:\solr-home` Started tomcat server, it is starting without any error / exception, but when I try to hit following URL `http://localhost:8080/solr/`, I am getting following error `message {msg=SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Error loading solr config from solr/collection1\solrconfig.xml,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Error loading solr config from solr/collection1\solrconfig.xml at The path in the error message is wrong. See SOLR-5814. https://issues.apache.org/jira/browse/SOLR-5814 Is there anything in the log before this message? You will need to find the actual solr logfile ... because you used tomcat and not the jetty included with Solr, I cannot tell you where this logfile is, although if you copied the logging jars and the logging config from the example, it will be logs\solr.log, relative to the current working directory of the process that started tomcat. A side note: Windows 8 is a client operating system. Microsoft has crippled their client operating systems in some way compared to their server operating systems. Heavy multi-threaded server workloads like Solr will not work as well. I don't know exactly what the differences are. If you really want to run Solr on a Windows system, you really should put it on Server 2012, not Windows 8. Unfortunately, their server operating systems have a rather high price tag. It will be the opinion of most people here that you and your pocketbook would be far happier with the results of running on Linux -- better performance and no cost. Thanks, Shawn
Re: Problem while extending TokenizerFactory in Solr 4.4.0
Hi Shawn, thank you very much for your quick anwser, I fixed it. Thanks Francesco 2014-09-10 15:34 GMT+02:00 Shawn Heisey s...@elyograg.org: On 9/10/2014 7:14 AM, Francesco Valentini wrote: I’m using Solr 4.4.0 distro and now, I have a strange issue while extending TokenizerFactory with a custom class. I think what we have here is a basic Java error, nothing specific to Solr. This jumps out at me: Caused by: java.lang.NoSuchMethodException: com.mytest.tokenizer.RelationChunkTokenizerFactory.init(java.util.Map) at java.lang.Class.getConstructor0(Unknown Source) at java.lang.Class.getConstructor(Unknown Source) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:552) ... 21 more Java is trying to execute a method that doesn't exist. The getConstructor pieces after the message suggest that perhaps it's a constructor with a Map as an argument, but I'm not familiar enough with this error to know whether it's trying to run a constructor that doesn't exist, or whether it's trying to actually use a method called init. The constructor in TokenizerFactory is protected, and all of the existing descendants that I looked at have a public constructor ... this message would make sense in all of the following situations: 1) You didn't create a constructor for your object with a Map argument. 2) You made your constructor protected. 3) You made your constructor private. Thanks, Shawn
Modify Schema - Schema API
In addition to adding new fields to the schema, is there a way to modify an existing field? If I created a field called userID as a long, but decided later that it should be a string? Thank you! -Joe
Re: Edismax mm and efficiency
We do that strict/loose query sequence, but on the client side with two requests. Would you consider contributing the QueryComponent? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 10, 2014, at 3:47 AM, Peter Keegan peterlkee...@gmail.com wrote: I implemented a custom QueryComponent that issues the edismax query with mm=100%, and if no results are found, it reissues the query with mm=1. This doubled our query throughput (compared to mm=1 always), as we do some expensive RankQuery processing. For your very long student queries, mm=100% would obviously be too high, so you'd have to experiment. On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood wun...@wunderwood.org wrote: Great! We have some very long queries, where students paste entire homework problems. One of them was 1051 words. Many of them are over 100 words. This could help. In the Jira discussion, I saw some comments about handling the most sparse lists first. We did something like that in the Infoseek Ultra engine about twenty years ago. Short termlists (documents matching a term) were processed first, which kept the in-memory lists of matching docs small. It also allowed early short-circuiting for no-hits queries. What would be a high mm value, 75%? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: indeed https://issues.apache.org/jira/browse/LUCENE-4571 my feeling is it gives a significant gain in mm high values. On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood wun...@wunderwood.org wrote: Are there any speed advantages to using “mm”? I can imagine pruning the set of matching documents early, which could help, but is that (or something else) done? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Modify Schema - Schema API
Hi Joseph, It isn't supported by an exiting REST API (if that was your question) but you can always edit the schema manually (if it isn't managed), upload the new schema and reload the collections (or cores in case of non-SolrCloud mode). Do remember that changing the field type might require you to reindex your data. There's an open JIRA for that one and I think someone would get to it sometime in the reasonably near future. JIRA: https://issues.apache.org/jira/browse/SOLR-5289 On Wed, Sep 10, 2014 at 8:05 AM, Joseph Obernberger joseph.obernber...@gmail.com wrote: In addition to adding new fields to the schema, is there a way to modify an existing field? If I created a field called userID as a long, but decided later that it should be a string? Thank you! -Joe -- Anshum Gupta http://www.anshumgupta.net
Re: Modify Schema - Schema API
Thank you - yes that was my question. I should have stated that it was for SolrCloud and hence a managed schema. Could I bring down the shards, edit the managed schema on zookeeper, fire the shards back up and re-index? -Joe On Wed, Sep 10, 2014 at 11:50 AM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Joseph, It isn't supported by an exiting REST API (if that was your question) but you can always edit the schema manually (if it isn't managed), upload the new schema and reload the collections (or cores in case of non-SolrCloud mode). Do remember that changing the field type might require you to reindex your data. There's an open JIRA for that one and I think someone would get to it sometime in the reasonably near future. JIRA: https://issues.apache.org/jira/browse/SOLR-5289 On Wed, Sep 10, 2014 at 8:05 AM, Joseph Obernberger joseph.obernber...@gmail.com wrote: In addition to adding new fields to the schema, is there a way to modify an existing field? If I created a field called userID as a long, but decided later that it should be a string? Thank you! -Joe -- Anshum Gupta http://www.anshumgupta.net
Problems for indexing large documents on SolrCloud
Hi, I have some problems for indexing large documents in a SolrCloud cluster of 3 servers (Solr 4.8.1) with 3 shards and 2 replicas for each shard on Tomcat 7. For a specific document (with 300 K values in a multivalued field), I couldn't index it on SolrCloud but I could do it in a single instance of Solr on my own PC. The indexation is done with Solarium from a database. The data indexed are e-commerce products with classic fields like name, price, description, instock, etc... The large field (type int) is constitued of other products ids. The only difference with other documents well-indexed on Solr is the size of that multivalued field. Indeed, other documents well-indexed have all between 100K values and 200 K values for that field. The index size is 11 Mb for 20 documents. To solve it, I tried to change several parameters including ZKTimeout in solr.xml : In solrcloud section : int name=zkClientTimeout6/int int name=distribUpdateConnTimeout10/int int name=distribUpdateSoTimeout10/int In shardHandlerFactory section : int name=socketTimeout${socketTimeout:10}/int int name=connTimeout${connTimeout:10}/int I also tried to increase these values in solrconfig.xml : requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=1 formdataUploadLimitInKB=10 addHttpRequestToContext=false/ I also tried to increase the quantity of RAM (there are VMs) : each server has 4 Gb of RAM with 3Gb for the JVM. Are there other settings which can solve the problem that I would have forgotten ? The error messages are : ERROR SolrDispatchFilter null:java.lang.RuntimeException: [was class java.net.SocketException] Connection reset ERROR SolrDispatchFilter null:ClientAbortException: java.net.SocketException: broken pipe ERROR SolrDispatchFilter null:ClientAbortException: java.net.SocketException: broken pipe ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected EOF in attribute value Thanks, Olivier SolrCore org.apache.solr.common.SolrException: Unexpected end of input block in start tag
Re: Wildcard in FL parameter not working with Solr 4.10.0
This may have been introduced by changes made to solve https://issues.apache.org/jira/browse/SOLR-5968 I created https://issues.apache.org/jira/browse/SOLR-6501 to track the new bug. On Tue, Sep 9, 2014 at 4:53 PM, Mike Hugo m...@piragua.com wrote: Hello, With Solr 4.7 we had some queries that return dynamic fields by passing in a fl=*_exact parameter; this is not working for us after upgrading to Solr 4.10.0. This appears to only be a problem when requesting wildcarded fields via SolrJ With Solr 4.10.0 - I downloaded the binary and set up the example: cd example java -jar start.jar java -jar post.jar solr.xml monitor.xml In a browser, if I request http://localhost:8983/solr/collection1/select?q=*:*wt=jsonindent=true *fl=*d* All is well with the world: {responseHeader: {status: 0,QTime: 1,params: {fl: *d,indent : true,q: *:*,wt: json}},response: {numFound: 2,start: 0, docs: [{id: SOLR1000},{id: 3007WFP}]}} However if I do the same query with SolrJ (groovy script) @Grab(group = 'org.apache.solr', module = 'solr-solrj', version = '4.10.0') import org.apache.solr.client.solrj.SolrQuery import org.apache.solr.client.solrj.impl.HttpSolrServer HttpSolrServer solrServer = new HttpSolrServer( http://localhost:8983/solr/collection1;) SolrQuery q = new SolrQuery(*:*) *q.setFields(*d)* println solrServer.query(q) No fields are returned: {responseHeader={status=0,QTime=0,params={fl=*d,q=*:*,wt=javabin,version=2}},response={numFound=2,start=0,docs=[*SolrDocument{}, SolrDocument{}*]}} Any ideas as to why when using SolrJ wildcarded fl fields are not returned? Thanks, Mike
Re: Modify Schema - Schema API
You don't need to bring down the shards/collections, instead here's what you can do: * Retain the filename (managed_schema, if you didn't change the default resource name). * Edit the file locally * Upload it to replace the current zk file. * Reload the collection(s). * Reindex Here's another thing you can do: * Upload the updated configs to zk * Create a new collection (different name) using the new configs * Reindex data to the new collection. * Use collection aliasing to swap the old/new collections. (http://www.anshumgupta.net/2013/10/collection-aliasing-in-solrcloud.html) All this while, you wouldn't really need to shut down the Solr cluster/collection etc. On Wed, Sep 10, 2014 at 8:56 AM, Joseph Obernberger joseph.obernber...@gmail.com wrote: Thank you - yes that was my question. I should have stated that it was for SolrCloud and hence a managed schema. Could I bring down the shards, edit the managed schema on zookeeper, fire the shards back up and re-index? -Joe On Wed, Sep 10, 2014 at 11:50 AM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Joseph, It isn't supported by an exiting REST API (if that was your question) but you can always edit the schema manually (if it isn't managed), upload the new schema and reload the collections (or cores in case of non-SolrCloud mode). Do remember that changing the field type might require you to reindex your data. There's an open JIRA for that one and I think someone would get to it sometime in the reasonably near future. JIRA: https://issues.apache.org/jira/browse/SOLR-5289 On Wed, Sep 10, 2014 at 8:05 AM, Joseph Obernberger joseph.obernber...@gmail.com wrote: In addition to adding new fields to the schema, is there a way to modify an existing field? If I created a field called userID as a long, but decided later that it should be a string? Thank you! -Joe -- Anshum Gupta http://www.anshumgupta.net -- Anshum Gupta http://www.anshumgupta.net
RE: [Announce] Apache Solr 4.10 with RankingAlgorithm 1.5.4 available now with complex-lsa algorithm (simulates human language acquisition and recognition)
Hi Deigo: Not sure of solr-sense, but complex-lsa is an enhanced lsa implementation with TERM-DOCUMENT Similarity, etc. (not found in lsa). The relevance/ranking is again different and is more accurate as it uses the RankingAlgorithm scoring model. The query performance gain with this version is significant from the last release, a TERM-SIMILARITY query that used to take about 8-9 seconds now takes just 30ms to 80ms. Lot of performance improvements ... Warm Regards, Nagendra Nagarajayya http://solr-ra.tgels.org http://elasticsearch-ra.tgels.org http://rankingalgorithm.tgels.org (accurate and relevant, simulates human language acquisition and recognition) -Original Message- From: Diego Fernandez [mailto:difer...@redhat.com] Sent: Tuesday, September 9, 2014 10:38 AM To: solr-user@lucene.apache.org Cc: gene...@lucene.apache.org Subject: Re: [Announce] Apache Solr 4.10 with RankingAlgorithm 1.5.4 available now with complex-lsa algorithm (simulates human language acquisition and recognition) Interesting. Does anyone know how that compares to this http://www.searchbox.com/products/searchbox-plugins/solr-sense/? Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics - Original Message - Hi! I am very excited to announce the availability of Apache Solr 4.10 with RankingAlgorithm 1.5.4. Solr 4.10.0 with RankingAlgorithm 1.5.4 includes support for complex-lsa. complex-lsa simulates human language acquisition and recognition (see demo http://solr-ra.tgels.org/rankingsearchlsa.jsp ) and can retrieve semantically related/hidden relationships between terms, sentences, paragraphs, chapters, books, images, etc. Three new similarities, TERM_SIMILARITY, DOCUMENT_SIMILARITY, TERM_DOCUMENT_SIMILARITY enable these with improved precision. A query for “holy AND ghost” returns jesus/christ as the top results for the bible corpus with no effort to introduce this relationship (see demo http://solr-ra.tgels.org/rankingsearchlsa.jsp ). This version adds support for multiple linear algebra libraries. complex-lsa does a large amount of this calcs so speeding this up should speed up the retrieval etc. EJML is the fastest if you are using complex-lsa for a smaller set of documents, while MTJ is faster as your document collection becomes bigger. MTJ can also use BLAS/LAPACK, etc installed on your system to further improve performance with native execution. The performance is similar to a C/C++ application. It can also make use of GPUs or Intel's mkl library if you have access to it. RankingAlgorithm 1.5.4 with complex-lsa supports the entire Lucene Query Syntax , ± and/or boolean/dismax/glob/regular expression/wildcard/fuzzy/prefix/suffix queries with boosting, etc. This version increases performance, with increased accuracy and relevance for Document similarity, fixes problems with phrase queries, Boolean queries, etc. You can get more information about complex-lsa and realtime-search performance from here: http://solr-ra.tgels.org/wiki/en/Complex-lsa-demo You can download Solr 4.10 with RankingAlgorithm 1.5.4 from here: http://solr-ra.tgels.org Please download and give the new version a try. Regards, Nagendra Nagarajayya http://solr-ra.tgels.org http://elasticsearch-ra.tgels.org http://rankingalgorithm.tgels.org Note: 1. Apache Solr 4.10 with RankingAlgorithm 1.5.4 is an external project.
Re: Modify Schema - Schema API
Wow - that's really cool! Thank you! -Joe On Wed, Sep 10, 2014 at 12:29 PM, Anshum Gupta ans...@anshumgupta.net wrote: You don't need to bring down the shards/collections, instead here's what you can do: * Retain the filename (managed_schema, if you didn't change the default resource name). * Edit the file locally * Upload it to replace the current zk file. * Reload the collection(s). * Reindex Here's another thing you can do: * Upload the updated configs to zk * Create a new collection (different name) using the new configs * Reindex data to the new collection. * Use collection aliasing to swap the old/new collections. (http://www.anshumgupta.net/2013/10/collection-aliasing-in-solrcloud.html) All this while, you wouldn't really need to shut down the Solr cluster/collection etc. On Wed, Sep 10, 2014 at 8:56 AM, Joseph Obernberger joseph.obernber...@gmail.com wrote: Thank you - yes that was my question. I should have stated that it was for SolrCloud and hence a managed schema. Could I bring down the shards, edit the managed schema on zookeeper, fire the shards back up and re-index? -Joe On Wed, Sep 10, 2014 at 11:50 AM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Joseph, It isn't supported by an exiting REST API (if that was your question) but you can always edit the schema manually (if it isn't managed), upload the new schema and reload the collections (or cores in case of non-SolrCloud mode). Do remember that changing the field type might require you to reindex your data. There's an open JIRA for that one and I think someone would get to it sometime in the reasonably near future. JIRA: https://issues.apache.org/jira/browse/SOLR-5289 On Wed, Sep 10, 2014 at 8:05 AM, Joseph Obernberger joseph.obernber...@gmail.com wrote: In addition to adding new fields to the schema, is there a way to modify an existing field? If I created a field called userID as a long, but decided later that it should be a string? Thank you! -Joe -- Anshum Gupta http://www.anshumgupta.net -- Anshum Gupta http://www.anshumgupta.net
Re: Edismax mm and efficiency
Sure. I created SOLR-6502. The tricky part was handling the behavior in a sharded index. When the index is sharded. the response from each shard will contain a parameter that indicates if the search results are from the conjunction of all keywords (mm=100%), or from disjunction (mm=1). If the shards contain both types, then only return the results from the conjunction. This is necessary in order to get the same results independent of the number of shards. Peter On Wed, Sep 10, 2014 at 11:07 AM, Walter Underwood wun...@wunderwood.org wrote: We do that strict/loose query sequence, but on the client side with two requests. Would you consider contributing the QueryComponent? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 10, 2014, at 3:47 AM, Peter Keegan peterlkee...@gmail.com wrote: I implemented a custom QueryComponent that issues the edismax query with mm=100%, and if no results are found, it reissues the query with mm=1. This doubled our query throughput (compared to mm=1 always), as we do some expensive RankQuery processing. For your very long student queries, mm=100% would obviously be too high, so you'd have to experiment. On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood wun...@wunderwood.org wrote: Great! We have some very long queries, where students paste entire homework problems. One of them was 1051 words. Many of them are over 100 words. This could help. In the Jira discussion, I saw some comments about handling the most sparse lists first. We did something like that in the Infoseek Ultra engine about twenty years ago. Short termlists (documents matching a term) were processed first, which kept the in-memory lists of matching docs small. It also allowed early short-circuiting for no-hits queries. What would be a high mm value, 75%? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: indeed https://issues.apache.org/jira/browse/LUCENE-4571 my feeling is it gives a significant gain in mm high values. On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood wun...@wunderwood.org wrote: Are there any speed advantages to using “mm”? I can imagine pruning the set of matching documents early, which could help, but is that (or something else) done? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Reading files in default Conf dir
Thank you for the inputs Jorge. Now i am getting the ResourceLoader using SolrCore API. Before: return new HashSetString(new SolrResourceLoader(null). getLines(stopwords.txt)); After: return new HashSetString(core.getResourceLoader().getLines( stopwords.txt)); I am able to load the resource successfully. Thanks, Ramana. On Wed, Sep 10, 2014 at 12:34 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: What are you developing a custom search component? update processor? a different class for one of the zillion moving parts of Solr? If you have access to a SolrCore instance you could use it to get access of, essentially using the SolrCore instance specific to the current core will cause the lookup of the file to be local to the conf directory of the specified core. In a custom UpdateRequestProcessorFactory which implements the SolrCoreAware interface I’ve the following code: @Override public void inform(SolrCore solrCore) { SolrResourceLoader loader = solrCore.getResourceLoader(); try { ListString lines = loader.getLines(patternFile); if (false == lines.isEmpty()) { for (String s : lines) { this.patterns.add(Pattern.compile(s)); } } } catch (IOException e) { SolrCore.log.error(String.format(File %s could not be loaded, patternFile)); } Essentially I ask the actually core (solrCore) to provide a SolrResourceLoader for it’s conf file, in your case you are just passing it null, which is causing (I think, haven’t tested) to instantiate a SolrResourceLoader of the Solr instance (judging for the paths you’ve placed in your mail) instead of a SolrResourceLoader relative to your core/collection that is what you want. So, bottom line implement the SolrCoreAware interface and use the SolrResourceLoader provided by this instance, and a little more info could be helpful as we can’t figure what Solr “part” are you developing. Regards, On Sep 9, 2014, at 2:37 PM, Ramana OpenSource ramanaopensou...@gmail.com wrote: Hi, I am trying to load one of the file in conf directory in SOLR, using below code. return new HashSetString(new SolrResourceLoader(null).getLines(stopwords.txt)); The stopwords.txt file is available in the location solr\example\solr\collection1\conf. When i debugged the SolrResourceLoader API, It is looking at the below locations to load the file: ...solr\example\solr\conf\stopwords.txt ...solr\example\stopwords.txt But as the file was not there in any of above location...it failed. How to load the files in the default conf directory using SolrResourceLoader API ? I am newbie to SOLR. Any help would be appreciated. Thanks, Ramana. Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
How to get access to SolrCore in init method of Handler Class
Hi, I need to load a file in instance's conf directory and this data is going to be used in handleRequestBody() implementation. As of now, i am loading the file in the handleRequestBody method like below. SolrCore solrCore = req.getCore(); solrCore .getResourceLoader().getLines(fileToLoad); But, To make it better, I would like to load this file only once and in the init() method of handler class. I am not sure how to get the access of SolrCore in the init method. Any help would be appreciated. Thanks, Ramana.
Re: How to get access to SolrCore in init method of Handler Class
: But, To make it better, I would like to load this file only once and in the : init() method of handler class. I am not sure how to get the access of : SolrCore in the init method. you can't access the SolrCore during hte init() method, because at the time it's called the SolrCore itself is not yet fully initialized. what you can do is implement the SolrCoreAware interface, and then you will be garunteed that *after* your init method is called, and before you are ever asked to handle any requests, your inform(SolrCore) method will be called... -Hoss http://www.lucidworks.com/
Inconsistent relevancy score between browser refreshes
I am seeing different relevancy scores for the same documents, between browser refreshes. Any ideas why? The query is the same, index is the same - why would score change? Example: First request returns: doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.463463/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.463463/float /doc doc str name=titleVertebrobasilar Stroke/str float name=score3.460521/float /doc Second request: doc str name=titleVertebrobasilar Stroke/str float name=score3.460521/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.4484053/float /doc doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.4484053/float /doc Third request: doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.463463/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.463463/float /doc doc str name=titleVertebrobasilar Stroke/str float name=score3.402718/float /doc Jing
Re: Creating Solr servers dynamically in Multicore folder
You should be good to go. Do note that you can the variables that were defined in your schema.xml in the individual core.properties file for the core in question if you need to, although the defaults work for most people's needs. Best, Erick On Tue, Sep 9, 2014 at 9:15 PM, nishwanth nishwanth.vupp...@gmail.com wrote: Hello Erick, Thanks for the response . My cores got created now after removing the core.properties in this location and the existing core folders . Also i commented the core related information on solr.xml . Are there going to be any further problems with the approach i followed. For the new cores i created could see the conf,data and core.properties file getting created. Thanks.. -- View this message in context: http://lucene.472066.n3.nabble.com/Creating-Solr-servers-dynamically-in-Multicore-folder-tp4157550p4157747.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem while extending TokenizerFactory in Solr 4.4.0
Francesco: What was the fix? It'll help others with the same issue. On Wed, Sep 10, 2014 at 6:53 AM, Francesco Valentini valentin...@gmail.com wrote: Hi Shawn, thank you very much for your quick anwser, I fixed it. Thanks Francesco 2014-09-10 15:34 GMT+02:00 Shawn Heisey s...@elyograg.org: On 9/10/2014 7:14 AM, Francesco Valentini wrote: I’m using Solr 4.4.0 distro and now, I have a strange issue while extending TokenizerFactory with a custom class. I think what we have here is a basic Java error, nothing specific to Solr. This jumps out at me: Caused by: java.lang.NoSuchMethodException: com.mytest.tokenizer.RelationChunkTokenizerFactory.init(java.util.Map) at java.lang.Class.getConstructor0(Unknown Source) at java.lang.Class.getConstructor(Unknown Source) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:552) ... 21 more Java is trying to execute a method that doesn't exist. The getConstructor pieces after the message suggest that perhaps it's a constructor with a Map as an argument, but I'm not familiar enough with this error to know whether it's trying to run a constructor that doesn't exist, or whether it's trying to actually use a method called init. The constructor in TokenizerFactory is protected, and all of the existing descendants that I looked at have a public constructor ... this message would make sense in all of the following situations: 1) You didn't create a constructor for your object with a Map argument. 2) You made your constructor protected. 3) You made your constructor private. Thanks, Shawn
Re: Problems for indexing large documents on SolrCloud
bq: org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier This is very often an indication that your packets are being truncated by something in the chain. In your case, make sure that Tomcat is configured to handle inputs of the size that you're sending. This may be happening before things get to Solr, in which case your settings in solrconfig.xml aren't germane, the problem is earlier than than. A semi-smoking-gun here is that there's a size of your multivalued field that seems to break things... That doesn't rule out time problems of course. But I'd look at the Tomcat settings for maximum packet size first. Best, Erick On Wed, Sep 10, 2014 at 9:11 AM, Olivier olivau...@gmail.com wrote: Hi, I have some problems for indexing large documents in a SolrCloud cluster of 3 servers (Solr 4.8.1) with 3 shards and 2 replicas for each shard on Tomcat 7. For a specific document (with 300 K values in a multivalued field), I couldn't index it on SolrCloud but I could do it in a single instance of Solr on my own PC. The indexation is done with Solarium from a database. The data indexed are e-commerce products with classic fields like name, price, description, instock, etc... The large field (type int) is constitued of other products ids. The only difference with other documents well-indexed on Solr is the size of that multivalued field. Indeed, other documents well-indexed have all between 100K values and 200 K values for that field. The index size is 11 Mb for 20 documents. To solve it, I tried to change several parameters including ZKTimeout in solr.xml : In solrcloud section : int name=zkClientTimeout6/int int name=distribUpdateConnTimeout10/int int name=distribUpdateSoTimeout10/int In shardHandlerFactory section : int name=socketTimeout${socketTimeout:10}/int int name=connTimeout${connTimeout:10}/int I also tried to increase these values in solrconfig.xml : requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=1 formdataUploadLimitInKB=10 addHttpRequestToContext=false/ I also tried to increase the quantity of RAM (there are VMs) : each server has 4 Gb of RAM with 3Gb for the JVM. Are there other settings which can solve the problem that I would have forgotten ? The error messages are : ERROR SolrDispatchFilter null:java.lang.RuntimeException: [was class java.net.SocketException] Connection reset ERROR SolrDispatchFilter null:ClientAbortException: java.net.SocketException: broken pipe ERROR SolrDispatchFilter null:ClientAbortException: java.net.SocketException: broken pipe ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected EOF in attribute value Thanks, Olivier SolrCore org.apache.solr.common.SolrException: Unexpected end of input block in start tag
Re: Inconsistent relevancy score between browser refreshes
More info please. 1 Are there replicas involved? 2 Is there any indexing going on? 3 If more than one node, did you optimize? 4 Did you optimize between refreshes? Best, Erick On Wed, Sep 10, 2014 at 12:28 PM, Tao, Jing j...@webmd.net wrote: I am seeing different relevancy scores for the same documents, between browser refreshes. Any ideas why? The query is the same, index is the same - why would score change? Example: First request returns: doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.463463/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.463463/float /doc doc str name=titleVertebrobasilar Stroke/str float name=score3.460521/float /doc Second request: doc str name=titleVertebrobasilar Stroke/str float name=score3.460521/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.4484053/float /doc doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.4484053/float /doc Third request: doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.463463/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.463463/float /doc doc str name=titleVertebrobasilar Stroke/str float name=score3.402718/float /doc Jing
RE: Inconsistent relevancy score between browser refreshes
1) It is a SolrCloud setup on 4 servers, 4 shards, replication factor of 2. 2) There is no indexing going on. 3) No, I did not optimize. 4) Did not optimize between refreshes. Thanks, Jing -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, September 10, 2014 4:09 PM To: solr-user@lucene.apache.org Subject: Re: Inconsistent relevancy score between browser refreshes More info please. 1 Are there replicas involved? 2 Is there any indexing going on? 3 If more than one node, did you optimize? 4 Did you optimize between refreshes? Best, Erick On Wed, Sep 10, 2014 at 12:28 PM, Tao, Jing j...@webmd.net wrote: I am seeing different relevancy scores for the same documents, between browser refreshes. Any ideas why? The query is the same, index is the same - why would score change? Example: First request returns: doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.463463/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.463463/float /doc doc str name=titleVertebrobasilar Stroke/str float name=score3.460521/float /doc Second request: doc str name=titleVertebrobasilar Stroke/str float name=score3.460521/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.4484053/float /doc doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.4484053/float /doc Third request: doc str name=titleStroke Anticoagulation and Prophylaxis/str float name=score3.463463/float /doc doc str name=titleHemorrhagic Stroke/str float name=score3.463463/float /doc doc str name=titleVertebrobasilar Stroke/str float name=score3.402718/float /doc Jing
Re: ExtractingRequestHandler indexing zip files
Thanks for the info Sergio. I updated my 4.8.1 version with that patch and SOLR 4216 (which was really the same thing). It took a day to get it to compile on my network and it still doesn't work. Did my config file look correct? I'm wondering if I need another param somewhere. Patch has to be applied to the source code and compile again Solr.war. If you do that then it works extracting the content of documents -- View this message in context: http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-indexing-zip-files-tp4138172p4158024.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr WARN Log
: I'm trying to upgrade Solr from version 4.2 to 4.9, since then I'm ... : haven't configured it. You can ignore this message. To get it to go The fact that a WARN is logged at all was a bug in 4.9 that got fixed in 4.10... https://issues.apache.org/jira/browse/SOLR-6179 -Hoss http://www.lucidworks.com/
Re: Problems for indexing large documents on SolrCloud
On 9/10/2014 2:05 PM, Erick Erickson wrote: bq: org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier This is very often an indication that your packets are being truncated by something in the chain. In your case, make sure that Tomcat is configured to handle inputs of the size that you're sending. This may be happening before things get to Solr, in which case your settings in solrconfig.xml aren't germane, the problem is earlier than than. A semi-smoking-gun here is that there's a size of your multivalued field that seems to break things... That doesn't rule out time problems of course. But I'd look at the Tomcat settings for maximum packet size first. The maximum HTTP request size is actually is controlled by Solr itself since 4.1, with changes committed for SOLR-4265. Changing the setting on Tomcat probably will not help. An example from my own config which sets this to 32MB - the default is 2048, or 2MB: requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=32768 formdataUploadLimitInKB=32768/ Thanks, Shawn
Re: How to get access to SolrCore in init method of Handler Class
Thanks Chris. I have implemented SolrCoreAware interface and loading the required file in the inform method. Thanks, Ramana. On Wed, Sep 10, 2014 at 10:59 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : But, To make it better, I would like to load this file only once and in the : init() method of handler class. I am not sure how to get the access of : SolrCore in the init method. you can't access the SolrCore during hte init() method, because at the time it's called the SolrCore itself is not yet fully initialized. what you can do is implement the SolrCoreAware interface, and then you will be garunteed that *after* your init method is called, and before you are ever asked to handle any requests, your inform(SolrCore) method will be called... -Hoss http://www.lucidworks.com/
Re: Creating Solr servers dynamically in Multicore folder
Hello Erick, Thanks for the response I have attached the core.properties and solr.xml for your reference. . solr.xml http://lucene.472066.n3.nabble.com/file/n4158124/solr.xml core.properties http://lucene.472066.n3.nabble.com/file/n4158124/core.properties Below is our plan on the creating cores. Every Tenant (user) is bound to some Contacts,sales,Orders and other information . Numbers of tenants for our application will be approximately 10,000. We are planning to create a Core for every Tenant and maintain the Contacts,sales,Orders and other information as a collection . So every time Tenant logs in this information will be used. Could you please let us know your thoughts on this approach. Regards, Nishwanth -- View this message in context: http://lucene.472066.n3.nabble.com/Creating-Solr-servers-dynamically-in-Multicore-folder-tp4157550p4158124.html Sent from the Solr - User mailing list archive at Nabble.com.