Re: EmbeddedSolrServer and StreamingUpdateSolrServer
Hi Mikhail Khludnev, THank you for your help. Let me explain you the scenario about JVM. The JVM in which tomcat is running will not be restarted every time the StreamingUpdateSolrServer is running where as the EmbeddedSolrServer is a fresh JVM instance(new process) every time. In this scenario the index is being corrupted. If I restart Tomcat(i.e. restart JVM in which StreamingupdateServer is running) after each of the index completion the index doesn't get corrupted. However, this is not a viable option for us because Solr will not be available to users during the restart. Let me know if you have any more thoughts on this. In case you dont, can you also let me know how can I seek help from others? Thanks again, PC Rao. -- View this message in context: http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3931636.html Sent from the Solr - User mailing list archive at Nabble.com.
Exception fixing docBase for context [error in opening zip file]
Hi, I am experiencing a problem starting solr with Tomcat 6. My system: Ubuntu 11. ii tomcat66.0.32-5ubuntu1.2 Servlet and JSP engine ii openjdk-6-jre 6b23~pre11-0ubuntu1.11.10.2 OpenJDK Java runtime, using Hotspot JIT I'm using the nightly build war file: apache-solr-4.0-2012-04-21_08-25-44.war Can anyone give me a pointer? Thanks. Below is the error message I got. 2012/4/23 下午 02:24:42 org.apache.coyote.http11.Http11Protocol init 資訊: Initializing Coyote HTTP/1.1 on http-8080 2012/4/23 下午 02:24:42 org.apache.catalina.startup.Catalina load 資訊: Initialization processed in 575 ms 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardService start 資訊: Starting service Catalina 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardEngine start 資訊: Starting Servlet Engine: Apache Tomcat/6.0.32 2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig deployDescriptor 資訊: Deploying configuration descriptor ROOT.xml 2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig deployDescriptor 資訊: Deploying configuration descriptor solr.xml 2012/4/23 下午 02:24:42 org.apache.catalina.startup.ContextConfig init 嚴重的: Exception fixing docBase for context [/solr] java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:131) at java.util.jar.JarFile.init(JarFile.java:150) at java.util.jar.JarFile.init(JarFile.java:87) at sun.net.www.protocol.jar.URLJarFile.init(URLJarFile.java:90) at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:66) at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:86) at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122) at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89) at org.apache.catalina.startup.ExpandWar.expand(ExpandWar.java:148) at org.apache.catalina.startup.ContextConfig.fixDocBase(ContextConfig.java:886) at org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:1021) at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:279) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.StandardContext.init(StandardContext.java:5707) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4449) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1315) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1061) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardContext resourcesStart 嚴重的: Error starting static Resources java.lang.IllegalArgumentException: Invalid or unreadable WAR file : /home/yclin/Projects/search/search/solr/wars/apache-solr-4.0-2012-04-21_08-25-44.war at org.apache.naming.resources.WARDirContext.setDocBase(WARDirContext.java:130) at org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4320) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4489) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) at
Re: # open files with SolrCloud
On Sat, Apr 21, 2012 at 9:57 PM, Yonik Seeley yo...@lucidimagination.com wrote: I can reproduce some kind of searcher leak issue here, even w/o SolrCloud, and I've opened https://issues.apache.org/jira/browse/SOLR-3392 With the fix integrated. I do not see the leaking problem anymore with my setup so it seems to be working now. -- Sami Siren
Re: 'Error 404: missing core name in path' in Solr
Looks like you need to select a core name on the admin UI before select search. Have a look in the solr.xml file in your solr home directory, what cores are defined? Solr is expecting the core name in the URL: http://localhost:8080/solr/CORENAME/admin/http://localhost:8080/solr/admin/ On Mon, Apr 23, 2012 at 12:58 AM, vasuj vasu.j...@live.in wrote: I http://lucene.472066.n3.nabble.com/file/n3931194/Screenshot_%2847%29.png used //server.deleteByQuery( *:* );// CAUTION: deletes everything! query in my solr indexing program. Since then i am receiving the error whenever , i go to http://localhost:8080/solr/admin/ and press search with query string : The error is HTTP Status 400 - Missing solr core name in path type Status report message Missing solr core name in path description The request sent by the client was syntactically incorrect (Missing solr core name in path). Apache Tomcat/7.0.21 -- View this message in context: http://lucene.472066.n3.nabble.com/Error-404-missing-core-name-in-path-in-Solr-tp3931194p3931194.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 'Error 404: missing core name in path' in Solr
Hi, Perhaps your search server uses a multi core setup? In that case you need your core name as part of the URL http://wiki.apache.org/solr/CoreAdmin#Example -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 23. apr. 2012, at 01:58, vasuj wrote: I http://lucene.472066.n3.nabble.com/file/n3931194/Screenshot_%2847%29.png used //server.deleteByQuery( *:* );// CAUTION: deletes everything! query in my solr indexing program. Since then i am receiving the error whenever , i go to http://localhost:8080/solr/admin/ and press search with query string : The error is HTTP Status 400 - Missing solr core name in path type Status report message Missing solr core name in path description The request sent by the client was syntactically incorrect (Missing solr core name in path). Apache Tomcat/7.0.21 -- View this message in context: http://lucene.472066.n3.nabble.com/Error-404-missing-core-name-in-path-in-Solr-tp3931194p3931194.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: StandardTokenizer and domain names containing digits
Steven A Rowe sarowe at syr.edu writes: StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary rules from Unicode 6.0.0 Standard Annex #29, a.k.a. UAX#29: http://www.unicode.org/reports/tr29/tr29- 17.html#Word_Boundaries. These rules don't include recognition of URLs or domain names. Lucene/Solr includes another tokenizer that does recognize URLs and domain names, in addition to the UAX#29 Word Boundary rules: UAX29URLEmailTokenizer http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailT okenizerFactory. (Stand-alone domain names are recognized as URLs.) My suggestion is that you add a filter (for both the indexing and querying) that splits tokens containing periods: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterF ilterFactory, something like (untested!): filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 generateWordParts=1 preserveOriginal=1 / Steve, Thank you very much for this reply, it helped immensely. In the end I've gone for your suggestion, plus a swap of StandardTokenizer - UAX29URLEmailTokenizer and setting autoGeneratePhraseQueries=true. The fieldType now looks like fieldType name=text_general class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.UAX29URLEmailTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=0 stemEnglishPossessive=0 generateWordParts=1 preserveOriginal=1 / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.UAX29URLEmailTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=0 stemEnglishPossessive=0 generateWordParts=1 preserveOriginal=1 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType autoGeneratePhraseQueries is set so that the tokens generated in the query analyzer behave more like tokens from a space delimited query. So ns1.define.logica.com finds a similar set of documents to ns1 define logica com (i.e. ns1 AND define AND logica AND com), rather than ns1 OR define OR logica OR com. Many thanks, Alex
Re: Solr Hanging
Hi I have succeeded in reproducing the scenario with two Solr instances running. They cover a single collection with two slices and two replica, two cores in each Solr instance. I have changed the number of threads that Jetty is allowed to use as follows: New class=org.mortbay.thread.QueuedThreadPool Set name=minThreads3/Set Set name=maxThreads3/Set Set name=lowThreads0/Set /New And when indexing a single document this works fine but when concurrently indexing 10 documents, Solr frequently hangs. I know that Jetty per default are allowed to use 10.000 threads, but in my other setup, all these 10.000 allowed thread are used on a single Solr instance (I have 7 Solr instances) after some days and the hanging scenario occurs. I'm not sure if just adjusting the allowed number of threads are the best solution and would like to get some input as what to expect and if there are other things I can adjust. My setup is as written before 7 Solr instances handling a single collection with 28 leaders and 28 replicas distributed fairly on the Solrs (8 cores on each Solr). Thanks for any input. Best regards Trym Den 19-04-2012 14:36, Yonik Seeley skrev: On Thu, Apr 19, 2012 at 4:25 AM, Trym R. Møllert...@sigmat.dk wrote: Hi I am using Solr trunk and have 7 Solr instances running with 28 leaders and 28 replicas for a single collection. After indexing a while (a couple of days) the solrs start hanging and doing a thread dump on the jvm I see blocked threads like the following: Thread 2369: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=158 (Compiled frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Compiled frame) - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399 (Compiled frame) - java.util.concurrent.ExecutorCompletionService.take() @bci=4, line=164 (Compiled frame) - org.apache.solr.update.SolrCmdDistributor.checkResponses(boolean) @bci=27, line=350 (Compiled frame) - org.apache.solr.update.SolrCmdDistributor.finish() @bci=18, line=98 (Compiled frame) - org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish() @bci=4, line=299 (Compiled frame) - org.apache.solr.update.processor.DistributedUpdateProcessor.finish() @bci=1, line=817 (Compiled frame) ... - org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, line=582 (Interpreted frame) I read the stack trace as my indexing client has indexed a document and this Solr is now waiting for the replica? to respond before returning an answer to the client. Correct. What's the full stack trace like on both a leader and replica? We need to know what the replica is blocking on. What version of trunk are you using? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Exception fixing docBase for context [error in opening zip file]
Hi, I have figured out this on my own. It was just a stupid permission thing. This error Exception fixing docBase for context java.util.zip.ZipException: error in opening zip file can be fixed by changing the permission of parent paths to 0755. find PARENT_PATH -type d -exec chmod 0755 {} \; Yung-chung Lin 2012/4/23 ☼ 林永忠 ☼ (Yung-chung Lin) henearkrx...@gmail.com Hi, I am experiencing a problem starting solr with Tomcat 6. My system: Ubuntu 11. ii tomcat66.0.32-5ubuntu1.2 Servlet and JSP engine ii openjdk-6-jre 6b23~pre11-0ubuntu1.11.10.2 OpenJDK Java runtime, using Hotspot JIT I'm using the nightly build war file: apache-solr-4.0-2012-04-21_08-25-44.war Can anyone give me a pointer? Thanks. Below is the error message I got. 2012/4/23 下午 02:24:42 org.apache.coyote.http11.Http11Protocol init 資訊: Initializing Coyote HTTP/1.1 on http-8080 2012/4/23 下午 02:24:42 org.apache.catalina.startup.Catalina load 資訊: Initialization processed in 575 ms 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardService start 資訊: Starting service Catalina 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardEngine start 資訊: Starting Servlet Engine: Apache Tomcat/6.0.32 2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig deployDescriptor 資訊: Deploying configuration descriptor ROOT.xml 2012/4/23 下午 02:24:42 org.apache.catalina.startup.HostConfig deployDescriptor 資訊: Deploying configuration descriptor solr.xml 2012/4/23 下午 02:24:42 org.apache.catalina.startup.ContextConfig init 嚴重的: Exception fixing docBase for context [/solr] java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:131) at java.util.jar.JarFile.init(JarFile.java:150) at java.util.jar.JarFile.init(JarFile.java:87) at sun.net.www.protocol.jar.URLJarFile.init(URLJarFile.java:90) at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:66) at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:86) at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122) at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89) at org.apache.catalina.startup.ExpandWar.expand(ExpandWar.java:148) at org.apache.catalina.startup.ContextConfig.fixDocBase(ContextConfig.java:886) at org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:1021) at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:279) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.StandardContext.init(StandardContext.java:5707) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4449) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1315) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1061) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 2012/4/23 下午 02:24:42 org.apache.catalina.core.StandardContext resourcesStart 嚴重的: Error starting static Resources java.lang.IllegalArgumentException: Invalid or unreadable WAR file : /home/yclin/Projects/search/search/solr/wars/apache-solr-4.0-2012-04-21_08-25-44.war at org.apache.naming.resources.WARDirContext.setDocBase(WARDirContext.java:130) at org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4320) at
Facing problem to integrate UIMA in SOLR
Hello all, I am facing problem to integrate the UIMA in SOLR. I followed the following steps, provided in README file shipped along with Uima to integrate it in Solr Step1. I set lib/ tags in solrconfig.xml appropriately to point the jar files. lib dir=../../contrib/uima/lib / lib dir=../../dist/ regex=apache-solr-uima-\d.*\.jar / Step2. modified my schema.xml adding the fields I wanted to hold metadata specifying proper values for type, indexed, stored and multiValued options as follows: field name=language type=string indexed=true stored=true required=false/ field name=concept type=string indexed=true stored=true multiValued=true required=false/ field name=sentence type=text indexed=true stored=true multiValued=true required=false / Step3. modified my solrconfig.xml adding the following snippet: updateRequestProcessorChain name=uima default=true processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters str name=keyword_apikeyVALID_ALCHEMYAPI_KEY/str str name=concept_apikeyVALID_ALCHEMYAPI_KEY/str str name=lang_apikeyVALID_ALCHEMYAPI_KEY/str str name=cat_apikeyVALID_ALCHEMYAPI_KEY/str str name=entities_apikeyVALID_ALCHEMYAPI_KEY/str str name=oc_licenseIDVALID_OPENCALAIS_KEY/str /lst str name=analysisEngine/org/apache/uima/desc/OverridingParamsExtServicesAE.xml/str bool name=ignoreErrorstrue/bool lst name=analyzeFields bool name=mergefalse/bool arr name=fields strtext/str /arr /lst lst name=fieldMappings lst name=type str name=nameorg.apache.uima.alchemy.ts.concept.ConceptFS/str lst name=mapping str name=featuretext/str str name=fieldconcept/str /lst /lst lst name=type str name=nameorg.apache.uima.alchemy.ts.language.LanguageFS/str lst name=mapping str name=featurelanguage/str str name=fieldlanguage/str /lst /lst lst name=type str name=nameorg.apache.uima.SentenceAnnotation/str lst name=mapping str name=featurecoveredText/str str name=fieldsentence/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Step 4: And finally created a new UpdateRequestHandler with the following: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processoruima/str /lst Further I indexed a word file called text.docx using the following command: curl http://localhost:8983/solr/update/extract?fmap.content=contentliteral.id=doc47commit=true; -F file=@test.docx When I searched the same document with http://localhost:8983/solr/select?q=id:doc47; command, got the following result i.e. not getting the additional UIMA fields in the response. result name=response numFound=1 start=0 doc str name=authordivakar/str arr name=content_type str application/vnd.openxmlformats-officedocument.wordprocessingml.document /str /arr str name=iddoc47/str date name=last_modified2012-04-18T14:19:00Z/date /doc /result Can anyone help to fix this problem. With Regds Thanks Divakar -- View this message in context: http://lucene.472066.n3.nabble.com/Facing-problem-to-integrate-UIMA-in-SOLR-tp3932008p3932008.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance problem with DIH in solr 3.3
Hi All, I am using Delta import handler(solr 3.3) to index data from my database (using 19 tables) Total Number of solr documents that get created from these 19 table is 444 Total number of request send to data source during clean full import is 91083. My problem is that, DIH makes too many calls and puts load on my database. 1. Can we batch these calls ? 2. Can we use view instead? If yes can I get some examples to use view with DIH 3. What kind of locks SOLR DIH acquire while querying DB? Note: we are using both Full-import and delta-import handler. Thanks in advance Pravin Agrawal DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: null pointer error with solr deduplication
A better error would be nicer. In the past, when I have had docs with the same id on multiple shards, I never saw an NPE problem. A lot has changed since then though. I guess, to me, checking if the id is stored sticks out a bit more. Roughly based on the stacktrace, it looks to me like it's not finding an id value and that is causing the NPE. If it's a legit problem we should probably make a JIRA issue about improving the error message you end up getting. -- - Mark http://www.lucidimagination.com On Sat, Apr 21, 2012 at 5:21 AM, Alexander Aristov alexander.aris...@gmail.com wrote: Hi I might be wrong but it's your responsibility to put unique doc IDs across shards. read this page http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations particualry - Documents must have a unique key and the unique key must be stored (stored=true in schema.xml) - *The unique key field must be unique across all shards.* If docs with duplicate unique keys are encountered, Solr will make an attempt to return valid results, but the behavior may be non-deterministic. So solr bahaves as it should :) _unexpectidly_ But I agree in that sence that there must be no error especially such as NPE. Best Regards Alexander Aristov On 21 April 2012 03:42, Peter Markey sudoma...@gmail.com wrote: Hello, I have been trying out deduplication in solr by following: http://wiki.apache.org/solr/Deduplication. I have defined a signature field to hold the values of the signature created based on few other fields in a document and the idea seems to work like a charm in a single solr instance. But, when I have multiple cores and try to do a distributed search ( Http://localhost:8080/solr/core0/select?q=*shards=localhost:8080/solr/dedupe,localhost:8080/solr/dedupe2facet=truefacet.field=doc_id ) I get the error pasted below. While normal search (with just q) works fine, the facet/stats queries seem to be the culprit. The doc_id contains duplicate ids since I'm testing the same set of documents indexed in both the cores(dedupe, dedupe2). Any insights would be highly appreciated. Thanks 20-Apr-2012 11:39:35 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:887) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:633) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
Synonyms file in solr
I have some problems with the synonyms file, it seems i can't make it work the way i'd want. Here is an exemple : I have these words : cat, animal, dog, living thing, baby shark if i search for animal OR animals, i'd like to have the results for cat, animal, dog, baby shark as well as their plural cats, dogs, animals and baby sharks. if i search for cat, i only want the results with cat or cats. Same for dog. if i search for living thing, i want the results with living thing, living things, animal or animals. So no dogs, cats... So the words are in a hierarchy : living thing(s) - animal(s) - [dog(s), cat(s), baby shark(s)] I've tried a lot of thing but i can't get the results i want and i really need your help :-( -- View this message in context: http://lucene.472066.n3.nabble.com/Synonyms-file-in-solr-tp3931838p3931838.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Hanging
Perhaps related is http://www.lucidimagination.com/search/document/6d0e168c82c86a38#45c945b2de6543f4 On Apr 23, 2012, at 5:37 AM, Trym R. Møller wrote: Hi I have succeeded in reproducing the scenario with two Solr instances running. They cover a single collection with two slices and two replica, two cores in each Solr instance. I have changed the number of threads that Jetty is allowed to use as follows: New class=org.mortbay.thread.QueuedThreadPool Set name=minThreads3/Set Set name=maxThreads3/Set Set name=lowThreads0/Set /New And when indexing a single document this works fine but when concurrently indexing 10 documents, Solr frequently hangs. I know that Jetty per default are allowed to use 10.000 threads, but in my other setup, all these 10.000 allowed thread are used on a single Solr instance (I have 7 Solr instances) after some days and the hanging scenario occurs. I'm not sure if just adjusting the allowed number of threads are the best solution and would like to get some input as what to expect and if there are other things I can adjust. My setup is as written before 7 Solr instances handling a single collection with 28 leaders and 28 replicas distributed fairly on the Solrs (8 cores on each Solr). Thanks for any input. Best regards Trym Den 19-04-2012 14:36, Yonik Seeley skrev: On Thu, Apr 19, 2012 at 4:25 AM, Trym R. Møllert...@sigmat.dk wrote: Hi I am using Solr trunk and have 7 Solr instances running with 28 leaders and 28 replicas for a single collection. After indexing a while (a couple of days) the solrs start hanging and doing a thread dump on the jvm I see blocked threads like the following: Thread 2369: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=158 (Compiled frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Compiled frame) - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399 (Compiled frame) - java.util.concurrent.ExecutorCompletionService.take() @bci=4, line=164 (Compiled frame) - org.apache.solr.update.SolrCmdDistributor.checkResponses(boolean) @bci=27, line=350 (Compiled frame) - org.apache.solr.update.SolrCmdDistributor.finish() @bci=18, line=98 (Compiled frame) - org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish() @bci=4, line=299 (Compiled frame) - org.apache.solr.update.processor.DistributedUpdateProcessor.finish() @bci=1, line=817 (Compiled frame) ... - org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, line=582 (Interpreted frame) I read the stack trace as my indexing client has indexed a document and this Solr is now waiting for the replica? to respond before returning an answer to the client. Correct. What's the full stack trace like on both a leader and replica? We need to know what the replica is blocking on. What version of trunk are you using? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 - Mark Miller lucidimagination.com
Re: Solr Hanging
And see https://issues.apache.org/jira/browse/SOLR-683 as it also may be related or have helpful info... On Apr 23, 2012, at 8:17 AM, Mark Miller wrote: Perhaps related is http://www.lucidimagination.com/search/document/6d0e168c82c86a38#45c945b2de6543f4 On Apr 23, 2012, at 5:37 AM, Trym R. Møller wrote: Hi I have succeeded in reproducing the scenario with two Solr instances running. They cover a single collection with two slices and two replica, two cores in each Solr instance. I have changed the number of threads that Jetty is allowed to use as follows: New class=org.mortbay.thread.QueuedThreadPool Set name=minThreads3/Set Set name=maxThreads3/Set Set name=lowThreads0/Set /New And when indexing a single document this works fine but when concurrently indexing 10 documents, Solr frequently hangs. I know that Jetty per default are allowed to use 10.000 threads, but in my other setup, all these 10.000 allowed thread are used on a single Solr instance (I have 7 Solr instances) after some days and the hanging scenario occurs. I'm not sure if just adjusting the allowed number of threads are the best solution and would like to get some input as what to expect and if there are other things I can adjust. My setup is as written before 7 Solr instances handling a single collection with 28 leaders and 28 replicas distributed fairly on the Solrs (8 cores on each Solr). Thanks for any input. Best regards Trym Den 19-04-2012 14:36, Yonik Seeley skrev: On Thu, Apr 19, 2012 at 4:25 AM, Trym R. Møllert...@sigmat.dk wrote: Hi I am using Solr trunk and have 7 Solr instances running with 28 leaders and 28 replicas for a single collection. After indexing a while (a couple of days) the solrs start hanging and doing a thread dump on the jvm I see blocked threads like the following: Thread 2369: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=158 (Compiled frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=1987 (Compiled frame) - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=399 (Compiled frame) - java.util.concurrent.ExecutorCompletionService.take() @bci=4, line=164 (Compiled frame) - org.apache.solr.update.SolrCmdDistributor.checkResponses(boolean) @bci=27, line=350 (Compiled frame) - org.apache.solr.update.SolrCmdDistributor.finish() @bci=18, line=98 (Compiled frame) - org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish() @bci=4, line=299 (Compiled frame) - org.apache.solr.update.processor.DistributedUpdateProcessor.finish() @bci=1, line=817 (Compiled frame) ... - org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, line=582 (Interpreted frame) I read the stack trace as my indexing client has indexed a document and this Solr is now waiting for the replica? to respond before returning an answer to the client. Correct. What's the full stack trace like on both a leader and replica? We need to know what the replica is blocking on. What version of trunk are you using? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
RE: StandardTokenizer and domain names containing digits
Hi Alex, Thanks for reporting back with concrete details of what worked for you - very helpful for others with similar projects. Steve -Original Message- From: Alex Willmer [mailto:al.will...@logica.com] Sent: Monday, April 23, 2012 5:35 AM To: solr-user@lucene.apache.org Subject: Re: StandardTokenizer and domain names containing digits Steven A Rowe sarowe at syr.edu writes: StandardTokenizer in Lucene/Solr v3.1+ implements the Word Boundary rules from Unicode 6.0.0 Standard Annex #29, a.k.a. UAX#29: http://www.unicode.org/reports/tr29/tr29- 17.html#Word_Boundaries. These rules don't include recognition of URLs or domain names. Lucene/Solr includes another tokenizer that does recognize URLs and domain names, in addition to the UAX#29 Word Boundary rules: UAX29URLEmailTokenizer http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailT okenizerFactory. (Stand-alone domain names are recognized as URLs.) My suggestion is that you add a filter (for both the indexing and querying) that splits tokens containing periods: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterF ilterFactory, something like (untested!): filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 generateWordParts=1 preserveOriginal=1 / Steve, Thank you very much for this reply, it helped immensely. In the end I've gone for your suggestion, plus a swap of StandardTokenizer - UAX29URLEmailTokenizer and setting autoGeneratePhraseQueries=true. The fieldType now looks like fieldType name=text_general class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.UAX29URLEmailTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=0 stemEnglishPossessive=0 generateWordParts=1 preserveOriginal=1 / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.UAX29URLEmailTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=0 stemEnglishPossessive=0 generateWordParts=1 preserveOriginal=1 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType autoGeneratePhraseQueries is set so that the tokens generated in the query analyzer behave more like tokens from a space delimited query. So ns1.define.logica.com finds a similar set of documents to ns1 define logica com (i.e. ns1 AND define AND logica AND com), rather than ns1 OR define OR logica OR com. Many thanks, Alex
Re: The index speed in the solr
Hard to say. Here's the basic approach I'd use to try to narrow it down: 1 take out ngrams. What does that do to your speed? 2 are you committing very often? Lengthen the time here if so. 3 Posting is probably not the more performant thing in world. Consider using SolrJ. 4 What does a document look like? Are they structured docs (Word, PDF, etc). If so, try offloading that to client machines. Basically, you haven't given enough information to make much of a guess here... 50 hours is a really long time for 2M docs though, so something doesn't seem right unless the docs are really unusual. If you need to offload the structured docs, here's a way to get started: http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ Best Erick On Sun, Apr 22, 2012 at 9:58 PM, neosky neosk...@yahoo.com wrote: It takes me 50 hours to index a total 9 G file(about 2,000,000 documents) with n-gram filter from min=6,max=10, my token before ngram filter is long(not a word, at most 300,000 bytes with white space). I split into 4 files and use the post.sh to update at the same time. I also tried to write a lucene to do the index myself(single thread). The time is almost the same. I would like to know what's the general bottleneck for the index in solr? Doesn't the solr handle the index update request concurrently? 1. Posting file /ngram_678910/file1.xml to http://localhost:8988/solr/update % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 51 3005M 0 0 51 1557M 0 18902 46:19:14 23:59:46 22:19:28 0 2. Posting file /ngram_678910/file2.xml to http://localhost:8988/solr/update % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 62 2623M 0 0 62 1632M 0 19839 38:31:16 23:58:01 14:33:15 76629 3. Posting file /ngram_678910/file3.xml to http://localhost:8988/solr/update % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 65 2667M 0 0 65 1737M 0 21113 36:48:23 23:58:06 12:50:17 25537 4. Posting file /ngram_678910/file4.xml to http://localhost:8988/solr/update % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 58 2766M 0 0 58 1625M 0 19752 40:47:34 23:58:28 16:49:06 81435 -- View this message in context: http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3931338.html Sent from the Solr - User mailing list archive at Nabble.com.
Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception
Hi, When i am trying to index 16 millions of documents using dataimport handler, intermittently i am getting the below exception and the indexing get stopped. STACKTRACE: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1997) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2411) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1360) at com.mysql.jdbc.MysqlIO.fetchRowsViaCursor(MysqlIO.java:4044) at com.mysql.jdbc.CursorRowProvider.fetchMoreRows(CursorRowProvider.java:396) at com.mysql.jdbc.CursorRowProvider.hasNext(CursorRowProvider.java:313) at com.mysql.jdbc.ResultSet.next(ResultSet.java:7296) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:228) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:262) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:77) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) ** END NESTED EXCEPTION ** Last packet sent to the server was 2 ms ago. at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2622) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1360) at com.mysql.jdbc.MysqlIO.fetchRowsViaCursor(MysqlIO.java:4044) at com.mysql.jdbc.CursorRowProvider.fetchMoreRows(CursorRowProvider.java:396) at com.mysql.jdbc.CursorRowProvider.hasNext(CursorRowProvider.java:313) at com.mysql.jdbc.ResultSet.next(ResultSet.java:7296) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331) ... 11 more 2012-04-23 08:25:35,693 SEVERE [org.apache.solr.handler.dataimport.DataImporter] (Thread-21) Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: And the db-config.xml has the below configuration. dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/phpq user=slrmgr defaultFetchSize=30 useCursorFetch=true autoReconnect=true tcpKeepAlive=true connectionTimeout=12 password=pqmgr123 batch-size=-1/ Any help on this is much appreciable. -- View this message in context: http://lucene.472066.n3.nabble.com/Full-Import-failed-org-apache-solr-handler-dataimport-DataImportHandlerException-com-mysql-jdbc-Commn-tp3932521p3932521.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using two repeater to rapidly switching Master and Slave (Replication)?
On 23-04-2012 10:28 am, A Vorderegger wrote: This setup would be highly convenient and perfect for the purpose of failing over the Master role however it does not work for me. Resolving http://slave_host:port/solr/replication?command=enablepoll I am met with: str name=statusERROR/strstr name=messageNo slave configured/str no matter what order I enable polling / replication in. I am confident that I have setup my solrconfig.xml file exactly as described. Could you please further describe how this setup is successfully achieved? Thanks in advance can you please share your repeater configuration (just replication handler definition)? It looks like, on slave host; master is enabled. and on master executing enablepoll command, will result into response lst name=responseHeader int name=status0/int int name=QTime2/int /lst str name=statusERROR/str str name=messageNo slave configured/str /response -Jeevanandam
Re: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception
On 23-04-2012 8:18 pm, sivaprasad wrote: Hi, When i am trying to index 16 millions of documents using dataimport handler, intermittently i am getting the below exception and the indexing get stopped. STACKTRACE: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1997) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2411) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1360) at com.mysql.jdbc.MysqlIO.fetchRowsViaCursor(MysqlIO.java:4044) at com.mysql.jdbc.CursorRowProvider.fetchMoreRows(CursorRowProvider.java:396) at com.mysql.jdbc.CursorRowProvider.hasNext(CursorRowProvider.java:313) at com.mysql.jdbc.ResultSet.next(ResultSet.java:7296) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:228) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:262) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:77) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) ** END NESTED EXCEPTION ** Last packet sent to the server was 2 ms ago. at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2622) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1360) at com.mysql.jdbc.MysqlIO.fetchRowsViaCursor(MysqlIO.java:4044) at com.mysql.jdbc.CursorRowProvider.fetchMoreRows(CursorRowProvider.java:396) at com.mysql.jdbc.CursorRowProvider.hasNext(CursorRowProvider.java:313) at com.mysql.jdbc.ResultSet.next(ResultSet.java:7296) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331) ... 11 more 2012-04-23 08:25:35,693 SEVERE [org.apache.solr.handler.dataimport.DataImporter] (Thread-21) Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: And the db-config.xml has the below configuration. dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/phpq user=slrmgr defaultFetchSize=30 useCursorFetch=true autoReconnect=true tcpKeepAlive=true connectionTimeout=12 password=pqmgr123 batch-size=-1/ Any help on this is much appreciable. -- View this message in context: http://lucene.472066.n3.nabble.com/Full-Import-failed-org-apache-solr-handler-dataimport-DataImportHandlerException-com-mysql-jdbc-Commn-tp3932521p3932521.html Sent from the Solr - User mailing list archive at Nabble.com. Sivaprasad, just a clarification about batch size attribute, is it typo error or real in your db-config.xml Supported attribute name is batchSize=-1 (http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) -Jeevanandam
RE: Performance problem with DIH in solr 3.3
See this page for an alternate way to use DIH for Delta updates that does not generate n+1 Selects: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Pravin Agrawal [mailto:pravin_agra...@persistent.co.in] Sent: Monday, April 23, 2012 5:51 AM To: solr-user@lucene.apache.org Subject: Performance problem with DIH in solr 3.3 Hi All, I am using Delta import handler(solr 3.3) to index data from my database (using 19 tables) Total Number of solr documents that get created from these 19 table is 444 Total number of request send to data source during clean full import is 91083. My problem is that, DIH makes too many calls and puts load on my database. 1. Can we batch these calls ? 2. Can we use view instead? If yes can I get some examples to use view with DIH 3. What kind of locks SOLR DIH acquire while querying DB? Note: we are using both Full-import and delta-import handler. Thanks in advance Pravin Agrawal DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: The index speed in the solr
On Apr 23, 2012, at 9:27 AM, Erick Erickson wrote: 50 hours is a really long time for 2M docs though, so something doesn't seem right unless the docs are really unusual. Don't forget he's n-gramming ;-) There's not much more demanding you could ask of text analysis except for throwing shingling in there too for good measure[*]. Neosky, you should consider using Solr trunk which has dramatic multithreaded indexing performance improvements if your hardware is capable. If you try trunk, use a large ramBufferSizeMB (say 2GB worth), but if you stick with Solr 3.x, use 1GB. And finally, increasing your mergeFactor will increase indexing performance at the expense of search speed. You could throw in an optimize at the very end with a maxSegments=10 or something to compensate. ~ David Smiley [*] that was a joke
Spatial4j
Hello Solr Community, We are interested in polygon spatial queries. I believe that Spatial4j supports it. Is there a solr branch available that includes Spatial4j? Will this be part of a furure solr release? Thank you. Best Regards Ericz
Re: Solr Core Admin Question on Trunk
So I believe I see the reason now. Basically in app.js we check to see if there is more than 1 core deployed to decide if we show the core admin or not. I am not sure this is intended or not, but I would think this isn't what we want the default action to be. Shouldn't we always show the core admin menu option so users can grow their solr instances without having to execute the core admin commands from curl or something? On Mon, Apr 23, 2012 at 11:04 AM, Jamie Johnson jej2...@gmail.com wrote: I just updated to the latest Solr nightly build to address the issue Yonik fixed in 3392 and have noticed that I no longer have a core admin button in my admin interface. What specifically controls if this is shown or not? I am also not ruling out the chance I've messed something up but I was wondering if there are a set of conditions that controls if this is shown or not.
Re: Spatial4j
Ericz, See this issue: https://issues.apache.org/jira/browse/SOLR-3304 It's just a TODO issue right now but when it's completed, you'll be able to do polygon spatial queries. All the software is written to do it right now but the missing Solr piece is temporarily at Spatial4j.com. If you were to try to use it, you would need to build it as of the same date that the Lucene spatial module was added, in LUCENE-3795. Also, FYI to do polygons, you need a 3rd party jar, JTS. I'm working through a backlog of things to get to but will get to it. ~ David Smiley On Apr 23, 2012, at 11:09 AM, Eric Grobler wrote: Hello Solr Community, We are interested in polygon spatial queries. I believe that Spatial4j supports it. Is there a solr branch available that includes Spatial4j? Will this be part of a furure solr release? Thank you. Best Regards Ericz
Re: Solr Core Admin Question on Trunk
Jamie, right .. that makes sense. right now the core-admin will not work in singlecore-mode because we have no core-name there. https://issues.apache.org/jira/browse/SOLR-2605 should fix this, afterwards we can show the core-admin for every configuration. would you mind to open a ticket for that? On Monday, April 23, 2012 at 5:25 PM, Jamie Johnson wrote: So I believe I see the reason now. Basically in app.js we check to see if there is more than 1 core deployed to decide if we show the core admin or not. I am not sure this is intended or not, but I would think this isn't what we want the default action to be. Shouldn't we always show the core admin menu option so users can grow their solr instances without having to execute the core admin commands from curl or something? On Mon, Apr 23, 2012 at 11:04 AM, Jamie Johnson jej2...@gmail.com (mailto:jej2...@gmail.com) wrote: I just updated to the latest Solr nightly build to address the issue Yonik fixed in 3392 and have noticed that I no longer have a core admin button in my admin interface. What specifically controls if this is shown or not? I am also not ruling out the chance I've messed something up but I was wondering if there are a set of conditions that controls if this is shown or not.
solr replication failing with error: Master at: is not available. Index fetch failed
hello all, enviornment: centOS and solr 3.5 i am attempting to set up replication betweeen two solr boxes (master and slave). i am getting the following in the logs on the slave box. 2012-04-23 10:54:59,985 SEVERE [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Master at: http://someip:someport/somepath/somecore/admin/replication/ is not available. Index fetch failed. Exception: Invalid version (expected 2, but 10) or the data in not in 'javabin' format master jvm (jboss host) is being started like this: -Denable.master=true slave jvm (jboss host) is being started like this: -Denable.slave=true does anyone have any ideas? i have done the following: used curl http://someip:someport/somepath/somecore/admin/replication/ from slave to successfully see master used ping from slave to master switched out the dns name for master to hard coded ip address made sure i can see http://someip:someport/somepath/somecore/admin/replication/ in a browser this is my request handler - i am using the same config file on both the master and slave - but sending in the appropriate switch on start up (per the solr wiki page on replication) lst name=master str name=enable${enable.master:false}/str str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=confFilesschema.xml,stopwords.txt,elevate.xml/str str name=commitReserveDuration00:00:10/str /lst str name=maxNumberOfBackups1/str lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://someip:someport/somecore/admin/replication//str str name=pollInterval00:00:20/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler any suggestions would be great thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3932921.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spatial4j
Hi David, Thank you for the information. I am glad to hear that is basically ready to be integrated into lucene. Regarding your backlog, is it realistic to expect 3304 resolved before June? Best Regards Ericz On Mon, Apr 23, 2012 at 4:38 PM, Smiley, David W. dsmi...@mitre.org wrote: Ericz, See this issue: https://issues.apache.org/jira/browse/SOLR-3304 It's just a TODO issue right now but when it's completed, you'll be able to do polygon spatial queries. All the software is written to do it right now but the missing Solr piece is temporarily at Spatial4j.com. If you were to try to use it, you would need to build it as of the same date that the Lucene spatial module was added, in LUCENE-3795. Also, FYI to do polygons, you need a 3rd party jar, JTS. I'm working through a backlog of things to get to but will get to it. ~ David Smiley On Apr 23, 2012, at 11:09 AM, Eric Grobler wrote: Hello Solr Community, We are interested in polygon spatial queries. I believe that Spatial4j supports it. Is there a solr branch available that includes Spatial4j? Will this be part of a furure solr release? Thank you. Best Regards Ericz
Kernel methods in SOLR
Hi Has there been any work that tries to integrate Kernel methods [1] with SOLR? I am interested in using kernel methods to solve synonym, hyponym and polysemous (disambiguation) problems which SOLR's Vector space model (bag of words) does not capture. For example, imagine we have only 3 words in our corpus, puma, cougar and feline. The 3 words have obviously interdependencies (puma disambiguates to cougar, cougar and puma are instances of felines - hyponyms). Now, imagine 2 docs, d1 and d2, that have the following TF-IDF vectors. puma, cougar, feline d1 = [ 2,0, 0] d2 = [ 0,1, 0] i.e. d1 has no mention of term cougar or feline and conversely, d2 has no mention of terms puma or feline. Hence under the vector approach d1 and d2 are not related at all (and each interpretation of the terms have a unique vector). Which is not what we want to conclude. What I need is to include a kernel matrix (as data) such as the following that captures these relationships: puma, cougar, feline puma= [ 1,1, 0.4] cougar = [ 1,1, 0.4] feline = [ 0.4, 0.4, 1] then recompute the TF-IDF vector as a product of (1) the original vector and (2) the kernel matrix, resulting in puma, cougar, feline d1 = [ 2,2, 0.8] d2 = [ 1,1, 0.4] (note, the new vectors are much less sparse). I can solve this problem (inefficiently) at the application layer but I was wondering if there has been any attempts within the community to solve similar problems, efficiently without paying a hefty response time price? thank you Peyman [1] http://en.wikipedia.org/wiki/Kernel_methods
Re: Language Identification
I was under the impression that solr does Tika and the language identifier that Shuyo did. The page at http://wiki.apache.org/solr/LanguageDetectionlists them both. processor class=org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory Again, I'm just trying to understand why it was moved to solr. On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl jan@cominvent.com wrote: Hi, Solr just reuses Tika's language identifier. But you are of course free to do your language detection on the Nutch side if you choose and not invoke the one in Solr. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 20. apr. 2012, at 21:49, Bai Shen wrote: I'm working on using Shuyo's work to improve the language identification of our search. Apparently, it's been moved from Nutch to Solr. Is there a reason for this? http://code.google.com/p/language-detection/issues/detail?id=34 I would prefer to have the processing done in Nutch as that has the benefit of more hardware and not interfering with Solr latency. Thanks.
Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s).
Solr 3.5 was not returning results. To my surprise Tomcat 6.x (64 bit) was not running on my Windows. There were absolutely no errors in the logs, no crash dumps nothing. I restarted it and everything seems to be fine now. Went to the Windows Event viewer and exported the following information as it relates to Tomcat: Level Date and Time Source Event IDTask Category Information 04/23/2012 8:51:58 AM Service Control Manager 7036None The Apache Tomcat 6 service entered the running state. Error04/23/2012 4:17:12 AM Service Control Manager 7034None The Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s). Information 04/16/2012 3:13:15 PM Service Control Manager 7036None The Apache Tomcat 6 service entered the running state. Error04/16/2012 1:12:47 PM Service Control Manager 7034None The Apache Tomcat 6 service terminated unexpectedly. It has done this 1 time(s). Information 04/07/2012 10:02:25 PM Service Control Manager 7036None The Apache Tomcat 6 service entered the running state. It is a mystery for me as I dont have any errors in the Tomcat logs. How should I go ahead debugging this problem? Any help would be appreciated. ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you.- ** FAFLD
Re: # open files with SolrCloud
Great! I am going to try new Solr 4 build from April 23rd On Sun, Apr 22, 2012 at 11:35 PM, Sami Siren ssi...@gmail.com wrote: On Sat, Apr 21, 2012 at 9:57 PM, Yonik Seeley yo...@lucidimagination.com wrote: I can reproduce some kind of searcher leak issue here, even w/o SolrCloud, and I've opened https://issues.apache.org/jira/browse/SOLR-3392 With the fix integrated. I do not see the leaking problem anymore with my setup so it seems to be working now. -- Sami Siren
Re: Solr Core Admin Question on Trunk
No problem, created this. https://issues.apache.org/jira/browse/SOLR-3401 and related to 2605. On Mon, Apr 23, 2012 at 11:39 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Jamie, right .. that makes sense. right now the core-admin will not work in singlecore-mode because we have no core-name there. https://issues.apache.org/jira/browse/SOLR-2605 should fix this, afterwards we can show the core-admin for every configuration. would you mind to open a ticket for that? On Monday, April 23, 2012 at 5:25 PM, Jamie Johnson wrote: So I believe I see the reason now. Basically in app.js we check to see if there is more than 1 core deployed to decide if we show the core admin or not. I am not sure this is intended or not, but I would think this isn't what we want the default action to be. Shouldn't we always show the core admin menu option so users can grow their solr instances without having to execute the core admin commands from curl or something? On Mon, Apr 23, 2012 at 11:04 AM, Jamie Johnson jej2...@gmail.com (mailto:jej2...@gmail.com) wrote: I just updated to the latest Solr nightly build to address the issue Yonik fixed in 3392 and have noticed that I no longer have a core admin button in my admin interface. What specifically controls if this is shown or not? I am also not ruling out the chance I've messed something up but I was wondering if there are a set of conditions that controls if this is shown or not.
Re: Spatial4j
Yes, I definitely think so. At a minimum, I expect there will at least be a patch or built jar file for you to get going by 1 June. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial4j-tp3932748p3933368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deciding whether to stem at query time
There is a third approach. Create two fields and always query both of them, with the exact field given a higher weight. This works great and performs well. It is what we did at Netflix and what I'm doing at Chegg. wunder On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote: So I just realized the other day that stemming basically happens at index time. If I'm understanding correctly, there's no way to allow a user to specify, at run time, whether to stem particular words or not based on a single index. I think there are two options, but I'd love to hear that I'm wrong: 1.) Incrementally build up a white list of words that don't stem very well. To pick a random example out of the blue, light isn't super closely related to, lighter, so I might choose not to stem that. If I wanted to do this, I think (if I understand correctly), stemmerOverrideFilter would help me out with this. I'm not a big fan of this approach. 2.) Index all the text in two fields, once with stemming and once without. Then build some kind of option into the UI for specifying whether to stem the words or not, and search the appropriate field. Unfortunately, this would roughly double the size of my index, and probably affect query times too. Plus, the UI would probably suck. Am I missing an option? Has anyone tried one of these approaches? Thanks! Andrew
Re: Spatial4j
Thank you David, it is fantastic what people like you do for the Solr community. On Mon, Apr 23, 2012 at 8:08 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Yes, I definitely think so. At a minimum, I expect there will at least be a patch or built jar file for you to get going by 1 June. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial4j-tp3932748p3933368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Language Identification
On Mon, Apr 23, 2012 at 1:27 PM, Bai Shen baishen.li...@gmail.com wrote: I was under the impression that solr does Tika and the language identifier that Shuyo did. The page at http://wiki.apache.org/solr/LanguageDetectionlists them both. processor class=org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory Again, I'm just trying to understand why it was moved to solr. Because it offers a number of features above Tika's implementation, and is available under the Apache 2.0 License so we are free to do that. -- lucidimagination.com
RE: Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s).
I am sorry, i should have raised this issue on tomcat forums. However just was trying my luck here as it was indirectly related to solr. From: Husain, Yavar Sent: Monday, April 23, 2012 11:07 PM To: solr-user@lucene.apache.org Subject: Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s). Solr 3.5 was not returning results. To my surprise Tomcat 6.x (64 bit) was not running on my Windows. There were absolutely no errors in the logs, no crash dumps nothing. I restarted it and everything seems to be fine now. Went to the Windows Event viewer and exported the following information as it relates to Tomcat: Level Date and Time Source Event IDTask Category Information 04/23/2012 8:51:58 AM Service Control Manager 7036None The Apache Tomcat 6 service entered the running state. Error04/23/2012 4:17:12 AM Service Control Manager 7034None The Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s). Information 04/16/2012 3:13:15 PM Service Control Manager 7036None The Apache Tomcat 6 service entered the running state. Error04/16/2012 1:12:47 PM Service Control Manager 7034None The Apache Tomcat 6 service terminated unexpectedly. It has done this 1 time(s). Information 04/07/2012 10:02:25 PM Service Control Manager 7036None The Apache Tomcat 6 service entered the running state. It is a mystery for me as I dont have any errors in the Tomcat logs. How should I go ahead debugging this problem? Any help would be appreciated. ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you.- ** FAFLD
Re: null pointer error with solr deduplication
Thanks for the response. Yes, I agree with you that I have to check for the uniqueness of doc ids but our requirement is such that we need to send it to solr and I know that solr discards duplicate documents and it does not work fine when we manually create the unique id. But I just wanted to report the error since in this scenario (i guess the components for deduplication are pretty new), it would probably help the devs to make the behavior more deterministic towards duplicate documents. On Sat, Apr 21, 2012 at 2:21 AM, Alexander Aristov alexander.aris...@gmail.com wrote: Hi I might be wrong but it's your responsibility to put unique doc IDs across shards. read this page http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations particualry - Documents must have a unique key and the unique key must be stored (stored=true in schema.xml) - *The unique key field must be unique across all shards.* If docs with duplicate unique keys are encountered, Solr will make an attempt to return valid results, but the behavior may be non-deterministic. So solr bahaves as it should :) _unexpectidly_ But I agree in that sence that there must be no error especially such as NPE. Best Regards Alexander Aristov On 21 April 2012 03:42, Peter Markey sudoma...@gmail.com wrote: Hello, I have been trying out deduplication in solr by following: http://wiki.apache.org/solr/Deduplication. I have defined a signature field to hold the values of the signature created based on few other fields in a document and the idea seems to work like a charm in a single solr instance. But, when I have multiple cores and try to do a distributed search ( Http://localhost:8080/solr/core0/select?q=*shards=localhost:8080/solr/dedupe,localhost:8080/solr/dedupe2facet=truefacet.field=doc_id ) I get the error pasted below. While normal search (with just q) works fine, the facet/stats queries seem to be the culprit. The doc_id contains duplicate ids since I'm testing the same set of documents indexed in both the cores(dedupe, dedupe2). Any insights would be highly appreciated. Thanks 20-Apr-2012 11:39:35 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:887) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:633) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
Re: FastVectorHighlighter - no highlights
This does not appear to be shingle specific. A non-shingled field is also NOT highlighted in the same manner with FVH. I can see in the timing information that it takes much longer to run FVH than no highlighting at all, so Solr must be doing something. But why it just lists the document IDs and little or no field highlights is still a mystery. Any ideas on where I should look in the configuration, parameters to try etc.? Cheers, Jeff On Apr 19, 2012, at 7:51 AM, Jeff Schmidt wrote: I am using Solr 4.0, and debug=timing shows Solr spending the great majority of its time in the HighlightComponent. It seemed logical to look into the FastVectorHighlighter. I does seem much faster, but on the other hand, I'm not getting the highlights I need. :) I've seen references to FVH not supporting MultiTerm and (non-fixed sized) ngrams. I'm using edismax, and I don't know if a certain configuration of that becomes multi term and that's my problem, or if the is something completely different. I don't have ngrams, but I do shingle. For the examples below, I have these fields defined: field name=n_macromolecule_name type=text_lc_np_shingle indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=n_protein_family type=text_lc_np_shingle indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=n_pathway_name type=text_lc_np_shingle indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=n_cellreg_regulated_by type=text_lc_np_shingle indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=n_cellreg_disease type=text_lc_np_shingle indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true / field name=n_macromolecule_summary type=text_lc_np_shingle indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true/ Note that all are both indexed and stored, multi-valued, and I have termVectors=true termPositions=true termOffsets=true to enable FVH. When I had missed that in a field, I could see the log indicating such and reverting to the regular highlighter. I no longer see those messages. All of the above fields are of this type: !-- A text field that forces lowercase, removes punctuation and generates shingles for phrase matching -- fieldType name=text_lc_np_shingle class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ !-- strip punctuation -- filter class=solr.PatternReplaceFilterFactory pattern=([\p{Punct}]) replacement= replace=all/ !-- Remove any 0-length tokens. -- filter class=solr.LengthFilterFactory min=1 max=100/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=4 outputUnigrams=true / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ !-- strip punctuation -- filter class=solr.PatternReplaceFilterFactory pattern=([\p{Punct}]) replacement= replace=all/ !-- Remove any 0-length tokens. -- filter class=solr.LengthFilterFactory min=1 max=100/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=4 outputUnigrams=false outputUnigramsIfNoShingles=true/ /analyzer /fieldType Using the standard highlight component, for the search term cancer (rows=2), I get the highlights I've come to appreciate: lst name=highlighting lst name=ING:3lzx arr name=n_macromolecule_name strlt;span class=ingReasonTextgt;cancerlt;/spangt; susceptibility candidate 1/str /arr arr name=n_protein_family strlt;span class=ingReasonTextgt;Cancerlt;/spangt; susceptibility candidate 1/str /arr /lst lst name=ING:8lj arr name=n_macromolecule_name strbreast lt;span class=ingReasonTextgt;cancerlt;/spangt; 2, early onset/str /arr arr name=n_pathway_name strHereditary Breast lt;span class=ingReasonTextgt;Cancerlt;/spangt; Signaling/str /arr arr name=n_cellreg_regulated_by strprostate lt;span class=ingReasonTextgt;cancerlt;/spangt; cells/str /arr arr name=n_cellreg_disease strbreast lt;span class=ingReasonTextgt;cancerlt;/spangt;/str
Re: Language Identification
I think nothing has moved. We just offer Solr users to do language detection inside of Solr, using any of these two libs. If you choose to do language detection on client side instead, using any of these, what is stopping you? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 23. apr. 2012, at 19:27, Bai Shen wrote: I was under the impression that solr does Tika and the language identifier that Shuyo did. The page at http://wiki.apache.org/solr/LanguageDetectionlists them both. processor class=org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory Again, I'm just trying to understand why it was moved to solr. On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl jan@cominvent.com wrote: Hi, Solr just reuses Tika's language identifier. But you are of course free to do your language detection on the Nutch side if you choose and not invoke the one in Solr. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 20. apr. 2012, at 21:49, Bai Shen wrote: I'm working on using Shuyo's work to improve the language identification of our search. Apparently, it's been moved from Nutch to Solr. Is there a reason for this? http://code.google.com/p/language-detection/issues/detail?id=34 I would prefer to have the processing done in Nutch as that has the benefit of more hardware and not interfering with Solr latency. Thanks.
java 1.6 requirement not documented clearly?
Both wiki http://wiki.apache.org/solr/SolrInstall and tutorial http://lucene.apache.org/solr/api/doc-files/tutorial.html state java 1.5 is required, but trying to run solr3.6 with java 1.5 was giving some cryptic error to a colleague. xab -- View this message in context: http://lucene.472066.n3.nabble.com/java-1-6-requirement-not-documented-clearly-tp3933799p3933799.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java 1.6 requirement not documented clearly?
: Both wiki http://wiki.apache.org/solr/SolrInstall and tutorial : http://lucene.apache.org/solr/api/doc-files/tutorial.html state java 1.5 is : required, but trying to run solr3.6 with java 1.5 was giving some cryptic : error to a colleague. You'll have to be more specific about what you (or your colleague) were doing, and what error you got. Solr 3.6 should work fine with Java 1.5 -Hoss
Re: Deciding whether to stem at query time
Yes, and you might choose to use different options for different fields. For dictionary searches, where users are searching for specific words, and a high degree of precision is called for, stemming is less helpful, but for full text searches, more so. -Mike On 4/23/2012 3:35 PM, Walter Underwood wrote: There is a third approach. Create two fields and always query both of them, with the exact field given a higher weight. This works great and performs well. It is what we did at Netflix and what I'm doing at Chegg. wunder On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote: So I just realized the other day that stemming basically happens at index time. If I'm understanding correctly, there's no way to allow a user to specify, at run time, whether to stem particular words or not based on a single index. I think there are two options, but I'd love to hear that I'm wrong: 1.) Incrementally build up a white list of words that don't stem very well. To pick a random example out of the blue, light isn't super closely related to, lighter, so I might choose not to stem that. If I wanted to do this, I think (if I understand correctly), stemmerOverrideFilter would help me out with this. I'm not a big fan of this approach. 2.) Index all the text in two fields, once with stemming and once without. Then build some kind of option into the UI for specifying whether to stem the words or not, and search the appropriate field. Unfortunately, this would roughly double the size of my index, and probably affect query times too. Plus, the UI would probably suck. Am I missing an option? Has anyone tried one of these approaches? Thanks! Andrew
Re: Deciding whether to stem at query time
Right. Stemming is less useful for author fields, you don't need to match bill gate or steve job. Also, if you want to do fuzzy matching, you should only do that on the exact fields, not the stemmed fields. wunder On Apr 23, 2012, at 3:45 PM, Michael Sokolov wrote: Yes, and you might choose to use different options for different fields. For dictionary searches, where users are searching for specific words, and a high degree of precision is called for, stemming is less helpful, but for full text searches, more so. -Mike On 4/23/2012 3:35 PM, Walter Underwood wrote: There is a third approach. Create two fields and always query both of them, with the exact field given a higher weight. This works great and performs well. It is what we did at Netflix and what I'm doing at Chegg. wunder On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote: So I just realized the other day that stemming basically happens at index time. If I'm understanding correctly, there's no way to allow a user to specify, at run time, whether to stem particular words or not based on a single index. I think there are two options, but I'd love to hear that I'm wrong: 1.) Incrementally build up a white list of words that don't stem very well. To pick a random example out of the blue, light isn't super closely related to, lighter, so I might choose not to stem that. If I wanted to do this, I think (if I understand correctly), stemmerOverrideFilter would help me out with this. I'm not a big fan of this approach. 2.) Index all the text in two fields, once with stemming and once without. Then build some kind of option into the UI for specifying whether to stem the words or not, and search the appropriate field. Unfortunately, this would roughly double the size of my index, and probably affect query times too. Plus, the UI would probably suck. Am I missing an option? Has anyone tried one of these approaches? Thanks! Andrew -- Walter Underwood wun...@wunderwood.org
Re: java 1.6 requirement not documented clearly?
oh, then it should work with 1.5?? OK i know what happened then. I did not see it happening myself, but he unzipped 3.6, started solr with the example config and got the error. He had java1.5, so I told him to upgrade and it worked, so I assumed Solr required 1.6 But this was in a linux box, so most probably java1.5 it was using was GCJ... thanks xab -- View this message in context: http://lucene.472066.n3.nabble.com/java-1-6-requirement-not-documented-clearly-tp3933799p3933920.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication failing with error: Master at: is not available. Index fetch failed
Hmmm, does your master have an index? In other words have you added anything to it? I actually doubt that's an issue, but An aside, a polling interval of 20 seconds is rather short, beware of your autowarming time exceeding your index updates But my _first_ guess is that somehow you're Solrs aren't the same version or you have a foo'd index on your master. Best Erick On Mon, Apr 23, 2012 at 12:10 PM, geeky2 gee...@hotmail.com wrote: hello all, enviornment: centOS and solr 3.5 i am attempting to set up replication betweeen two solr boxes (master and slave). i am getting the following in the logs on the slave box. 2012-04-23 10:54:59,985 SEVERE [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Master at: http://someip:someport/somepath/somecore/admin/replication/ is not available. Index fetch failed. Exception: Invalid version (expected 2, but 10) or the data in not in 'javabin' format master jvm (jboss host) is being started like this: -Denable.master=true slave jvm (jboss host) is being started like this: -Denable.slave=true does anyone have any ideas? i have done the following: used curl http://someip:someport/somepath/somecore/admin/replication/ from slave to successfully see master used ping from slave to master switched out the dns name for master to hard coded ip address made sure i can see http://someip:someport/somepath/somecore/admin/replication/ in a browser this is my request handler - i am using the same config file on both the master and slave - but sending in the appropriate switch on start up (per the solr wiki page on replication) lst name=master str name=enable${enable.master:false}/str str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=confFilesschema.xml,stopwords.txt,elevate.xml/str str name=commitReserveDuration00:00:10/str /lst str name=maxNumberOfBackups1/str lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://someip:someport/somecore/admin/replication//str str name=pollInterval00:00:20/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler any suggestions would be great thank you, mark -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3932921.html Sent from the Solr - User mailing list archive at Nabble.com.