Solr Replication
Hi, I am using solr 4 setup. For the backup purpose once in a day I start one additional tomcat server with cores having empty data folders and which acts as a slave server. However it does not replicate data from the master unless there is a commit on the master. Is there a possibility to pull data from master core without firing a commit operation on that core -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-tp4047266.html Sent from the Solr - User mailing list archive at Nabble.com.
New-Question On Search data who does not have x field
My prev question was I have updated 250 data to solr. and some of data have category field and some of don't have. for example. { id:321, name:anurag, category:30 }, { id:3, name:john } now i want to search that docs who does not have that field. what query should like. I got an answer. i can use http://localhost:8983/search?q=*:*fq=-category:[* TO *] but now i am facing a problem. that i want to search all docs .. who does not have category field or category field value = 20 I wrote following query. http://localhost:8983/search?q=*:*wt=jsonstart=0fq=category:20; OR -category:[* TO *] but it is giving me zero output. http://localhost:8983/search?q=*:*wt=jsonstart=0fq=category:20; - output = 2689 http://localhost:8983/search?q=*:*wt=jsonstart=0fq=-category:[* TO *] - output = 2644684 what is problem ... am i doing some mistakes ?? -- View this message in context: http://lucene.472066.n3.nabble.com/New-Question-On-Search-data-who-does-not-have-x-field-tp4047270.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication
Hi Vicky, May be str name=replicateAfterstartup/str ? For backups http://master_host:port/solr/replication?command=backup would be more suitable. or str name=backupAfterstartup/str --- On Thu, 3/14/13, vicky desai vicky.de...@germinait.com wrote: From: vicky desai vicky.de...@germinait.com Subject: Solr Replication To: solr-user@lucene.apache.org Date: Thursday, March 14, 2013, 9:20 AM Hi, I am using solr 4 setup. For the backup purpose once in a day I start one additional tomcat server with cores having empty data folders and which acts as a slave server. However it does not replicate data from the master unless there is a commit on the master. Is there a possibility to pull data from master core without firing a commit operation on that core -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-tp4047266.html Sent from the Solr - User mailing list archive at Nabble.com.
Blog Post: Integration Testing SOLR Index with Maven
Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testing of a SOLR index configuration. http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/ Feedback or comments appreciated! And again, thanks for that great piece of software. Chantal
Re: Blog Post: Integration Testing SOLR Index with Maven
Informative. Useful.Thanks On Thu, Mar 14, 2013 at 1:59 PM, Chantal Ackermann c.ackerm...@it-agenten.com wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testing of a SOLR index configuration. http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/ Feedback or comments appreciated! And again, thanks for that great piece of software. Chantal
Re: Blog Post: Integration Testing SOLR Index with Maven
Nice, Chantal can you indicate there or here what kind of speed for integration tests you've reached with this, from a bare source to a successfully tested application? (e.g. with 100 documents) thanks in advance Paul On 14 mars 2013, at 09:29, Chantal Ackermann wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testing of a SOLR index configuration. http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/ Feedback or comments appreciated! And again, thanks for that great piece of software. Chantal
OutOfMemoryError
Hi I'm getting this error after a few hours of filling solr with documents. Tomcat is running with -Xms1024m -Xmx4096m. Total memory of host is 12GB. Softcommits are done every second and hard commits every minute. Any idea why this is happening and how to avoid this? *top* PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 13666 root 20 0 86.8g 4.7g 248m S 101 39.7 478:37.45 /usr/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -server -Xms1024m -Xmx4096m -XX:PermSize=64m -XX:MaxPermSize=128m -Duser.timezone=UTC -Dfile.encoding=UTF8 -Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName 22247 root 20 0 2430m 409m 4176 S0 3.4 1:23.43 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bi *free -m** * total used free shared buffers cached Mem: 12047 11942105 0180 6363 -/+ buffers/cache: 5399 6648 Swap: 956 75881 *log* SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:462) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:290) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.OutOfMemoryError at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:127) at java.util.zip.ZipFile.init(ZipFile.java:144) at org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:157) at org.apache.poi.openxml4j.opc.ZipPackage.init(ZipPackage.java:101) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207) at org.apache.tika.parser.pkg.ZipContainerDetector.detectOfficeOpenXML(ZipContainerDetector.java:194) at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:134) at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:77) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) ... 15 more Java HotSpot(TM) 64-Bit Server VM warning: Attempt to allocate stack guard pages failed. mmap failed for CEN and END part of zip file -- Met vriendelijke groeten Arkadi Colson Smartbit bvba . Hoogstraat 13 . 3670 Meeuwen T +32 11 64 08 80 . F +32 11 64 08 81
Re: Blog Post: Integration Testing SOLR Index with Maven
Hi Paul, I'm sorry I cannot provide you with any numbers. I also doubt it would be wise to post any as I think the speed depends highly on what you are doing in your integration tests. Say you have several request handlers that you want to test (on different cores), and some more complex use cases like using output from one request handler as input to others. You would also import test data that would be representative enough to test these request handlers and use cases. The requests themselves, of course, only take as long as SolrJ takes to run and SOLR takes to answer them. In addition, there is the overhead of Maven starting up, running all the plugins, importing the data, executing the tests. Well, Maven is certainly not the fastest tool to start up and get going… If you are asking because you want to run rather a lot requests and test their output - JMeter might be preferrable? Hope that was not too vague an answer, Chantal Am 14.03.2013 um 09:51 schrieb Paul Libbrecht: Nice, Chantal can you indicate there or here what kind of speed for integration tests you've reached with this, from a bare source to a successfully tested application? (e.g. with 100 documents) thanks in advance Paul On 14 mars 2013, at 09:29, Chantal Ackermann wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testing of a SOLR index configuration. http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/ Feedback or comments appreciated! And again, thanks for that great piece of software. Chantal
Re: Blog Post: Integration Testing SOLR Index with Maven
Chantal, the goal is different: get a general feeling how practical it is to integrate this in the routine. If you are able, on your contemporary machine which I assume is not a supercomputer of some special sort, to run this whole process somewhat useful for you in about 2 minutes then I'll be very interested. If, like quite many things where maven starts and integration is measured from all facets, it takes more than 15 minutes to run this process, once useful, then I will be less motivated. I'm not asking for performance measurement and certainly not for that of solr which I trust largely and depends a lot on good caching. Yes, for this, jMeter or others are useful. Paul On 14 mars 2013, at 12:20, Chantal Ackermann wrote: Hi Paul, I'm sorry I cannot provide you with any numbers. I also doubt it would be wise to post any as I think the speed depends highly on what you are doing in your integration tests. Say you have several request handlers that you want to test (on different cores), and some more complex use cases like using output from one request handler as input to others. You would also import test data that would be representative enough to test these request handlers and use cases. The requests themselves, of course, only take as long as SolrJ takes to run and SOLR takes to answer them. In addition, there is the overhead of Maven starting up, running all the plugins, importing the data, executing the tests. Well, Maven is certainly not the fastest tool to start up and get going… If you are asking because you want to run rather a lot requests and test their output - JMeter might be preferrable? Hope that was not too vague an answer, Chantal Am 14.03.2013 um 09:51 schrieb Paul Libbrecht: Nice, Chantal can you indicate there or here what kind of speed for integration tests you've reached with this, from a bare source to a successfully tested application? (e.g. with 100 documents) thanks in advance Paul On 14 mars 2013, at 09:29, Chantal Ackermann wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testing of a SOLR index configuration. http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/ Feedback or comments appreciated! And again, thanks for that great piece of software. Chantal
Re: OutOfMemoryError
When I shutdown tomcat free -m and top keeps telling me the same values. Almost no free memory... Any idea? On 03/14/2013 10:35 AM, Arkadi Colson wrote: Hi I'm getting this error after a few hours of filling solr with documents. Tomcat is running with -Xms1024m -Xmx4096m. Total memory of host is 12GB. Softcommits are done every second and hard commits every minute. Any idea why this is happening and how to avoid this? *top* PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMANDnbs p; 13666 root 20 0 86.8g 4.7g 248m S 101 39.7 478:37.45 /usr/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -server -Xms1024m -Xmx4096m -XX:PermSize=64m -XX:MaxPermSize=128m -Duser.timezone=UTC -Dfile.encoding=UTF8 -Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName 22247 root 20 0 2430m 409m 4176 S0 3.4 1:23.43 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bi *free -m** * total used free shared buffers cached Mem: 12047 11942105 0 180 6363 -/+ buffers/cache: 5399 6648 Swap: 956 75881 *log* SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:462) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:290) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.OutOfMemoryError at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:127) at java.util.zip.ZipFile.init(ZipFile.java:144) at org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:157) at org.apache.poi.openxml4j.opc.ZipPackage.init(ZipPackage.java:101) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207) at org.apache.tika.parser.pkg.ZipContainerDetector.detectOfficeOpenXML(ZipContainerDetector.java:194) at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:134) at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:77) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) ... 15 more Java HotSpot(TM) 64-Bit Server VM warning: Attempt to allocate stack guard pages failed. mmap failed for CEN and END part of zip file -- Met vriendelijke groeten Arkadi Colson Smartbit bvba . Hoogstraat 13 . 3670
Solr 4.1 monitoring with /solr/replication?command=details - indexVersion?
Hi All. I am monitoring two solr 4.1 solr instances in master-slave setup. On both nodes I check url /solr/replication?command=details and parse it to get: - on master: if replication is enabled - field replicationEnabled - on slave: if replication is enabled - field replicationEnabled - on slave: if polling is disabled - field isPollingDisabled For solr 3.6 I've als used url: solr/replication?command=indexversion but for 4.1 it gives me different results on master and slave, on slave the version is higher despite the fact that replication is enabled, polling is enabled and in admin gui /solr/#/collection1/replication I have: Index Version Gen Size Master: 1363259808632 3 22.59 KB Slave: 1363259808632 3 22.59 KB So as I see it master and slave have the same version of index despite the fact that /solr/replication?command=indexversion gives: - on master: long name=indexversion1363259808632/long - on slave: long name=indexversion1363259880360/long - higher value Is this a bug? Best regards, Rafal Radecki.
Re: New-Question On Search data who does not have x field
Writing OR - is simply the same as -, so the query would match documents containing category 20 and then remove all documents that had any category (including 20) specified, giving you nothing. Try: http://localhost:8983/search?q=*:*wt=jsonstart=0fq=category:20; OR (*:* -category:[* TO *]) Technically, the following should work, but there have been bugs with pure negative queries and sub-queries, so it may or may not work: http://localhost:8983/search?q=*:*wt=jsonstart=0fq=category:20; OR (-category:[* TO *]) -- Jack Krupansky -Original Message- From: anurag.jain Sent: Thursday, March 14, 2013 3:48 AM To: solr-user@lucene.apache.org Subject: New-Question On Search data who does not have x field My prev question was I have updated 250 data to solr. and some of data have category field and some of don't have. for example. { id:321, name:anurag, category:30 }, { id:3, name:john } now i want to search that docs who does not have that field. what query should like. I got an answer. i can use http://localhost:8983/search?q=*:*fq=-category:[* TO *] but now i am facing a problem. that i want to search all docs .. who does not have category field or category field value = 20 I wrote following query. http://localhost:8983/search?q=*:*wt=jsonstart=0fq=category:20; OR -category:[* TO *] but it is giving me zero output. http://localhost:8983/search?q=*:*wt=jsonstart=0fq=category:20; - output = 2689 http://localhost:8983/search?q=*:*wt=jsonstart=0fq=-category:[* TO *] - output = 2644684 what is problem ... am i doing some mistakes ?? -- View this message in context: http://lucene.472066.n3.nabble.com/New-Question-On-Search-data-who-does-not-have-x-field-tp4047270.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.1 monitoring with /solr/replication?command=details - indexVersion?
In the output of: /solr/replication?command=details there is indexVersion mentioned many times: response lst name=responseHeader int name=status0/int int name=QTime3/int /lst lst name=details str name=indexSize22.59 KB/str str name=indexPath/usr/share/solr/data/index//str arr name=commits lst long name=indexVersion1363259880360/long long name=generation4/long arr name=filelist str_1.tvx/str str_1_nrm.cfs/str str_1_Lucene41_0.doc/str str_1_Lucene41_0.tim/str str_1_Lucene41_0.tip/str str_1.fnm/str str_1_nrm.cfe/str str_1.fdx/str str_1_Lucene41_0.pos/str str_1.tvf/str str_1.fdt/str str_1_Lucene41_0.pay/str str_1.si/str str_1.tvd/str strsegments_4/str /arr /lst /arr str name=isMasterfalse/str str name=isSlavetrue/str long name=indexVersion1363259808632/long long name=generation3/long lst name=slave lst name=masterDetails str name=indexSize22.59 KB/str str name=indexPath/usr/share/solr/data/index//str arr name=commits lst long name=indexVersion1363263304585/long long name=generation4/long arr name=filelist str_2_Lucene41_0.pos/str str_2.si/str str_2_Lucene41_0.tim/str str_2.fdt/str str_2_Lucene41_0.doc/str str_2_Lucene41_0.tip/str str_2.fdx/str str_2.tvx/str str_2.fnm/str str_2_nrm.cfe/str str_2.tvd/str str_2_Lucene41_0.pay/str str_2_nrm.cfs/str str_2.tvf/str strsegments_4/str /arr /lst /arr str name=isMastertrue/str str name=isSlavefalse/str long name=indexVersion1363263304585/long long name=generation4/long lst name=master str name=confFilesschema.xml,stopwords.txt/str arr name=replicateAfter strcommit/str strstartup/str /arr str name=replicationEnabledfalse/str long name=replicatableGeneration4/long /lst /lst str name=masterUrlhttp://172.18.19.204:8080/solr/str str name=pollInterval00:00:60/str str name=nextExecutionAtPolling disabled/str str name=indexReplicatedAtThu Mar 14 12:18:00 CET 2013/str arr name=indexReplicatedAtList strThu Mar 14 12:18:00 CET 2013/str strThu Mar 14 12:17:00 CET 2013/str strFri Mar 08 14:55:00 CET 2013/str strFri Mar 08 14:50:52 CET 2013/str strFri Mar 08 14:32:00 CET 2013/str /arr str name=timesIndexReplicated5/str str name=lastCycleBytesDownloaded23214/str str name=previousCycleTimeInSeconds0/str str name=currentDateThu Mar 14 13:15:53 CET 2013/str str name=isPollingDisabledtrue/str str name=isReplicatingfalse/str /lst /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response Which one should be used? Is there any other way to monitor idex version on master and slave? Best regards, Rafał Radecki. 2013/3/14 Rafał Radecki radecki.ra...@gmail.com: Hi All. I am monitoring two solr 4.1 solr instances in master-slave setup. On both nodes I check url /solr/replication?command=details and parse it to get: - on master: if replication is enabled - field replicationEnabled - on slave: if replication is enabled - field replicationEnabled - on slave: if polling is disabled - field isPollingDisabled For solr 3.6 I've als used url: solr/replication?command=indexversion but for 4.1 it gives me different results on master and slave, on slave the version is higher despite the fact that replication is enabled, polling is enabled and in admin gui /solr/#/collection1/replication I have: Index Version Gen Size Master: 1363259808632 3 22.59 KB Slave: 1363259808632 3 22.59 KB So as I see it master and slave have the same version of index despite the fact that /solr/replication?command=indexversion gives: - on master: long name=indexversion1363259808632/long - on slave: long name=indexversion1363259880360/long - higher value Is this a bug? Best regards, Rafal Radecki.
Re: SolrCloud with Zookeeper ensemble in production environment: SEVERE problems.
Hello! Thanks a lot, Erick! I've attached some stack traces during a normal 'engine' running. Cheers, - Luis Cappa 2013/3/13 Erick Erickson erickerick...@gmail.com Stack traces.. First, jps -l that will give you a the process IDs of your running Java processes. Then: jstack pid from above Usually I pipe the output from jstack into a text file... Best Erick On Wed, Mar 13, 2013 at 1:48 PM, Luis Cappa Banda luisca...@gmail.com wrote: Uhm, how can I do that... 'cleanly'? I know that with JConsole it´s posible to output this traces, but with a .war application built on top of Spring I don´t know how can I do that. In any case, here is my CloudSolrServer wrapper that is used by other classes. There is no sync method or piece of code: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - *public class BinaryLBHttpSolrServer extends LBHttpSolrServer {* private static final long serialVersionUID = 3905956120804659445L; public BinaryLBHttpSolrServer(String[] endpoints) throws MalformedURLException { super(endpoints); } @Override protected HttpSolrServer makeServer(String server) throws MalformedURLException { HttpSolrServer solrServer = super.makeServer(server); solrServer.setRequestWriter(new BinaryRequestWriter()); return solrServer; } } - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - *public class CloudSolrHttpServerImpl implements CloudSolrHttpServer {* private CloudSolrServer cloudSolrServer; private Logger log = Logger.getLogger(CloudSolrHttpServerImpl.class); public CloudSolrHttpServerImpl(String zookeeperEndpoints, String[] endpoints, int clientTimeout, int connectTimeout, String cloudCollection) { try { BinaryLBHttpSolrServer lbSolrServer = new *BinaryLBHttpSolrServer* (endpoints); this.cloudSolrServer = new CloudSolrServer(zookeeperEndpoints, lbSolrServer); this.cloudSolrServer.setZkConnectTimeout(connectTimeout); this.cloudSolrServer.setZkClientTimeout(clientTimeout); this.cloudSolrServer.setDefaultCollection(cloudCollection); } catch (MalformedURLException e) { log.error(e); } } @Override public QueryResponse *search*(SolrQuery query) throws SolrServerException { return cloudSolrServer.query(query, METHOD.POST); } @Override public boolean *index*(DocumentBean user) { boolean indexed = false; int retries = 0; do { indexed = addBean(user); retries++; } while(!indexed retries4); return indexed; } @Override public boolean *update*(SolrInputDocument updateDoc) { boolean update = false; int retries = 0; do { update = addSolrInputDocument(updateDoc); retries++; } while(!update retries4); return update; } @Override public void commit() { try { cloudSolrServer.commit(); } catch (SolrServerException e) { log.error(e); } catch (IOException e) { log.error(e); } } @Override public boolean *delete*(String ... ids) { boolean deleted = false; ListString idList = Arrays.asList(ids); try { this.cloudSolrServer.deleteById(idList); this.cloudSolrServer.commit(true, true); deleted = true; } catch (SolrServerException e) { log.error(e); } catch (IOException e) { log.error(e); } return deleted; } @Override public void *optimize*() { try { this.cloudSolrServer.optimize(); } catch (SolrServerException e) { log.error(e); } catch (IOException e) { log.error(e); } } /* * * Getters setters * * * */ public CloudSolrServer getSolrServer() { return cloudSolrServer; } public void setSolrServer(CloudSolrServer solrServer) { this.cloudSolrServer = solrServer; } private boolean addBean(DocumentBean user) { boolean added = false; try { this.cloudSolrServer.addBean(user, 100); this.commit(); } catch (IOException e) { log.error(e); } catch (SolrServerException e) { log.error(e); }catch(SolrException e) { log.error(e); } return added; } private boolean addSolrInputDocument(SolrInputDocument updateDoc) { boolean added = false; try { this.cloudSolrServer.add(updateDoc, 100); this.commit(); added = true; } catch (IOException e) { log.error(e); } catch (SolrServerException e) { log.error(e); }catch(SolrException e) { log.error(e); } return added; } } Thank you very much, Mark. - Luis Cappa And 2013/3/13 Mark Miller markrmil...@gmail.com Could you capture some thread stack traces in the 'engine' and see if there are any blocking methods? - Mark On Mar 13, 2013, at 1:34 PM, Luis Cappa Banda luisca...@gmail.com wrote: Just one
Re: Poll: Largest SolrCloud out there?
Does it only count if you are using SolrCloud? We are using a traditional Master/Slave setup with Solr 4.1: 1 Master per 14 days: Documents: ~15mio Index size: ~150GB (stored fields) #of masters: +30 Performance: SUCKS big time until caches catches up. Unfortunately that takes quite some time. Issues: #1: Storage: To use SAN or not. #2: Cores per instance: what is ideal? #3: Size of cores: is 14 days optimal? #4: Performance when searching across shards. #5: Would SolrCloud be the solution for us? Med venlig hilsen / Best Regards Christian von Wendt-Jensen IT Team Lead, Customer Solutions Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K Phone +45 36 99 00 00 Mobile +45 31 17 10 07 Email christian.sonne.jen...@infopaq.commailto:christian.sonne.jen...@infopaq.com Webwww.infopaq.comhttp://www.infopaq.com/ DISCLAIMER: This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00 P Please consider the environment before printing this mail note. From: Annette Newton annette.new...@servicetick.commailto:annette.new...@servicetick.com Reply-To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Date: Wed, 13 Mar 2013 15:49:34 +0100 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: Re: Poll: Largest SolrCloud out there? 8 AWS hosts. 35GB memory per host 10Gb allocated to JVM 13 aws compute units per instance 4 Shards, 2 replicas 25M docs in total 22.4GB index per shard High writes, low reads On 13 March 2013 09:12, adm1n evgeni.evg...@gmail.commailto:evgeni.evg...@gmail.com wrote: 4 AWS hosts: Memory: 30822868k total CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz x8 17M docs 5 Gb index. 8 master-slave shards (2 shards /host). 57 msec/query avg. time. (~110K queries/24 hours). -- View this message in context: http://lucene.472066.n3.nabble.com/Poll-Largest-SolrCloud-out-there-tp4043293p4046915.html Sent from the Solr - User mailing list archive at Nabble.com. -- Annette Newton Database Administrator ServiceTick Ltd T:+44(0)1603 618326 Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ www.servicetick.com *www.sessioncam.com* -- *This message is confidential and is intended to be read solely by the addressee. The contents should not be disclosed to any other person or copies taken unless authorised to do so. If you are not the intended recipient, please notify the sender and permanently delete this message. As Internet communications are not secure ServiceTick accepts neither legal responsibility for the contents of this message nor responsibility for any change made to this message after it was forwarded by the original author.*
Advice: solrCloud + DIH
Hello, I need some advice with my solrcloud cluster and the DIH. I have a cluster with 3 cloud servers. Every server has an solr instance and a zookeeper instance. I start it with the -Dzkhost parameter. It works great, i send updates by an curl(xml) like this: curl http:/ip:SOLRport/solr/update -H Content-Type: text/xml --data-binary 'adddocfield name=id223232/fieldfield name=contenttest/field/doc/add' Solr has 2 million docs in the index. Now i want a extra field: content2. I add this in my schema and upload this again to the cluster with -Dbootstrap_confdir and -Dcollection.configName. It's replicated to the whole cluster. Now i need a re-index to add the field to every doc. I have a database with all the data and want to use the full-import of DIH(this was the way i did this in previous solr versions). When i run this it goes with 3 doc/s(Really slow). When i run solr alone(not solrcloud) it goes 600 docs/sec. What's the best way to do a full re-index with solrcloud? Does solrcloud support DIH? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Advice-solrCloud-DIH-tp4047339.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Poll: Largest SolrCloud out there?
Christian, SSDs will warm up muuuch faster. Your other questionable require more info / discussion. Otis Solr ElasticSearch Support http://sematext.com/ On Mar 14, 2013 8:47 AM, Christian von Wendt-Jensen christian.vonwendt-jen...@infopaq.com wrote: Does it only count if you are using SolrCloud? We are using a traditional Master/Slave setup with Solr 4.1: 1 Master per 14 days: Documents: ~15mio Index size: ~150GB (stored fields) #of masters: +30 Performance: SUCKS big time until caches catches up. Unfortunately that takes quite some time. Issues: #1: Storage: To use SAN or not. #2: Cores per instance: what is ideal? #3: Size of cores: is 14 days optimal? #4: Performance when searching across shards. #5: Would SolrCloud be the solution for us? Med venlig hilsen / Best Regards Christian von Wendt-Jensen IT Team Lead, Customer Solutions Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K Phone +45 36 99 00 00 Mobile +45 31 17 10 07 Email christian.sonne.jen...@infopaq.commailto: christian.sonne.jen...@infopaq.com Webwww.infopaq.comhttp://www.infopaq.com/ DISCLAIMER: This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00 P Please consider the environment before printing this mail note. From: Annette Newton annette.new...@servicetick.commailto: annette.new...@servicetick.com Reply-To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Date: Wed, 13 Mar 2013 15:49:34 +0100 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: Re: Poll: Largest SolrCloud out there? 8 AWS hosts. 35GB memory per host 10Gb allocated to JVM 13 aws compute units per instance 4 Shards, 2 replicas 25M docs in total 22.4GB index per shard High writes, low reads On 13 March 2013 09:12, adm1n evgeni.evg...@gmail.commailto: evgeni.evg...@gmail.com wrote: 4 AWS hosts: Memory: 30822868k total CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz x8 17M docs 5 Gb index. 8 master-slave shards (2 shards /host). 57 msec/query avg. time. (~110K queries/24 hours). -- View this message in context: http://lucene.472066.n3.nabble.com/Poll-Largest-SolrCloud-out-there-tp4043293p4046915.html Sent from the Solr - User mailing list archive at Nabble.com. -- Annette Newton Database Administrator ServiceTick Ltd T:+44(0)1603 618326 Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ www.servicetick.com *www.sessioncam.com* -- *This message is confidential and is intended to be read solely by the addressee. The contents should not be disclosed to any other person or copies taken unless authorised to do so. If you are not the intended recipient, please notify the sender and permanently delete this message. As Internet communications are not secure ServiceTick accepts neither legal responsibility for the contents of this message nor responsibility for any change made to this message after it was forwarded by the original author.*
Re: OutOfMemoryError
On Thu, 2013-03-14 at 13:10 +0100, Arkadi Colson wrote: When I shutdown tomcat free -m and top keeps telling me the same values. Almost no free memory... Any idea? Are you reading top free right? It is standard behaviour for most modern operating systems to have very little free memory. As long as the sum of free memory and cache is high, everything is fine. Looking at the stats you gave previously we have *top* PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND nbs p; 13666 root 20 0 86.8g 4.7g 248m S 101 39.7 478:37.45 4.7GB physical memory used and ~80GB used for memory mapping the index. *free -m** * total used free shared buffers cached Mem: 12047 11942105 0 180 6363 -/+ buffers/cache: 5399 6648 Swap: 956 75881 So 6648MB used for either general disk cache or memory mapped index. This really translates to 6648MB (plus the 105MB above) available memory as any application asking for memory will get it immediately from that pool (sorry if this is basic stuff for you). java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.OutOfMemoryError at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:127) at java.util.zip.ZipFile.init(ZipFile.java:144) at org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:157) [...] Java HotSpot(TM) 64-Bit Server VM warning: Attempt to allocate stack guard pages failed. mmap failed for CEN and END part of zip file A quick search shows that other people have had problems with ZipFile in at least some sub-versions of Java 1.7. However, another very common cause for OOM with memory mapping is that the limit for allocating virtual memory is too low. Try doing a ulimit -v on the machine. If the number is somewhere around 1 (100GB), Lucene's memory mapping of your index (the 80GB) plus the ZipFile's memory mapping plus other processes might hit the ceiling. If that is the case, simply raise the limit. - Toke
Re: Solr 4.1 monitoring with /solr/replication?command=details - indexVersion?
On Mar 14, 2013, at 8:10 AM, Rafał Radecki radecki.ra...@gmail.com wrote: Is this a bug? Yes, 4.1 had some replication issues just as you seem to describe here. It all should be fixed in 4.2 which is available now and is a simple upgrade. - Mark
Re: Advice: solrCloud + DIH
On Mar 14, 2013, at 9:22 AM, roySolr royrutten1...@gmail.com wrote: Hello, When i run this it goes with 3 doc/s(Really slow). When i run solr alone(not solrcloud) it goes 600 docs/sec. What's the best way to do a full re-index with solrcloud? Does solrcloud support DIH? Thanks SolrCloud supports DIH, but not fully and happily. It's setup to work pretty nicely with non SolrCloud - it will load pretty quick - with SolrCloud a few things can happen - one is that you might be running DIH on a replica rather than a leader - and that can change without your consent - in this case all docs will go to another node and then come back. SolrCloud also works best with multiple threads really - DIH will only use one to my knowledge. Still, at 3 docs/s, something sounds wrong. That's too slow. - Mark
Re: OutOfMemoryError
On 03/14/2013 03:11 PM, Toke Eskildsen wrote: On Thu, 2013-03-14 at 13:10 +0100, Arkadi Colson wrote: When I shutdown tomcat free -m and top keeps telling me the same values. Almost no free memory... Any idea? Are you reading top free right? It is standard behaviour for most modern operating systems to have very little free memory. As long as the sum of free memory and cache is high, everything is fine. Looking at the stats you gave previously we have *top* PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND nbs p; 13666 root 20 0 86.8g 4.7g 248m S 101 39.7 478:37.45 4.7GB physical memory used and ~80GB used for memory mapping the index. *free -m** * total used free shared buffers cached Mem: 12047 11942105 0 180 6363 -/+ buffers/cache: 5399 6648 Swap: 956 75881 So 6648MB used for either general disk cache or memory mapped index. This really translates to 6648MB (plus the 105MB above) available memory as any application asking for memory will get it immediately from that pool (sorry if this is basic stuff for you). java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.OutOfMemoryError at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:127) at java.util.zip.ZipFile.init(ZipFile.java:144) at org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:157) [...] Java HotSpot(TM) 64-Bit Server VM warning: Attempt to allocate stack guard pages failed. mmap failed for CEN and END part of zip file A quick search shows that other people have had problems with ZipFile in at least some sub-versions of Java 1.7. However, another very common cause for OOM with memory mapping is that the limit for allocating virtual memory is too low. We do not index zip files so that could not cause the problem Try doing a ulimit -v on the machine. If the number is somewhere around 1 (100GB), Lucene's memory mapping of your index (the 80GB) plus the ZipFile's memory mapping plus other processes might hit the ceiling. If that is the case, simply raise the limit. - Toke ulimit -v shows me unlimited I decreased the hard commit time to 10 seconds and set ramBufferSizeMB to 250. Hope this helps... Will keep you informed! Thanks for the explanation!
Replication
Based on what does solr replicate the whole shard again from zero? From time to time after a restart of tomcat solr copies over the whole shard to the replicator instead of doing only the changes. BR, Arkadi
Question about email search
I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one field with all the content extracted from the page, which could possibly include email addresses, this is the configuration of my schema: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SnowballPorterFilterFactory languange=Spanish/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The thing is that I'm trying to search against a field of this type (text) with a value like @gmail.com and I'm intended to get documents with that text, any advice? slds -- It is only in the mysterious equation of love that any logical reasons can be found. Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)
Re: Solr 4.1 monitoring with /solr/replication?command=details - indexVersion?
I believe this is the same issue as described, I'm running 4.2 and as you can see my slave is a couple versions ahead of the master (all three slaves show the same behavior). This was never the case until I upgraded from 4.0 to 4.2. Master: 1363272681951 93 1,022.31 MB Slave: 1363273274085 95 1,022.31 MB -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-1-monitoring-with-solr-replication-command-details-indexVersion-tp4047329p4047380.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about email search
Hi, Since you have word delimiter filter in your analysis chain, I am not sure if e-mail addresses are recognised. You can check that on solr admin UI, analysis page. If e-mail addresses kept one token, I would use leading wildcard query. q=*@gmail.com There was a similar question recently: http://search-lucene.com/m/XF2ejnM6Vi2 --- On Thu, 3/14/13, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: From: Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu Subject: Question about email search To: solr-user@lucene.apache.org Date: Thursday, March 14, 2013, 5:11 PM I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one field with all the content extracted from the page, which could possibly include email addresses, this is the configuration of my schema: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SnowballPorterFilterFactory languange=Spanish/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The thing is that I'm trying to search against a field of this type (text) with a value like @gmail.com and I'm intended to get documents with that text, any advice? slds -- It is only in the mysterious equation of love that any logical reasons can be found. Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)
Strange error in Solr 4.2
Hi We have been using Solr 4.0 for a while now and wanted to upgrade to 4.2. But our application stopped working. When we tried 4.1 it was working as expected. Here is a description of the situation. We deploy a Solr web application under java 7 on a Glassfish 3.1.2.2 server. We added some classes to the standard Solr webapp which are listening to a jms service and update the index according to the message content, which can be fetch the document with this id from that URL and add it to the index. The documents are fetched via SSL from a repository server. This has been working well since Solr 1.2 for about 6 years now. With Solr 4.2 we suddenly get the following error: javax.ejb.CreateException: Initialization failed for Singleton IndexMessageClientFactory at com.sun.ejb.containers.AbstractSingletonContainer.createSingletonEJB(AbstractSingletonContainer.java:547) ... Caused by: org.apache.http.conn.ssl.SSLInitializationException: Failure initializing default system SSL context at org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:368) at org.apache.http.conn.ssl.SSLSocketFactory.getSystemSocketFactory(SSLSocketFactory.java:204) at org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault(SchemeRegistryFactory.java:82) at org.apache.http.impl.client.SystemDefaultHttpClient.createClientConnectionManager(SystemDefaultHttpClient.java:118) at org.apache.http.impl.client.AbstractHttpClient.getConnectionManager(AbstractHttpClient.java:466) at org.apache.solr.client.solrj.impl.HttpClientUtil.setMaxConnections(HttpClientUtil.java:179) at org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:33) at org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:115) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:105) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:132) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.init(ConcurrentUpdateSolrServer.java:101) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.init(ConcurrentUpdateSolrServer.java:93) at diva.commons.search.cdi.SolrServerFactory.init(SolrServerFactory.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.ejb.containers.interceptors.BeanCallbackInterceptor.intercept(InterceptorManager.java:1009) at com.sun.ejb.containers.interceptors.CallbackChainImpl.invokeNext(CallbackChainImpl.java:65) at com.sun.ejb.containers.interceptors.CallbackInvocationContext.proceed(CallbackInvocationContext.java:113) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCallback(SystemInterceptorProxy.java:138) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.init(SystemInterceptorProxy.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.ejb.containers.interceptors.CallbackInterceptor.intercept(InterceptorManager.java:964) at com.sun.ejb.containers.interceptors.CallbackChainImpl.invokeNext(CallbackChainImpl.java:65) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:393) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:376) at com.sun.ejb.containers.AbstractSingletonContainer.createSingletonEJB(AbstractSingletonContainer.java:538) ... 103 more Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:772) at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:55) at java.security.KeyStore.load(KeyStore.java:1214) at org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:281) at org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:366) ... 134 more Caused by: java.security.UnrecoverableKeyException: Password verification failed at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:770) This exception occurs in this part new
Re: Solr 4.1 monitoring with /solr/replication?command=details - indexVersion?
What calls are you using to get the versions? Or is it the admin UI? Also can you add any details about your setup - if this is a problem, we need to duplicate it in one of our unit tests. Also, is it affecting proper replication in any way that you can tell. - Mark On Mar 14, 2013, at 11:12 AM, richardg richa...@dvdempire.com wrote: I believe this is the same issue as described, I'm running 4.2 and as you can see my slave is a couple versions ahead of the master (all three slaves show the same behavior). This was never the case until I upgraded from 4.0 to 4.2. Master: 1363272681951 93 1,022.31 MB Slave: 1363273274085 95 1,022.31 MB -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-1-monitoring-with-solr-replication-command-details-indexVersion-tp4047329p4047380.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange error in Solr 4.2
Perhaps as a result of https://issues.apache.org/jira/browse/SOLR-4451 ? Just a guess. The root cause looks to be: Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect - Mark On Mar 14, 2013, at 11:24 AM, Uwe Klosa uwe.kl...@gmail.com wrote: Hi We have been using Solr 4.0 for a while now and wanted to upgrade to 4.2. But our application stopped working. When we tried 4.1 it was working as expected. Here is a description of the situation. We deploy a Solr web application under java 7 on a Glassfish 3.1.2.2 server. We added some classes to the standard Solr webapp which are listening to a jms service and update the index according to the message content, which can be fetch the document with this id from that URL and add it to the index. The documents are fetched via SSL from a repository server. This has been working well since Solr 1.2 for about 6 years now. With Solr 4.2 we suddenly get the following error: javax.ejb.CreateException: Initialization failed for Singleton IndexMessageClientFactory at com.sun.ejb.containers.AbstractSingletonContainer.createSingletonEJB(AbstractSingletonContainer.java:547) ... Caused by: org.apache.http.conn.ssl.SSLInitializationException: Failure initializing default system SSL context at org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:368) at org.apache.http.conn.ssl.SSLSocketFactory.getSystemSocketFactory(SSLSocketFactory.java:204) at org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault(SchemeRegistryFactory.java:82) at org.apache.http.impl.client.SystemDefaultHttpClient.createClientConnectionManager(SystemDefaultHttpClient.java:118) at org.apache.http.impl.client.AbstractHttpClient.getConnectionManager(AbstractHttpClient.java:466) at org.apache.solr.client.solrj.impl.HttpClientUtil.setMaxConnections(HttpClientUtil.java:179) at org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:33) at org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:115) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:105) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:132) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.init(ConcurrentUpdateSolrServer.java:101) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.init(ConcurrentUpdateSolrServer.java:93) at diva.commons.search.cdi.SolrServerFactory.init(SolrServerFactory.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.ejb.containers.interceptors.BeanCallbackInterceptor.intercept(InterceptorManager.java:1009) at com.sun.ejb.containers.interceptors.CallbackChainImpl.invokeNext(CallbackChainImpl.java:65) at com.sun.ejb.containers.interceptors.CallbackInvocationContext.proceed(CallbackInvocationContext.java:113) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCallback(SystemInterceptorProxy.java:138) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.init(SystemInterceptorProxy.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.ejb.containers.interceptors.CallbackInterceptor.intercept(InterceptorManager.java:964) at com.sun.ejb.containers.interceptors.CallbackChainImpl.invokeNext(CallbackChainImpl.java:65) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:393) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:376) at com.sun.ejb.containers.AbstractSingletonContainer.createSingletonEJB(AbstractSingletonContainer.java:538) ... 103 more Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:772) at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:55) at java.security.KeyStore.load(KeyStore.java:1214) at org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:281) at
need general advice on how others version and mange core deployments over time
hello everyone, i know this is a general topic - but would really appreciate info from others that are doing this now. - how are others managing this so that users are impacted the least - how are others handling the scenario where users don't want to migrate forward. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/need-general-advice-on-how-others-version-and-mange-core-deployments-over-time-tp4047390.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange error in Solr 4.2
Thanks, but nobody has tempered with keystores. I have tested the application on different machines. Always the same exception is thrown. Do we have to set some system property to fix this? /Uwe On 14 March 2013 16:36, Mark Miller markrmil...@gmail.com wrote: Perhaps as a result of https://issues.apache.org/jira/browse/SOLR-4451 ? Just a guess. The root cause looks to be: Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect - Mark On Mar 14, 2013, at 11:24 AM, Uwe Klosa uwe.kl...@gmail.com wrote: Hi We have been using Solr 4.0 for a while now and wanted to upgrade to 4.2. But our application stopped working. When we tried 4.1 it was working as expected. Here is a description of the situation. We deploy a Solr web application under java 7 on a Glassfish 3.1.2.2 server. We added some classes to the standard Solr webapp which are listening to a jms service and update the index according to the message content, which can be fetch the document with this id from that URL and add it to the index. The documents are fetched via SSL from a repository server. This has been working well since Solr 1.2 for about 6 years now. With Solr 4.2 we suddenly get the following error: javax.ejb.CreateException: Initialization failed for Singleton IndexMessageClientFactory at com.sun.ejb.containers.AbstractSingletonContainer.createSingletonEJB(AbstractSingletonContainer.java:547) ... Caused by: org.apache.http.conn.ssl.SSLInitializationException: Failure initializing default system SSL context at org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:368) at org.apache.http.conn.ssl.SSLSocketFactory.getSystemSocketFactory(SSLSocketFactory.java:204) at org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault(SchemeRegistryFactory.java:82) at org.apache.http.impl.client.SystemDefaultHttpClient.createClientConnectionManager(SystemDefaultHttpClient.java:118) at org.apache.http.impl.client.AbstractHttpClient.getConnectionManager(AbstractHttpClient.java:466) at org.apache.solr.client.solrj.impl.HttpClientUtil.setMaxConnections(HttpClientUtil.java:179) at org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:33) at org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:115) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:105) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:132) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.init(ConcurrentUpdateSolrServer.java:101) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.init(ConcurrentUpdateSolrServer.java:93) at diva.commons.search.cdi.SolrServerFactory.init(SolrServerFactory.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.ejb.containers.interceptors.BeanCallbackInterceptor.intercept(InterceptorManager.java:1009) at com.sun.ejb.containers.interceptors.CallbackChainImpl.invokeNext(CallbackChainImpl.java:65) at com.sun.ejb.containers.interceptors.CallbackInvocationContext.proceed(CallbackInvocationContext.java:113) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCallback(SystemInterceptorProxy.java:138) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.init(SystemInterceptorProxy.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.ejb.containers.interceptors.CallbackInterceptor.intercept(InterceptorManager.java:964) at com.sun.ejb.containers.interceptors.CallbackChainImpl.invokeNext(CallbackChainImpl.java:65) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:393) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:376) at com.sun.ejb.containers.AbstractSingletonContainer.createSingletonEJB(AbstractSingletonContainer.java:538) ... 103 more Caused by: java.io.IOException: Keystore was tampered with, or
Handling a closed IndexWriter in SOLR 4.0
Hey all, We're using a Solr 4 core to handle our article data. When someone in our CMS publishes an article, we have a listener that indexes it straight to solr. We use the previously instantiated HttpSolrServer, build the solr document, add it with server.add(doc) .. then do a server.commit() right away. For some reason, sometimes this exception is thrown, which I suspect is related to a simultaneous data import done from another client which sometimes errors: Feb 26, 2013 5:07:51 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1310) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1422) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1200) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:560) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:999) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:565) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:550) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:563) at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4196) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:266) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1256) ... 28 more I'm not sure if the error is causing the IndexWriter to close, and why an IndexWriter would be shared across clients, but usually, I can get around this by basically creating a new HttpSolrServer and trying again. But it doesn't always work, perhaps due to frequency… I don't like the idea of an infinite loop of creating connections until it works. I'd rather understand what's going on. What's the proper way to fix this? I see I can add a doc with a commitWithMs of 0 and maybe this couples the add tightly with the commit and would prevent interference. But am I totally off the mark here as to the problem? Suggestions? Posted this on java-user before, but then realized solr-user existed, so please forgive the redundancy… Thanks for reading! - Scott
Re: Strange error in Solr 4.2
I found the answer myself. Thanks for the pointer. Cheers Uwe On 14 March 2013 16:48, Uwe Klosa uwe.kl...@gmail.com wrote: Thanks, but nobody has tempered with keystores. I have tested the application on different machines. Always the same exception is thrown. Do we have to set some system property to fix this? /Uwe On 14 March 2013 16:36, Mark Miller markrmil...@gmail.com wrote: Perhaps as a result of https://issues.apache.org/jira/browse/SOLR-4451 ? Just a guess. The root cause looks to be: Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect - Mark On Mar 14, 2013, at 11:24 AM, Uwe Klosa uwe.kl...@gmail.com wrote: Hi We have been using Solr 4.0 for a while now and wanted to upgrade to 4.2. But our application stopped working. When we tried 4.1 it was working as expected. Here is a description of the situation. We deploy a Solr web application under java 7 on a Glassfish 3.1.2.2 server. We added some classes to the standard Solr webapp which are listening to a jms service and update the index according to the message content, which can be fetch the document with this id from that URL and add it to the index. The documents are fetched via SSL from a repository server. This has been working well since Solr 1.2 for about 6 years now. With Solr 4.2 we suddenly get the following error: javax.ejb.CreateException: Initialization failed for Singleton IndexMessageClientFactory at com.sun.ejb.containers.AbstractSingletonContainer.createSingletonEJB(AbstractSingletonContainer.java:547) ... Caused by: org.apache.http.conn.ssl.SSLInitializationException: Failure initializing default system SSL context at org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:368) at org.apache.http.conn.ssl.SSLSocketFactory.getSystemSocketFactory(SSLSocketFactory.java:204) at org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault(SchemeRegistryFactory.java:82) at org.apache.http.impl.client.SystemDefaultHttpClient.createClientConnectionManager(SystemDefaultHttpClient.java:118) at org.apache.http.impl.client.AbstractHttpClient.getConnectionManager(AbstractHttpClient.java:466) at org.apache.solr.client.solrj.impl.HttpClientUtil.setMaxConnections(HttpClientUtil.java:179) at org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:33) at org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:115) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:105) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:132) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.init(ConcurrentUpdateSolrServer.java:101) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.init(ConcurrentUpdateSolrServer.java:93) at diva.commons.search.cdi.SolrServerFactory.init(SolrServerFactory.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.ejb.containers.interceptors.BeanCallbackInterceptor.intercept(InterceptorManager.java:1009) at com.sun.ejb.containers.interceptors.CallbackChainImpl.invokeNext(CallbackChainImpl.java:65) at com.sun.ejb.containers.interceptors.CallbackInvocationContext.proceed(CallbackInvocationContext.java:113) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCallback(SystemInterceptorProxy.java:138) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.init(SystemInterceptorProxy.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.ejb.containers.interceptors.CallbackInterceptor.intercept(InterceptorManager.java:964) at com.sun.ejb.containers.interceptors.CallbackChainImpl.invokeNext(CallbackChainImpl.java:65) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:393) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:376) at
Re: Replication
Hi Arkadi, If the update delta between the shard leader and replica 100 docs, then Solr punts and replicas the entire index. Last I heard, the 100 was hard-coded in 4.0 so is not configurable. This makes sense because the replica shouldn't be out-of-sync with the leader unless it has been offline. Cheers, Tim On Thu, Mar 14, 2013 at 9:05 AM, Arkadi Colson ark...@smartbit.be wrote: Based on what does solr replicate the whole shard again from zero? From time to time after a restart of tomcat solr copies over the whole shard to the replicator instead of doing only the changes. BR, Arkadi
Out of Memory doing a query Solr 4.2
Hi After doing a query to Solr to get the uniqueIds (string of 20 characters) of 700 documents in a collection, I'm getting an out of memory error using Solr 4.2. I tried to increase the JVM-Memory 1G (from 3G to 4G) however this didn't change anything. This was working on 3.5. I've moved from 3.5 to 4.2. Did anyone have the same problem? Thanks -- Details : Solr 4.2 Solr Index 20G aprox. JVM: IBM J9 VM(1.6.0.2.4) JVM-Memory:4G S.O. Linux Processors 8 RAM: 101G org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:218) at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:333) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:394) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:284) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:322) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1714) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:898) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:920) at java.lang.Thread.run(Thread.java:736) Caused by: java.lang.OutOfMemoryError at java.util.Arrays.copyOfRange(Arrays.java:4114) at java.util.Arrays.copyOf(Arrays.java:3833) at java.lang.StringCoding.safeTrim(StringCoding.java:686) at java.lang.StringCoding.access$300(StringCoding.java:41) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:739) at java.lang.StringCoding.decode(StringCoding.java:746) at java.lang.String.init(String.java:2036) at java.lang.String.init(String.java:2011) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:143) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:272) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:139) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:116) at org.apache.lucene.index.IndexReader.document(IndexReader.java:436) at org.apache.lucene.document.LazyDocument.getDocument(LazyDocument.java:65) at org.apache.lucene.document.LazyDocument.access$000(LazyDocument.java:36) at org.apache.lucene.document.LazyDocument$LazyField.stringValue(LazyDocument.java:105) at org.apache.solr.schema.FieldType.toExternal(FieldType.java:346) at org.apache.solr.schema.FieldType.toObject(FieldType.java:355) at org.apache.solr.response.BinaryResponseWriter$Resolver.getValue(BinaryResponseWriter.java:208) at org.apache.solr.response.BinaryResponseWriter$Resolver.getDoc(BinaryResponseWriter.java:186) at org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:147) at org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:173) at org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:86) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:154) at org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:144) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:234) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:149) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:92) at
ids request to shard with star query are slow
ids request to shard with star query are slow I have a distributed solr environment and I am investigating all the request where the shard took significant amount of time. One common pattern i saw was all the ids request with q=*:* and ids=some id took around 2-3sec. i picked some shard request q=xyz and ids=some id and all of them took only few milliseconds. I copied the params and manually sent the same request to that particular shard and again it took around 2.5 sec. But when i removed the query (q=*:*) parameter and sent the same set of params to the same shard i got the response back in 10 or millisecond. in both cases the response had the document i am looking for. took 2-3 sec - q=*:* qt=search ids=123 isShard=true took 20ms - qt=search ids=123 isShard=true In my understanding ids param is used to get the stored field in a distributed search. Why does the query parameter (q=) matter here ? Thanks Srini -- View this message in context: http://lucene.472066.n3.nabble.com/ids-request-to-shard-with-star-query-are-slow-tp4047395.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange error in Solr 4.2
On Thursday, March 14, 2013 at 4:57 PM, Uwe Klosa wrote: I found the answer myself. Thanks for the pointer. Would you mind sharing you answer, Uwe?
Re: Out of Memory doing a query Solr 4.2
On Thu, Mar 14, 2013 at 12:07 PM, raulgrande83 raulgrand...@hotmail.com wrote: JVM: IBM J9 VM(1.6.0.2.4) I don't recommend using this JVM.
Re: Strange error in Solr 4.2
On 3/14/2013 9:24 AM, Uwe Klosa wrote: This exception occurs in this part new ConcurrentUpdateSolrServer(http://solr.diva-portal.org:8080/search;, 5, 50) Side comment, unrelated to your question: If you're already aware that ConcurrentUpdateSolrServer has no built-in error handling and you're OK with that, then you don't need to be concerned with this message. ConcurrentUpdateSolrServer swallows any exception that happens during its operation. Errors get logged, but are not passed back to the calling application. Update requests always succeed, even if Solr is completely down. I have been told that it is possible to override the handleError method to fix this, but I don't know what code to actually use. Thanks, Shawn
Re: Strange error in Solr 4.2
On Mar 14, 2013, at 1:27 PM, Shawn Heisey s...@elyograg.org wrote: I have been told that it is possible to override the handleError method to fix this I'd say mitigate more than fix. I think the real fix requires some dev work. - Mark
Re: OutOfMemoryError
On 3/14/2013 3:35 AM, Arkadi Colson wrote: Hi I'm getting this error after a few hours of filling solr with documents. Tomcat is running with -Xms1024m -Xmx4096m. Total memory of host is 12GB. Softcommits are done every second and hard commits every minute. Any idea why this is happening and how to avoid this? *top* PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 13666 root 20 0 86.8g 4.7g 248m S 101 39.7 478:37.45 /usr/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -server -Xms1024m -Xmx4096m -XX:PermSize=64m -XX:MaxPermSize=128m -Duser.timezone=UTC -Dfile.encoding=UTF8 -Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName 22247 root 20 0 2430m 409m 4176 S0 3.4 1:23.43 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bi *free -m** * total used free shared buffers cached Mem: 12047 11942105 0180 6363 -/+ buffers/cache: 5399 6648 Swap: 956 75881 As you've already been told, this looks like you have about 80GB of index. I ran into Out Of Memory problems with heavy indexing with a 4GB heap on a total index size just a little bit smaller than this. I had to increase the heap size to 8GB. With heap sizes this large, you'll see garbage collection pause problems without careful tuning. You're probably already having these problems with the 4GB heap, but they'll get much worse with an 8GB heap. Here are the memory options I'm using that got rid of my GC pause problem. I'm using these with with the Sun/Oracle JVM, on both 1.6 and 1.7: -Xmx8192M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts I notice that you've got options that change the PermSize and MaxPermSize. You probably don't need these options, unless you know that you'll run into problems without it. Additional note: if you have greatly increased RamBufferSizeMB, try reducing it to 100, the default on recent versions. The default used to be 32. Either amount is usually plenty, unless you have huge documents. Side comment: 12GB total RAM isn't going to be enough memory for top performance with 80GB of index. You'll probably need 8GB of java heap, plus between 40 and 80GB of memory for the OS disk cache, to fit a large chunk (or all) of your index into RAM. 48GB would be a good start, 64 to 128GB would be better. Thanks, Shawn
Meaning of Current in Solr Cloud Statistics
Hi everyone, Is there an official definition of the Current flag under Core Home Statistics? What would it mean if a shard leader is not Current? Thanks, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game
Solr 4.2 mechanism proxy request error
Hi, I think that in solr 4.2 the new feature to proxy a request if the collection is not in the requested node has a bug. If I do a query with the parameter rows=0 and the node doesn't have the collection. If the parameter is rows=4 or superior then the search works as expected the curl returns The output of wget is: Connecting to 192.168.20.48:8983... connected. HTTP request sent, awaiting response... 200 OK Length: 210 [application/xml] Saving to: ‘select?q=*:*rows=0’ 0% [ ] 0 --.-K/s in 0s 2013-03-14 18:01:04 (0.00 B/s) - Connection closed at byte 0. Retrying. Curl says: curl http://192.168.20.48:8983/solr/ST-3A856BBCA3_12/select?q=*%3A*rows=0; curl: (56) Problem (2) in the Chunked-Encoded data Chrome says: This webpage is not available The webpage at http://192.168.20.48:8983/solr/ST-3A856BBCA3_12/select?q=*%3A*rows=0wt=xmlindent=true might be temporarily down or it may have moved permanently to a new web address. Error 321 (net::ERR_INVALID_CHUNKED_ENCODING): Unknown error. Someone have the same issue? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-mechanism-proxy-request-error-tp4047433.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2 mechanism proxy request error
I'll add a test with rows = 0 and see how easy it is to replicate. Looks to me like you should file a JIRA issue in any case. - Mark On Mar 14, 2013, at 2:04 PM, yriveiro yago.rive...@gmail.com wrote: Hi, I think that in solr 4.2 the new feature to proxy a request if the collection is not in the requested node has a bug. If I do a query with the parameter rows=0 and the node doesn't have the collection. If the parameter is rows=4 or superior then the search works as expected the curl returns The output of wget is: Connecting to 192.168.20.48:8983... connected. HTTP request sent, awaiting response... 200 OK Length: 210 [application/xml] Saving to: ‘select?q=*:*rows=0’ 0% [ ] 0 --.-K/s in 0s 2013-03-14 18:01:04 (0.00 B/s) - Connection closed at byte 0. Retrying. Curl says: curl http://192.168.20.48:8983/solr/ST-3A856BBCA3_12/select?q=*%3A*rows=0; curl: (56) Problem (2) in the Chunked-Encoded data Chrome says: This webpage is not available The webpage at http://192.168.20.48:8983/solr/ST-3A856BBCA3_12/select?q=*%3A*rows=0wt=xmlindent=true might be temporarily down or it may have moved permanently to a new web address. Error 321 (net::ERR_INVALID_CHUNKED_ENCODING): Unknown error. Someone have the same issue? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-mechanism-proxy-request-error-tp4047433.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2 mechanism proxy request error
The log of the UI null:org.apache.solr.common.SolrException: Error trying to proxy request for url: http://192.168.20.47:8983/solr/ST-3A856BBCA3_12/select I will open the issue in Jira. Thanks - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-mechanism-proxy-request-error-tp4047433p4047440.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Version conflict during data import from another Solr instance into clean Solr
: It looks strange to me that if there is no document yet (foundVersion 0) : then the only case when document will be imported is when input version is : negative. Guess I need to test specific cases using SolrJ or smth. to be sure. you're assuming that if foundVersion 0 that means no document *yet* ... it could also mean there was a document, and it's been deleted. Either way if the client has said (replace|update) version X of doc D the code is failing because it can't: doc D does not exist with version X. Regardless of whether someone deleted doc D, or replaced it it with a newer version, or it never existed i nthe first place, Solr can't do what you asked it to do. : Anyway I'll also check if I can inherit from SolrEntityProcessor and override : _version_ field there before insertion. Easier solutions to consider (off the cuff, not tested)... 1) on in your SolrEntityProcessor, configure fl with something like this to alias the _version_ field to something else fl=*,old_version:_version_ 2) configure your destination solr instance with an update chain that ignores the _version_ field (you wouldn't want this for most normal usage, but it would be suitable for thiese conds of from scratch imports from other solr instances)... https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html -Hoss
Re: Question about email search
Sorry for the duplicated mail :-(, any advice on a configuration for searching emails in a field that does not have only email addresses, so the email addresses are contained in larger textual messages? - Mensaje original - De: Ahmet Arslan iori...@yahoo.com Para: solr-user@lucene.apache.org Enviados: Jueves, 14 de Marzo 2013 11:23:47 Asunto: Re: Question about email search Hi, Since you have word delimiter filter in your analysis chain, I am not sure if e-mail addresses are recognised. You can check that on solr admin UI, analysis page. If e-mail addresses kept one token, I would use leading wildcard query. q=*@gmail.com There was a similar question recently: http://search-lucene.com/m/XF2ejnM6Vi2 --- On Thu, 3/14/13, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: From: Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu Subject: Question about email search To: solr-user@lucene.apache.org Date: Thursday, March 14, 2013, 5:11 PM I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one field with all the content extracted from the page, which could possibly include email addresses, this is the configuration of my schema: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SnowballPorterFilterFactory languange=Spanish/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The thing is that I'm trying to search against a field of this type (text) with a value like @gmail.com and I'm intended to get documents with that text, any advice? slds -- It is only in the mysterious equation of love that any logical reasons can be found. Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)
Searching across multiple collections (cores)
I've been looking all over for a clear answer to this question and can't seem to find one. It seems like a very basic concept to me though so maybe I'm using the wrong terminology. I want to be able to search across multiple collections (as it is now called in SolrCloud world, previously called Cores). I want the scoring, sorting, faceting etc. to be blended, that is to be relevant to data from all the collections, not just a set of independent results per collection. Is that possible? A real-world example would be a merchandise site that has books, movies and music. The index for each of those is quite different and they would have their own schema.xml (and therefore be their own Collection). When in the 'books' area of a website the users could search on fields specific to books (ISBN for example). However on a 'home' page a search would span across all 3 product lines, and the results should be scored relative to each other, not just relative to other items in their specific collection. Is this possible in v4.0? I'm pretty sure it wasn't in v1.4.1. But it seems to be a fundamentally useful concept, I was wondering if it had been addressed yet. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-across-multiple-collections-cores-tp4047457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching across multiple collections (cores)
Yes, with SolrCloud, it's just the collection param (as long as the schemas are compatible for this): http://wiki.apache.org/solr/SolrCloud#Distributed_Requests - Mark On Mar 14, 2013, at 2:55 PM, kfdroid kfdr...@gmail.com wrote: I've been looking all over for a clear answer to this question and can't seem to find one. It seems like a very basic concept to me though so maybe I'm using the wrong terminology. I want to be able to search across multiple collections (as it is now called in SolrCloud world, previously called Cores). I want the scoring, sorting, faceting etc. to be blended, that is to be relevant to data from all the collections, not just a set of independent results per collection. Is that possible? A real-world example would be a merchandise site that has books, movies and music. The index for each of those is quite different and they would have their own schema.xml (and therefore be their own Collection). When in the 'books' area of a website the users could search on fields specific to books (ISBN for example). However on a 'home' page a search would span across all 3 product lines, and the results should be scored relative to each other, not just relative to other items in their specific collection. Is this possible in v4.0? I'm pretty sure it wasn't in v1.4.1. But it seems to be a fundamentally useful concept, I was wondering if it had been addressed yet. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-across-multiple-collections-cores-tp4047457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Meaning of Current in Solr Cloud Statistics
Hey Michael I was a bit confused because you mentioned SolrCloud in the subject. We're talking about http://host:port/solr/#/collection1 (f.e.) right? And there, the left-upper Box Statistics ? If so, the Output comes from /solr/collection1/admin/luke ( http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java?view=markup#l551 ) which uses DirectoryReader.isCurrent() under the Hood. That method contains a explanation in its javadocs: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/DirectoryReader.html#isCurrent() HTH Stefan On Thursday, March 14, 2013 at 7:01 PM, Michael Della Bitta wrote: Hi everyone, Is there an official definition of the Current flag under Core Home Statistics? What would it mean if a shard leader is not Current? Thanks, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com (http://www.appinions.com) Where Influence Isn’t a Game
Re: Meaning of Current in Solr Cloud Statistics
Stefan, Thanks a lot! Makes sense. So I don't have to worry about my leader thinking it's out of date, then. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Mar 14, 2013 at 3:11 PM, Stefan Matheis matheis.ste...@gmail.com wrote: Hey Michael I was a bit confused because you mentioned SolrCloud in the subject. We're talking about http://host:port/solr/#/collection1 (f.e.) right? And there, the left-upper Box Statistics ? If so, the Output comes from /solr/collection1/admin/luke ( http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java?view=markup#l551 ) which uses DirectoryReader.isCurrent() under the Hood. That method contains a explanation in its javadocs: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/DirectoryReader.html#isCurrent() HTH Stefan On Thursday, March 14, 2013 at 7:01 PM, Michael Della Bitta wrote: Hi everyone, Is there an official definition of the Current flag under Core Home Statistics? What would it mean if a shard leader is not Current? Thanks, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com (http://www.appinions.com) Where Influence Isn’t a Game
Re: Meaning of Current in Solr Cloud Statistics
Perhaps the wording of Current is a bit too generic in that context? I'd like to change that description if that clarifies things .. but not sure which one is a better fit? On Thursday, March 14, 2013 at 8:26 PM, Michael Della Bitta wrote: Stefan, Thanks a lot! Makes sense. So I don't have to worry about my leader thinking it's out of date, then. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com (http://www.appinions.com) Where Influence Isn’t a Game On Thu, Mar 14, 2013 at 3:11 PM, Stefan Matheis matheis.ste...@gmail.com (mailto:matheis.ste...@gmail.com) wrote: Hey Michael I was a bit confused because you mentioned SolrCloud in the subject. We're talking about http://host:port/solr/#/collection1 (f.e.) right? And there, the left-upper Box Statistics ? If so, the Output comes from /solr/collection1/admin/luke ( http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java?view=markup#l551 ) which uses DirectoryReader.isCurrent() under the Hood. That method contains a explanation in its javadocs: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/DirectoryReader.html#isCurrent() HTH Stefan On Thursday, March 14, 2013 at 7:01 PM, Michael Della Bitta wrote: Hi everyone, Is there an official definition of the Current flag under Core Home Statistics? What would it mean if a shard leader is not Current? Thanks, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com (http://www.appinions.com) Where Influence Isn’t a Game
Re: Meaning of Current in Solr Cloud Statistics
Something like 'Reader is Current' might be better. Personally, I don't even know if it's worth showing. - Mark On Mar 14, 2013, at 3:40 PM, Stefan Matheis matheis.ste...@gmail.com wrote: Perhaps the wording of Current is a bit too generic in that context? I'd like to change that description if that clarifies things .. but not sure which one is a better fit? On Thursday, March 14, 2013 at 8:26 PM, Michael Della Bitta wrote: Stefan, Thanks a lot! Makes sense. So I don't have to worry about my leader thinking it's out of date, then. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com (http://www.appinions.com) Where Influence Isn’t a Game On Thu, Mar 14, 2013 at 3:11 PM, Stefan Matheis matheis.ste...@gmail.com (mailto:matheis.ste...@gmail.com) wrote: Hey Michael I was a bit confused because you mentioned SolrCloud in the subject. We're talking about http://host:port/solr/#/collection1 (f.e.) right? And there, the left-upper Box Statistics ? If so, the Output comes from /solr/collection1/admin/luke ( http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java?view=markup#l551 ) which uses DirectoryReader.isCurrent() under the Hood. That method contains a explanation in its javadocs: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/DirectoryReader.html#isCurrent() HTH Stefan On Thursday, March 14, 2013 at 7:01 PM, Michael Della Bitta wrote: Hi everyone, Is there an official definition of the Current flag under Core Home Statistics? What would it mean if a shard leader is not Current? Thanks, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com (http://www.appinions.com) Where Influence Isn’t a Game
Solr indexing binary files
Hi, I am new with Solr and I am extracting metadata from binary files through URLs stored in my database. I would like to know what fields are available for indexing from PDFs (the ones that would be initiated as in column=””). For example how would I extract something like file size, format or file type. I would also like to know how to create customized fields in Solr. How those metadata and text content are mapped into Solr schema? Would I have to declare that in the solrconfig.xml or do some more tweaking somewhere else? If someone has a code snippet that could show me it would be greatly appreciated. Thank you in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-binary-files-tp4047470.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr indexing binary files
Take a look at Solr Cell: http://wiki.apache.org/solr/ExtractingRequestHandler Include a dynamicField with a * pattern and you will see the wide variety of metadata that is available for PDF and other rich document formats. -- Jack Krupansky -Original Message- From: Luis Sent: Thursday, March 14, 2013 3:30 PM To: solr-user@lucene.apache.org Subject: Solr indexing binary files Hi, I am new with Solr and I am extracting metadata from binary files through URLs stored in my database. I would like to know what fields are available for indexing from PDFs (the ones that would be initiated as in column=””). For example how would I extract something like file size, format or file type. I would also like to know how to create customized fields in Solr. How those metadata and text content are mapped into Solr schema? Would I have to declare that in the solrconfig.xml or do some more tweaking somewhere else? If someone has a code snippet that could show me it would be greatly appreciated. Thank you in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-binary-files-tp4047470.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about email search
Sure. copyField it into a new indexed non-stored field with the following type definition: fieldType name=address_email class=solr.TextField analyzer tokenizer class=solr.UAX29URLEmailTokenizerFactory/ filter class=solr.TypeTokenFilterFactory types=filter_email.txt enablePositionIncrements=true useWhitelist=true/ /analyzer /fieldType Content of filter_email.txt is (including signs): EMAIL You will have the emails only left as tokens. Can't display them easily, but can certainly search. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Mar 14, 2013 at 2:33 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Sorry for the duplicated mail :-(, any advice on a configuration for searching emails in a field that does not have only email addresses, so the email addresses are contained in larger textual messages? - Mensaje original - De: Ahmet Arslan iori...@yahoo.com Para: solr-user@lucene.apache.org Enviados: Jueves, 14 de Marzo 2013 11:23:47 Asunto: Re: Question about email search Hi, Since you have word delimiter filter in your analysis chain, I am not sure if e-mail addresses are recognised. You can check that on solr admin UI, analysis page. If e-mail addresses kept one token, I would use leading wildcard query. q=*@gmail.com There was a similar question recently: http://search-lucene.com/m/XF2ejnM6Vi2 --- On Thu, 3/14/13, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: From: Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu Subject: Question about email search To: solr-user@lucene.apache.org Date: Thursday, March 14, 2013, 5:11 PM I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one field with all the content extracted from the page, which could possibly include email addresses, this is the configuration of my schema: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SnowballPorterFilterFactory languange=Spanish/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The thing is that I'm trying to search against a field of this type (text) with a value like @gmail.com and I'm intended to get documents with that text, any advice? slds -- It is only in the mysterious equation of love that any logical reasons can be found. Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)
Re: Handling a closed IndexWriter in SOLR 4.0
Hi Scott, Not sure why IW would be closed, but: * consider not (hard) committing after each doc, but just periodically, every N minutes * soft committing instead * using 4.2 Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Mar 14, 2013 at 11:55 AM, Danzig, Scott scott.dan...@nymag.comwrote: Hey all, We're using a Solr 4 core to handle our article data. When someone in our CMS publishes an article, we have a listener that indexes it straight to solr. We use the previously instantiated HttpSolrServer, build the solr document, add it with server.add(doc) .. then do a server.commit() right away. For some reason, sometimes this exception is thrown, which I suspect is related to a simultaneous data import done from another client which sometimes errors: Feb 26, 2013 5:07:51 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1310) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1422) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1200) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:560) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:999) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:565) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:550) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:563) at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4196) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:266) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1256) ... 28 more I'm not sure if the error is causing the IndexWriter to close, and why an IndexWriter would be shared across clients, but usually, I can get around this by basically creating a new HttpSolrServer and trying again. But it doesn't always work, perhaps due to frequency… I don't like the idea of an infinite loop of creating connections until it works. I'd rather understand what's going on. What's the proper way to fix this? I see I can add a doc with a commitWithMs of 0 and maybe this couples the add tightly with the commit and would prevent interference. But am I totally off the mark here as to the problem? Suggestions? Posted this on java-user before, but then realized solr-user existed, so please forgive the
Re: Blog Post: Integration Testing SOLR Index with Maven
Wow! That's great. And it's a lot of work, especially getting it all keyboard-complete. Thank you. On 03/14/2013 01:29 AM, Chantal Ackermann wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testing of a SOLR index configuration. http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/ Feedback or comments appreciated! And again, thanks for that great piece of software. Chantal
Re: Advice: solrCloud + DIH
3docs/s is lower, I test with 4 node is more 1000docs/s and 4k/doc with solrcloud. Every leader has a replica. I am tuning to improve to 3000docs/s. 3docs/s is too slow. 3x! -- View this message in context: http://lucene.472066.n3.nabble.com/Advice-solrCloud-DIH-tp4047339p4047559.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Embedded Solr
give u to test embeded solr: import java.io.File; import java.io.IOException; import java.net.MalformedURLException; import java.util.ArrayList; import java.util.Collection; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.SimpleAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.ParseException; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer; import org.apache.solr.common.SolrDocumentList; import org.apache.solr.common.SolrInputDocument; import org.apache.solr.core.CoreContainer; public class EmbededSolrTest { private static int commitNum = 5000; private static String path = /home/solr/Rollin/solr-4.1.0/embeddedExample; /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { if(args != null) { if(args.length 0) { path = args[0].trim(); } if(args.length 1) { commitNum = Integer.parseInt(args[1].trim()); } } //path = D:\\program\\solr\\41embededtest; System.setProperty(solr.solr.home, path); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); addIndex(server); //query(server); //deleteAllDoc(server); } public static void query(SolrServer server) throws Exception { try { SolrQuery q = new SolrQuery(); q.setQuery(*:*); q.setStart(0); q.setRows(20); SolrDocumentList list = server.query(q).getResults(); System.out.println(list.getNumFound()); } catch(Exception e) { e.printStackTrace(); } finally { server.shutdown(); } } public static void deleteAllDoc(SolrServer server) throws Exception { try { server.deleteByQuery(*:*); server.commit(); query(server); } catch(Exception e) { e.printStackTrace(); } finally { server.shutdown(); } } public static void addIndex(SolrServer solrServer) throws IOException, ParseException { String path = index; Analyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35); //Analyzer analyzer = new SimpleAnalyzer(); Directory directonry = FSDirectory.open(new File(path)); IndexReader ireader = IndexReader.open(directonry); IndexSearcher isearcher = new IndexSearcher(ireader); QueryParser parser = new QueryParser(Version.LUCENE_35, , analyzer); Query query = parser.parse(*:*); TopDocs hits = isearcher.search(query, null, 100); System.out.println(find size: + hits.totalHits); java.net.InetAddress addr = java.net.InetAddress.getLocalHost(); String computerName = addr.getHostName(); //insert2Solr(solrServer, isearcher, hits); long beginTime = System.currentTimeMillis(); long totalTime = 0; System.out.println(begin time: + beginTime); try { CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for(int i = 0; i hits.scoreDocs.length; i ++ ) { SolrInputDocument doc = new SolrInputDocument(); Document hitDoc = isearcher.doc(hits.scoreDocs[i].doc); doc.addField(id, i + a + computerName + Thread.currentThread().getId()); doc.addField(text, hitDoc.get(text)); docs.add(doc);
Re: discovery-based core enumeration with embedded solr
H, could you raise a JIRA and assign it to me? Please be sure and emphasize that it's embedded because I'm pretty sure this is fine for the regular case. But I have to admit that the embedded case completely slipped under the radar. Even better if you could make a test case, but that might not be straightforward... Thanks, Erick On Wed, Mar 13, 2013 at 5:28 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: Has the new core enumeration strategy been implemented in the CoreContainer.Initializer.**initialize() code path? It doesn't seem like it has. I get this exception: Caused by: org.apache.solr.common.**SolrException: Could not load config for solrconfig.xml at org.apache.solr.core.**CoreContainer.createFromLocal(** CoreContainer.java:991) at org.apache.solr.core.**CoreContainer.create(** CoreContainer.java:1051) ... 10 more Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or 'solr-multi/collection1/conf/'**, cwd=/proj/lux at org.apache.solr.core.**SolrResourceLoader.**openResource(** SolrResourceLoader.java:318) at org.apache.solr.core.**SolrResourceLoader.openConfig(** SolrResourceLoader.java:283) at org.apache.solr.core.Config.**init(Config.java:103) at org.apache.solr.core.Config.**init(Config.java:73) at org.apache.solr.core.**SolrConfig.init(SolrConfig.**java:117) at org.apache.solr.core.**CoreContainer.createFromLocal(** CoreContainer.java:989) ... 11 more even though I have a solr.properties file in solr-multi (which is my solr.home), and core.properties in some subdirectories of that -- Michael Sokolov Senior Architect Safari Books Online
Re: Can we manipulate termfreq to count as 1 for multiple matches?
Hi! Take a look on http://wiki.apache.org/solr/SchemaXml#Common_field_options parameter *omitTermFreqAndPositions* or you can use a custom similarity class that overrides the term freq and return one for only that field. http://wiki.apache.org/solr/SchemaXml#Similarity fieldType name=text_dfr class=solr.TextField analyzer class=org.apache.lucene.analysis.standard.StandardAnalyzer/ similarity class=solr.MyCustomSimiliratyWithoutTermFreq /similarity /fieldType Best, On Wed, Mar 13, 2013 at 8:43 PM, roz dev rozde...@gmail.com wrote: Hi All I am wondering if there is a way to alter term frequency of a certain field as 1, even if there are multiple matches in that document? Use Case is: Let's say that I have a document with 2 fields - Name and - Description And, there is a document with data like this Document_1 Name = Blue Jeans Description = This jeans is very soft. Jeans is pretty nice. Now, If I Search for Jeans then Jeans is found in 2 places in Description field. Term Frequency for Description is 2 I want Solr to count term frequency for Description as 1 even if Jeans is found multiple times in this field. For all other fields, i do want to get the term frequency, as it is. Is this doable in Solr with any of the functions? Any inputs are welcome. Thanks Saroj -- Felipe Lahti Consultant Developer - ThoughtWorks Porto Alegre
SOLR Num Docs vs NumFound
On my solr 4 setup a query returns a higher NumFound value during a *:* query than the Num Docs value reported on the statistics page of collection1. Why is that? My data is split across 3 data import handlers where each handler has the same type of data but the ids are guaranteed to be different. Are some of my documents not hard commited? If so, how do I hard commit. Otherwise, why are these numbers different? -- CTO Zenlok株式会社
Re: Solr Replication
Hi, I have a multi core setup and there is continuous updation going on in each core. Hence I dont prefer a bckup as it would either cause a downtime or if during a backup there is a write activity my backup will be corrupted. Can you please suggest if there is a cleaner way to handle this -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-tp4047266p4047591.html Sent from the Solr - User mailing list archive at Nabble.com.