Re: httpclient.ProtocolException using Solrj
Thanks Shalin and Paul. I'm not using MultipartRequest. I do share the same SolrServer between two threads. I'm not using MultiThreadedHttpConnectionManager. I'm simply using CommonsHttpSolrServer to create the SolrServer. I've also tried StreamingUpdateSolrServer, which works much faster, but does throws connection reset exception once in a while. Do I need to use MultiThreadedHttpConnectionManager? I couldn't find anything on it on Wiki. I was also thinking of using EmbeddedSolrServer - in what case would I be able to use it? Does my application and the Solr web app need to run into the same JVM for this to work? How would I use the EmbeddedSolrServer? Thanks, -vivek On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Vivek, do you share the same SolrServer instance between your two threads? If so, are you using the MultiThreadedHttpConnectionManager when creating the HttpClient instance? On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote: single thread everything works fine. Two threads are fine too for a while and all the sudden problem starts happening. I tried indexing using REST services as well (instead of Solrj), but with that too I get following error after a while, 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - indexData()- Failed to index java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) Note, I'm using simple lock type. I'd tried single type before that once caused index corruption so I switched to simple. Thanks, -vivek 2009/4/8 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: do you see the same problem when you use a single thread? what is the version of SolrJ that you use? On Wed, Apr 8, 2009 at 1:19 PM, vivek sar vivex...@gmail.com wrote: Hi, Any ideas on this issue? I ran into this again - once it starts happening it keeps happening. One of the thread keeps failing. Here are my SolrServer settings, int socketTO = 0; int connectionTO = 100; int maxConnectionPerHost = 10; int maxTotalConnection = 50; boolean followRedirects = false; boolean allowCompression = true; int maxRetries = 1; Note, I'm using two threads to simultaneously write to the same index. org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) Thanks, -vivek On Sat, Apr 4, 2009 at 1:07 AM, vivek sar vivex...@gmail.com wrote: Hi, I'm sending 15K records at once using Solrj (server.addBeans(...)) and have two threads writing to same index. One thread goes fine, but the second thread always fails with, org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at
Re: Searching on mulit-core Solr
Hi, I've gone through the mailing archive and have read contradicting remarks on this issue. Can someone please clear this up as I'm not able to run distributed search on multi-cores. Is there any document on how can I search across multicore which share the same schema. Here are the various comments I've read on this mailing list, 1) http://www.nabble.com/multi-core-vs-multi-app-td15803781.html#a15803781 Don't think you can search against multiple cores automatically - i.e. got to make multiple queries, one for each core and combine results yourself. Yes, this will slow things down. - Otis 2) http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-td20356173.html#a20356173 The idea behind multicore is that you will use them if you have completely different type of documents (basically multiple schemas). - Shalin 3) http://www.nabble.com/Distributed-search-td22036229.html#a22036229 That should work, yes, though it may not be a wise thing to do performance-wise, if the number of CPU cores that solr server has is lower than the number of Solr cores. - Otis My only motivation behind using multi-core is to keep the index size in limit. All my cores are using the same schema. My index grow to over 30G within a day and I need to keep up to a year of data. I couldn't find any other way of scaling using Solr. I've noticed once the index grows above 10G the index process starts slowing down, the commit takes much longer and optimize is hard to finish. So, I'm trying to create a new core after every 10 million documents (equals to 10G in my case). I don't want to start new Solr instance every 10G - that won't scale for a year time. I'm going to use 3-4 servers to hold all these cores. Now if someone could please tell me if this is a wrong scaling architecture I could re-think. I want fast indexing at the same time fast enough search. If I've to search on each core separately and merge myself the search performance is going to be awful. Is Solr the right tool for managing billions of records (I can get up to 100million records every day - with 1Kb per record - 100GB of index a day)? Most of the field values are pretty distinct (like 10 million email addresses) so the index size would be huge too. I would think it's a common problem to scale huge size index keeping both indexing and search time acceptable. I'm not sure if this can be managed on just 4 servers - we don't have 100s of boxes for this project. Any other tool that might be more appropriate for this kind of case - like Katta or Lucene on Hadoop, or simply use Lucene using Parallel Search and partition the indexes on size? Thanks, -vivek On Wed, Apr 8, 2009 at 11:07 AM, vivek sar vivex...@gmail.com wrote: Any help on this issue? Would distributed search on multi-core on same Solr instance even work? Does it has to be different Solr instances altogether (separate shards)? I'm kind of stuck at this point right now. Keep getting one of the two errors (when running distributed search - single searches work fine) as mentioned in this thread earlier. Thanks, -vivek On Wed, Apr 8, 2009 at 1:57 AM, vivek sar vivex...@gmail.com wrote: Thanks Fergus. I'm still having problem with multicore search. I tried the following with two cores (they both share the same schema and solrconfig.xml) on the same box on same solr instance, 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the cores in admin interface 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in xml 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, gives me top 10 records 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, gives me top 10 records 5) http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3indent=trueq=japan - this FAILS. I've seen two problems with this. a) When index are being committed I see, SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at
Re: httpclient.ProtocolException using Solrj
how many documents are you inserting ? may be you can create multiple instances of CommonshttpSolrServer and upload in parallel On Thu, Apr 9, 2009 at 11:58 AM, vivek sar vivex...@gmail.com wrote: Thanks Shalin and Paul. I'm not using MultipartRequest. I do share the same SolrServer between two threads. I'm not using MultiThreadedHttpConnectionManager. I'm simply using CommonsHttpSolrServer to create the SolrServer. I've also tried StreamingUpdateSolrServer, which works much faster, but does throws connection reset exception once in a while. Do I need to use MultiThreadedHttpConnectionManager? I couldn't find anything on it on Wiki. I was also thinking of using EmbeddedSolrServer - in what case would I be able to use it? Does my application and the Solr web app need to run into the same JVM for this to work? How would I use the EmbeddedSolrServer? Thanks, -vivek On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Vivek, do you share the same SolrServer instance between your two threads? If so, are you using the MultiThreadedHttpConnectionManager when creating the HttpClient instance? On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote: single thread everything works fine. Two threads are fine too for a while and all the sudden problem starts happening. I tried indexing using REST services as well (instead of Solrj), but with that too I get following error after a while, 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - indexData()- Failed to index java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) Note, I'm using simple lock type. I'd tried single type before that once caused index corruption so I switched to simple. Thanks, -vivek 2009/4/8 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: do you see the same problem when you use a single thread? what is the version of SolrJ that you use? On Wed, Apr 8, 2009 at 1:19 PM, vivek sar vivex...@gmail.com wrote: Hi, Any ideas on this issue? I ran into this again - once it starts happening it keeps happening. One of the thread keeps failing. Here are my SolrServer settings, int socketTO = 0; int connectionTO = 100; int maxConnectionPerHost = 10; int maxTotalConnection = 50; boolean followRedirects = false; boolean allowCompression = true; int maxRetries = 1; Note, I'm using two threads to simultaneously write to the same index. org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) Thanks, -vivek On Sat, Apr 4, 2009 at 1:07 AM, vivek sar vivex...@gmail.com wrote: Hi, I'm sending 15K records at once using Solrj (server.addBeans(...)) and have two threads writing to same index. One thread goes fine, but the second thread always fails with, org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at
Re: solr 1.4 memory jvm
Hi Noble, Yes exactly that, I would like to know how people do during a replication ? Do they turn off servers and put a high autowarmCount which turn off the slave for a while like for my case, 10mn to bring back the new index and then autowarmCount maybe 10 minutes more. Otherwise I tried to put large number of mergefactor but I guess I've too much update every 30mn something like 2000docs and almost all segment are modified. What would you reckon? :( :) Thanks a lot Noble Noble Paul നോബിള് नोब्ळ् wrote: So what I decipher from the numbers is w/o queries Solr replication is not performing too badly. The queries are inherently slow and you wish to optimize the query performance itself. am I correct? On Tue, Apr 7, 2009 at 7:50 PM, sunnyfr johanna...@gmail.com wrote: Hi, So I did two test on two servers; First server : with just replication every 20mn like you can notice: http://www.nabble.com/file/p22930179/cpu_without_request.png cpu_without_request.png http://www.nabble.com/file/p22930179/cpu2_without_request.jpg cpu2_without_request.jpg Second server : with one first replication and a second one during query test: between 15:32pm and 15h41 during replication (checked on .../admin/replication/index.jsp) my respond time query at the end was around 5000msec after the replication I guess during commitment I couldn't get answer of my query for a long time, I refreshed my page few minutes after. http://www.nabble.com/file/p22930179/cpu_with_request.png cpu_with_request.png http://www.nabble.com/file/p22930179/cpu2_with_request.jpg cpu2_with_request.jpg Now without replication I kept going query on the second server, and I can't get better than 1000msec repond time and 11request/second. http://www.nabble.com/file/p22930179/cpu_.jpg cpu_.jpg This is my request : select?fl=idfq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_ready_web:1json.nl=mapwt=jsonstart=0version=1.2bq=status_official:1^1.5+OR+status_creative:1^1+OR+language:en^0.5bf=recip(rord(created),1,10,10)^3+pow(stat_views,0.1)^15+pow(stat_comments,0.1)^15rows=100qt=dismaxqf=title_en^0.8+title^0.2+description_en^0.3+description^0.2+tags^1+owner_login^0.5 Do you have advice ? Thanks Noble -- View this message in context: http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22930179.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22966630.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 facet boost field according to another field
Do you have an idea ? sunnyfr wrote: Hi, I've title description and tag field ... According to where I find the word searched, I would like to boost differently other field like nb_views or rating. if word is find in title then nb_views^10 and rating^10 if word is find in description then nb_views^2 and rating^2 Thanks a lot for your help, -- View this message in context: http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p2294.html Sent from the Solr - User mailing list archive at Nabble.com.
different scoring for different types of found documents
Hi, We have a quite complex requirement concerning scoring logic customization, but but I guess it's quite useful and probably something like it was done already. So we're searching through the product catalog. Product have types (i.e. Electronics, Apparel, Furniture etc). What we need is to customize scoring of the results so that top results should contain products of all different types which match the query. So after finding all the products matching the query we want to group results by product type. Then for every product type take corresponding sub-set of results and in every of the sub-sets assign scores with the following logic. Assign score 5 to the first 20% of results, then assign score 4 to the next 15% of results, and so on. Particular percent values are configured by the end user. How could we achive it using Solr? Is it possible at all? Maybe we should implement some custom ValueSource and use it in a function queries? -- Andrew Klochkov
Re: Its urgent! plz help in schema.xml- appending one field to another
On Apr 8, 2009, at 9:50 PM, Udaya wrote: Hi, Need your help, I would like to know how we could append or add one field value to another field in Scheme.xml My scheme is as follows (only the field part is given): Scheme.xml fields field name=topics_id type=integer indexed=true stored=true required=true / field name=topics_subject type=text indexed=true stored=true required=true/ field name=post_text type=text indexed=true stored=true multiValued=true/ field name=url type=string stored=true default=http://comp.com/portals/ForumWindow? action=1v=tp=topics_id#topics_id / field name=all_text type=text indexed=true stored=true multiValued=true/ Here for the field with name topics_id we get id from a table. I what his topics_id value to be appended into the default value attribute of the field with name url. For eg: Suppose if we get topics_id value as 512 during a search then the value of the url should be appended as http://comp.com/portals/JBossForumWindow?action=1v=tp=512#512 Is this possible, plz give me some suggestions. If you're using DIH to index your table, you could aggregate using the template transformer during indexing. If you're indexing a different way, why not let the searching client (UI) do the aggregation of an id into a URL? Erik
Re: Searching on mulit-core Solr
On Apr 9, 2009, at 3:00 AM, vivek sar wrote: Can someone please clear this up as I'm not able to run distributed search on multi-cores. What error or problem are you encountering when trying this? How are you trying it? Erik
Re: solr 1.4 facet boost field according to another field
I don't think conditional boosting is possible. You can boost the same field on which the match was found. But you cannot boost a different field. On Thu, Apr 9, 2009 at 2:05 PM, sunnyfr johanna...@gmail.com wrote: Do you have an idea ? sunnyfr wrote: Hi, I've title description and tag field ... According to where I find the word searched, I would like to boost differently other field like nb_views or rating. if word is find in title then nb_views^10 and rating^10 if word is find in description then nb_views^2 and rating^2 Thanks a lot for your help, -- View this message in context: http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p2294.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: different scoring for different types of found documents
On Thu, Apr 9, 2009 at 2:17 PM, Andrey Klochkov akloch...@griddynamics.comwrote: So we're searching through the product catalog. Product have types (i.e. Electronics, Apparel, Furniture etc). What we need is to customize scoring of the results so that top results should contain products of all different types which match the query. So after finding all the products matching the query we want to group results by product type. This is something similar to Field Collapsing. It is not committed to trunk but there are a few patches. https://issues.apache.org/jira/browse/SOLR-236 Then for every product type take corresponding sub-set of results and in every of the sub-sets assign scores with the following logic. Assign score 5 to the first 20% of results, then assign score 4 to the next 15% of results, and so on. Particular percent values are configured by the end user. How could we achive it using Solr? Is it possible at all? Maybe we should implement some custom ValueSource and use it in a function queries? Such kind of scoring is not possible out of the box. You need to assign scores according to where the document lies in the final list of results (after all filters are applied), therefore you may not be able to operate on the DocList directly or in the value source. I *think* a good place to start looking would be the QueryValueSource in trunk as it has access to the scorer. But I do not know much about these things. -- Regards, Shalin Shekhar Mangar.
Re: Searching on mulit-core Solr
Any help on this issue? Would distributed search on multi-core on same Solr instance even work? Does it has to be different Solr instances altogether (separate shards)? As best I can tell this works fine for me. Multiple cores on the one machine. Very different schema and solrconfig.xml for each of the cores. Distributed searching using shards works fine. But I am using the trunk version. Perhaps you should post your solr.xml file. I'm kind of stuck at this point right now. Keep getting one of the two errors (when running distributed search - single searches work fine) as mentioned in this thread earlier. Thanks, -vivek On Wed, Apr 8, 2009 at 1:57 AM, vivek sar vivex...@gmail.com wrote: Thanks Fergus. I'm still having problem with multicore search. I tried the following with two cores (they both share the same schema and solrconfig.xml) on the same box on same solr instance, 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the cores in admin interface 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in xml 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, gives me top 10 records 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, gives me top 10 records 5) http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3indent=trueq=japan - this FAILS. I've seen two problems with this. a) When index are being committed I see, SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) b) Other times I see this, SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
Multi-language support
Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. Can we say then that solr supports all the above languages .Will search be same across all the above cases? thanks revas
Re: Using constants with DataImportHandler and MySQL ?
Here´s the solution: entity name=ci_project query=select pr_id, pr_name, pr_comment, 'dataci_project' from ci_project WHERE pr_id = 1 field column=dataci_project name=definition / /entity just insert a dummy sql field 'dataci_project' in your select statement. Glen Newton wrote: In MySql at least, you can do achieve what I think you want by manipulating the SQL, like this: mysql select foo as Constant1, id from Article limit 10; select foo as Constant1, id from Article limit 10; +---++ | Constant1 | id | +---++ | foo | 1 | | foo | 2 | | foo | 3 | | foo | 4 | | foo | 5 | | foo | 6 | | foo | 7 | | foo | 8 | | foo | 9 | | foo | 10 | +---++ 10 rows in set (0.00 sec) mysql select 435 as Constant2, id from Article limit 10; select 435 as Constant2, id from Article limit 10; +---++ | Constant2 | id | +---++ | 435 | 1 | | 435 | 2 | | 435 | 3 | | 435 | 4 | | 435 | 5 | | 435 | 6 | | 435 | 7 | | 435 | 8 | | 435 | 9 | | 435 | 10 | +---++ 10 rows in set (0.00 sec) mysql 2009/4/8 Shalin Shekhar Mangar shalinman...@gmail.com: On Wed, Apr 8, 2009 at 10:23 PM, gateway0 reiterwo...@yahoo.de wrote: The problem as you see is the line: field name=definitionProjects/field I want to set a constant value for every row in the SQL table but it doesn´t work that way, any ideas? That is not a valid syntax. There are two ways to do this: 1. In your schema.xml provide the 'default' attribute 2. Use TemplateTransformer - see http://wiki.apache.org/solr/DataImportHandlerFaq -- Regards, Shalin Shekhar Mangar. -- - -- View this message in context: http://www.nabble.com/Using-constants-with-DataImportHandler-and-MySQL---tp22954954p22969123.html Sent from the Solr - User mailing list archive at Nabble.com.
Analyzers and stemmer
Hi , With respect to language support in solr ,we have analyzers for some languages and stemmers for certain langauges.Do we say that solr supports this particular language only if we have both analyzer and stemmer for the language or also for which we have analyzer but not stemmer Regards Sujatha
Dataimporthandler + MySQL = Datetime offset by 2 hours ?
Hi, im fetching entries from my mysql database and index them with the Dataimporthandler: MySQL Table entry: (for example) pr_timedate : 2009-04-14 11:00:00 entry in data-config.xml to index the mysql field: field column=pr_timedate name=completion dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' / result in solr index: date2009-04-14T09:00:00Z/date:confused: it says 09:00:00 instead of 11:00:00 as it supposed to. I´ve searched for hours already, why is that? best wishes, Sebastian -- View this message in context: http://www.nabble.com/Dataimporthandler-%2B-MySQL-%3D-Datetime-offset-by-2-hours---tp22970250p22970250.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dataimporthandler + MySQL = Datetime offset by 2 hours ?
On Thu, Apr 9, 2009 at 6:18 PM, gateway0 reiterwo...@yahoo.de wrote: Hi, im fetching entries from my mysql database and index them with the Dataimporthandler: MySQL Table entry: (for example) pr_timedate : 2009-04-14 11:00:00 entry in data-config.xml to index the mysql field: field column=pr_timedate name=completion dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' / result in solr index: date2009-04-14T09:00:00Z/date:confused: it says 09:00:00 instead of 11:00:00 as it supposed to. I´ve searched for hours already, why is that? I think that may be because date/time in Solr is supposed to be in UTC. See the note on DateField in the schema.xml -- Regards, Shalin Shekhar Mangar.
Access HTTP headers from custom request handler
Hello all, we are writing a custom request handler and we need to implement some business logic according to some HTTP headers. I see there is no easy way to access HTTP headers from the request handler. Moreover it seems to me that the HTTPServletness is lost way before the custom request handler comes in the game. Is there any way to access HTTP headers from within the request handler? Thanks, Giovanni
Re: Snapinstaller vs Solr Restart
Hi Otis, Ok about that, but still when it merges segments it changes names and I've no choice to replicate all the segment which is bad for the replication and cpu. ?? Thanks Otis Gospodnetic wrote: Lower your mergeFactor and Lucene will merge segments(i.e. fewer index files) and purge deletes more often for you at the expense of somewhat slower indexing. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: wojtekpia wojte...@hotmail.com To: solr-user@lucene.apache.org Sent: Tuesday, January 6, 2009 5:18:26 PM Subject: Re: Snapinstaller vs Solr Restart I'm optimizing because I thought I should. I'll be updating my index somewhere between every 15 minutes, and every 2 hours. That means between 12 and 96 updates per day. That seems like a lot of index files (and it scared me a little), so that's my second reason for wanting to optimize nightly. I haven't benchmarked the performance hit for not optimizing. That'll be my next step. If the hit isn't too bad, I'll look into optimizing less frequently (weekly, ...). Thanks Otis! Otis Gospodnetic wrote: OK, so that question/answer seems to have hit the nail on the head. :) When you optimize your index, all index files get rewritten. This means that everything that the OS cached up to that point goes out the window and the OS has to slowly re-cache the hot parts of the index. If you don't optimize, this won't happen. Do you really need to optimize? Or maybe a more direct question: why are you optimizing? Regarding autowarming, with such high fq hit rate, I'd make good use of fq autowarming. The result cache rate is lower, but still decent. I wouldn't turn off autowarming the way you have. -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21320334.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p22972780.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any tips for indexing large amounts of data?
Hi Otis, How did you manage that? I've 8 core machine with 8GB of ram and 11GB index for 14M docs and 5 update every 30mn but my replication kill everything. My segments are merged too often sor full index replicate and cache lost and I've no idea what can I do now? Some help would be brilliant, btw im using Solr 1.4. Thanks, Otis Gospodnetic wrote: Mike is right about the occasional slow-down, which appears as a pause and is due to large Lucene index segment merging. This should go away with newer versions of Lucene where this is happening in the background. That said, we just indexed about 20MM documents on a single 8-core machine with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took a little less than 10 hours - that's over 550 docs/second. The vanilla approach before some of our changes apparently required several days to index the same amount of data. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mike Klaas mike.kl...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, November 19, 2007 5:50:19 PM Subject: Re: Any tips for indexing large amounts of data? There should be some slowdown in larger indices as occasionally large segment merge operations must occur. However, this shouldn't really affect overall speed too much. You haven't really given us enough data to tell you anything useful. I would recommend trying to do the indexing via a webapp to eliminate all your code as a possible factor. Then, look for signs to what is happening when indexing slows. For instance, is Solr high in cpu, is the computer thrashing, etc? -Mike On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: Hi, Thanks for answering this question a while back. I have made some of the suggestions you mentioned. ie not committing until I've finished indexing. What I am seeing though, is as the index get larger (around 1Gb), indexing is taking a lot longer. In fact it slows down to a crawl. Have you got any pointers as to what I might be doing wrong? Also, I was looking at using MultiCore solr. Could this help in some way? Thank you Brendan On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: : I would think you would see better performance by allowing auto commit : to handle the commit size instead of reopening the connection all the : time. if your goal is fast indexing, don't use autoCommit at all ... just index everything, and don't commit until you are completely done. autoCommitting will slow your indexing down (the benefit being that more results will be visible to searchers as you proceed) -Hoss -- View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any tips for indexing large amounts of data?
For Solr / Lucene: - use -XX:+AggressiveOpts - If available, huge pages can help. See http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html I haven't yet followed-up with my Lucene performance numbers using huge pages: it is 10-15% for large indexing jobs. For Lucene: - multi-thread using java.util.concurrent.ThreadPoolExecutor (http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html 6.4 million full-text article + metadata indexed resulting in 83GB index; these are old number: things are down to ~10hours now) - while multithreading on multicore is particularly good, it also improves performance on single core, for small (6 YMMV) numbers of threads good I/O (test for your particular configuration) - Use multiple indexes merge at the end - As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf use separate ThreadPoolExecutor per index in previous, reducing queue contention. This is giving me an additional ~10%. I will blog about this in the near future... -glen 2009/4/9 sunnyfr johanna...@gmail.com: Hi Otis, How did you manage that? I've 8 core machine with 8GB of ram and 11GB index for 14M docs and 5 update every 30mn but my replication kill everything. My segments are merged too often sor full index replicate and cache lost and I've no idea what can I do now? Some help would be brilliant, btw im using Solr 1.4. Thanks, Otis Gospodnetic wrote: Mike is right about the occasional slow-down, which appears as a pause and is due to large Lucene index segment merging. This should go away with newer versions of Lucene where this is happening in the background. That said, we just indexed about 20MM documents on a single 8-core machine with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took a little less than 10 hours - that's over 550 docs/second. The vanilla approach before some of our changes apparently required several days to index the same amount of data. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mike Klaas mike.kl...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, November 19, 2007 5:50:19 PM Subject: Re: Any tips for indexing large amounts of data? There should be some slowdown in larger indices as occasionally large segment merge operations must occur. However, this shouldn't really affect overall speed too much. You haven't really given us enough data to tell you anything useful. I would recommend trying to do the indexing via a webapp to eliminate all your code as a possible factor. Then, look for signs to what is happening when indexing slows. For instance, is Solr high in cpu, is the computer thrashing, etc? -Mike On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: Hi, Thanks for answering this question a while back. I have made some of the suggestions you mentioned. ie not committing until I've finished indexing. What I am seeing though, is as the index get larger (around 1Gb), indexing is taking a lot longer. In fact it slows down to a crawl. Have you got any pointers as to what I might be doing wrong? Also, I was looking at using MultiCore solr. Could this help in some way? Thank you Brendan On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: : I would think you would see better performance by allowing auto commit : to handle the commit size instead of reopening the connection all the : time. if your goal is fast indexing, don't use autoCommit at all ... just index everything, and don't commit until you are completely done. autoCommitting will slow your indexing down (the benefit being that more results will be visible to searchers as you proceed) -Hoss -- View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html Sent from the Solr - User mailing list archive at Nabble.com. -- -
Re: Any tips for indexing large amounts of data?
- As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf Sorry, the presentation covers a lot of ground: see slide #20: Standard thread pools can have high contention for task queue and other data structures when used with fine-grained tasks [I haven't yet implemented work stealing] -glen 2009/4/9 Glen Newton glen.new...@gmail.com: For Solr / Lucene: - use -XX:+AggressiveOpts - If available, huge pages can help. See http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html I haven't yet followed-up with my Lucene performance numbers using huge pages: it is 10-15% for large indexing jobs. For Lucene: - multi-thread using java.util.concurrent.ThreadPoolExecutor (http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html 6.4 million full-text article + metadata indexed resulting in 83GB index; these are old number: things are down to ~10hours now) - while multithreading on multicore is particularly good, it also improves performance on single core, for small (6 YMMV) numbers of threads good I/O (test for your particular configuration) - Use multiple indexes merge at the end - As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf use separate ThreadPoolExecutor per index in previous, reducing queue contention. This is giving me an additional ~10%. I will blog about this in the near future... -glen 2009/4/9 sunnyfr johanna...@gmail.com: Hi Otis, How did you manage that? I've 8 core machine with 8GB of ram and 11GB index for 14M docs and 5 update every 30mn but my replication kill everything. My segments are merged too often sor full index replicate and cache lost and I've no idea what can I do now? Some help would be brilliant, btw im using Solr 1.4. Thanks, Otis Gospodnetic wrote: Mike is right about the occasional slow-down, which appears as a pause and is due to large Lucene index segment merging. This should go away with newer versions of Lucene where this is happening in the background. That said, we just indexed about 20MM documents on a single 8-core machine with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took a little less than 10 hours - that's over 550 docs/second. The vanilla approach before some of our changes apparently required several days to index the same amount of data. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mike Klaas mike.kl...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, November 19, 2007 5:50:19 PM Subject: Re: Any tips for indexing large amounts of data? There should be some slowdown in larger indices as occasionally large segment merge operations must occur. However, this shouldn't really affect overall speed too much. You haven't really given us enough data to tell you anything useful. I would recommend trying to do the indexing via a webapp to eliminate all your code as a possible factor. Then, look for signs to what is happening when indexing slows. For instance, is Solr high in cpu, is the computer thrashing, etc? -Mike On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: Hi, Thanks for answering this question a while back. I have made some of the suggestions you mentioned. ie not committing until I've finished indexing. What I am seeing though, is as the index get larger (around 1Gb), indexing is taking a lot longer. In fact it slows down to a crawl. Have you got any pointers as to what I might be doing wrong? Also, I was looking at using MultiCore solr. Could this help in some way? Thank you Brendan On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: : I would think you would see better performance by allowing auto commit : to handle the commit size instead of reopening the connection all the : time. if your goal is fast indexing, don't use autoCommit at all ... just index everything, and don't commit until you are completely done. autoCommitting will slow your indexing down (the benefit being that more results will be visible to searchers as you proceed) -Hoss -- View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html Sent from the Solr - User mailing list archive at Nabble.com. -- - -- -
Custom DIH: FileDataSource with additional business logic?
Hello, here I am with another question. I am using DIH to index a DB. Additionally I also have to index some files containing Java serialized objects (and I cannot change this... :-( ). I currently have implemented a standalone Java app with the following features: 1) read all files from a given folder 2) deserialize the files into lists of items 3) convert the list of items into lists of SolrInputDocument(s) 4) post the lists of SolrInputDocument(s) to Solr All this is done using SolrJ. So far so good. I would like to use a DIH with a FileDataSource to do 1) and 4), and I would like to squeeze in my implementation for 2) and 3). Is this possible? Any hint? Thank you all in advance. Cheers, Giovanni
Re: Searching on mulit-core Solr
Erik, Here is what I'd posted in this thread earlier, I tried the following with two cores (they both share the same schema and solrconfig.xml) on the same box on same solr instance, 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the cores in admin interface 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in xml 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, gives me top 10 records 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, gives me top 10 records 5) http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3indent=trueq=japan - this FAILS. I've seen two problems with this. a) This is the error most of the times, SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) b) When index are being committed I see this during search, SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Any tips on how can I search on multicore on same solr instance? Thanks, -vivek On Thu, Apr 9, 2009 at 2:56 AM, Erik Hatcher e...@ehatchersolutions.com wrote: On Apr 9, 2009, at 3:00 AM, vivek sar wrote: Can someone please clear this up as I'm not able to run distributed search on multi-cores. What error or problem are you encountering when trying this? How are you trying it? Erik
Re: Searching on mulit-core Solr
Attached is the solr.xml - note, the schema and solrconfig are located in the core0 and all other cores point to the same core0 instance for schema. Searches on individual cores work fine so I'm using the solr.xml is correct - I also get their status correctly. From the NullPointerException it seems it fails at, for (int i=resultSize-1; i=0; i--) { ShardDoc shardDoc = (ShardDoc)queue.pop(); shardDoc.positionInResponse = i; // Need the toString() for correlation with other lists that must // be strings (like keys in highlighting, explain, etc) resultIds.put(shardDoc.id.toString(), shardDoc); } I've a unique field (required) in my documents so I'm not sure whether that can be null - could doc itself be null - how? Same search on the same cores individually works fine. Not sure if there is a way to debug this. I'm not sure on when would I get Connection reset exception - would it be if indexing is happening at the same time at hight rate - would that cause problems? Thanks, -vivek On Thu, Apr 9, 2009 at 4:07 AM, Fergus McMenemie fer...@twig.me.uk wrote: Any help on this issue? Would distributed search on multi-core on same Solr instance even work? Does it has to be different Solr instances altogether (separate shards)? As best I can tell this works fine for me. Multiple cores on the one machine. Very different schema and solrconfig.xml for each of the cores. Distributed searching using shards works fine. But I am using the trunk version. Perhaps you should post your solr.xml file. I'm kind of stuck at this point right now. Keep getting one of the two errors (when running distributed search - single searches work fine) as mentioned in this thread earlier. Thanks, -vivek On Wed, Apr 8, 2009 at 1:57 AM, vivek sar vivex...@gmail.com wrote: Thanks Fergus. I'm still having problem with multicore search. I tried the following with two cores (they both share the same schema and solrconfig.xml) on the same box on same solr instance, 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the cores in admin interface 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in xml 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, gives me top 10 records 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, gives me top 10 records 5) http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3indent=trueq=japan - this FAILS. I've seen two problems with this. a) When index are being committed I see, SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) b) Other times I see this, SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at
Re: httpclient.ProtocolException using Solrj
I'm inserting 10K in a batch (using addBeans method). I read somewhere in the wiki that it's better to use the same instance of SolrServer for better performance. Would MultiThreadedConnectionManager help? How do I use it? I also wanted to know how can use EmbeddedSolrServer - does my app needs to be running in the same jvm with Solr webapp? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: how many documents are you inserting ? may be you can create multiple instances of CommonshttpSolrServer and upload in parallel On Thu, Apr 9, 2009 at 11:58 AM, vivek sar vivex...@gmail.com wrote: Thanks Shalin and Paul. I'm not using MultipartRequest. I do share the same SolrServer between two threads. I'm not using MultiThreadedHttpConnectionManager. I'm simply using CommonsHttpSolrServer to create the SolrServer. I've also tried StreamingUpdateSolrServer, which works much faster, but does throws connection reset exception once in a while. Do I need to use MultiThreadedHttpConnectionManager? I couldn't find anything on it on Wiki. I was also thinking of using EmbeddedSolrServer - in what case would I be able to use it? Does my application and the Solr web app need to run into the same JVM for this to work? How would I use the EmbeddedSolrServer? Thanks, -vivek On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Vivek, do you share the same SolrServer instance between your two threads? If so, are you using the MultiThreadedHttpConnectionManager when creating the HttpClient instance? On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote: single thread everything works fine. Two threads are fine too for a while and all the sudden problem starts happening. I tried indexing using REST services as well (instead of Solrj), but with that too I get following error after a while, 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - indexData()- Failed to index java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) Note, I'm using simple lock type. I'd tried single type before that once caused index corruption so I switched to simple. Thanks, -vivek 2009/4/8 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: do you see the same problem when you use a single thread? what is the version of SolrJ that you use? On Wed, Apr 8, 2009 at 1:19 PM, vivek sar vivex...@gmail.com wrote: Hi, Any ideas on this issue? I ran into this again - once it starts happening it keeps happening. One of the thread keeps failing. Here are my SolrServer settings, int socketTO = 0; int connectionTO = 100; int maxConnectionPerHost = 10; int maxTotalConnection = 50; boolean followRedirects = false; boolean allowCompression = true; int maxRetries = 1; Note, I'm using two threads to simultaneously write to the same index. org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) Thanks, -vivek On Sat, Apr 4, 2009 at 1:07 AM, vivek sar vivex...@gmail.com wrote: Hi, I'm sending 15K records at once using Solrj (server.addBeans(...)) and have two threads writing to same index. One thread goes fine, but the second
Re: Custom DIH: FileDataSource with additional business logic?
FileDataSource is of type Reader . means getData() returns ajava.io.Reader.That is not very suitable for you. your best bet is to write a simple DataSource which returns an IteratorMapString,Object after reading the serialized Objects .This is what JdbcdataSource does. Then you can use it with SqlEntityProcessor On Thu, Apr 9, 2009 at 9:42 PM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello, here I am with another question. I am using DIH to index a DB. Additionally I also have to index some files containing Java serialized objects (and I cannot change this... :-( ). I currently have implemented a standalone Java app with the following features: 1) read all files from a given folder 2) deserialize the files into lists of items 3) convert the list of items into lists of SolrInputDocument(s) 4) post the lists of SolrInputDocument(s) to Solr All this is done using SolrJ. So far so good. I would like to use a DIH with a FileDataSource to do 1) and 4), and I would like to squeeze in my implementation for 2) and 3). Is this possible? Any hint? Thank you all in advance. Cheers, Giovanni -- --Noble Paul
Re: httpclient.ProtocolException using Solrj
using a single request is the fatest http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65 I could index at the rate of 10,000 docs/sec using this and BinaryRequestWriter On Thu, Apr 9, 2009 at 10:36 PM, vivek sar vivex...@gmail.com wrote: I'm inserting 10K in a batch (using addBeans method). I read somewhere in the wiki that it's better to use the same instance of SolrServer for better performance. Would MultiThreadedConnectionManager help? How do I use it? I also wanted to know how can use EmbeddedSolrServer - does my app needs to be running in the same jvm with Solr webapp? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: how many documents are you inserting ? may be you can create multiple instances of CommonshttpSolrServer and upload in parallel On Thu, Apr 9, 2009 at 11:58 AM, vivek sar vivex...@gmail.com wrote: Thanks Shalin and Paul. I'm not using MultipartRequest. I do share the same SolrServer between two threads. I'm not using MultiThreadedHttpConnectionManager. I'm simply using CommonsHttpSolrServer to create the SolrServer. I've also tried StreamingUpdateSolrServer, which works much faster, but does throws connection reset exception once in a while. Do I need to use MultiThreadedHttpConnectionManager? I couldn't find anything on it on Wiki. I was also thinking of using EmbeddedSolrServer - in what case would I be able to use it? Does my application and the Solr web app need to run into the same JVM for this to work? How would I use the EmbeddedSolrServer? Thanks, -vivek On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Vivek, do you share the same SolrServer instance between your two threads? If so, are you using the MultiThreadedHttpConnectionManager when creating the HttpClient instance? On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote: single thread everything works fine. Two threads are fine too for a while and all the sudden problem starts happening. I tried indexing using REST services as well (instead of Solrj), but with that too I get following error after a while, 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - indexData()- Failed to index java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) Note, I'm using simple lock type. I'd tried single type before that once caused index corruption so I switched to simple. Thanks, -vivek 2009/4/8 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: do you see the same problem when you use a single thread? what is the version of SolrJ that you use? On Wed, Apr 8, 2009 at 1:19 PM, vivek sar vivex...@gmail.com wrote: Hi, Any ideas on this issue? I ran into this again - once it starts happening it keeps happening. One of the thread keeps failing. Here are my SolrServer settings, int socketTO = 0; int connectionTO = 100; int maxConnectionPerHost = 10; int maxTotalConnection = 50; boolean followRedirects = false; boolean allowCompression = true; int maxRetries = 1; Note, I'm using two threads to simultaneously write to the same index. org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at
Re: httpclient.ProtocolException using Solrj
On Thu, Apr 9, 2009 at 10:36 PM, vivek sar vivex...@gmail.com wrote: I'm inserting 10K in a batch (using addBeans method). I read somewhere in the wiki that it's better to use the same instance of SolrServer for better performance. Would MultiThreadedConnectionManager help? How do I use it? If you are not passing your own HttpClient to the CommonsHttpSolrServer constructor then you do not need to worry about this. The default is the MultiThreadedConnectionManager. I also wanted to know how can use EmbeddedSolrServer - does my app needs to be running in the same jvm with Solr webapp? Actually with EmbeddedSolrServer, there is no Solr webapp. You add it as another jar in your own webapp. -- Regards, Shalin Shekhar Mangar.
Re: Any tips for indexing large amounts of data?
On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr johanna...@gmail.com wrote: Hi Otis, How did you manage that? I've 8 core machine with 8GB of ram and 11GB index for 14M docs and 5 update every 30mn but my replication kill everything. My segments are merged too often sor full index replicate and cache lost and I've no idea what can I do now? Some help would be brilliant, btw im using Solr 1.4. sunnnyfr , whether the replication is full or delta , the caches are lost completely. you can think of partitioning the index into separate Solrs and updating one partition at a time and perform distributed search. Thanks, Otis Gospodnetic wrote: Mike is right about the occasional slow-down, which appears as a pause and is due to large Lucene index segment merging. This should go away with newer versions of Lucene where this is happening in the background. That said, we just indexed about 20MM documents on a single 8-core machine with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took a little less than 10 hours - that's over 550 docs/second. The vanilla approach before some of our changes apparently required several days to index the same amount of data. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mike Klaas mike.kl...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, November 19, 2007 5:50:19 PM Subject: Re: Any tips for indexing large amounts of data? There should be some slowdown in larger indices as occasionally large segment merge operations must occur. However, this shouldn't really affect overall speed too much. You haven't really given us enough data to tell you anything useful. I would recommend trying to do the indexing via a webapp to eliminate all your code as a possible factor. Then, look for signs to what is happening when indexing slows. For instance, is Solr high in cpu, is the computer thrashing, etc? -Mike On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: Hi, Thanks for answering this question a while back. I have made some of the suggestions you mentioned. ie not committing until I've finished indexing. What I am seeing though, is as the index get larger (around 1Gb), indexing is taking a lot longer. In fact it slows down to a crawl. Have you got any pointers as to what I might be doing wrong? Also, I was looking at using MultiCore solr. Could this help in some way? Thank you Brendan On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: : I would think you would see better performance by allowing auto commit : to handle the commit size instead of reopening the connection all the : time. if your goal is fast indexing, don't use autoCommit at all ... just index everything, and don't commit until you are completely done. autoCommitting will slow your indexing down (the benefit being that more results will be visible to searchers as you proceed) -Hoss -- View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Dictionary lookup possibilities
Hello, I'm struggling with some ideas, maybe somebody can help me with past experiences or tips. I have loaded a dictionary into a Solr index, using stemming and some stopwords in analysis part of the schema. Each record holds a term from the dictionary, which can consist of multiple words. For some data analysis work, I want to send pieces of text (sentences actually) to Solr to retrieve all possible dictionary terms that could occur. Ideally, I want to construct a query that only returns those Solr records for which all individual words in that record are matched. For instance, my dictionary holds the following terms: 1 - a b c d 2 - c d e 3 - a b 4 - a e f g h If I put the sentence [a b c d f g h] in as a query, I want to recieve dictionary items 1 (matching all words a b c d) and 3 (matching words a b) as matches I have been puzzling about how to do this. The only way I found so far was to construct an OR query with all words of the sentence in it. In this case, that would result in all dictionary items being returned. This would then require some code to go over the search results and analyse each of them (i.e. by using the highlight function) to kick out 'false' matches, but I am looking for a more efficient way. Is there a way to do this with Solr functionality, or do I need to start looking into the Lucene API ..? Any help would be much appreciated as usual! Thanks, bye, Jaco.
logging
We built our own webapp that used the Solr JARs. We used Apache Commons/log4j logging and just put log4j.properties in the Resin conf directory. The commons-logging and log4j jars were put in the Resin lib driectory. Everything worked great and we got log files for our code only. So, I upgraded to Solr 1.4 and I no longer get my log file. I assume it has something to do with Solr 1.4 using SL4J instead of JDK logging, but it seems like my code would be independent of that. Any ideas?
Re: httpclient.ProtocolException using Solrj
Here is what I'm doing, SolrServer server = new StreamingUpdateSolrServer(url, 1000,5); server.addBeans(dataList); //where dataList is Listsome_obj with 10K elements I run two threads each using the same server object and then each call server.addBeans(...). I'm able to get 50K/sec inserted using that, but the commit after that (after 100k records) takes 70sec - which messes up the avg time. There are two problems here, 1) Once in a while I get connection reset error, Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) Note: if I use CommonsHttpSolrServer I get the buffer error. 2) The commit takes way too long for every 100k (I may commit more often if this can not be improved) I'm trying to fix this error problem which happens only if I run two threads both calling addBeans (10k at a time). One thread work fine. I'm not sure how can I use the MultiThreadedConnectionManager to create StreamingUpdateSolrServer and if they would help? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: using a single request is the fatest http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65 I could index at the rate of 10,000 docs/sec using this and BinaryRequestWriter On Thu, Apr 9, 2009 at 10:36 PM, vivek sar vivex...@gmail.com wrote: I'm inserting 10K in a batch (using addBeans method). I read somewhere in the wiki that it's better to use the same instance of SolrServer for better performance. Would MultiThreadedConnectionManager help? How do I use it? I also wanted to know how can use EmbeddedSolrServer - does my app needs to be running in the same jvm with Solr webapp? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: how many documents are you inserting ? may be you can create multiple instances of CommonshttpSolrServer and upload in parallel On Thu, Apr 9, 2009 at 11:58 AM, vivek sar vivex...@gmail.com wrote: Thanks Shalin and Paul. I'm not using MultipartRequest. I do share the same SolrServer between two threads. I'm not using MultiThreadedHttpConnectionManager. I'm simply using CommonsHttpSolrServer to create the SolrServer. I've also tried StreamingUpdateSolrServer, which works much faster, but does throws connection reset exception once in a while. Do I need to use MultiThreadedHttpConnectionManager? I couldn't find anything on it on Wiki. I was also thinking of using EmbeddedSolrServer - in what case would I be able to use it? Does my application and the Solr web app need to run into the same JVM for this to work? How would I use the EmbeddedSolrServer? Thanks, -vivek On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Vivek, do you share the same SolrServer instance between your two threads? If so, are you using the MultiThreadedHttpConnectionManager when creating the HttpClient instance? On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote: single thread everything works fine. Two threads are fine too for a while and all the sudden problem starts happening. I tried indexing using REST services as well (instead of Solrj), but with that too I get following error after a while, 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - indexData()- Failed to index java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) Note, I'm using simple lock type. I'd tried single type before that
Re: How to get the solrhome location dynamically
: Subject: How to get the solrhome location dynamically Do you really want the Solr Home Dir, or do you want the instanceDir for a specific SolrCore? If you're using a solr.xml file (ie: one or many cores), you can get hte instanceDir for each core from the CoreAdminHandler -- but it doesn't expost the actual SolrHomeDir where the solr.xml file was found. If you aren't using a solr.xml file (ie: you definitely only have one core) you can get the instance dir from the SystemInfoRequestHandler (/admin/system in the example configs) ... and since you aren't using a solr.xml file, the instance dir is the same as the Solr Home Dir. (H... I suppose the CoreAdminHandler should probably expose metadta about the CoreContainer ... anyone want to work up a patch?) -Hoss
Question on Solr Distributed Search
Hi, I've another thread on multi-core distributed search, but just wanted to put a simple question here on distributed search to get some response. I've a search query, http://etsx19.co.com:8080/solr/20090409_9/select?q=usa - returns with 10 result now if I add shards parameter to it, http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9q=usa - this fails with org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at .. at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) .. Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) Attached is my solrconfig.xml. Do I need a special RequestHandler for sharding? I haven't been able to make any distributed search successfully. Any help is appreciated. Note: I'm indexing using Solrj - not sure if that makes any difference to the search part. Thanks, -vivek ?xml version=1.0 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- config !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. -- !-- dataDir./solr/data/dataDir -- indexDefaults !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFiletrue/useCompoundFile mergeFactor100/mergeFactor !-- maxBufferedDocs1/maxBufferedDocs -- ramBufferSizeMB64/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType /indexDefaults mainIndex !-- options specific to the main on-disk lucene index -- useCompoundFiletrue/useCompoundFile mergeFactor100/mergeFactor !-- maxBufferedDocs1000/maxBufferedDocs -- !-- Tell Lucene when to flush documents to disk. Giving Lucene more memory for indexing means faster indexing at the cost of more RAM If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. -- ramBufferSizeMB64/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength !-- If true, unlock any held write or commit locks on startup. This defeats the locking mechanism that allows multiple processes to safely access a lucene index, and should be used with care. -- unlockOnStartuptrue/unlockOnStartup lockTypesingle/lockType /mainIndex !-- the
Re: Querying for multi-word synonyms
: Unfortunately, I have to use SynonymFilter at query time due to the nature : of the data I'm indexing. At index time, all I have are keywords but at : query time I will have some semantic markup which allows me to expand into : synonyms. I am wondering if any progress has been made into making query : time synonym searching work correctly. If not, does anyone have some ideas : for alternatives to using SynonymFilter? The only thing I can think of is to : simply create a custom BooleanQuery for the search and feed the synonyms in : manually, but then I am missing out on all the functionality of the dismax : query parser. Any ideas are appreciated, thanks very much. Fundementally the problem with multi-word query time synonyms is that the Analyzer only has a limited mechanism of conveying structure back to the caller (ie: the QueryParser) ... that mechanism being the term position -- you can indicate that terms can occupy the same single position, but not that sequences of terms can occupy the same position. you could write a query parser that used nested SpanNearQueries to create a directed acyclic graph of terms that you want to match in a sequence, where some branches of the graph contain more nodes then others, but you would need to do the synonym recognition while building up the query (and working with the DAG) ... but the current SynonymFilter works as part of hte TokenStream. -Hoss
Re: Question on Solr Distributed Search
I think the reason behind the connection reset is. Looking at the code it points to QueryComponent.mergeIds() resultIds.put(shardDoc.id.toString(), shardDoc); looks like the doc unique id is returning null. I'm not sure how is it possible as its a required field. Right my unique id is not stored (only indexed) - does it has to be stored for distributed search? HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) On Thu, Apr 9, 2009 at 5:01 PM, vivek sar vivex...@gmail.com wrote: Hi, I've another thread on multi-core distributed search, but just wanted to put a simple question here on distributed search to get some response. I've a search query, http://etsx19.co.com:8080/solr/20090409_9/select?q=usa - returns with 10 result now if I add shards parameter to it, http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9q=usa - this fails with org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at .. at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) .. Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) Attached is my solrconfig.xml. Do I need a special RequestHandler for sharding? I haven't been able to make any distributed search successfully. Any help is appreciated. Note: I'm indexing using Solrj - not sure if that makes any difference to the search part. Thanks, -vivek
Re: Question on Solr Distributed Search
Just an update. I changed the schema to store the unique id field, but I still get the connection reset exception. I did notice that if there is no data in the core then it returns the 0 result (no exception), but if there is data and you search using shards parameter I get the connection reset exception. Can anyone provide some tip on where can I look for this problem? Apr 10, 2009 3:16:04 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:395) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) ... 1 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) On Thu, Apr 9, 2009 at 6:51 PM, vivek sar vivex...@gmail.com wrote: I think the reason behind the connection reset is. Looking at the code it points to QueryComponent.mergeIds() resultIds.put(shardDoc.id.toString(), shardDoc); looks like the doc unique id is returning null. I'm not sure how is it possible as its a required field. Right my unique id is not stored (only indexed) - does it has to be stored for distributed search? HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at
multiple tokenizers needed
I want to analyze a text based on pattern ; and separate on whitespace and it is a Japanese text so use CJKAnalyzer + tokenizer also. in short I want to do: analyzer class=org.apache.lucene.analysis.cjk.CJKAnalyzer tokenizer class=solr.PatternTokenizerFactory pattern=; / tokenizer class=solr.WhitespaceTokenizerFactory / tokenizer class=org.apache.lucene.analysis.cjk.CJKTokenizer / /analyzer Can anyone please tell me how to achieve this?? Because the above syntax is not at all possible. -- View this message in context: http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html Sent from the Solr - User mailing list archive at Nabble.com.