Re: sorlj search
Tevfik Kiziloren wrote: > > Hi. I'm a newbie. I need to develop a jsf based search application by > using solr. I found nothing about soljava imlementation except simple > example on the solr wiki. When I tried a console program that similar in > the example at solr wiki, I got the exception below. Where can i find an > extensive documentation about solrj? > > Thanks in advance. > Tevfik Kızılören. > > try { > String url = "http://localhost:8080/solr";; > SolrServer server = new CommonsHttpSolrServer(url); > > SolrQuery query = new SolrQuery(); > query.setQuery("solr"); > System.out.println(query.toString()); > QueryResponse rsp = server.query(query); > System.out.println(rsp.getResults().toString()); > > } catch (IOException ex) { > > Logger.getLogger(SolrclientView.class.getName()).log(Level.SEVERE, null, > ex); > } catch (SolrServerException ex) { > > Logger.getLogger(SolrclientView.class.getName()).log(Level.SEVERE, null, > ex); > } > > > --- > solrclient.SolrclientView jButton1ActionPerformed > SEVERE: null > org.apache.solr.client.solrj.SolrServerException: Error executing query > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) > at > org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:96) > at > solrclient.SolrclientView.jButton1ActionPerformed(SolrclientView.java:229) > at solrclient.SolrclientView.access$800(SolrclientView.java:32) > at > solrclient.SolrclientView$4.actionPerformed(SolrclientView.java:135) > at > javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995) > at > javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318) > at > javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387) > at > javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242) > at > javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:236) > at java.awt.Component.processMouseEvent(Component.java:6038) > at javax.swing.JComponent.processMouseEvent(JComponent.java:3265) > at java.awt.Component.processEvent(Component.java:5803) > at java.awt.Container.processEvent(Container.java:2058) > at java.awt.Component.dispatchEventImpl(Component.java:4410) > at java.awt.Container.dispatchEventImpl(Container.java:2116) > at java.awt.Component.dispatchEvent(Component.java:4240) > at > java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4322) > at > java.awt.LightweightDispatcher.processMouseEvent(Container.java:3986) > at > java.awt.LightweightDispatcher.dispatchEvent(Container.java:3916) > at java.awt.Container.dispatchEventImpl(Container.java:2102) > at java.awt.Window.dispatchEventImpl(Window.java:2429) > at java.awt.Component.dispatchEvent(Component.java:4240) > at java.awt.EventQueue.dispatchEvent(EventQueue.java:599) > at > java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:273) > at > java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:183) > at > java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:173) > at > java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:168) > at > java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:160) > at java.awt.EventDispatchThread.run(EventDispatchThread.java:121) > Caused by: org.apache.solr.common.SolrException: parsing error > at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:138) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:99) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:317) > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:84) > ... 29 more > Caused by: java.lang.RuntimeException: this must be known type! not: int > at > org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:217) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:235) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:123) > Hi May be your query string contains any illegal values or the problem may be in your server... make sure that your solr is running in localhost:8080 -- View this message in context: http://www.nabble.com/sorlj-search-tp15305698p22983898.h
multiple tokenizers needed
I want to analyze a text based on pattern ";" and separate on whitespace and it is a Japanese text so use CJKAnalyzer + tokenizer also. in short I want to do: Can anyone please tell me how to achieve this?? Because the above syntax is not at all possible. -- View this message in context: http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Solr Distributed Search
Just an update. I changed the schema to store the unique id field, but I still get the connection reset exception. I did notice that if there is no data in the core then it returns the 0 result (no exception), but if there is data and you search using "shards" parameter I get the connection reset exception. Can anyone provide some tip on where can I look for this problem? Apr 10, 2009 3:16:04 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:395) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) ... 1 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) On Thu, Apr 9, 2009 at 6:51 PM, vivek sar wrote: > I think the reason behind the "connection reset" is. Looking at the > code it points to QueryComponent.mergeIds() > > resultIds.put(shardDoc.id.toString(), shardDoc); > > looks like the doc unique id is returning null. I'm not sure how is it > possible as its a required field. Right my unique id is not stored > (only indexed) - does it has to be stored for distributed search? > > HTTP Status 500 - null java.lang.NullPointerException at > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) > at > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) > at > org.apache.s
Re: Question on Solr Distributed Search
I think the reason behind the "connection reset" is. Looking at the code it points to QueryComponent.mergeIds() resultIds.put(shardDoc.id.toString(), shardDoc); looks like the doc unique id is returning null. I'm not sure how is it possible as its a required field. Right my unique id is not stored (only indexed) - does it has to be stored for distributed search? HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) On Thu, Apr 9, 2009 at 5:01 PM, vivek sar wrote: > Hi, > > I've another thread on multi-core distributed search, but just > wanted to put a simple question here on distributed search to get some > response. I've a search query, > > http://etsx19.co.com:8080/solr/20090409_9/select?q=usa - > returns with 10 result > > now if I add "shards" parameter to it, > > http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa > - this fails with > > org.apache.solr.client.solrj.SolrServerException: > java.net.SocketException: Connection reset > org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: > java.net.SocketException: Connection reset at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) > at > .. > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:637) > Caused by: org.apache.solr.client.solrj.SolrServerException: > java.net.SocketException: Connection reset > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) > at > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) > .. > Caused by: java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:168) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at > org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) > at > org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) > at > org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) > at > org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) > at > org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) > > Attached is my solrconfig.xml. Do I need a special RequestHandler for > sharding? I haven't been able to make any distributed search > successfully. Any help is appreciated. > > Note: I'm indexing using Solrj - not sure if that makes any difference > to the search part. > > Thanks, > -vivek >
Re: Additive filter queries
: Right now a document looks like this: : : : : 1598548 : 12545 : Adidas : 1, 2, 3, 4, 5, 6, 7 : AA, A, B, W, W, : Brown : : : If we went down a level, it could look like.. : : : 1598548 : 12545 : 654641654684 : Adidas : 1 : AA : Brown : If you want result at the "product" level then you don't have to have one *doc* per legal size+width pair ... you just need one *term* per valid size+width pair 1, 2, 3, 4, 5, 6, 7 AA, A, B, W, W, 1_W 2W 3_B 3_W 4_AA 4_A 4_B 4_W 4_WW 5_W 5_ 6_ 7_ a search for size 4 clogs would look like... q=clogs&fq=size:5&facet.field=opts&f.opts.facet.prefix=4_ ...and the facet counts for "opts" would tell me what widths were available (and how many). for completeness you typically want to index the pairs in both directions (1_W and W_1 ... typically in seperate fields) so the user can filter by either option first ... for something like size+color this makes sense, but i'm guessing with shoes no one expects to narrow by "width" untill they've narrowed by size first. -Hoss
Re: Querying for multi-word synonyms
: Unfortunately, I have to use SynonymFilter at query time due to the nature : of the data I'm indexing. At index time, all I have are keywords but at : query time I will have some semantic markup which allows me to expand into : synonyms. I am wondering if any progress has been made into making query : time synonym searching work correctly. If not, does anyone have some ideas : for alternatives to using SynonymFilter? The only thing I can think of is to : simply create a custom BooleanQuery for the search and feed the synonyms in : manually, but then I am missing out on all the functionality of the dismax : query parser. Any ideas are appreciated, thanks very much. Fundementally the problem with multi-word query time synonyms is that the Analyzer only has a limited mechanism of conveying "structure" back to the caller (ie: the QueryParser) ... that mechanism being the "term position" -- you can indicate that terms can occupy the same single position, but not that sequences of terms can occupy the same position. you could write a query parser that used nested SpanNearQueries to create a directed acyclic graph of terms that you want to match in a sequence, where some "branches" of the graph contain more nodes then others, but you would need to do the synonym recognition while building up the query (and working with the DAG) ... but the current SynonymFilter works as part of hte TokenStream. -Hoss
Question on Solr Distributed Search
Hi, I've another thread on multi-core distributed search, but just wanted to put a simple question here on distributed search to get some response. I've a search query, http://etsx19.co.com:8080/solr/20090409_9/select?q=usa - returns with 10 result now if I add "shards" parameter to it, http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa - this fails with org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at .. at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422) .. Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) Attached is my solrconfig.xml. Do I need a special RequestHandler for sharding? I haven't been able to make any distributed search successfully. Any help is appreciated. Note: I'm indexing using Solrj - not sure if that makes any difference to the search part. Thanks, -vivek true 100 64 2147483647 1 1000 1 single true 100 64 2147483647 1 true single 1024 false 10 false explicit inStock:true text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 2<-1 5<-2 6<90% 5 solr
Re: How to get the solrhome location dynamically
: Subject: How to get the solrhome location dynamically Do you really want the Solr Home Dir, or do you want the instanceDir for a specific SolrCore? If you're using a solr.xml file (ie: one or many cores), you can get hte instanceDir for each core from the CoreAdminHandler -- but it doesn't expost the actual SolrHomeDir where the solr.xml file was found. If you aren't using a solr.xml file (ie: you definitely only have one core) you can get the instance dir from the SystemInfoRequestHandler (/admin/system in the example configs) ... and since you aren't using a solr.xml file, the instance dir is the same as the Solr Home Dir. (H... I suppose the CoreAdminHandler should probably expose metadta about the CoreContainer ... anyone want to work up a patch?) -Hoss
Re: httpclient.ProtocolException using Solrj
Here is what I'm doing, SolrServer server = new StreamingUpdateSolrServer(url, 1000,5); server.addBeans(dataList); //where dataList is List with 10K elements I run two threads each using the same server object and then each call server.addBeans(...). I'm able to get 50K/sec inserted using that, but the commit after that (after 100k records) takes 70sec - which messes up the avg time. There are two problems here, 1) Once in a while I get "connection reset" error, Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) Note: if I use CommonsHttpSolrServer I get the buffer error. 2) The commit takes way too long for every 100k (I may commit more often if this can not be improved) I'm trying to fix this error problem which happens only if I run two threads both calling addBeans (10k at a time). One thread work fine. I'm not sure how can I use the MultiThreadedConnectionManager to create StreamingUpdateSolrServer and if they would help? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् : > using a single request is the fatest > > http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65 > > I could index at the rate of 10,000 docs/sec using this and > BinaryRequestWriter > > On Thu, Apr 9, 2009 at 10:36 PM, vivek sar wrote: >> I'm inserting 10K in a batch (using addBeans method). I read somewhere >> in the wiki that it's better to use the same instance of SolrServer >> for better performance. Would MultiThreadedConnectionManager help? How >> do I use it? >> >> I also wanted to know how can use EmbeddedSolrServer - does my app >> needs to be running in the same jvm with Solr webapp? >> >> Thanks, >> -vivek >> >> 2009/4/9 Noble Paul നോബിള് नोब्ळ् : >>> how many documents are you inserting ? >>> may be you can create multiple instances of CommonshttpSolrServer and >>> upload in parallel >>> >>> >>> On Thu, Apr 9, 2009 at 11:58 AM, vivek sar wrote: Thanks Shalin and Paul. I'm not using MultipartRequest. I do share the same SolrServer between two threads. I'm not using MultiThreadedHttpConnectionManager. I'm simply using CommonsHttpSolrServer to create the SolrServer. I've also tried StreamingUpdateSolrServer, which works much faster, but does throws "connection reset" exception once in a while. Do I need to use MultiThreadedHttpConnectionManager? I couldn't find anything on it on Wiki. I was also thinking of using EmbeddedSolrServer - in what case would I be able to use it? Does my application and the Solr web app need to run into the same JVM for this to work? How would I use the EmbeddedSolrServer? Thanks, -vivek On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar wrote: > Vivek, do you share the same SolrServer instance between your two threads? > If so, are you using the MultiThreadedHttpConnectionManager when creating > the HttpClient instance? > > On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote: > >> single thread everything works fine. Two threads are fine too for a >> while and all the sudden problem starts happening. >> >> I tried indexing using REST services as well (instead of Solrj), but >> with that too I get following error after a while, >> >> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - >> indexData()-> Failed to index >> java.net.SocketException: Broken pipe >> at java.net.SocketOutputStream.socketWrite0(Native Method) >> at >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >> at >> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) >> at java.io.FilterOutputStream.write(FilterOutputStream.java:80) >> at >> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) >> at >> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) >> at >> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) >> at >> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) >> at >> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) >> at >> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) >> at >> org.apache.
logging
We built our own webapp that used the Solr JARs. We used Apache Commons/log4j logging and just put log4j.properties in the Resin conf directory. The commons-logging and log4j jars were put in the Resin lib driectory. Everything worked great and we got log files for our code only. So, I upgraded to Solr 1.4 and I no longer get my log file. I assume it has something to do with Solr 1.4 using SL4J instead of JDK logging, but it seems like my code would be independent of that. Any ideas?
Re: Using ExtractingRequestHandler to index a large PDF ~solved
On Apr 6, 2009, at 10:16 AM, Fergus McMenemie wrote: Hmmm, Not sure how this all hangs together. But editing my solrconfig.xml as follows sorted the problem:- multipartUploadLimitInKB="2048" /> to multipartUploadLimitInKB="20048" /> We should document this on the wiki or in the config, if it isn't already. Also, my initial report of the issue was misled by the log messages. The mention of "oceania.pdf" refers to a previous successful tika extract. There no mention of the filename that was rejected in the logs or any information that would help me identify it! We should fix this so it at least spits out a meaningful message. Can you open a JIRA? Regards Fergus. Sorry if this is a FAQ; I suspect it could be. But how do I work around the following:- INFO: [] webapp=/apache-solr-1.4-dev path=/update/extract params={ext.def.fl=text&ext.literal.id=factbook/reference_maps/pdf/ oceania.pdf} status=0 QTime=318 Apr 2, 2009 11:17:46 AM org.apache.solr.common.SolrException log SEVERE: org.apache.commons.fileupload.FileUploadBase $SizeLimitExceededException: the request was rejected because its size (4585774) exceeds the configured maximum (2097152) at org.apache.commons.fileupload.FileUploadBase $FileItemIteratorImpl.(FileUploadBase.java:914) at org .apache .commons .fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331) at org .apache .commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java: 349) at org .apache .commons .fileupload .servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126) at org .apache .solr .servlet .MultipartRequestParser .parseParamsAndFillStreams(SolrRequestParsers.java:343) at org .apache .solr .servlet .StandardRequestParser .parseParamsAndFillStreams(SolrRequestParsers.java:396) at org .apache .solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 217) at org .apache .catalina .core .ApplicationFilterChain .internalDoFilter(ApplicationFilterChain.java:202) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 173) at org .apache .catalina .core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org .apache .catalina .core.StandardContextValve.invoke(StandardContextValve.java:178) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) Although the PDF is big, it contains very little text; it is a map. "java -jar solr/lib/tika-0.3.jar -g" appears to have no bother with it. Fergus... -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Dictionary lookup possibilities
Hello, I'm struggling with some ideas, maybe somebody can help me with past experiences or tips. I have loaded a dictionary into a Solr index, using stemming and some stopwords in analysis part of the schema. Each record holds a term from the dictionary, which can consist of multiple words. For some data analysis work, I want to send pieces of text (sentences actually) to Solr to retrieve all possible dictionary terms that could occur. Ideally, I want to construct a query that only returns those Solr records for which all individual words in that record are matched. For instance, my dictionary holds the following terms: 1 - a b c d 2 - c d e 3 - a b 4 - a e f g h If I put the sentence [a b c d f g h] in as a query, I want to recieve dictionary items 1 (matching all words a b c d) and 3 (matching words a b) as matches I have been puzzling about how to do this. The only way I found so far was to construct an OR query with all words of the sentence in it. In this case, that would result in all dictionary items being returned. This would then require some code to go over the search results and analyse each of them (i.e. by using the highlight function) to kick out 'false' matches, but I am looking for a more efficient way. Is there a way to do this with Solr functionality, or do I need to start looking into the Lucene API ..? Any help would be much appreciated as usual! Thanks, bye, Jaco.
Re: Any tips for indexing large amounts of data?
On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr wrote: > > Hi Otis, > How did you manage that? I've 8 core machine with 8GB of ram and 11GB index > for 14M docs and 5 update every 30mn but my replication kill everything. > My segments are merged too often sor full index replicate and cache lost and > I've no idea what can I do now? > Some help would be brilliant, > btw im using Solr 1.4. > sunnnyfr , whether the replication is full or delta , the caches are lost completely. you can think of partitioning the index into separate Solrs and updating one partition at a time and perform distributed search. > Thanks, > > > Otis Gospodnetic wrote: >> >> Mike is right about the occasional slow-down, which appears as a pause and >> is due to large Lucene index segment merging. This should go away with >> newer versions of Lucene where this is happening in the background. >> >> That said, we just indexed about 20MM documents on a single 8-core machine >> with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took >> a little less than 10 hours - that's over 550 docs/second. The vanilla >> approach before some of our changes apparently required several days to >> index the same amount of data. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> - Original Message >> From: Mike Klaas >> To: solr-user@lucene.apache.org >> Sent: Monday, November 19, 2007 5:50:19 PM >> Subject: Re: Any tips for indexing large amounts of data? >> >> There should be some slowdown in larger indices as occasionally large >> segment merge operations must occur. However, this shouldn't really >> affect overall speed too much. >> >> You haven't really given us enough data to tell you anything useful. >> I would recommend trying to do the indexing via a webapp to eliminate >> all your code as a possible factor. Then, look for signs to what is >> happening when indexing slows. For instance, is Solr high in cpu, is >> the computer thrashing, etc? >> >> -Mike >> >> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: >> >>> Hi, >>> >>> Thanks for answering this question a while back. I have made some >>> of the suggestions you mentioned. ie not committing until I've >>> finished indexing. What I am seeing though, is as the index get >>> larger (around 1Gb), indexing is taking a lot longer. In fact it >>> slows down to a crawl. Have you got any pointers as to what I might >>> be doing wrong? >>> >>> Also, I was looking at using MultiCore solr. Could this help in >>> some way? >>> >>> Thank you >>> Brendan >>> >>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: >>> : I would think you would see better performance by allowing auto commit : to handle the commit size instead of reopening the connection all the : time. if your goal is "fast" indexing, don't use autoCommit at all ... >> just index everything, and don't commit until you are completely done. autoCommitting will slow your indexing down (the benefit being that more results will be visible to searchers as you proceed) -Hoss >>> >> >> >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul
Re: httpclient.ProtocolException using Solrj
On Thu, Apr 9, 2009 at 10:36 PM, vivek sar wrote: > I'm inserting 10K in a batch (using addBeans method). I read somewhere > in the wiki that it's better to use the same instance of SolrServer > for better performance. Would MultiThreadedConnectionManager help? How > do I use it? > If you are not passing your own HttpClient to the CommonsHttpSolrServer constructor then you do not need to worry about this. The default is the MultiThreadedConnectionManager. > > I also wanted to know how can use EmbeddedSolrServer - does my app > needs to be running in the same jvm with Solr webapp? > Actually with EmbeddedSolrServer, there is no Solr webapp. You add it as another jar in your own webapp. -- Regards, Shalin Shekhar Mangar.
Re: httpclient.ProtocolException using Solrj
using a single request is the fatest http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65 I could index at the rate of 10,000 docs/sec using this and BinaryRequestWriter On Thu, Apr 9, 2009 at 10:36 PM, vivek sar wrote: > I'm inserting 10K in a batch (using addBeans method). I read somewhere > in the wiki that it's better to use the same instance of SolrServer > for better performance. Would MultiThreadedConnectionManager help? How > do I use it? > > I also wanted to know how can use EmbeddedSolrServer - does my app > needs to be running in the same jvm with Solr webapp? > > Thanks, > -vivek > > 2009/4/9 Noble Paul നോബിള് नोब्ळ् : >> how many documents are you inserting ? >> may be you can create multiple instances of CommonshttpSolrServer and >> upload in parallel >> >> >> On Thu, Apr 9, 2009 at 11:58 AM, vivek sar wrote: >>> Thanks Shalin and Paul. >>> >>> I'm not using MultipartRequest. I do share the same SolrServer between >>> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm >>> simply using CommonsHttpSolrServer to create the SolrServer. I've also >>> tried StreamingUpdateSolrServer, which works much faster, but does >>> throws "connection reset" exception once in a while. >>> >>> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find >>> anything on it on Wiki. >>> >>> I was also thinking of using EmbeddedSolrServer - in what case would I >>> be able to use it? Does my application and the Solr web app need to >>> run into the same JVM for this to work? How would I use the >>> EmbeddedSolrServer? >>> >>> Thanks, >>> -vivek >>> >>> >>> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar >>> wrote: Vivek, do you share the same SolrServer instance between your two threads? If so, are you using the MultiThreadedHttpConnectionManager when creating the HttpClient instance? On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote: > single thread everything works fine. Two threads are fine too for a > while and all the sudden problem starts happening. > > I tried indexing using REST services as well (instead of Solrj), but > with that too I get following error after a while, > > 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - > indexData()-> Failed to index > java.net.SocketException: Broken pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at > java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > at java.io.FilterOutputStream.write(FilterOutputStream.java:80) > at > org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) > at > org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) > at > org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) > at > org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) > at > org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) > at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) > > > Note, I'm using "simple" lock type. I'd tried "single" type before > that once caused index corruption so I switched to "simple". > > Thanks, > -vivek > > 2009/4/8 Noble Paul നോബിള് नोब्ळ् : > > do you see the same problem when you use a single thread? > > > > what is the version of SolrJ that you use? > > > > > > > > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar wrote: > >> Hi, > >> > >> Any ideas on this issue? I ran into this again - once it starts > >> happening it keeps happening. One of the thread keeps failing. Here > >> are my SolrServer settings, > >> > >> int socketTO = 0; > >> int connectionTO = 100; > >> int maxConnectionPerHost = 10; > >> int maxTotalConnection = 50; > >> boolean followRedirects = false; > >> boolean allowCompression = true; > >> int maxRetries = 1; > >> > >> Note, I'm using two threads to simultaneously write to the same index. > >> > >> org.apache.solr.client.solrj.SolrServerException: > >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity > >> enclosing request can not be repeated. > >> at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.re
Re: Custom DIH: FileDataSource with additional business logic?
FileDataSource is of type Reader . means getData() returns ajava.io.Reader.That is not very suitable for you. your best bet is to write a simple DataSource which returns an Iterator> after reading the serialized Objects .This is what JdbcdataSource does. Then you can use it with SqlEntityProcessor On Thu, Apr 9, 2009 at 9:42 PM, Giovanni De Stefano wrote: > Hello, > > here I am with another question. > > I am using DIH to index a DB. Additionally I also have to index some files > containing Java serialized objects (and I cannot change this... :-( ). > > I currently have implemented a standalone Java app with the following > features: > > 1) read all files from a given folder > 2) deserialize the files into lists of items > 3) convert the list of items into lists of SolrInputDocument(s) > 4) post the lists of SolrInputDocument(s) to Solr > > All this is done using SolrJ. So far so good. > > I would like to use a DIH with a FileDataSource to do 1) and 4), and I would > like to "squeeze" in my implementation for 2) and 3). > > Is this possible? Any hint? > > Thank you all in advance. > > Cheers, > Giovanni > -- --Noble Paul
Re: Access HTTP headers from custom request handler
well unfortunately , no. Solr cannot assume that the request would always come from http (think of EmbeddedSolrServer) .So it assumes that there are only parameters Your best bet is to modify SolrDispatchFilter and readthe params and set them in the SolrRequest Object or you can just write a Filter before SolrDispatchFIlter and set the current httrequest object into a threadlocal On Thu, Apr 9, 2009 at 6:27 PM, Giovanni De Stefano wrote: > Hello all, > > we are writing a custom request handler and we need to implement some > business logic according to some HTTP headers. > > I see there is no easy way to access HTTP headers from the request handler. > > Moreover it seems to me that the HTTPServletness is lost way before the > custom request handler comes in the game. > > Is there any way to access HTTP headers from within the request handler? > > Thanks, > Giovanni > -- --Noble Paul
Re: httpclient.ProtocolException using Solrj
I'm inserting 10K in a batch (using addBeans method). I read somewhere in the wiki that it's better to use the same instance of SolrServer for better performance. Would MultiThreadedConnectionManager help? How do I use it? I also wanted to know how can use EmbeddedSolrServer - does my app needs to be running in the same jvm with Solr webapp? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् : > how many documents are you inserting ? > may be you can create multiple instances of CommonshttpSolrServer and > upload in parallel > > > On Thu, Apr 9, 2009 at 11:58 AM, vivek sar wrote: >> Thanks Shalin and Paul. >> >> I'm not using MultipartRequest. I do share the same SolrServer between >> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm >> simply using CommonsHttpSolrServer to create the SolrServer. I've also >> tried StreamingUpdateSolrServer, which works much faster, but does >> throws "connection reset" exception once in a while. >> >> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find >> anything on it on Wiki. >> >> I was also thinking of using EmbeddedSolrServer - in what case would I >> be able to use it? Does my application and the Solr web app need to >> run into the same JVM for this to work? How would I use the >> EmbeddedSolrServer? >> >> Thanks, >> -vivek >> >> >> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar >> wrote: >>> Vivek, do you share the same SolrServer instance between your two threads? >>> If so, are you using the MultiThreadedHttpConnectionManager when creating >>> the HttpClient instance? >>> >>> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote: >>> single thread everything works fine. Two threads are fine too for a while and all the sudden problem starts happening. I tried indexing using REST services as well (instead of Solrj), but with that too I get following error after a while, 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - indexData()-> Failed to index java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) Note, I'm using "simple" lock type. I'd tried "single" type before that once caused index corruption so I switched to "simple". Thanks, -vivek 2009/4/8 Noble Paul നോബിള് नोब्ळ् : > do you see the same problem when you use a single thread? > > what is the version of SolrJ that you use? > > > > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar wrote: >> Hi, >> >> Any ideas on this issue? I ran into this again - once it starts >> happening it keeps happening. One of the thread keeps failing. Here >> are my SolrServer settings, >> >> int socketTO = 0; >> int connectionTO = 100; >> int maxConnectionPerHost = 10; >> int maxTotalConnection = 50; >> boolean followRedirects = false; >> boolean allowCompression = true; >> int maxRetries = 1; >> >> Note, I'm using two threads to simultaneously write to the same index. >> >> org.apache.solr.client.solrj.SolrServerException: >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity >> enclosing request can not be repeated. >> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) >> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) >> at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) >> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) >> at org.apac
Re: Searching on mulit-core Solr
Attached is the solr.xml - note, the schema and solrconfig are located in the core0 and all other cores point to the same core0 instance for schema. Searches on individual cores work fine so I'm using the solr.xml is correct - I also get their status correctly. From the "NullPointerException" it seems it fails at, for (int i=resultSize-1; i>=0; i--) { ShardDoc shardDoc = (ShardDoc)queue.pop(); shardDoc.positionInResponse = i; // Need the toString() for correlation with other lists that must // be strings (like keys in highlighting, explain, etc) resultIds.put(shardDoc.id.toString(), shardDoc); } I've a unique field (required) in my documents so I'm not sure whether that can be null - could doc itself be null - how? Same search on the same cores individually works fine. Not sure if there is a way to debug this. I'm not sure on when would I get "Connection reset" exception - would it be if indexing is happening at the same time at hight rate - would that cause problems? Thanks, -vivek On Thu, Apr 9, 2009 at 4:07 AM, Fergus McMenemie wrote: >>Any help on this issue? Would distributed search on multi-core on same >>Solr instance even work? Does it has to be different Solr instances >>altogether (separate shards)? > > As best I can tell this works fine for me. Multiple cores on the one > machine. Very different schema and solrconfig.xml for each of the > cores. Distributed searching using shards works fine. But I am using > the trunk version. > > Perhaps you should post your solr.xml file. > >>I'm kind of stuck at this point right now. Keep getting one of the two >>errors (when running distributed search - single searches work fine) >>as mentioned in this thread earlier. >> >>Thanks, >>-vivek >> >>On Wed, Apr 8, 2009 at 1:57 AM, vivek sar wrote: >>> Thanks Fergus. I'm still having problem with multicore search. >>> >>> I tried the following with two cores (they both share the same schema >>> and solrconfig.xml) on the same box on same solr instance, >>> >>> 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the >>> cores in admin interface >>> 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores >>> in xml >>> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, >>> gives me top 10 records >>> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, >>> gives me top 10 records >>> 5) >>> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan >>> - this FAILS. I've seen two problems with this. >>> >>> a) When index are being committed I see, >>> >>> SEVERE: org.apache.solr.common.SolrException: >>> org.apache.solr.client.solrj.SolrServerException: >>> java.net.SocketException: Connection reset >>> at >>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) >>> at >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> at >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>> at >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >>> at >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >>> at >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>> at >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >>> at >>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) >>> at >>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) >>> at >>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >>> at >>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) >>> at java.lang.Thread.run(Thread.java:637) >>> >>> b) Other times I see this, >>> >>> SEVERE: java.lang.NullPointerException >>> at >>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) >>> at >>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) >>> at >>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) >>> at >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
Re: Searching on mulit-core Solr
Erik, Here is what I'd posted in this thread earlier, I tried the following with two cores (they both share the same schema and solrconfig.xml) on the same box on same solr instance, 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the cores in admin interface 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in xml 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, gives me top 10 records 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, gives me top 10 records 5) http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan - this FAILS. I've seen two problems with this. a) This is the error most of the times, SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) b) When index are being committed I see this during search, SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:637) Any tips on how can I search on multicore on same solr instance? Thanks, -vivek On Thu, Apr 9, 2009 at 2:56 AM, Erik Hatcher wrote: > > On Apr 9, 2009, at 3:00 AM, vivek sar wrote: >> >> Can someone please clear this up as I'm not >> able to run distributed search on multi-cores. > > What error or problem are you encountering when trying this? How are you > trying it? > > Erik > >
Custom DIH: FileDataSource with additional business logic?
Hello, here I am with another question. I am using DIH to index a DB. Additionally I also have to index some files containing Java serialized objects (and I cannot change this... :-( ). I currently have implemented a standalone Java app with the following features: 1) read all files from a given folder 2) deserialize the files into lists of items 3) convert the list of items into lists of SolrInputDocument(s) 4) post the lists of SolrInputDocument(s) to Solr All this is done using SolrJ. So far so good. I would like to use a DIH with a FileDataSource to do 1) and 4), and I would like to "squeeze" in my implementation for 2) and 3). Is this possible? Any hint? Thank you all in advance. Cheers, Giovanni
Re: Any tips for indexing large amounts of data?
> - As per > http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf Sorry, the presentation covers a lot of ground: see slide #20: "Standard thread pools can have high contention for task queue and other data structures when used with fine-grained tasks" [I haven't yet implemented work stealing] -glen 2009/4/9 Glen Newton : > For Solr / Lucene: > - use -XX:+AggressiveOpts > - If available, huge pages can help. See > http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html > I haven't yet followed-up with my Lucene performance numbers using > huge pages: it is 10-15% for large indexing jobs. > > For Lucene: > - multi-thread using java.util.concurrent.ThreadPoolExecutor > (http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html > 6.4 million full-text article + metadata indexed resulting in 83GB > index; these are old number: things are down to ~10hours now) > - while multithreading on multicore is particularly good, it also > improves performance on single core, for small (<6 YMMV) numbers of > threads & good I/O (test for your particular configuration) > - Use multiple indexes & merge at the end > - As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf > use separate ThreadPoolExecutor per index in previous, reducing queue > contention. This is giving me an additional ~10%. I will blog about > this in the near future... > > -glen > > 2009/4/9 sunnyfr : >> >> Hi Otis, >> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index >> for 14M docs and 5 update every 30mn but my replication kill everything. >> My segments are merged too often sor full index replicate and cache lost and >> I've no idea what can I do now? >> Some help would be brilliant, >> btw im using Solr 1.4. >> >> Thanks, >> >> >> Otis Gospodnetic wrote: >>> >>> Mike is right about the occasional slow-down, which appears as a pause and >>> is due to large Lucene index segment merging. This should go away with >>> newer versions of Lucene where this is happening in the background. >>> >>> That said, we just indexed about 20MM documents on a single 8-core machine >>> with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took >>> a little less than 10 hours - that's over 550 docs/second. The vanilla >>> approach before some of our changes apparently required several days to >>> index the same amount of data. >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> - Original Message >>> From: Mike Klaas >>> To: solr-user@lucene.apache.org >>> Sent: Monday, November 19, 2007 5:50:19 PM >>> Subject: Re: Any tips for indexing large amounts of data? >>> >>> There should be some slowdown in larger indices as occasionally large >>> segment merge operations must occur. However, this shouldn't really >>> affect overall speed too much. >>> >>> You haven't really given us enough data to tell you anything useful. >>> I would recommend trying to do the indexing via a webapp to eliminate >>> all your code as a possible factor. Then, look for signs to what is >>> happening when indexing slows. For instance, is Solr high in cpu, is >>> the computer thrashing, etc? >>> >>> -Mike >>> >>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: >>> Hi, Thanks for answering this question a while back. I have made some of the suggestions you mentioned. ie not committing until I've finished indexing. What I am seeing though, is as the index get larger (around 1Gb), indexing is taking a lot longer. In fact it slows down to a crawl. Have you got any pointers as to what I might be doing wrong? Also, I was looking at using MultiCore solr. Could this help in some way? Thank you Brendan On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: > > : I would think you would see better performance by allowing auto > commit > : to handle the commit size instead of reopening the connection > all the > : time. > > if your goal is "fast" indexing, don't use autoCommit at all ... >>> just > index everything, and don't commit until you are completely done. > > autoCommitting will slow your indexing down (the benefit being > that more > results will be visible to searchers as you proceed) > > > > > -Hoss > >>> >>> >>> >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > > - > -- -
Re: Any tips for indexing large amounts of data?
For Solr / Lucene: - use -XX:+AggressiveOpts - If available, huge pages can help. See http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html I haven't yet followed-up with my Lucene performance numbers using huge pages: it is 10-15% for large indexing jobs. For Lucene: - multi-thread using java.util.concurrent.ThreadPoolExecutor (http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html 6.4 million full-text article + metadata indexed resulting in 83GB index; these are old number: things are down to ~10hours now) - while multithreading on multicore is particularly good, it also improves performance on single core, for small (<6 YMMV) numbers of threads & good I/O (test for your particular configuration) - Use multiple indexes & merge at the end - As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf use separate ThreadPoolExecutor per index in previous, reducing queue contention. This is giving me an additional ~10%. I will blog about this in the near future... -glen 2009/4/9 sunnyfr : > > Hi Otis, > How did you manage that? I've 8 core machine with 8GB of ram and 11GB index > for 14M docs and 5 update every 30mn but my replication kill everything. > My segments are merged too often sor full index replicate and cache lost and > I've no idea what can I do now? > Some help would be brilliant, > btw im using Solr 1.4. > > Thanks, > > > Otis Gospodnetic wrote: >> >> Mike is right about the occasional slow-down, which appears as a pause and >> is due to large Lucene index segment merging. This should go away with >> newer versions of Lucene where this is happening in the background. >> >> That said, we just indexed about 20MM documents on a single 8-core machine >> with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took >> a little less than 10 hours - that's over 550 docs/second. The vanilla >> approach before some of our changes apparently required several days to >> index the same amount of data. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> - Original Message >> From: Mike Klaas >> To: solr-user@lucene.apache.org >> Sent: Monday, November 19, 2007 5:50:19 PM >> Subject: Re: Any tips for indexing large amounts of data? >> >> There should be some slowdown in larger indices as occasionally large >> segment merge operations must occur. However, this shouldn't really >> affect overall speed too much. >> >> You haven't really given us enough data to tell you anything useful. >> I would recommend trying to do the indexing via a webapp to eliminate >> all your code as a possible factor. Then, look for signs to what is >> happening when indexing slows. For instance, is Solr high in cpu, is >> the computer thrashing, etc? >> >> -Mike >> >> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: >> >>> Hi, >>> >>> Thanks for answering this question a while back. I have made some >>> of the suggestions you mentioned. ie not committing until I've >>> finished indexing. What I am seeing though, is as the index get >>> larger (around 1Gb), indexing is taking a lot longer. In fact it >>> slows down to a crawl. Have you got any pointers as to what I might >>> be doing wrong? >>> >>> Also, I was looking at using MultiCore solr. Could this help in >>> some way? >>> >>> Thank you >>> Brendan >>> >>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: >>> : I would think you would see better performance by allowing auto commit : to handle the commit size instead of reopening the connection all the : time. if your goal is "fast" indexing, don't use autoCommit at all ... >> just index everything, and don't commit until you are completely done. autoCommitting will slow your indexing down (the benefit being that more results will be visible to searchers as you proceed) -Hoss >>> >> >> >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- -
Re: Any tips for indexing large amounts of data?
Hi Otis, How did you manage that? I've 8 core machine with 8GB of ram and 11GB index for 14M docs and 5 update every 30mn but my replication kill everything. My segments are merged too often sor full index replicate and cache lost and I've no idea what can I do now? Some help would be brilliant, btw im using Solr 1.4. Thanks, Otis Gospodnetic wrote: > > Mike is right about the occasional slow-down, which appears as a pause and > is due to large Lucene index segment merging. This should go away with > newer versions of Lucene where this is happening in the background. > > That said, we just indexed about 20MM documents on a single 8-core machine > with 8 GB of RAM, resulting in nearly 20 GB index. The whole process took > a little less than 10 hours - that's over 550 docs/second. The vanilla > approach before some of our changes apparently required several days to > index the same amount of data. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Mike Klaas > To: solr-user@lucene.apache.org > Sent: Monday, November 19, 2007 5:50:19 PM > Subject: Re: Any tips for indexing large amounts of data? > > There should be some slowdown in larger indices as occasionally large > segment merge operations must occur. However, this shouldn't really > affect overall speed too much. > > You haven't really given us enough data to tell you anything useful. > I would recommend trying to do the indexing via a webapp to eliminate > all your code as a possible factor. Then, look for signs to what is > happening when indexing slows. For instance, is Solr high in cpu, is > the computer thrashing, etc? > > -Mike > > On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote: > >> Hi, >> >> Thanks for answering this question a while back. I have made some >> of the suggestions you mentioned. ie not committing until I've >> finished indexing. What I am seeing though, is as the index get >> larger (around 1Gb), indexing is taking a lot longer. In fact it >> slows down to a crawl. Have you got any pointers as to what I might >> be doing wrong? >> >> Also, I was looking at using MultiCore solr. Could this help in >> some way? >> >> Thank you >> Brendan >> >> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote: >> >>> >>> : I would think you would see better performance by allowing auto >>> commit >>> : to handle the commit size instead of reopening the connection >>> all the >>> : time. >>> >>> if your goal is "fast" indexing, don't use autoCommit at all ... > just >>> index everything, and don't commit until you are completely done. >>> >>> autoCommitting will slow your indexing down (the benefit being >>> that more >>> results will be visible to searchers as you proceed) >>> >>> >>> >>> >>> -Hoss >>> >> > > > > > > -- View this message in context: http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Snapinstaller vs Solr Restart
Hi Otis, Ok about that, but still when it merges segments it changes names and I've no choice to replicate all the segment which is bad for the replication and cpu. ?? Thanks Otis Gospodnetic wrote: > > Lower your mergeFactor and Lucene will merge segments(i.e. fewer index > files) and purge deletes more often for you at the expense of somewhat > slower indexing. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: wojtekpia >> To: solr-user@lucene.apache.org >> Sent: Tuesday, January 6, 2009 5:18:26 PM >> Subject: Re: Snapinstaller vs Solr Restart >> >> >> I'm optimizing because I thought I should. I'll be updating my index >> somewhere between every 15 minutes, and every 2 hours. That means between >> 12 >> and 96 updates per day. That seems like a lot of index files (and it >> scared >> me a little), so that's my second reason for wanting to optimize nightly. >> >> I haven't benchmarked the performance hit for not optimizing. That'll be >> my >> next step. If the hit isn't too bad, I'll look into optimizing less >> frequently (weekly, ...). >> >> Thanks Otis! >> >> >> Otis Gospodnetic wrote: >> > >> > OK, so that question/answer seems to have hit the nail on the head. :) >> > >> > When you optimize your index, all index files get rewritten. This >> means >> > that everything that the OS cached up to that point goes out the window >> > and the OS has to slowly re-cache the hot parts of the index. If you >> > don't optimize, this won't happen. Do you really need to optimize? Or >> > maybe a more direct question: why are you optimizing? >> > >> > >> > Regarding autowarming, with such high fq hit rate, I'd make good use of >> fq >> > autowarming. The result cache rate is lower, but still decent. I >> > wouldn't turn off autowarming the way you have. >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21320334.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p22972780.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exception while solr commit
This is a spooky exception. Committing after every update will give very poor performance, but should be "fine" (ie, not cause exceptions like this). What filesystem are you on? Is there any possibility that two writers are open against the same index? Is this easily reproduced? Mike On Wed, Apr 8, 2009 at 2:13 PM, Narayanan, Karthikeyan wrote: > > Hello, > I am calling commit for every record (document) added/updated > to the index. Our number of records size is < 50k. Getting the > following exception during commit. Is it correct approach > to call commit for every insert/update?. > > Apr 7, 2009 4:41:23 PM org.apache.solr.handler.dataimport.SolrWriter > commit > SEVERE: Exception while solr commit. > java.lang.RuntimeException: after flush: fdx size mismatch: 20096 docs > vs 65536 length in bytes of _6.fdx > at > org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWri > ter.java:94) > at > org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumer > s.java:83) > at > org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcesso > r.java:47) > at > org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.ja > va:367) > at > org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1774 > ) > at > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3600) > at > org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4151) > at > org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4031) > at > org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeSc > heduler.java:176) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2485) > at > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2332) > at > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2. > java:355) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpd > ateProcessorFactory.java:77) > at > org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:180 > ) > at > org.apache.solr.handler.dataimport.DocBuilder.commit(DocBuilder.java:168 > ) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:15 > 2) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte > r.java:334) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java > :386) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java: > 377) > Apr 7, 2009 4:41:23 PM org.apache.solr.handler.dataimport.DocBuilder > execute > > > > Thanks. > > Karthik >
Access HTTP headers from custom request handler
Hello all, we are writing a custom request handler and we need to implement some business logic according to some HTTP headers. I see there is no easy way to access HTTP headers from the request handler. Moreover it seems to me that the HTTPServletness is lost way before the custom request handler comes in the game. Is there any way to access HTTP headers from within the request handler? Thanks, Giovanni
Re: Dataimporthandler + MySQL = Datetime offset by 2 hours ?
On Thu, Apr 9, 2009 at 6:18 PM, gateway0 wrote: > > Hi, > > im fetching entries from my mysql database and index them with the > Dataimporthandler: > > MySQL Table entry: (for example) > pr_timedate : 2009-04-14 11:00:00 > > entry in data-config.xml to index the mysql field: > dateTimeFormat="-MM-dd'T'hh:mm:ss'Z'" /> > > result in solr index: > 2009-04-14T09:00:00Z:confused: > > it says 09:00:00 instead of 11:00:00 as it supposed to. > > I´ve searched for hours already, why is that? > I think that may be because date/time in Solr is supposed to be in UTC. See the note on DateField in the schema.xml -- Regards, Shalin Shekhar Mangar.
Dataimporthandler + MySQL = Datetime offset by 2 hours ?
Hi, im fetching entries from my mysql database and index them with the Dataimporthandler: MySQL Table entry: (for example) pr_timedate : 2009-04-14 11:00:00 entry in data-config.xml to index the mysql field: result in solr index: 2009-04-14T09:00:00Z:confused: it says 09:00:00 instead of 11:00:00 as it supposed to. I´ve searched for hours already, why is that? best wishes, Sebastian -- View this message in context: http://www.nabble.com/Dataimporthandler-%2B-MySQL-%3D-Datetime-offset-by-2-hours---tp22970250p22970250.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using constants with DataImportHandler and MySQL ?
Here´s the solution: just insert a dummy sql field 'dataci_project' in your select statement. Glen Newton wrote: > > In MySql at least, you can do achieve what I think you want by > manipulating the SQL, like this: > > mysql> select "foo" as Constant1, id from Article limit 10; > select "foo" as Constant1, id from Article limit 10; > +---++ > | Constant1 | id | > +---++ > | foo | 1 | > | foo | 2 | > | foo | 3 | > | foo | 4 | > | foo | 5 | > | foo | 6 | > | foo | 7 | > | foo | 8 | > | foo | 9 | > | foo | 10 | > +---++ > 10 rows in set (0.00 sec) > > mysql> select 435 as Constant2, id from Article limit 10; > select 435 as Constant2, id from Article limit 10; > +---++ > | Constant2 | id | > +---++ > | 435 | 1 | > | 435 | 2 | > | 435 | 3 | > | 435 | 4 | > | 435 | 5 | > | 435 | 6 | > | 435 | 7 | > | 435 | 8 | > | 435 | 9 | > | 435 | 10 | > +---++ > 10 rows in set (0.00 sec) > > mysql> > > 2009/4/8 Shalin Shekhar Mangar : >> On Wed, Apr 8, 2009 at 10:23 PM, gateway0 wrote: >> >>> >>> The problem as you see is the line: >>> "Projects" >>> >>> I want to set a constant value for every row in the SQL table but it >>> doesn´t >>> work that way, any ideas? >>> >> >> That is not a valid syntax. >> >> There are two ways to do this: >> 1. In your schema.xml provide the 'default' attribute >> 2. Use TemplateTransformer - see >> http://wiki.apache.org/solr/DataImportHandlerFaq >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > > > > -- > > - > > -- View this message in context: http://www.nabble.com/Using-constants-with-DataImportHandler-and-MySQL---tp22954954p22969123.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi-language support
Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. Can we say then that solr supports all the above languages .Will search be same across all the above cases? thanks revas
Re: Searching on mulit-core Solr
>Any help on this issue? Would distributed search on multi-core on same >Solr instance even work? Does it has to be different Solr instances >altogether (separate shards)? As best I can tell this works fine for me. Multiple cores on the one machine. Very different schema and solrconfig.xml for each of the cores. Distributed searching using shards works fine. But I am using the trunk version. Perhaps you should post your solr.xml file. >I'm kind of stuck at this point right now. Keep getting one of the two >errors (when running distributed search - single searches work fine) >as mentioned in this thread earlier. > >Thanks, >-vivek > >On Wed, Apr 8, 2009 at 1:57 AM, vivek sar wrote: >> Thanks Fergus. I'm still having problem with multicore search. >> >> I tried the following with two cores (they both share the same schema >> and solrconfig.xml) on the same box on same solr instance, >> >> 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the >> cores in admin interface >> 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in >> xml >> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, >> gives me top 10 records >> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, >> gives me top 10 records >> 5) >> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan >> - this FAILS. I've seen two problems with this. >> >> a) When index are being committed I see, >> >> SEVERE: org.apache.solr.common.SolrException: >> org.apache.solr.client.solrj.SolrServerException: >> java.net.SocketException: Connection reset >> at >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >> at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) >> at >> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) >> at >> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >> at >> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) >> at java.lang.Thread.run(Thread.java:637) >> >> b) Other times I see this, >> >> SEVERE: java.lang.NullPointerException >> at >> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432) >> at >> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276) >> at >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >> at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) >> at >> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) >> at >> org.apac
Re: different scoring for different types of found documents
On Thu, Apr 9, 2009 at 2:17 PM, Andrey Klochkov wrote: > > So we're searching through the product catalog. Product have types (i.e. > "Electronics", "Apparel", "Furniture" etc). What we need is to customize > scoring of the results so that top results should contain products of all > different types which match the query. So after finding all the products > matching the query we want to group results by product type. This is something similar to Field Collapsing. It is not committed to trunk but there are a few patches. https://issues.apache.org/jira/browse/SOLR-236 > Then for every > product type take corresponding sub-set of results and in every of the > sub-sets assign scores with the following logic. Assign score 5 to the > first > 20% of results, then assign score 4 to the next 15% of results, and so on. > Particular percent values are configured by the end user. How could we > achive it using Solr? Is it possible at all? Maybe we should implement some > custom ValueSource and use it in a function queries? > Such kind of scoring is not possible out of the box. You need to assign scores according to where the document lies in the final list of results (after all filters are applied), therefore you may not be able to operate on the DocList directly or in the value source. I *think* a good place to start looking would be the QueryValueSource in trunk as it has access to the scorer. But I do not know much about these things. -- Regards, Shalin Shekhar Mangar.
Re: solr 1.4 facet boost field according to another field
I don't think conditional boosting is possible. You can boost the same field on which the match was found. But you cannot boost a different field. On Thu, Apr 9, 2009 at 2:05 PM, sunnyfr wrote: > > Do you have an idea ? > > > > sunnyfr wrote: > > > > Hi, > > > > I've title description and tag field ... According to where I find the > > word searched, I would like to boost differently other field like > nb_views > > or rating. > > > > if word is find in title then nb_views^10 and rating^10 > > if word is find in description then nb_views^2 and rating^2 > > > > Thanks a lot for your help, > > > > -- > View this message in context: > http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p2294.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Re: Searching on mulit-core Solr
On Apr 9, 2009, at 3:00 AM, vivek sar wrote: Can someone please clear this up as I'm not able to run distributed search on multi-cores. What error or problem are you encountering when trying this? How are you trying it? Erik
Re: Its urgent! plz help in schema.xml- appending one field to another
On Apr 8, 2009, at 9:50 PM, Udaya wrote: Hi, Need your help, I would like to know how we could append or add one field value to another field in Scheme.xml My scheme is as follows (only the field part is given): Scheme.xml stored="true" required="true"/> default="http://comp.com/portals/ForumWindow? action=1&v=t&p="topics_id"#"topics_id"" /> Here for the field with name "topics_id" we get id from a table. I what his topics_id value to be appended into the default value attribute of the field with name "url". For eg: Suppose if we get topics_id value as 512 during a search then the value of the url should be appended as http://comp.com/portals/JBossForumWindow?action=1&v=t&p=512#512 Is this possible, plz give me some suggestions. If you're using DIH to index your table, you could aggregate using the template transformer during indexing. If you're indexing a different way, why not let the searching client (UI) do the aggregation of an id into a URL? Erik
Analyzers and stemmer
Hi , With respect to language support in solr ,we have analyzers for some languages and stemmers for certain langauges.Do we say that solr supports this particular language only if we have both analyzer and stemmer for the language or also for which we have analyzer but not stemmer Regards Sujatha
different scoring for different types of found documents
Hi, We have a quite complex requirement concerning scoring logic customization, but but I guess it's quite useful and probably something like it was done already. So we're searching through the product catalog. Product have types (i.e. "Electronics", "Apparel", "Furniture" etc). What we need is to customize scoring of the results so that top results should contain products of all different types which match the query. So after finding all the products matching the query we want to group results by product type. Then for every product type take corresponding sub-set of results and in every of the sub-sets assign scores with the following logic. Assign score 5 to the first 20% of results, then assign score 4 to the next 15% of results, and so on. Particular percent values are configured by the end user. How could we achive it using Solr? Is it possible at all? Maybe we should implement some custom ValueSource and use it in a function queries? -- Andrew Klochkov
Re: solr 1.4 facet boost field according to another field
Do you have an idea ? sunnyfr wrote: > > Hi, > > I've title description and tag field ... According to where I find the > word searched, I would like to boost differently other field like nb_views > or rating. > > if word is find in title then nb_views^10 and rating^10 > if word is find in description then nb_views^2 and rating^2 > > Thanks a lot for your help, > -- View this message in context: http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p2294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 memory jvm
Hi Noble, Yes exactly that, I would like to know how people do during a replication ? Do they turn off servers and put a high autowarmCount which turn off the slave for a while like for my case, 10mn to bring back the new index and then autowarmCount maybe 10 minutes more. Otherwise I tried to put large number of mergefactor but I guess I've too much update every 30mn something like 2000docs and almost all segment are modified. What would you reckon? :( :) Thanks a lot Noble Noble Paul നോബിള് नोब्ळ् wrote: > > So what I decipher from the numbers is w/o queries Solr replication is > not performing too badly. The queries are inherently slow and you wish > to optimize the query performance itself. > am I correct? > > On Tue, Apr 7, 2009 at 7:50 PM, sunnyfr wrote: >> >> Hi, >> >> So I did two test on two servers; >> >> First server : with just replication every 20mn like you can notice: >> http://www.nabble.com/file/p22930179/cpu_without_request.png >> cpu_without_request.png >> http://www.nabble.com/file/p22930179/cpu2_without_request.jpg >> cpu2_without_request.jpg >> >> Second server : with one first replication and a second one during query >> test: between 15:32pm and 15h41 >> during replication (checked on .../admin/replication/index.jsp) my >> respond >> time query at the end was around 5000msec >> after the replication I guess during commitment I couldn't get answer of >> my >> query for a long time, I refreshed my page few minutes after. >> http://www.nabble.com/file/p22930179/cpu_with_request.png >> cpu_with_request.png >> http://www.nabble.com/file/p22930179/cpu2_with_request.jpg >> cpu2_with_request.jpg >> >> Now without replication I kept going query on the second server, and I >> can't >> get better than >> 1000msec repond time and 11request/second. >> http://www.nabble.com/file/p22930179/cpu_.jpg cpu_.jpg >> >> This is my request : >> select?fl=id&fq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_ready_web:1&json.nl=map&wt=json&start=0&version=1.2&bq=status_official:1^1.5+OR+status_creative:1^1+OR+language:en^0.5&bf=recip(rord(created),1,10,10)^3+pow(stat_views,0.1)^15+pow(stat_comments,0.1)^15&rows=100&qt=dismax&qf=title_en^0.8+title^0.2+description_en^0.3+description^0.2+tags^1+owner_login^0.5 >> >> Do you have advice ? >> >> Thanks Noble >> >> >> -- >> View this message in context: >> http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22930179.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > --Noble Paul > > -- View this message in context: http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22966630.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: httpclient.ProtocolException using Solrj
how many documents are you inserting ? may be you can create multiple instances of CommonshttpSolrServer and upload in parallel On Thu, Apr 9, 2009 at 11:58 AM, vivek sar wrote: > Thanks Shalin and Paul. > > I'm not using MultipartRequest. I do share the same SolrServer between > two threads. I'm not using MultiThreadedHttpConnectionManager. I'm > simply using CommonsHttpSolrServer to create the SolrServer. I've also > tried StreamingUpdateSolrServer, which works much faster, but does > throws "connection reset" exception once in a while. > > Do I need to use MultiThreadedHttpConnectionManager? I couldn't find > anything on it on Wiki. > > I was also thinking of using EmbeddedSolrServer - in what case would I > be able to use it? Does my application and the Solr web app need to > run into the same JVM for this to work? How would I use the > EmbeddedSolrServer? > > Thanks, > -vivek > > > On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar > wrote: >> Vivek, do you share the same SolrServer instance between your two threads? >> If so, are you using the MultiThreadedHttpConnectionManager when creating >> the HttpClient instance? >> >> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote: >> >>> single thread everything works fine. Two threads are fine too for a >>> while and all the sudden problem starts happening. >>> >>> I tried indexing using REST services as well (instead of Solrj), but >>> with that too I get following error after a while, >>> >>> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - >>> indexData()-> Failed to index >>> java.net.SocketException: Broken pipe >>> at java.net.SocketOutputStream.socketWrite0(Native Method) >>> at >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) >>> at java.io.FilterOutputStream.write(FilterOutputStream.java:80) >>> at >>> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) >>> at >>> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) >>> at >>> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) >>> at >>> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) >>> at >>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) >>> at >>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) >>> at >>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) >>> at >>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) >>> >>> >>> Note, I'm using "simple" lock type. I'd tried "single" type before >>> that once caused index corruption so I switched to "simple". >>> >>> Thanks, >>> -vivek >>> >>> 2009/4/8 Noble Paul നോബിള് नोब्ळ् : >>> > do you see the same problem when you use a single thread? >>> > >>> > what is the version of SolrJ that you use? >>> > >>> > >>> > >>> > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar wrote: >>> >> Hi, >>> >> >>> >> Any ideas on this issue? I ran into this again - once it starts >>> >> happening it keeps happening. One of the thread keeps failing. Here >>> >> are my SolrServer settings, >>> >> >>> >> int socketTO = 0; >>> >> int connectionTO = 100; >>> >> int maxConnectionPerHost = 10; >>> >> int maxTotalConnection = 50; >>> >> boolean followRedirects = false; >>> >> boolean allowCompression = true; >>> >> int maxRetries = 1; >>> >> >>> >> Note, I'm using two threads to simultaneously write to the same index. >>> >> >>> >> org.apache.solr.client.solrj.SolrServerException: >>> >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity >>> >> enclosing request can not be repeated. >>> >> at >>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470) >>> >> at >>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) >>> >> at >>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259) >>> >> at >>> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) >>> >> at >>> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57) >>> >> >>> >> Thanks, >>> >> -vivek >>> >> >>> >> On Sat, Apr 4, 2009 at 1:07 AM, vivek sar wrote: >>> >>> Hi, >>> >>> >>> >>> I'm sending 15K records at once using Solrj (server.addBeans(...)) >>> >>> and have two threads writing to same index. One thread goes fine, but >>> >>> the second thread always fails with, >>> >>> >>> >>> >>> >>> org.apache.solr.client.solrj.SolrServerException: >>> >>> org.apache.commons.httpclient.ProtocolException: Unb
Re: Searching on mulit-core Solr
Hi, I've gone through the mailing archive and have read contradicting remarks on this issue. Can someone please clear this up as I'm not able to run distributed search on multi-cores. Is there any document on how can I search across multicore which share the same schema. Here are the various comments I've read on this mailing list, 1) http://www.nabble.com/multi-core-vs-multi-app-td15803781.html#a15803781 Don't think you can search against multiple cores "automatically" - i.e. got to make multiple queries, one for each core and combine results yourself. Yes, this will slow things down. - Otis 2) http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-td20356173.html#a20356173 The idea behind multicore is that you will use them if you have completely different type of documents (basically multiple schemas). - Shalin 3) http://www.nabble.com/Distributed-search-td22036229.html#a22036229 That should work, yes, though it may not be a wise thing to do performance-wise, if the number of CPU cores that solr server has is lower than the number of Solr cores. - Otis My only motivation behind using multi-core is to keep the index size in limit. All my cores are using the same schema. My index grow to over 30G within a day and I need to keep up to a year of data. I couldn't find any other way of scaling using Solr. I've noticed once the index grows above 10G the index process starts slowing down, the commit takes much longer and optimize is hard to finish. So, I'm trying to create a new core after every 10 million documents (equals to 10G in my case). I don't want to start new Solr instance every 10G - that won't scale for a year time. I'm going to use 3-4 servers to hold all these cores. Now if someone could please tell me if this is a wrong scaling architecture I could re-think. I want fast indexing at the same time fast enough search. If I've to search on each core separately and merge myself the search performance is going to be awful. Is Solr the right tool for managing billions of records (I can get up to 100million records every day - with 1Kb per record - 100GB of index a day)? Most of the field values are pretty distinct (like 10 million email addresses) so the index size would be huge too. I would think it's a common problem to scale huge size index keeping both indexing and search time acceptable. I'm not sure if this can be managed on just 4 servers - we don't have 100s of boxes for this project. Any other tool that might be more appropriate for this kind of case - like Katta or Lucene on Hadoop, or simply use Lucene using Parallel Search and partition the indexes on size? Thanks, -vivek On Wed, Apr 8, 2009 at 11:07 AM, vivek sar wrote: > Any help on this issue? Would distributed search on multi-core on same > Solr instance even work? Does it has to be different Solr instances > altogether (separate shards)? > > I'm kind of stuck at this point right now. Keep getting one of the two > errors (when running distributed search - single searches work fine) > as mentioned in this thread earlier. > > Thanks, > -vivek > > On Wed, Apr 8, 2009 at 1:57 AM, vivek sar wrote: >> Thanks Fergus. I'm still having problem with multicore search. >> >> I tried the following with two cores (they both share the same schema >> and solrconfig.xml) on the same box on same solr instance, >> >> 1) http://10.4.x.x:8080/solr/core0/admin/ - works fine, shows all the >> cores in admin interface >> 2) http://10.4.x.x:8080/solr/admin/cores - works fine, see all the cores in >> xml >> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine, >> gives me top 10 records >> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine, >> gives me top 10 records >> 5) >> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan >> - this FAILS. I've seen two problems with this. >> >> a) When index are being committed I see, >> >> SEVERE: org.apache.solr.common.SolrException: >> org.apache.solr.client.solrj.SolrServerException: >> java.net.SocketException: Connection reset >> at >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >> at >> org.apache.catalina.core.StandardContextVa