Re: SolrJ 4.0 Beta maxConnectionsPerHost
There are other updates that happen on the server that do not fail, so the answer to your question is yes. On Wed, Oct 10, 2012 at 8:12 AM, Sami Siren ssi...@gmail.com wrote: On Wed, Oct 10, 2012 at 12:02 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: *Sami* The client IS instantiated only once and not for every request. I was curious if this was part of the problem. Do I need to re-instantiate the object for each request made? No, it is expensive if you instantiate the client every time. When the client seems to be hanging, can you still access the Solr instance normally and execute updates/searches from other clients? -- Sami Siren
Re: SolrJ 4.0 Beta maxConnectionsPerHost
They are both SolrJ. What is happening is I have a batch indexer application that does a full re-index once per day. I also have an incremental indexer that takes items off a queue when they are updated. The problem only happens when both are running at the same time - they also run from the same machine. I am going to dig into this today and see what I find - I didn't get around to it yesterday. Question: I don't seem to see a StreamingUpdateSolrServer object on the 4.0 beta. I did see the ConcurrentUpdateSolrServer - this seems like a similar choice. Is this correct? On Wed, Oct 10, 2012 at 9:43 AM, Sami Siren ssi...@gmail.com wrote: On Wed, Oct 10, 2012 at 5:36 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: There are other updates that happen on the server that do not fail, so the answer to your question is yes. The other updates are using solrj or something else? It would be helpful if you could prepare a simple java program that uses solrj to demonstrate the problem. Based on the available information it is really difficult try to guess what's happening. -- Sami Siren
Re: SolrJ 4.0 Beta maxConnectionsPerHost
Thanks for the heads up. I just tested this and you are right. I am making a call to addBeans and it succeeds without any issue even when the server is down. That sucks. A big part of this process is reliant on knowing exactly what has made it into the index and what has not, so this a difficult problem to solve when you can't catch exceptions. I was thinking I could execute a ping request first to determine if the Solr server is still operational, but that doesn't help if the updateRequestHandler fails. On Wed, Oct 10, 2012 at 1:48 PM, Shawn Heisey s...@elyograg.org wrote: On 10/9/2012 3:02 PM, Briggs Thompson wrote: *Otis* - jstack is a great suggestion, thanks! The problem didn't happen this morning but next time it does I will certainly get the dump to see exactly where the app is swimming around. I haven't used StreamingUpdateSolrServer but I will see if that makes a difference. Are there any major drawbacks of going this route? One caveat -- when using the Streaming/Concurrent object, your application will not be notified when there is a problem indexing. I've been told there is a way to override a method in the object to allow trapping errors, but I have not seen sample code and haven't figured out how to do it. I've filed an issue and a patch to fix this. It's received some comments, but so far nobody has decided to commit it. https://issues.apache.org/**jira/browse/SOLR-3284https://issues.apache.org/jira/browse/SOLR-3284 Thanks, Shawn
Re: SolrJ 4.0 Beta maxConnectionsPerHost
Thanks all for your responses. For some reason the emails were getting filtered out of my inbox. *Otis* - jstack is a great suggestion, thanks! The problem didn't happen this morning but next time it does I will certainly get the dump to see exactly where the app is swimming around. I haven't used StreamingUpdateSolrServer but I will see if that makes a difference. Are there any major drawbacks of going this route? *Sami* - if you are referring to config:maxConnections=200maxConnectionsPerHost=8, it showed up up in the Solr logs, not the SolrJ logs. The client IS instantiated only once and not for every request. I was curious if this was part of the problem. Do I need to re-instantiate the object for each request made? I figured there would be more overhead if I am re-creating the connection several times when I never really need to shut it down, but at this point the overhead would be minimal though so I will try that. *Hoss* - The reason it seemed the client was creating the log was because the indexer (solr *server*) was more or less dormant for several hours, then I booted up my indexing *client* and the maxConnectionsPerHost tidbit was spit out right away. I was looking for something in the solrconfig and online but didn't find anything. I didn't look for very long so will check it out again. Some very good suggestions here. I appreciate everyones feedback. I will follow up after some experimentation. Thanks, Briggs Thompson On Tue, Oct 9, 2012 at 11:19 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I did some digging and experimentation and found something interesting. : When starting up the application, I see the following in Solr logs: : Creating new http client, config:maxConnections=200maxConnectionsPerHost=8 ... : It seems as though the maxConnections and maxConnectionsPerHost are not : actually getting set. Anyone seen this problem or have an idea how to : resolve? To elaborate on sami's comment... If you are seeing this in the logs from your solr *server*, it is unlikey that it has anything to do with the settings you are making on your solr *client* this is probably related to the http client created inside solr for communicating with other solr nodes (ie: replication, solr cloud distributed updates, solr cloud peersync, etc...). Which is different from the properties you set on the http client in your solr client application. I believe there is a way to configure the defaults for the internal used http clients via solrconfig.xml, but off the top of my head i don't remember what that is. -Hoss
Re: SolrJ - IOException
I have also just ran into this a few times over the weekend in a newly deployed system. We are running Solr 4.0 Beta (not using SolrCloud) and it is hosted via AWS. I have a RabbitMQ consumer that reads updates from a queue and posts updates to Solr via SolrJ. There is quite a bit of error handling around the indexing request, and even if Solr is not live the consumer application successfully logs the exception and attempts to move along in the queue. There are two consumer applications running at once, and at times processes 400 requests per minute. The high volume times is not necessarily when this problem occurs, though. This exception is causing the entire application to hang - which is surprising considering all SolrJ logic is wrapped with try/catches. Has anyone found out more information regarding the possible keep alive bug? Any insight is much appreciated. Thanks, Briggs Thompson Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: I/O exception (java.net.SocketException) caught when processing request: Broken pipe Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: Retrying request Oct 8, 2012 7:25:48 AM com..rabbitmq.worker.SolrWriter work SEVERE: {id:4049703,datetime:2012-10-08 07:22:05} IOException occured when talking to server at: http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon server org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon server at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:362) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:69) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:96) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:79) at com..solr.SolrIndexService.IndexCoupon(SolrIndexService.java:57) at com..solr.SolrIndexService.Index(SolrIndexService.java:36) at com..rabbitmq.worker.SolrWriter.work(SolrWriter.java:47) at com..rabbitmq.job.Runner.run(Runner.java:84) at com..rabbitmq.job.SolrConsumer.main(SolrConsumer.java:10) Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:306) ... 10 more Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed. at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ... 13 more Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:147) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:154) at org.apache.http.impl.conn.LoggingSessionOutputBuffer.flush(LoggingSessionOutputBuffer.java:95) at org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:178) at org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:72) at org.apache.http.entity.mime.HttpMultipart.doWriteTo(HttpMultipart.java:206) at org.apache.http.entity.mime.HttpMultipart.writeTo(HttpMultipart.java:224) at org.apache.http.entity.mime.MultipartEntity.writeTo(MultipartEntity.java:183) at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) at org.apache.http.impl.conn.AbstractClientConnAdapter.sendRequestEntity(AbstractClientConnAdapter.java:227) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute
Re: SolrJ - IOException
Also note there were no exceptions in the actual Solr log, only on the SolrJ side. Thanks, Briggs On Mon, Oct 8, 2012 at 10:45 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: I have also just ran into this a few times over the weekend in a newly deployed system. We are running Solr 4.0 Beta (not using SolrCloud) and it is hosted via AWS. I have a RabbitMQ consumer that reads updates from a queue and posts updates to Solr via SolrJ. There is quite a bit of error handling around the indexing request, and even if Solr is not live the consumer application successfully logs the exception and attempts to move along in the queue. There are two consumer applications running at once, and at times processes 400 requests per minute. The high volume times is not necessarily when this problem occurs, though. This exception is causing the entire application to hang - which is surprising considering all SolrJ logic is wrapped with try/catches. Has anyone found out more information regarding the possible keep alive bug? Any insight is much appreciated. Thanks, Briggs Thompson Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: I/O exception (java.net.SocketException) caught when processing request: Broken pipe Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute INFO: Retrying request Oct 8, 2012 7:25:48 AM com..rabbitmq.worker.SolrWriter work SEVERE: {id:4049703,datetime:2012-10-08 07:22:05} IOException occured when talking to server at: http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon server org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon server at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:362) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:69) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:96) at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:79) at com..solr.SolrIndexService.IndexCoupon(SolrIndexService.java:57) at com..solr.SolrIndexService.Index(SolrIndexService.java:36) at com..rabbitmq.worker.SolrWriter.work(SolrWriter.java:47) at com..rabbitmq.job.Runner.run(Runner.java:84) at com..rabbitmq.job.SolrConsumer.main(SolrConsumer.java:10) Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:306) ... 10 more Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed. at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ... 13 more Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:147) at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:154) at org.apache.http.impl.conn.LoggingSessionOutputBuffer.flush(LoggingSessionOutputBuffer.java:95) at org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:178) at org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:72) at org.apache.http.entity.mime.HttpMultipart.doWriteTo(HttpMultipart.java:206) at org.apache.http.entity.mime.HttpMultipart.writeTo(HttpMultipart.java:224) at org.apache.http.entity.mime.MultipartEntity.writeTo(MultipartEntity.java:183) at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) at org.apache.http.impl.conn.AbstractClientConnAdapter.sendRequestEntity
4.0 Strange Commit/Replication Issue
Hello all, I am running 4.0 alpha and have encountered something I am unable to explain. I am indexing content to a master server, and the data is replicating to a slave. The odd part is that when searching through the UI, no documents show up on master with a standard *:* query. All cache types are set to zero. I know indexing is working because I am watching the logs and I can see documents getting added, not to mention the data is written to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a commit issue. The very strange part is that the slave is correctly replicating the data, and it is searchable in the UI on the slave (but not master). I don't understand how/why the data is visible on the slave and not visible on the master. Does anyone have any thoughts on this or seen it before? Thanks in advance! Briggs
Re: 4.0 Strange Commit/Replication Issue
That is the problem. I wasn't aware of that new feature in 4.0. Thanks for the quick response Tomás. -Briggs On Wed, Aug 1, 2012 at 3:08 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Could your autocommit in the master be using openSearcher=false? If you go to the Master admin, do you see that the searcher has all the segments that you see in the filesystem? On Wed, Aug 1, 2012 at 4:24 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello all, I am running 4.0 alpha and have encountered something I am unable to explain. I am indexing content to a master server, and the data is replicating to a slave. The odd part is that when searching through the UI, no documents show up on master with a standard *:* query. All cache types are set to zero. I know indexing is working because I am watching the logs and I can see documents getting added, not to mention the data is written to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a commit issue. The very strange part is that the slave is correctly replicating the data, and it is searchable in the UI on the slave (but not master). I don't understand how/why the data is visible on the slave and not visible on the master. Does anyone have any thoughts on this or seen it before? Thanks in advance! Briggs
Re: Solr 4 Alpha SolrJ Indexing Issue
This is unrelated for the most part, but the javabin update request handler does not seem to be working properly when calling solrj method*HttpSolrServer.deleteById(ListString ids) *. A single Id gets deleted from the index as opposed to the full list. It appears properly in the logs - shows delete of all Ids sent, although all but one remain in the index. I confirmed that the default update request handler deletes the list properly, so this appears to be a problem with the BinaryUpdateRequestHandler. Not an issue for me, just spreading the word. Thanks, Briggs On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller markrmil...@gmail.com wrote: we really need to resolve that issue soon... On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote: Yury, Thank you so much! That was it. Man, I spent a good long while trouble shooting this. Probably would have spent quite a bit more time. I appreciate your help!! -Briggs On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote: On 7/18/2012 7:11 PM, Briggs Thompson wrote: I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Can be this: https://issues.apache.org/jira/browse/SOLR-3432 - Mark Miller lucidimagination.com
Re: Solr 4 Alpha SolrJ Indexing Issue
Thanks Mark! On Thu, Jul 19, 2012 at 4:07 PM, Mark Miller markrmil...@gmail.com wrote: https://issues.apache.org/jira/browse/SOLR-3649 On Thu, Jul 19, 2012 at 3:34 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: This is unrelated for the most part, but the javabin update request handler does not seem to be working properly when calling solrj method*HttpSolrServer.deleteById(ListString ids) *. A single Id gets deleted from the index as opposed to the full list. It appears properly in the logs - shows delete of all Ids sent, although all but one remain in the index. I confirmed that the default update request handler deletes the list properly, so this appears to be a problem with the BinaryUpdateRequestHandler. Not an issue for me, just spreading the word. Thanks, Briggs On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller markrmil...@gmail.com wrote: we really need to resolve that issue soon... On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote: Yury, Thank you so much! That was it. Man, I spent a good long while trouble shooting this. Probably would have spent quite a bit more time. I appreciate your help!! -Briggs On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote: On 7/18/2012 7:11 PM, Briggs Thompson wrote: I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Can be this: https://issues.apache.org/jira/browse/SOLR-3432 - Mark Miller lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: Solr 4 Alpha SolrJ Indexing Issue
I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Running curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' Yields this in the logs: INFO: [coupon] webapp=/solr path=/update params={stream.body=deletequery*:*/query/delete} {deleteByQuery=*:*} 0 0 But the corpus of documents in the core do not change. My solrconfig is pretty barebones at this point, but I attached it in case anyone sees something strange. Anyone have any idea why documents aren't getting deleted? Thanks in advance, Briggs Thompson On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello All, I am using 4.0 Alpha and running into an issue with indexing using HttpSolrServer (SolrJ). Relevant java code: HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER); solrServer.setRequestWriter(new BinaryRequestWriter()); Relevant Solrconfig.xml content: requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / Indexing documents works perfectly fine (using addBeans()), however, when trying to do deletes I am seeing issues. I tried to do a solrServer.deleteByQuery(*:*) followed by a commit and optimize, and nothing is deleted. The response from delete request is a success, and even in the solr logs I see the following: INFO: [coupon] webapp=/solr path=/update/javabin params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false} I tried removing the binaryRequestWriter and have the request send out in default format, and I get the following error. SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType: application/octet-stream Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json] at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) I thought that an optimize does the same thing as expungeDeletes, but in the log I see expungeDeletes=false. Is there a way to force that using SolrJ? Thanks in advance, Briggs ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- This is a stripped down
Re: Solr 4 Alpha SolrJ Indexing Issue
Yury, Thank you so much! That was it. Man, I spent a good long while trouble shooting this. Probably would have spent quite a bit more time. I appreciate your help!! -Briggs On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote: On 7/18/2012 7:11 PM, Briggs Thompson wrote: I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Can be this: https://issues.apache.org/jira/browse/SOLR-3432
Re: Trunk error in Tomcat
Thanks Erik. If anyone else has any ideas about the NoSuchFieldError issue please let me know. Thanks! -Briggs On Mon, Jul 2, 2012 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Interestingly, I just logged the issue of it not showing the right error in the UI here: https://issues.apache.org/jira/browse/SOLR-3591 As for your specific issue, not sure, but the error should at least also show in the admin view. Erik On Jul 2, 2012, at 18:59 , Briggs Thompson wrote: Hi All, I just grabbed the latest version of trunk and am having a hard time getting it running properly in tomcat. It does work fine in Jetty. The admin screen gives the following error: This interface requires that you activate the admin request handlers, add the following configuration to your Solrconfig.xml I am pretty certain the front end error has nothing to do with the actual error. I have seen some other folks on the distro with the same problem, but none of the threads have a solution (that I could find). Below is the stack trace. I also tried with different versions of Lucene but none worked. Note: my index is EMPTY and I am not migrating over an index build with a previous version of lucene. I think I ran into this a while ago with an earlier version of trunk, but I don't recall doing anything to fix it. Anyhow, if anyone has an idea with this one, please let me know. Thanks! Briggs Thompson SEVERE: null:java.lang.NoSuchFieldError: LUCENE_50 at org.apache.solr.analysis.SynonymFilterFactory$1.createComponents(SynonymFilterFactory.java:83) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83) at org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:120) at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99) at org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70) at org.apache.solr.analysis.SynonymFilterFactory.loadSolrSynonyms(SynonymFilterFactory.java:131) at org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:93) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:584) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:112) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:510) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:282) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4649) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5305) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:963) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1600) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680)
Re: Trunk error in Tomcat
Also, I forgot to include this before, but there is a client side error which is a failed 404 request to the below URL. http://localhost:8983/solr/null/admin/system?wt=json On Tue, Jul 3, 2012 at 8:45 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Thanks Erik. If anyone else has any ideas about the NoSuchFieldError issue please let me know. Thanks! -Briggs On Mon, Jul 2, 2012 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.comwrote: Interestingly, I just logged the issue of it not showing the right error in the UI here: https://issues.apache.org/jira/browse/SOLR-3591 As for your specific issue, not sure, but the error should at least also show in the admin view. Erik On Jul 2, 2012, at 18:59 , Briggs Thompson wrote: Hi All, I just grabbed the latest version of trunk and am having a hard time getting it running properly in tomcat. It does work fine in Jetty. The admin screen gives the following error: This interface requires that you activate the admin request handlers, add the following configuration to your Solrconfig.xml I am pretty certain the front end error has nothing to do with the actual error. I have seen some other folks on the distro with the same problem, but none of the threads have a solution (that I could find). Below is the stack trace. I also tried with different versions of Lucene but none worked. Note: my index is EMPTY and I am not migrating over an index build with a previous version of lucene. I think I ran into this a while ago with an earlier version of trunk, but I don't recall doing anything to fix it. Anyhow, if anyone has an idea with this one, please let me know. Thanks! Briggs Thompson SEVERE: null:java.lang.NoSuchFieldError: LUCENE_50 at org.apache.solr.analysis.SynonymFilterFactory$1.createComponents(SynonymFilterFactory.java:83) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83) at org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:120) at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99) at org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70) at org.apache.solr.analysis.SynonymFilterFactory.loadSolrSynonyms(SynonymFilterFactory.java:131) at org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:93) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:584) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:112) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:510) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:282) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4649) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5305) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:963) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1600) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680)
Re: Trunk error in Tomcat
Wow! I didn't know 4.0 alpha was released today. I think I will just get that going. Woo!! On Tue, Jul 3, 2012 at 9:00 AM, Vadim Kisselmann v.kisselm...@gmail.comwrote: same problem here: https://mail.google.com/mail/u/0/?ui=2view=btopver=18zqbez0n5t35q=tomcat%20v.kisselmannqs=truesearch=queryth=13615cfb9a5064bdqt=kisselmann.1.tomcat.1.tomcat's.1.v.1cvid=3 https://issues.apache.org/jira/browse/SOLR-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230056#comment-13230056 i use an older solr-trunk version from february/march, it works. with newer versions from trunk i get the same error: This interface requires that you activate the admin request handlers... regards vadim 2012/7/3 Briggs Thompson w.briggs.thomp...@gmail.com: Also, I forgot to include this before, but there is a client side error which is a failed 404 request to the below URL. http://localhost:8983/solr/null/admin/system?wt=json On Tue, Jul 3, 2012 at 8:45 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Thanks Erik. If anyone else has any ideas about the NoSuchFieldError issue please let me know. Thanks! -Briggs On Mon, Jul 2, 2012 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Interestingly, I just logged the issue of it not showing the right error in the UI here: https://issues.apache.org/jira/browse/SOLR-3591 As for your specific issue, not sure, but the error should at least also show in the admin view. Erik On Jul 2, 2012, at 18:59 , Briggs Thompson wrote: Hi All, I just grabbed the latest version of trunk and am having a hard time getting it running properly in tomcat. It does work fine in Jetty. The admin screen gives the following error: This interface requires that you activate the admin request handlers, add the following configuration to your Solrconfig.xml I am pretty certain the front end error has nothing to do with the actual error. I have seen some other folks on the distro with the same problem, but none of the threads have a solution (that I could find). Below is the stack trace. I also tried with different versions of Lucene but none worked. Note: my index is EMPTY and I am not migrating over an index build with a previous version of lucene. I think I ran into this a while ago with an earlier version of trunk, but I don't recall doing anything to fix it. Anyhow, if anyone has an idea with this one, please let me know. Thanks! Briggs Thompson SEVERE: null:java.lang.NoSuchFieldError: LUCENE_50 at org.apache.solr.analysis.SynonymFilterFactory$1.createComponents(SynonymFilterFactory.java:83) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83) at org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:120) at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99) at org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70) at org.apache.solr.analysis.SynonymFilterFactory.loadSolrSynonyms(SynonymFilterFactory.java:131) at org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:93) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:584) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:112) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:510) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:282) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4649) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5305) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:963) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1600
DataImportHandler w/ multivalued fields
Hello Solr Community! I am implementing a data connection to Solr through the Data Import Handler and non-multivalued fields are working correctly, but multivalued fields are not getting indexed properly. I am new to DataImportHandler, but from what I could find, the entity is the way to go for multivalued field. The weird thing is that data is being indexed for one row, meaning first raw_tag gets populated. Anyone have any ideas? Thanks, Briggs This is the relevant part of the schema: field name =raw_tag type=text_en_lessAggressive indexed=true stored=false multivalued=true/ field name =raw_tag_string type=string indexed=false stored=true multivalued=true/ copyField source=raw_tag dest=raw_tag_string/ And the relevant part of data-import.xml: document name=merchant entity name=site query=select * from site field column=siteId name=siteId / field column=domain name=domain / field column=aliasFor name=aliasFor / field column=title name=title / field column=description name=description / field column=requests name=requests / field column=requiresModeration name=requiresModeration / field column=blocked name=blocked / field column=affiliateLink name=affiliateLink / field column=affiliateTracker name=affiliateTracker / field column=affiliateNetwork name=affiliateNetwork / field column=cjMerchantId name=cjMerchantId / field column=thumbNail name=thumbNail / field column=updateRankings name=updateRankings / field column=couponCount name=couponCount / field column=category name=category / field column=adult name=adult / field column=rank name=rank / field column=redirectsTo name=redirectsTo / field column=wwwRequired name=wwwRequired / field column=avgSavings name=avgSavings / field column=products name=products / field column=nameChecked name=nameChecked / field column=tempFlag name=tempFlag / field column=created name=created / field column=enableSplitTesting name=enableSplitTesting / field column=affiliateLinklock name=affiliateLinklock / field column=hasMobileSite name=hasMobileSite / field column=blockSite name=blockSite / entity name=merchant_tags pk=siteId query=select raw_tag, freetags.id, freetagged_objects.object_id as siteId from freetags inner join freetagged_objects on freetags.id=freetagged_objects.tag_id where freetagged_objects.object_id='${site.siteId}' field column=raw_tag name=raw_tag/ /entity /entity /document
Re: DataImportHandler w/ multivalued fields
In addition, I tried a query like below and changed the column definition to field column=raw_tag name=raw_tag splitBy=, / and still no luck. It is indexing the full content now but not multivalued. It seems like the splitBy ins't working properly. select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.* from site left outer join (freetags inner join freetagged_objects) on (freetags.id = freetagged_objects.tag_id and site.siteId = freetagged_objects.object_id) group by site.siteId Am I doing something wrong? Thanks, Briggs Thompson On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello Solr Community! I am implementing a data connection to Solr through the Data Import Handler and non-multivalued fields are working correctly, but multivalued fields are not getting indexed properly. I am new to DataImportHandler, but from what I could find, the entity is the way to go for multivalued field. The weird thing is that data is being indexed for one row, meaning first raw_tag gets populated. Anyone have any ideas? Thanks, Briggs This is the relevant part of the schema: field name =raw_tag type=text_en_lessAggressive indexed=true stored=false multivalued=true/ field name =raw_tag_string type=string indexed=false stored=true multivalued=true/ copyField source=raw_tag dest=raw_tag_string/ And the relevant part of data-import.xml: document name=merchant entity name=site query=select * from site field column=siteId name=siteId / field column=domain name=domain / field column=aliasFor name=aliasFor / field column=title name=title / field column=description name=description / field column=requests name=requests / field column=requiresModeration name=requiresModeration / field column=blocked name=blocked / field column=affiliateLink name=affiliateLink / field column=affiliateTracker name=affiliateTracker / field column=affiliateNetwork name=affiliateNetwork / field column=cjMerchantId name=cjMerchantId / field column=thumbNail name=thumbNail / field column=updateRankings name=updateRankings / field column=couponCount name=couponCount / field column=category name=category / field column=adult name=adult / field column=rank name=rank / field column=redirectsTo name=redirectsTo / field column=wwwRequired name=wwwRequired / field column=avgSavings name=avgSavings / field column=products name=products / field column=nameChecked name=nameChecked / field column=tempFlag name=tempFlag / field column=created name=created / field column=enableSplitTesting name=enableSplitTesting / field column=affiliateLinklock name=affiliateLinklock / field column=hasMobileSite name=hasMobileSite / field column=blockSite name=blockSite / entity name=merchant_tags pk=siteId query=select raw_tag, freetags.id, freetagged_objects.object_id as siteId from freetags inner join freetagged_objects on freetags.id=freetagged_objects.tag_id where freetagged_objects.object_id='${site.siteId}' field column=raw_tag name=raw_tag/ /entity /entity /document
Re: DataImportHandler w/ multivalued fields
Hey Rahul, Thanks for the response. I actually just figured it thankfully :). To answer your question, the raw_tag is indexed and not stored (tokenized), and then there is a copyField for raw_tag to raw_tag_string which would be used for facets. That *should have* been displayed in the results. The silly mistake I made was not camel casing multiValued, which is clearly the source of the problem. The second email I sent changing the query and using the split for the multivalued field had an error in it in the form of a missing line: transformer=RegexTransformer in the entity declaration. Anyhow, thanks for the quick response! Briggs On Thu, Dec 1, 2011 at 12:57 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Hi Briggs, By saying multivalued fields are not getting indexed prperly, do you mean to say that you are not able to search on those fields ? Have you tried actually searching your Solr index for those multivalued terms and make sure if it returns the search results ? One possibility could be that the multivalued fields are getting indexed correctly and are searchable. However, since your schema.xml has a raw_tag field whose stored attribute is set to false, you may not be able to see those fields. On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: In addition, I tried a query like below and changed the column definition to field column=raw_tag name=raw_tag splitBy=, / and still no luck. It is indexing the full content now but not multivalued. It seems like the splitBy ins't working properly. select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.* from site left outer join (freetags inner join freetagged_objects) on (freetags.id = freetagged_objects.tag_id and site.siteId = freetagged_objects.object_id) group by site.siteId Am I doing something wrong? Thanks, Briggs Thompson On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello Solr Community! I am implementing a data connection to Solr through the Data Import Handler and non-multivalued fields are working correctly, but multivalued fields are not getting indexed properly. I am new to DataImportHandler, but from what I could find, the entity is the way to go for multivalued field. The weird thing is that data is being indexed for one row, meaning first raw_tag gets populated. Anyone have any ideas? Thanks, Briggs This is the relevant part of the schema: field name =raw_tag type=text_en_lessAggressive indexed=true stored=false multivalued=true/ field name =raw_tag_string type=string indexed=false stored=true multivalued=true/ copyField source=raw_tag dest=raw_tag_string/ And the relevant part of data-import.xml: document name=merchant entity name=site query=select * from site field column=siteId name=siteId / field column=domain name=domain / field column=aliasFor name=aliasFor / field column=title name=title / field column=description name=description / field column=requests name=requests / field column=requiresModeration name=requiresModeration / field column=blocked name=blocked / field column=affiliateLink name=affiliateLink / field column=affiliateTracker name=affiliateTracker / field column=affiliateNetwork name=affiliateNetwork / field column=cjMerchantId name=cjMerchantId / field column=thumbNail name=thumbNail / field column=updateRankings name=updateRankings / field column=couponCount name=couponCount / field column=category name=category / field column=adult name=adult / field column=rank name=rank / field column=redirectsTo name=redirectsTo / field column=wwwRequired name=wwwRequired / field column=avgSavings name=avgSavings / field column=products name=products / field column=nameChecked name=nameChecked / field column=tempFlag name=tempFlag / field column=created name=created / field column=enableSplitTesting name=enableSplitTesting / field column=affiliateLinklock name=affiliateLinklock / field column=hasMobileSite name=hasMobileSite / field column=blockSite name=blockSite / entity name=merchant_tags pk=siteId query=select raw_tag, freetags.id, freetagged_objects.object_id as siteId from freetags inner join freetagged_objects on freetags.id=freetagged_objects.tag_id where freetagged_objects.object_id='${site.siteId}' field
Re: difference between shard and core in solr
I think everything you said is correct for static schemas, but a single core does not necessarily have a unique schema since you can have dynamic fields. With dynamic fields, you can have multiple types of documents in the same index (core), and a multiple types of indexed fields specific to individual document types - all in the same core. Briggs Thompson On Mon, Jul 18, 2011 at 2:22 AM, pravesh suyalprav...@yahoo.com wrote: a single core is an index with same schema , is this wat core really is ? YES. A single core is a independent index with its own unique schema. You go with a new core for cases where your schema/analysis/search requirements are completely different from your existing core(s). can a single core contain two separate indexes with different schema in it ? NO (for same reason as explained above). Is a shard refers to a collection of index in a single physical machine ?can a single core be presented in different shards ? You can think of a Shard as a big index distributed across a cluster of machines. So all shards belonging to a single core share same schema/analysis/search requirements. You go with sharding when index is not scalable on a single machine, or, when your index grows really big in size. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/difference-between-shard-and-core-in-solr-tp3178214p3178249.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with troublesome wildcard query
Hey Chris, Removing the ORs in each query might help narrow down the problem, but I suggest you run this through the query analyzer in order to see where it is dropping out. It is a great tool for troubleshooting issues like these. I see a few things here. - for leading wildcard queries, you should include the reverseWildcardFilterFactory. Check out the documentation here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory - Your result might get dropped out because you are trying to do wildcard searches on a stemmed field. Wildcard searches on a stemmed field is counter-intuitive because if you index computers, it may stem to comput, in which wildcard query of computer* would not match. - If you want to support stemming and wildcard searches, I suggest creating a copy field with an un-stemmed field type definition. Don't forget if you modify your field type definition, you need to re-index. In response to your question about text_ws, this is just a different field type definition that essentially splits on whiteSpaces. You should use that if that is what the desired search logic is, but it probably isn't. Check out the documentation on each of the tokenizers and filter factories in your text field type and see what you need and what you don't to satisfy your use cases. Hope that helps, Briggs Thompson On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato christopher.c...@minimedia.se wrote: Hi Briggs. Thanks for taking the time. I have the query nearly working now, currently this is how it looks when it matches on the title Super Technocrane 30 and others with similar names: INFO: [] webapp=/solr path=/select/ params={qf=title^40.0hl.fl=titlewt=jsonrows=10fl=*,scorestart=0q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)qt=standardfq=type:product+AND+language:sv} hits=3 status=0 QTime=1 Adding another letter stops it matching: INFO: [] webapp=/solr path=/select/ params={qf=title^40.0hl.fl=titlewt=jsonrows=10fl=*,scorestart=0q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)qt=standardfq=type:product+AND+language:sv} hits=0 status=0 QTime=0 The field type definitions are as follows: field name=title type=text indexed=true stored=true termVectors=true omitNorms=true/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType There is also a type definition that is called text_ws, should I use that instead and change text to text_ws in the field definition for title? !-- A text field that only splits on whitespace for exact matching of words -- fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer
Re: Need help with troublesome wildcard query
Hello Christopher, Can you provide the exact query sent to Solr for the one word query and also the two word query? The field type definition for your title field would be useful too. From what I understand, Solr should be able to handle your use case. I am guessing it is a problem with how the field is defined assuming the query is correct. Briggs Thompson On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato christopher.c...@minimedia.se wrote: Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal. I'm having some problems writing a query that matches a specific field on several words. I have implemented an AJAX search that basically takes whatever is in a form field and attempts to match documents. I'm not having much luck though. First word always matches correctly but as soon as I enter the second word I'm loosing matches, the third word doesn't give any matches at all. The title field that I'm searching contains a product name that may or may not have several words. The requirement is that the search should be progressive i.e. as the user inputs words I should always return results that contain all of the words entered. I also have to correct bad input like an erraneous space in the product name ex. product name instead of productname. I'm wondering if there isn't an easier way to query Solr? Ideally I'd want to say give me all docs that have the following text in it's titles Is that possible? I'd really appreciate any help! Regards, Christopher Cato
Hit Rate
Hello all, Is there a good way to get the hit count of a search? Example query: textField:solr AND documentId:1000 Say document with Id = 1000 has solr 13 times in the document. Any way to extract that number [13] in the response? I know we can return the score which is loosely related to hit counts via tf-idf, but for this case I need the actually hit counts. I believe you can get this information from the logs, but that is less useful if the use case is on the presentation layer. I tried faceting on the query but it seems like that returns the number of documents that query matches rather than the hit count. http://localhost:8080/solr/ExampleCore/select/?q=textField%3Asolr+AND+documentId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=textField:solrfacet.query=http://cobra:8080/solr/BusinessDescriptionCore/select/?q=businessDescription%3Afacebook+AND+businessDescriptionId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=businessDescriptionQuoted:facebookfacet.query=businessDescriptionQuoted:facebook textField:solrhttp://cobra:8080/solr/BusinessDescriptionCore/select/?q=businessDescription%3Afacebook+AND+businessDescriptionId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=businessDescriptionQuoted:facebookfacet.query=businessDescriptionQuoted:facebook I was thinking that highlighting essentially returns the hit count if you supply unlimited amount of snippets, but I imagine there must be a more elegant solution. Thanks in advance, Briggs
Re: Hit Rate
Yes indeed, that is what I was missing. Thanks Ahmet! On Tue, Jul 5, 2011 at 12:48 PM, Ahmet Arslan iori...@yahoo.com wrote: Is there a good way to get the hit count of a search? Example query: textField:solr AND documentId:1000 Say document with Id = 1000 has solr 13 times in the document. Any way to extract that number [13] in the response? Looks like you are looking for term frequency info: Two separate solutions: http://wiki.apache.org/solr/TermVectorComponent http://wiki.apache.org/solr/FunctionQuery#tf
Dynamic Fields vs. Multicore
Hi All, I was searching around for documentation of the performance differences of having a sharded, single schema, dynamic field set up vs. a multi-core, static multi-schema setup (which I currently have), but I have not had much luck finding what I am looking for. I understand commits and optimizes will be more intensive in a single core since there is more data (though I would offset by sharding heavily), but I am particularly curious about the search performance implications. I am interested in moving to the dynamic field setup in order to implement a better global search, but I want to make sure I understood the drawbacks of hitting those datasets individually and globally after they are merged (NOTE: I would have a global field signifying the dataset type, which could then be added to the filter query in order to create the subset for individual dataset queries). Some background about the data: it is extremely variable. Some documents contain only 2 or 3 sentences, and some are 20 page extracted PDFs. There would probably only be about 100-150 unique fields. Any input is greatly appreciated! Thanks, Briggs Thompson