Re: AutoSoftcommit option solr 4.0
Hi Shaveta, simple, index a doc and search for this ;) An soft commit stands for NearRealTimeSearch, It could take a couple of seconds to see this doc, but it should be there. Best regards Vadim 2012/11/26 Shaveta_Chawla shaveta.cha...@knimbus.com: I have migrated solr 3.6 to solr 4.0. I have implemented solr4.0's auto commit option by adding autoSoftCommit maxTime1000/maxTime /autoSoftCommit autoCommit maxTime6/maxTime openSearcherfalse/openSearcher /autoCommit these lines in solrconfig.xml. I am doing these changes on my local machine. I know what autosoftcommit features does but how can i check that the autocommit feature is working ok? -- View this message in context: http://lucene.472066.n3.nabble.com/AutoSoftcommit-option-solr-4-0-tp4022302.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AutoSoftcommit option solr 4.0
Hi Shaveta, simple, index a doc and search for this ;) An soft commit stands for NearRealTimeSearch, It could take a couple of seconds to see this doc, but it should be there. Best regards Vadim 2012/11/26 Shaveta_Chawla shaveta.cha...@knimbus.com: I have migrated solr 3.6 to solr 4.0. I have implemented solr4.0's auto commit option by adding autoSoftCommit maxTime1000/maxTime /autoSoftCommit autoCommit maxTime6/maxTime openSearcherfalse/openSearcher /autoCommit these lines in solrconfig.xml. I am doing these changes on my local machine. I know what autosoftcommit features does but how can i check that the autocommit feature is working ok? -- View this message in context: http://lucene.472066.n3.nabble.com/AutoSoftcommit-option-solr-4-0-tp4022302.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Out Of Memory =( Too many cores on one server?
Hi, your JVM need more RAM. My setup works well with 10 Cores, and 300mio. docs, Xmx8GB Xms8GB, 16GB for OS. But it's how Bernd mentioned, the memory consumption depends on the number of fields and the fieldCache. Best Regards Vadim 2012/11/16 Bernd Fehling bernd.fehl...@uni-bielefeld.de: I guess you should give JVM more memory. When starting to find a good value for -Xmx I oversized and set it to Xmx20G and Xms20G. Then I monitored the system and saw that JVM is between 5G and 10G (java7 with G1 GC). Now it is finally set to Xmx11G and Xms11G for my system with 1 core and 38 million docs. But JVM memory depends pretty much on number of fields in schema.xml and fieldCache (sortable fields). Regards Bernd Am 16.11.2012 09:29, schrieb stockii: Hello. if my server is running for a while i get some OOM Problems. I think the problem is, that i running to many cores on one Server with too many documents. this is my server concept: 14 cores. 1 with 30 million docs 1 with 22 million docs 1 with growing 25 million docs 1 with 67 million docs and the other cores are under 1 million docs. all these cores are running fine in one jetty and searching is very fast and we are satisfied with this. yesterday we got OOM. Do you think that we should outsource the big cores into another virtual instance of the server? so that the JVM not share the memory and going OOM? starting with: MEMORY_OPTIONS=-Xmx6g -Xms2G -Xmn1G
Re: Re: how solr4.0 and zookeeper run on weblogic
Hi, how your update/add command looks like? Regards Vadim 2012/10/18 rayvicky zongwei...@gmail.com: i make it work on weblogic. but when i add or update index ,it error 2012-10-17 ?Χ03?47·?3? CST Error HTTP Session BEA-100060 An unexpected error occurred while retrieving the session for Web application: weblogic.servlet.internal.WebAppServletContext@425eab87 - appName: 'solr', name: 'solr', context-path: '/solr', spec-version: '2.5'. weblogic.utils.NestedRuntimeException: Cannot parse POST parameters of request: '/solr/collection1/update' at weblogic.servlet.internal.ServletRequestImpl$RequestParameters.mergePostParams(ServletRequestImpl.java:2021) at weblogic.servlet.internal.ServletRequestImpl$RequestParameters.parseQueryParams(ServletRequestImpl.java:1901) at weblogic.servlet.internal.ServletRequestImpl$RequestParameters.peekParameter(ServletRequestImpl.java:2047) at weblogic.servlet.internal.ServletRequestImpl$SessionHelper.initSessionInfoWithContext(ServletRequestImpl.java:2602) at weblogic.servlet.internal.ServletRequestImpl$SessionHelper.initSessionInfo(ServletRequestImpl.java:2506) Truncated. see log file for complete stacktrace java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at weblogic.servlet.internal.PostInputStream.read(PostInputStream.java:142) at weblogic.utils.http.HttpChunkInputStream.readChunkSize(HttpChunkInputStream.java:109) at weblogic.utils.http.HttpChunkInputStream.initChunk(HttpChunkInputStream.java:71) Truncated. see log file for complete stacktrace 2012-10-17 ?Χ03?47·?3? CST Error HTTP BEA-101020 [weblogic.servlet.internal.WebAppServletContext@425eab87 - appName: 'solr', name: 'solr', context-path: '/solr', spec-version: '2.5'] Servlet failed with Exception java.lang.IllegalStateException: Failed to retrieve session: Cannot parse POST parameters of request: '/solr/collection1/update' at weblogic.servlet.security.internal.SecurityModule.getUserSession(SecurityModule.java:486) at weblogic.servlet.security.internal.ServletSecurityManager.checkAccess(ServletSecurityManager.java:81) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2116) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) Truncated. see log file for complete stacktrace 2012-10-17 ?Χ03?47·?3? CST Error Kernel BEA-000802 ExecuteRequest failed weblogic.utils.NestedRuntimeException: Cannot parse POST parameters of request: '/solr/collection1/update'. weblogic.utils.NestedRuntimeException: Cannot parse POST parameters of request: '/solr/collection1/update' at weblogic.servlet.internal.ServletRequestImpl$RequestParameters.mergePostParams(ServletRequestImpl.java:2021) at weblogic.servlet.internal.ServletRequestImpl$RequestParameters.parseQueryParams(ServletRequestImpl.java:1901) at weblogic.servlet.internal.ServletRequestImpl$RequestParameters.peekParameter(ServletRequestImpl.java:2047) at weblogic.servlet.internal.ServletRequestImpl$SessionHelper.initSessionInfoWithContext(ServletRequestImpl.java:2602) at weblogic.servlet.internal.ServletRequestImpl$SessionHelper.initSessionInfo(ServletRequestImpl.java:2506) Truncated. see log file for complete stacktrace java.io.IOException: Malformed chunk at weblogic.utils.http.HttpChunkInputStream.initChunk(HttpChunkInputStream.java:67) at weblogic.utils.http.HttpChunkInputStream.read(HttpChunkInputStream.java:142) at weblogic.utils.http.HttpChunkInputStream.read(HttpChunkInputStream.java:182) at weblogic.servlet.internal.ServletInputStreamImpl.read(ServletInputStreamImpl.java:222) at weblogic.servlet.internal.ServletRequestImpl$RequestParameters.mergePostParams(ServletRequestImpl.java:1995) Truncated. see log file for complete stacktrace how to handle it ? thanks, ray. 2012-10-18 zongweilei 发件人: Jan_Høydahl_/_Cominvent_[via_Lucene] 发送时间: 2012-10-17 23:13:10 收件人: rayvicky 抄送: 主题: Re: how solr4.0 and zookeeper run on weblogic Did it work for you? You probably also have to set -Djetty.port=8080 in order for local ZK not to be started on port 9983. It's confusing, but you can also edit solr.xml to achieve the same. -- Jan H酶ydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 17. okt. 2012 kl. 10:06 skrev rayvicky [hidden email]: thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-solr4-0-and-zookeeper-run-on-weblogic-tp4013882p4014167.html
Re: how solr4.0 and zookeeper run on weblogic
Hi, these are JAVA_OPTS params, you can find and set this stuff in the startManagedWeblogic script. Best regards Vadim 2012/10/16 rayvicky zongwei...@gmail.com: who can help me ? where to settings -DzkRun-Dbootstrap_conf=true -DzkHost=localhost:9080 -DnumShards=2 in weblogic -- View this message in context: http://lucene.472066.n3.nabble.com/how-solr4-0-and-zookeeper-run-on-weblogic-tp4013882.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7
Hi Rogerio, i can imagine what it is. Tomcat extract the war-files in /var/lib/tomcatXX/webapps. If you already run an older Solr-Version on your server, the old extracted Solr-war could still be there (keyword: tomcat cache). Delete the /var/lib/tomcatXX/webapps/solr - folder and restart tomcat, when Tomcat should put your new war-file. Best regards Vadim 2012/10/14 Rogerio Pereira rogerio.ara...@gmail.com: I'll try to be more specific Jack. I just download the apache-solr-4.0.0.zip, from this archive I took the core1 and core2 folders from multicore example and rename them to collection1 and collection2, I also did all necessary changes on solr.xml and solrconfig.xml and schema.xml on these two correct to reflect the new names. After this step I just tried to deploy and war file on tomcat pointing to the the directory (solr/home) where these two cores are located, solr.xml is there, with collection1 and collection2 properly configured. The question is, now matter what is contained on solr.xml, this file isn't read at Tomcat startup, I tried to cause a parser error on solr.xml by removing closing tags, but even with this change I can't get at least a parser error. I hope to be clear now. 2012/10/14 Jack Krupansky j...@basetechnology.com I can't quite parse the same multicore deployment as we have on apache solr 4.0 distribution archive. Could you rephrase and be more specific. What archive? Were you already using 4.0-ALPHA or BETA (or some snapshot of 4.0) or are you moving from pre-4.0 to 4.0? The directory structure did change in 4.0. Look at the example/solr directory. -- Jack Krupansky -Original Message- From: Rogerio Pereira Sent: Sunday, October 14, 2012 10:01 AM To: solr-user@lucene.apache.org Subject: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7 Hi, I tried to perform the same multicore deployment as we have on apache solr 4.0 distribution archive, I created a directory for solr/home with solr.xml inside and two subdirectories collection1 and collection2, these two cores are properly configured with conf folder and solrconfi.xml and schema.xml, on Tomcat I setup the system property pointing to solr/home path, unfortunatelly when I start tomcat the solr.xml is ignored and only the default collection1 is loaded. As a test, I made changes on solr.xml to cause parser errors, and guess what? These errors aren't reported on tomcat startup. The same thing doesn't happens on multicore example that comes on distribution archive, now I'm trying to figure out what's the black magic happening. Let me do the same kind of deployment on Windows and Mac OSX, if persist, I'll update this thread. Regards, Rogério -- Regards, Rogério Pereira Araújo Blogs: http://faces.eti.br, http://ararog.blogspot.com Twitter: http://twitter.com/ararog Skype: rogerio.araujo MSN: ara...@hotmail.com Gtalk/FaceTime: rogerio.ara...@gmail.com (0xx62) 8240 7212 (0xx62) 3920 2666
Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?
Hi Ahmet, thank you, it sounds great:) I will test it in the next days and give feedback. Best regards Vadim 2012/10/5 Ahmet Arslan iori...@yahoo.com: Hi Vadim, I attached a zip (solr plugin) file to SOLR-1604. This not a patch. This is supposed to work with solr 4.0. Some tests fails but it should work with pol* tel*~5 types of queries. Ahmet --- On Thu, 9/27/12, Vadim Kisselmann v.kisselm...@gmail.com wrote: From: Vadim Kisselmann v.kisselm...@gmail.com Subject: Re: Proximity(tilde) combined with wildcard, AutomatonQuery ? To: solr-user@lucene.apache.org Date: Thursday, September 27, 2012, 10:38 AM Hi Ahmet, thanks for your reply:) I see that it does not come with the 4.0 release, because the given patches do not work with this version. Right? Best regards Vadim 2012/9/26 Ahmet Arslan iori...@yahoo.com: we assume i have a simple query like this with wildcard and tilde: japa* fukushima~10 instead of japan fukushima~10 OR japanese fukushima~10, etc. Do we have a solution in Solr 4.0 to work with these kind of queries? Vadim, two open jira issues: https://issues.apache.org/jira/browse/SOLR-1604 https://issues.apache.org/jira/browse/LUCENE-1486
Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?
Hi Ahmet, thanks for your reply:) I see that it does not come with the 4.0 release, because the given patches do not work with this version. Right? Best regards Vadim 2012/9/26 Ahmet Arslan iori...@yahoo.com: we assume i have a simple query like this with wildcard and tilde: japa* fukushima~10 instead of japan fukushima~10 OR japanese fukushima~10, etc. Do we have a solution in Solr 4.0 to work with these kind of queries? Vadim, two open jira issues: https://issues.apache.org/jira/browse/SOLR-1604 https://issues.apache.org/jira/browse/LUCENE-1486
Re: How to run Solr Cloud using Tomcat?
Hi Roy, jepp, it works with Tomcat 6 and an external Zookeeper. I will publish a blogpost about it tomorrow on sentric.ch My blogpost is ready, but i had no time to publish it in the last couple of days:) Best regards Vadim 2012/9/27 Markus Jelsma markus.jel...@openindex.io: Hi - on Debian systems there's a /etc/default/tomcat properties file you can use to set your flags. -Original message- From:Benjamin, Roy rbenja...@ebay.com Sent: Thu 27-Sep-2012 19:57 To: solr-user@lucene.apache.org Subject: How to run Solr Cloud using Tomcat? I've gone through the guide on running Solr Cloud using Jetty but it's not practical to use JAVA_OPTS etc on real cloud deployments. I don't see how to extend these instructions to running on Tomcat. Has anyone run Solr Cloud under Tomcat successfully? Did they document how? Thanks Roy
Proximity(tilde) combined with wildcard, AutomatonQuery ?
Hi guys, we assume i have a simple query like this with wildcard and tilde: japa* fukushima~10 instead of japan fukushima~10 OR japanese fukushima~10, etc. Do we have a solution in Solr 4.0 to work with these kind of queries? Does the AutomatonQuery/Filter cover this case? Best regards Vadim
Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20
Hi Claudio, great to hear that it works. Everyone can edit the wiki, you need only to login. Regards Vadim 2012/8/27 Claudio Ranieri claudio.rani...@estadao.com: I solved the problem. I added the parameter sharedLib=lib in $SOLR_HOME/solr.xml (solr persistent=true sharedLib=lib) and moved all jars into $TOMCAT_HOME/webapps/solr/WEB-INF/lib to $SOLR_HOME/lib This information could be included in the wiki Solr / Tomcat. Claudio Ranieri | Especialista Sistemas de Busca | S.A O Estado de S.Paulo Av. Eng. Caetano Álvares, 55 - Limão - São Paulo - SP - 02598-900 + 55 11 3856-5790 | + 55 11 9344-2674 -Mensagem original- De: Claudio Ranieri [mailto:claudio.rani...@estadao.com] Enviada em: segunda-feira, 27 de agosto de 2012 10:34 Para: solr-user@lucene.apache.org Assunto: RES: Problem to start solr-4.0.0-BETA with tomcat-6.0.20 Can anyone help me? -Mensagem original- De: Claudio Ranieri [mailto:claudio.rani...@estadao.com] Enviada em: sexta-feira, 24 de agosto de 2012 11:40 Para: solr-user@lucene.apache.org Assunto: RES: Problem to start solr-4.0.0-BETA with tomcat-6.0.20 Hi Vadim, No, I used the entire apache-solr-4.0.0-BETA\example\solr (schema.xml, solrconfig.xml ...) -Mensagem original- De: Vadim Kisselmann [mailto:v.kisselm...@gmail.com] Enviada em: sexta-feira, 24 de agosto de 2012 07:26 Para: solr-user@lucene.apache.org Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20 a presumption: do you use your old solrconfig.xml files from older installations? when yes, compare the default config and yours. 2012/8/23 Claudio Ranieri claudio.rani...@estadao.com: I made this instalation on a new tomcat. With Solr 3.4.*, 3.5.*, 3.6.* works with jars into $TOMCAT_HOME/webapps/solr/WEB-INF/lib, but with solr 4.0 beta doesn´t work. I needed to add the jars into $TOMCAT_HOME/lib. The problem with the cast seems to be in the source code. -Mensagem original- De: Karthick Duraisamy Soundararaj [mailto:karthick.soundara...@gmail.com] Enviada em: quinta-feira, 23 de agosto de 2012 09:22 Para: solr-user@lucene.apache.org Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20 Not sure if this can help. But once I had a similar problem with Solr 3.6.0 where tomcat refused to find one of the classes that existed. I deleted the tomcat's webapp directory and then it worked fine. On Thu, Aug 23, 2012 at 8:19 AM, Erick Erickson erickerick...@gmail.comwrote: First, I'm no Tomcat expert here's the Tomcat Solr page, but you've probably already seen it: http://wiki.apache.org/solr/SolrTomcat But I'm guessing that you may have old jars around somewhere and things are getting confused. I'd blow away the whole thing and start over, whenever I start copying jars around I always lose track of what's where. Have you successfully had any other Solr operate under Tomcat? Sorry I can't be more help Erick On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri claudio.rani...@estadao.com wrote: Hi, I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not work. I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps. Then I copied the directory apache-solr-4.0.0-BETA\example\solr to C:\home\solr-4.0-beta and adjusted the file $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to point the solr/home to C:/home/solr-4.0-beta. With this configuration, when I startup tomcat I got: SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32, LUCENE_33, LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in format 'VV' So I changed the line in solrconfig.xml: luceneMatchVersionLUCENE_40/luceneMatchVersion to luceneMatchVersionLUCENE_CURRENT/luceneMatchVersion So I got a new error: Caused by: java.lang.ClassNotFoundException: solr.NRTCachingDirectoryFactory This class is within the file apache-solr-core-4.0.0-BETA.jar but for some reason classloader of the class is not loaded. I then moved all jars in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to $TOMCAT_HOME\lib. After this setup, I got a new error: SEVERE: java.lang.ClassCastException: org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to org.apache.solr.core.DirectoryFactory So I changed the line in solrconfig.xml: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ to directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/ So I got a new error: Caused by: java.lang.ClassCastException: org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to org.apache.solr.spelling.SolrSpellChecker How can I resolve the problem of classloader? How can I
Re: flush (delete all document) solr 4 Beta
your docs are marked as deleted. you should optimize after commit, then they will be really deleted. it's easier and faster to stop your jetty/tomcat, drop your index directory and start your servlet container... when it's not possible, then optimize. regards Vadim 2012/8/27 Jamel ESSOUSSI jamel.essou...@gmail.com: Hi, I should flush solr (delete all existing documents) -- for doing this, I have the following code: HttpSolrServer server = HttpSolrServer(url); server.setSoTimeout(1000); server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); server.setAllowCompression(true); server.setMaxRetries(1); server.setParser(new XMLResponseParser()); UpdateResponse ur = server.deleteByQuery(*:*); server.commit(true, true); In the result, I hace already all document, -- the ur.getStatus() eq 0 and the solr documents was not deleted -- I have'nt server or client errors Can you explain me why it did not work, Thinks -- View this message in context: http://lucene.472066.n3.nabble.com/flush-delete-all-document-solr-4-Beta-tp4003434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20
a presumption: do you use your old solrconfig.xml files from older installations? when yes, compare the default config and yours. 2012/8/23 Claudio Ranieri claudio.rani...@estadao.com: I made this instalation on a new tomcat. With Solr 3.4.*, 3.5.*, 3.6.* works with jars into $TOMCAT_HOME/webapps/solr/WEB-INF/lib, but with solr 4.0 beta doesn´t work. I needed to add the jars into $TOMCAT_HOME/lib. The problem with the cast seems to be in the source code. -Mensagem original- De: Karthick Duraisamy Soundararaj [mailto:karthick.soundara...@gmail.com] Enviada em: quinta-feira, 23 de agosto de 2012 09:22 Para: solr-user@lucene.apache.org Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20 Not sure if this can help. But once I had a similar problem with Solr 3.6.0 where tomcat refused to find one of the classes that existed. I deleted the tomcat's webapp directory and then it worked fine. On Thu, Aug 23, 2012 at 8:19 AM, Erick Erickson erickerick...@gmail.comwrote: First, I'm no Tomcat expert here's the Tomcat Solr page, but you've probably already seen it: http://wiki.apache.org/solr/SolrTomcat But I'm guessing that you may have old jars around somewhere and things are getting confused. I'd blow away the whole thing and start over, whenever I start copying jars around I always lose track of what's where. Have you successfully had any other Solr operate under Tomcat? Sorry I can't be more help Erick On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri claudio.rani...@estadao.com wrote: Hi, I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not work. I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps. Then I copied the directory apache-solr-4.0.0-BETA\example\solr to C:\home\solr-4.0-beta and adjusted the file $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to point the solr/home to C:/home/solr-4.0-beta. With this configuration, when I startup tomcat I got: SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32, LUCENE_33, LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in format 'VV' So I changed the line in solrconfig.xml: luceneMatchVersionLUCENE_40/luceneMatchVersion to luceneMatchVersionLUCENE_CURRENT/luceneMatchVersion So I got a new error: Caused by: java.lang.ClassNotFoundException: solr.NRTCachingDirectoryFactory This class is within the file apache-solr-core-4.0.0-BETA.jar but for some reason classloader of the class is not loaded. I then moved all jars in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to $TOMCAT_HOME\lib. After this setup, I got a new error: SEVERE: java.lang.ClassCastException: org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to org.apache.solr.core.DirectoryFactory So I changed the line in solrconfig.xml: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ to directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/ So I got a new error: Caused by: java.lang.ClassCastException: org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to org.apache.solr.spelling.SolrSpellChecker How can I resolve the problem of classloader? How can I resolve the problem of cast of NRTCachingDirectoryFactory and DirectSolrSpellChecker? I can not startup the solr 4.0 beta with tomcat. Thanks, -- -- Karthick D S Master's in Computer Engineering ( Software Track ) Syracuse University Syracuse - 13210 New York United States of America
Does SolrEntityProcessor fulfill my requirements?
Hi folks, i have this case: i want to update my solr 4.0 from trunk to solr 4.0 alpha. the index structure has changed, i can't replicate. 10 cores are in use, each with 30Mio docs. We assume that all fields are stored and indexed. What is the best way to export the docs from all cores on one machine with solr 4.0trunk to same named cores on other machine with solr 4.0 alpha. SolrEntityProcessor can be one solution, but does it work with this size of data? I want reindex all docs at once and not in small parts. I find no examples of bigger reindex-attempts with SolrEntityProcessor. Xslt as option two? What were the best solution to do this, what do you think? Best Regards Vadim
Re: Pb installation Solr/Tomcat6
same problem. but here should tomcat6 have the right to read/write your index. regards vadim 2012/7/14 Bruno Mannina bmann...@free.fr: I found the problem I think, It was a permission problem on the schema.xml schema.xml was only readable by the solr user. Now I have the same problem with the solr index directory Le 14/07/2012 14:00, Bruno Mannina a écrit : Dear Solr users, I try to run solr/ with tomcat but I have always this error: Can't find resource 'schema.xml' in classpath or '/home/solr/apache-solr-3.6.0/example/solr/./conf/', cwd='/var/lib/tomcat6 but schema.xml is inside the directory '/home/solr/apache-solr-3.6.0/example/solr/./conf/' http://localhost:8080/manager/html = works fine, I see Applications /solr, fonctionnelle True but when I click on solr/ (http://localhost:8080/solr/) I get this error. Could you help me to solve this problem, it makes me crazy. thanks a lot, Bruno Tomcat6 Ubuntu 12.04 Solr 3.6
Re: Trunk error in Tomcat
it works, with a few changes :) I think we don't need a new issue in jura. Solr 4.0 is no longer Solr 4.0 since late february. There were some changes in solrconfig.xml in this time. I migrate my solr 4.0 trunk-config, which works till late february in a new config from 4.0 alpha. A couple of changes which i noticed: - abortOnConfigurationError:true is gone - luceneMatchVersion was changed to LUCENE_50 - a couple of new jars included for velocity and lang - new directory Factory = solr.directoryFactory:solr.NRTCachingDirectoryFactory - indexDefaults replaced by indexConfig - updateLog added - replication Handler for SoldCloud added - Names for handlers were changed like /select for search - new Handler added requestHandler name=/get class=solr.RealTimeGetHandler and so on... This AdminHandler-Exception is still there, when i use the clusteringComponent, see here: SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' But if i comment it out, Solr starts without errors. The paths to the clustering jar in ../contrib/clustering/lib/ is correct and the needed jars are there, eventually we need new jar-files? Best regards Vadim 2012/7/5 Stefan Matheis matheis.ste...@googlemail.com: Great, thanks Vadim On Thursday, July 5, 2012 at 9:34 AM, Vadim Kisselmann wrote: Hi Stefan, ok, i would test the latest version from trunk with tomcat in next days and open an new issue:) regards Vadim 2012/7/3 Stefan Matheis matheis.ste...@googlemail.com (mailto:matheis.ste...@googlemail.com): On Tuesday, July 3, 2012 at 8:10 PM, Vadim Kisselmann wrote: sorry, i overlooked your latest comment with the new issue in SOLR-3238 ;) Should i open an new issue? NP Vadim, yes a new Issue would help .. all available Information too :)
Re: Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
Hi Simon, i checked my log files one more time to get the error timestamps. I get the first Error at 14:37: 06.07.2012 14:37:52 org.apache.solr.common.SolrException log SCHWERWIEGEND: null:ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:358) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:323) Next one, and the first Java heap Space error at 17:35: 06.07.2012 17:35:36 org.apache.solr.common.SolrException log SCHWERWIEGEND: null:java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.init(FreqProxTermsWriterPerField.java:248) at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:269) at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48) at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:307) at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:330) Commit failure a couple of seconds later: 06.07.2012 17:35:38 org.apache.solr.common.SolrException log SCHWERWIEGEND: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2650) follow by 10 Java heap space Exceptions, and one minute later at 17:36 the first auto-warming Exception: 06.07.2012 17:36:26 org.apache.solr.common.SolrException log SCHWERWIEGEND: Error during auto-warming of key:pubDate:[1340971496000 TO 1341576296000]:java.lang.OutOfMemoryError: Java heap space 06.07.2012 17:36:28 org.apache.solr.common.SolrException log SCHWERWIEGEND: Error during auto-warming of key:pubDate:[1340971495000 TO 1341576295000]:java.lang.OutOfMemoryError: Java heap space it really seems that you are hitting an OOM during auto warming. can this be the case for your failure. Can you raise the JVM memory and see if you still hit the spike and go OOM? this is very unlikely a IndexWriter problem. I'd rather look at your warmup queries ie. fieldcache, FieldValueCache usage. Are you sorting / facet on anything? Auto warming problems began one minute after the java heap-exceptions, so i think this are subsequent problems. I configured very small caches(max. sizes between 512 and 2048) for my use case. Warming queries looks like this, with sorting, but without facetting: lst str name=qag/str str name=fqpubDate:[NOW-1DAY TO *]/str str name=sortpubDate desc/str /lst Du you think that 8GB for JVM are not enough? To raise the JVM memory can solve the problem.. As mentioned, this server runs a long time with the same config without problems, i am surprised that this problem was there at one time without heavy usage...now it's running smoothly again after restart yesterday, so i don't know whet the problem appears again. I try to update to 4.0 alpha today and run it with tomcat and report:) Best regards Vadim 2012/7/10 Simon Willnauer simon.willna...@gmail.com: it really seems that you are hitting an OOM during auto warming. can this be the case for your failure. Can you raise the JVM memory and see if you still hit the spike and go OOM? this is very unlikely a IndexWriter problem. I'd rather look at your warmup queries ie. fieldcache, FieldValueCache usage. Are you sorting / facet on anything? simon On Tue, Jul 10, 2012 at 4:49 PM, Vadim Kisselmann v.kisselm...@gmail.com wrote: Hi Robert, Can you run Lucene's checkIndex tool on your index? No, unfortunately not. This Solr should run without stoppage, an tomcat-restart is ok, but not more:) I tested newer trunk-versions a couple of months ago, but they fail all with tomcat. i would test 4.0-alpha in next days with tomcat and open an jira-issue if it doesn't work with it. do you have another exception in your logs? To my knowledge, in all cases that IndexWriter throws an OutOfMemoryError, the original OutOfMemoryError is also rethrown (not just this IllegalStateException noting that at some point, it hit OOM. Hmm, i checked older logs and found something new, what i have not seen in VisualVM. Java heap space-Problems, just before OOM. My JVM has 8GB -Xmx/-Xms, 16GB for OS, nothing else on this machine. This Errors pop up's during normal run according logs, no optimizes, high loads(max. 30 queries per minute) or something special at this time. SCHWERWIEGEND: null:ClientAbortException: java.net.SocketException: Broken pipe SCHWERWIEGEND: null:java.lang.OutOfMemoryError: Java heap space SCHWERWIEGEND: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit SCHWERWIEGEND: Error during auto-warming of key:org.apache.solr.search.QueryResultKey@7cba935e:java.lang.OutOfMemoryError: Java
Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
Hi folks, my Test-Server with Solr 4.0 from trunk(version 1292064 from late february) throws this exception... auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2650) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2804) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2786) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:391) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:197) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) My Server has 24GB RAM, 8GB for JVM. I index round about 20 docs per seconds, my index is small with 10Mio docs. It runs about a couple of weeks and then suddenly i get this errors.. I can't see any problems in VisualVM with my GC. It's all ok, memory consumption is about 6GB, no swapping, no i/o problems..it's all green:) What's going on on this machine?:) My uncommitted docs are gone, right? Best regards Vadim
Re: Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
Hi Robert, Can you run Lucene's checkIndex tool on your index? No, unfortunately not. This Solr should run without stoppage, an tomcat-restart is ok, but not more:) I tested newer trunk-versions a couple of months ago, but they fail all with tomcat. i would test 4.0-alpha in next days with tomcat and open an jira-issue if it doesn't work with it. do you have another exception in your logs? To my knowledge, in all cases that IndexWriter throws an OutOfMemoryError, the original OutOfMemoryError is also rethrown (not just this IllegalStateException noting that at some point, it hit OOM. Hmm, i checked older logs and found something new, what i have not seen in VisualVM. Java heap space-Problems, just before OOM. My JVM has 8GB -Xmx/-Xms, 16GB for OS, nothing else on this machine. This Errors pop up's during normal run according logs, no optimizes, high loads(max. 30 queries per minute) or something special at this time. SCHWERWIEGEND: null:ClientAbortException: java.net.SocketException: Broken pipe SCHWERWIEGEND: null:java.lang.OutOfMemoryError: Java heap space SCHWERWIEGEND: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit SCHWERWIEGEND: Error during auto-warming of key:org.apache.solr.search.QueryResultKey@7cba935e:java.lang.OutOfMemoryError: Java heap space SCHWERWIEGEND: org.apache.solr.common.SolrException: Internal Server Error SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Internal Server Error I knew this failures when i work on virtual machines with solr 1.4, big indexes and ridiculous small -Xmx sizes. But on real hardware, with enough RAM, fast disks/cpu's it's new for me:) Best regards Vadim
Re: Trunk error in Tomcat
Hi Stefan, ok, i would test the latest version from trunk with tomcat in next days and open an new issue:) regards Vadim 2012/7/3 Stefan Matheis matheis.ste...@googlemail.com: On Tuesday, July 3, 2012 at 8:10 PM, Vadim Kisselmann wrote: sorry, i overlooked your latest comment with the new issue in SOLR-3238 ;) Should i open an new issue? NP Vadim, yes a new Issue would help .. all available Information too :)
Re: Trunk error in Tomcat
same problem here: https://mail.google.com/mail/u/0/?ui=2view=btopver=18zqbez0n5t35q=tomcat%20v.kisselmannqs=truesearch=queryth=13615cfb9a5064bdqt=kisselmann.1.tomcat.1.tomcat's.1.v.1cvid=3 https://issues.apache.org/jira/browse/SOLR-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230056#comment-13230056 i use an older solr-trunk version from february/march, it works. with newer versions from trunk i get the same error: This interface requires that you activate the admin request handlers... regards vadim 2012/7/3 Briggs Thompson w.briggs.thomp...@gmail.com: Also, I forgot to include this before, but there is a client side error which is a failed 404 request to the below URL. http://localhost:8983/solr/null/admin/system?wt=json On Tue, Jul 3, 2012 at 8:45 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Thanks Erik. If anyone else has any ideas about the NoSuchFieldError issue please let me know. Thanks! -Briggs On Mon, Jul 2, 2012 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.comwrote: Interestingly, I just logged the issue of it not showing the right error in the UI here: https://issues.apache.org/jira/browse/SOLR-3591 As for your specific issue, not sure, but the error should at least also show in the admin view. Erik On Jul 2, 2012, at 18:59 , Briggs Thompson wrote: Hi All, I just grabbed the latest version of trunk and am having a hard time getting it running properly in tomcat. It does work fine in Jetty. The admin screen gives the following error: This interface requires that you activate the admin request handlers, add the following configuration to your Solrconfig.xml I am pretty certain the front end error has nothing to do with the actual error. I have seen some other folks on the distro with the same problem, but none of the threads have a solution (that I could find). Below is the stack trace. I also tried with different versions of Lucene but none worked. Note: my index is EMPTY and I am not migrating over an index build with a previous version of lucene. I think I ran into this a while ago with an earlier version of trunk, but I don't recall doing anything to fix it. Anyhow, if anyone has an idea with this one, please let me know. Thanks! Briggs Thompson SEVERE: null:java.lang.NoSuchFieldError: LUCENE_50 at org.apache.solr.analysis.SynonymFilterFactory$1.createComponents(SynonymFilterFactory.java:83) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83) at org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:120) at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99) at org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70) at org.apache.solr.analysis.SynonymFilterFactory.loadSolrSynonyms(SynonymFilterFactory.java:131) at org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:93) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:584) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:112) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:510) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:282) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4649) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5305) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:963) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1600) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at
Re: Trunk error in Tomcat
Hi Stefan, sorry, i overlooked your latest comment with the new issue in SOLR-3238 ;) Should i open an new issue? I´m not testing it with newer trunk-versions about a couple of months because solr cloud with an external ZK and tomcat fails too, but i can do it and post all the errors which i find in my log files. Regards Vadim 2012/7/3 Stefan Matheis matheis.ste...@googlemail.com: Hey Vadim Right now JIRA is Down for Maintenance, but afaik there was another comment asking for more informations. I'll check Eric's Issue today or tomorrow and see how we can handle (and hopefully fix) that. Regards Stefan On Tuesday, July 3, 2012 at 4:00 PM, Vadim Kisselmann wrote: same problem here: https://mail.google.com/mail/u/0/?ui=2view=btopver=18zqbez0n5t35q=tomcat%20v.kisselmannqs=truesearch=queryth=13615cfb9a5064bdqt=kisselmann.1.tomcat.1.tomcat's.1.v.1cvid=3 https://issues.apache.org/jira/browse/SOLR-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230056#comment-13230056 i use an older solr-trunk version from february/march, it works. with newer versions from trunk i get the same error: This interface requires that you activate the admin request handlers... regards vadim
Re: Dismax Question
in your schema.xml you can set the default query parser operator, in your case solrQueryParser defaultOperator=AND/, but it's deprecated. When you use the edismax, read this:http://drupal.org/node/1559394 . mm-param is here the answer. Best regards Vadim 2012/7/2 Steve Fatula compconsult...@yahoo.com: Let's say a user types in: DualHead2Go The way solr is working, it splits this into: Dual Head 2 Go And searches the index for various fields, and finds records where any ONE of them matches. Now, if I simply type the search terms Dual Head 2 Go, it finds records where ALL of them match. This is because we set q.op to AND. Recently, we went from Solr 3.4 to 3.6, and, 3.4 used to work ok, 3.6 seems o behave differently, or, perhaps we mucked something up. So, my question is how do we get Solr search to work with AND when it is splitting words? The splitting part is good, the bad part is that it is searching for any one of those split words. Steve
Solr 1.4, slaves hang after replication from an just optimized master
Hi folks, i have to look for an old live system with solr 1.4. When i optimize an bigger index with round about 200GB(after optimize and cut, 100GB) and my slaves replicate the newest version after(!) optimize, they hang(all) with 100% in replication and they have at once circa 300GB index sizes. After a couple of seconds i have to restart my Tomcat, because the slaves are no longer be able to response on queries... Ironically, they have the same number of segments like master, i can't see errors in my logfile and the server load is normal. What's wrong here? :) Normal HTTP Replication is used, this params are set on master: str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=replicateAfteroptimize/str Any ideas? Best regards Vadim
Re: Solr 1.4, slaves hang after replication from an just optimized master
Forget to mention: After Tomcat-restart, the slaves still have an index with 300GB. After an manual replication command in UI, 100GB like master in a couple of seconds and all is ok. 2012/6/19 Vadim Kisselmann v.kisselm...@googlemail.com: Hi folks, i have to look for an old live system with solr 1.4. When i optimize an bigger index with round about 200GB(after optimize and cut, 100GB) and my slaves replicate the newest version after(!) optimize, they hang(all) with 100% in replication and they have at once circa 300GB index sizes. After a couple of seconds i have to restart my Tomcat, because the slaves are no longer be able to response on queries... Ironically, they have the same number of segments like master, i can't see errors in my logfile and the server load is normal. What's wrong here? :) Normal HTTP Replication is used, this params are set on master: str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=replicateAfteroptimize/str Any ideas? Best regards Vadim
Re: Poll: What do you use for Solr performance monitoring?
Hi Otis, done :) Till now we use Graphite, Ganglia and Zabbix. For our JVM monitoring JStatsD. Best regards Vadim 2012/5/31 Otis Gospodnetic otis_gospodne...@yahoo.com: Hi, Super quick poll: What do you use for Solr performance monitoring? Vote here: http://blog.sematext.com/2012/05/30/poll-what-do-you-use-for-solr-performance-monitoring/ I'm collecting data for my Berlin Buzzwords talk that will touch on Solr, so your votes will be greatly appreciated! Thanks, Otis
Re: Weird query results with edismax and boolean operator +
Hi Jan, thanks for your response! My qf parameter for edismax is: title. My defaultSearchField=text in schema.xml. In my app i generate a query with qf=title,text, so i think the default parameters in config/schema should bei overridden, right? I found eventually 2 reasons for this behavior. 1. mm-parameter in solrconfig.xml for edismax is 0. 0 stands for OR, but it should be an AND = 100%. 2. I suppose that my app does not override my default-qf. I test it today and report, with my parsed query and all params. Best regards Vadim 2012/4/29 Jan Høydahl jan@cominvent.com: Hi, What is your qf parameter? Can you run the three queries with debugQuery=trueechoParams=all and attach parsed query and all params? It will probably explain what is happening. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 27. apr. 2012, at 11:21, Vadim Kisselmann wrote: Hi folks, i use solr 4.0 from trunk, and edismax as standard query handler. In my schema i defined this: solrQueryParser defaultOperator=AND/ I have this simple problem: nascar +author:serg* (3500 matches) +nascar +author:serg* (1 match) nascar author:serg* (5200 matches) nascar AND author:serg* (1 match) I think i understand the query syntax, but this behavior confused me. Why this match-differences? By the way, i get in all matches at least one of my terms. But not always both. Best regards Vadim
Re: Weird query results with edismax and boolean operator +
I tested it. With default qf=title text in solrconfig and mm=100% i get the same result(1) for nascar AND author:serg* and +nascar +author:serg*, great. With nascar +author:serg* i get 3500 matches, in this case the mm-parameter seems not to work. Here are my debug params for nascar AND author:serg*: /strstr name=querystringnascar AND author:serg*/str str name=parsedquery(+(+DisjunctionMaxQuery((text:nascar | title:nascar)~0.01) +author:serg*))/no_coord/str str name=parsedquery_toString+(+(text:nascar | title:nascar)~0.01 +author:serg*)/strlst name=explainstr name=com.bostonherald/news/international/europe/view/20120409russia_allows_anti-putin_demonstration_in_red_square 8.235954 = (MATCH) sum of: 8.10929 = (MATCH) max plus 0.01 times others of: 8.031613 = (MATCH) weight(text:nascar in 0) [DefaultSimilarity], result of: 8.031613 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.84814763 = queryWeight, product of: 6.6960144 = idf(docFreq=27, maxDocs=8335) 0.12666455 = queryNorm 9.469594 = fieldWeight in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 6.6960144 = idf(docFreq=27, maxDocs=8335) 1.0 = fieldNorm(doc=0) 7.7676363 = (MATCH) weight(title:nascar in 0) [DefaultSimilarity], result of: 7.7676363 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 0.9919093 = queryWeight, product of: 7.830994 = idf(docFreq=8, maxDocs=8335) 0.12666455 = queryNorm 7.830994 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 7.830994 = idf(docFreq=8, maxDocs=8335) 1.0 = fieldNorm(doc=0) 0.12666455 = (MATCH) ConstantScore(author:serg*), product of: 1.0 = boost 0.12666455 = queryNorm /str/lst And here for nascar +author:serg*: str name=querystringnascar +author:serg*/str str name=parsedquery(+(DisjunctionMaxQuery((text:nascar | title:nascar)~0.01) +author:serg*))/no_coord/str str name=parsedquery_toString+((text:nascar | title:nascar)~0.01 +author:serg*)/strlst name=explainstr name=com.bostonherald/news/international/europe/view/20120409russia_allows_anti-putin_demonstration_in_red_square 8.235954 = (MATCH) sum of: 8.10929 = (MATCH) max plus 0.01 times others of: 8.031613 = (MATCH) weight(text:nascar in 0) [DefaultSimilarity], result of: 8.031613 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.84814763 = queryWeight, product of: 6.6960144 = idf(docFreq=27, maxDocs=8335) 0.12666455 = queryNorm 9.469594 = fieldWeight in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 6.6960144 = idf(docFreq=27, maxDocs=8335) 1.0 = fieldNorm(doc=0) 7.7676363 = (MATCH) weight(title:nascar in 0) [DefaultSimilarity], result of: 7.7676363 = score(doc=0,freq=1.0 = termFreq=1.0 ), product of: 0.9919093 = queryWeight, product of: 7.830994 = idf(docFreq=8, maxDocs=8335) 0.12666455 = queryNorm 7.830994 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 7.830994 = idf(docFreq=8, maxDocs=8335) 1.0 = fieldNorm(doc=0) 0.12666455 = (MATCH) ConstantScore(author:serg*), product of: 1.0 = boost 0.12666455 = queryNorm /str str name=mx.com.elsiglodetorreon/noticia/727525.sacerdotas.html 0.063332275 = (MATCH) product of: 0.12666455 = (MATCH) sum of: 0.12666455 = (MATCH) ConstantScore(author:serg*), product of: 1.0 = boost 0.12666455 = queryNorm 0.5 = coord(1/2) /str You can see, that for first doc in nascar +author:serg* all query-params match, but in the second doc only ConstantScore(author:serg*). But with an mm=100% all query-params should match. http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/ http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html Best regards Vadim 2012/4/30 Vadim Kisselmann v.kisselm...@googlemail.com: Hi Jan, thanks for your response! My qf parameter for edismax is: title. My defaultSearchField=text in schema.xml. In my app i generate a query with qf=title,text, so i think the default parameters in config/schema should bei overridden, right? I found eventually 2 reasons for this behavior. 1. mm-parameter in solrconfig.xml for edismax is 0. 0 stands for OR, but it should be an AND = 100%. 2. I suppose that my app does not override my default-qf. I test it today and report, with my parsed query and all params. Best regards Vadim 2012/4/29 Jan Høydahl jan@cominvent.com: Hi, What is your qf parameter? Can you run the three queries with debugQuery=trueechoParams=all and attach parsed query and all params? It will probably explain what is happening. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr
Weird query results with edismax and boolean operator +
Hi folks, i use solr 4.0 from trunk, and edismax as standard query handler. In my schema i defined this: solrQueryParser defaultOperator=AND/ I have this simple problem: nascar +author:serg* (3500 matches) +nascar +author:serg* (1 match) nascar author:serg* (5200 matches) nascar AND author:serg* (1 match) I think i understand the query syntax, but this behavior confused me. Why this match-differences? By the way, i get in all matches at least one of my terms. But not always both. Best regards Vadim
Re: Master config
hi, when only the slaves are used for search, why not, more RAM for OS. I keep my default settings on my master, because of when my slaves are busy with client-queries, i can test a few things on my master. best regards vadim 2012/4/27 Jamel ESSOUSSI jamel.essou...@gmail.com: Hi, I use two Solr slaves and one Solr master, it's a good idea to disable all the the caches in the master ? Best Regards -- Jamel ESSOUSSI -- View this message in context: http://lucene.472066.n3.nabble.com/Master-config-tp3943648p3943648.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Localize the largest fields (content) in index
Hi Erick, thanks:) The admin UI give me the counts, so i can identify fields with big bulks of unique terms. I known this wiki-page, but i read it one more time. List of my file extensions with size in GB(Index size ~150GB): tvf 90GB fdt 30GB tim 18GB prx 15GB frq 12GB tip 200MB tvx 150MB tvf is my biggest file extension. Wiki :This file contains, for each field that has a term vector stored, a list of the terms, their frequencies and, optionally, position and offest information. Hmm, i use termVectors on my biggest fields because of MLT and Highlighting. But i think i should test my performance without termVectors. Good Idea? :) What do you think about my file extension sizes? Best regards Vadim 2012/3/29 Erick Erickson erickerick...@gmail.com: The admin UI (schema browser) will give you the counts of unique terms in your fields, which is where I'd start. I suspect you've already seen this page, but if not: http://lucene.apache.org/java/3_5_0/fileformats.html#file-names the .fdt and .fdx file extensions are where data goes when you set 'stored=true '. These files don't affect search speed, they just contain the verbatim copy of the data. The relative sizes of the various files above should give you a hint as to what's using the most space, but it'll be a bit of a hunt for you to pinpoint what's actually up. TermVectors and norms are often sources of using up space. Best Erick On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hello folks, i work with Solr 4.0 r1292064 from trunk. My index grows fast, with 10Mio. docs i get an index size of 150GB (25% stored, 75% indexed). I want to find out, which fields(content) are too large, to consider measures. How can i localize/discover the largest fields in my index? Luke(latest from trunk) doesn't work with my Solr version. I build Lucene/Solr .jars and tried to feed Luke this these, but i get many errors and can't build it. What other options do i have? Thanks and best regards Vadim
Re: Localize the largest fields (content) in index
Yes, i think so, too :) MLT doesn´t need termVectors really, but it´s faster with them. I found out, what MLT works better on the title field in my case, instead of big text fields. Sharding is in planning, but my setup with SolrCloud, ZK and Tomcat doesn´t work, see here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3CCA+GXEZE3LCTtgXFzn9uEdRxMymGF=z0ujb9s8b0qkipafn6...@mail.gmail.com%3E I split my huge index (150GB-index in this case is my test-index), and want use SolrCloud, but it´s not runnable with tomcat at this time. Best regards Vadim 2012/3/29 Erick Erickson erickerick...@gmail.com: Yeah, it's worth a try. The term vectors aren't entirely necessary for highlighting, although they do make things more efficient. As far as MLT, does MLT really need such a big field? But you may be on your way to sharding your index if you remove this info and testing shows problems Best Erick On Thu, Mar 29, 2012 at 9:32 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi Erick, thanks:) The admin UI give me the counts, so i can identify fields with big bulks of unique terms. I known this wiki-page, but i read it one more time. List of my file extensions with size in GB(Index size ~150GB): tvf 90GB fdt 30GB tim 18GB prx 15GB frq 12GB tip 200MB tvx 150MB tvf is my biggest file extension. Wiki :This file contains, for each field that has a term vector stored, a list of the terms, their frequencies and, optionally, position and offest information. Hmm, i use termVectors on my biggest fields because of MLT and Highlighting. But i think i should test my performance without termVectors. Good Idea? :) What do you think about my file extension sizes? Best regards Vadim 2012/3/29 Erick Erickson erickerick...@gmail.com: The admin UI (schema browser) will give you the counts of unique terms in your fields, which is where I'd start. I suspect you've already seen this page, but if not: http://lucene.apache.org/java/3_5_0/fileformats.html#file-names the .fdt and .fdx file extensions are where data goes when you set 'stored=true '. These files don't affect search speed, they just contain the verbatim copy of the data. The relative sizes of the various files above should give you a hint as to what's using the most space, but it'll be a bit of a hunt for you to pinpoint what's actually up. TermVectors and norms are often sources of using up space. Best Erick On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hello folks, i work with Solr 4.0 r1292064 from trunk. My index grows fast, with 10Mio. docs i get an index size of 150GB (25% stored, 75% indexed). I want to find out, which fields(content) are too large, to consider measures. How can i localize/discover the largest fields in my index? Luke(latest from trunk) doesn't work with my Solr version. I build Lucene/Solr .jars and tried to feed Luke this these, but i get many errors and can't build it. What other options do i have? Thanks and best regards Vadim
Localize the largest fields (content) in index
Hello folks, i work with Solr 4.0 r1292064 from trunk. My index grows fast, with 10Mio. docs i get an index size of 150GB (25% stored, 75% indexed). I want to find out, which fields(content) are too large, to consider measures. How can i localize/discover the largest fields in my index? Luke(latest from trunk) doesn't work with my Solr version. I build Lucene/Solr .jars and tried to feed Luke this these, but i get many errors and can't build it. What other options do i have? Thanks and best regards Vadim
Re: SolrCloud with Tomcat and external Zookeeper, does it work?
Hi Jerry, thanks for your response:) This thread(SolrCloud new...) is new for me, thanks! How far are you with your setup? Which problems/errors du you have? Best regards Vadim 2012/3/27 jerry.min...@gmail.com jerry.min...@gmail.com: Hi Vadim, I too am experimenting with SolrCloud and need help with setting it up using Tomcat as the java servlet container. While searching for help on this question, I found another thread in the solr-mailing-list that is helpful. In case you haven't seen this thread that I found, please search the solr-mailing-list for: SolrCloud new You can also view it at nabble using this link: http://lucene.472066.n3.nabble.com/SolrCloud-new-td1528872.html Best, Jerry M. On Wed, Mar 21, 2012 at 5:51 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hello folks, i read the SolrCloud Wiki and Bruno Dumon's blog entry with his First Exploration of SolrCloud. Examples and a first setup with embedded Jetty and ZK WORKS without problems. I tried to setup my own configuration with Tomcat and an external Zookeeper(my Master-ZK), but it doesn't work really. My setup: - latest Solr version from trunk - Tomcat 6 - external ZK - Target: 1 Server, 1 Tomcat, 1 Solr instance, 2 collections with different config/schema What i tried: -- 1. After checkout i build solr(ant run-example), it works. --- 2. I send my config/schema files to external ZK with Jetty: java -Djetty.port=8080 -Dbootstrap_confdir=/root/solrCloud/conf/ -Dcollection.configName=conf1 -DzkHost=master-zk:2181 -jar start.jar it works, too. --- 3. I create my (empty, without cores)solr.xml, like Bruno: http://www.ngdata.com/site/blog/57-ng.html#disqus_thread --- 4. I started my Tomcat, and get the first error: in UI: This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml: !-- Admin Handlers - This will register all the standard admin RequestHandlers. -- requestHandler name=/admin/ class=solr.admin.AdminHandlers / Admin request Handlers are definitely activated in my solrconfig. I get this error only with the latest trunk versions, with r1292064 from February not. Sometimes it works with the new version, sometimes not and i get this error. -- 5. Ok , it it works, after few restarts, i changed my JAVA_OPTS for Tomcat and added this: -DzkHost=master-zk:2181 Next Error: This The web application [/solr2] appears to have started a thread named [main-SendThread(master-zk:2181)] but has failed to stop it. This is very likely to create a memory leak. Exception in thread Thread-2 java.lang.NullPointerException at org.apache.solr.cloud.Overseer$CloudStateUpdater.amILeader(Overseer.java:179) at org.apache.solr.cloud.Overseer$CloudStateUpdater.run(Overseer.java:104) at java.lang.Thread.run(Thread.java:662) 15.03.2012 13:25:17 org.apache.catalina.loader.WebappClassLoader loadClass INFO: Illegal access: this web application instance has been stopped already. Could not load org.apache.zookeeper.server.ZooTrace. The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact. java.lang.IllegalStateException at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1196) 15.03.2012 13:25:17 org.apache.coyote.http11.Http11Protocol destroy - 6. Ok, we assume, that the first steps works, and i would create new cores and my 2 collections. My requests with CoreAdminHandler are ok, my solr.xml looks like this: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8080 hostContext=solr core name=shard1_data collection=col1 shard=shard1 instanceDir=xxx/ / core name=shard2_data collection=col2 shard=shard2 instanceDir=xx2/ / /cores /solr Now i get the following exception: ...couldn't find conf name for collection1... I don't have an collection 1. Why this exception? --- You can see, there are too many exceptions and eventually configuration problems with Tomcat and an external ZK. Has anyone set up an identical configuration and does it work? Does anyone detect mistakes in my configuration steps? Best regards Vadim
SolrCloud with Tomcat and external Zookeeper, does it work?
Hello folks, i read the SolrCloud Wiki and Bruno Dumon's blog entry with his First Exploration of SolrCloud. Examples and a first setup with embedded Jetty and ZK WORKS without problems. I tried to setup my own configuration with Tomcat and an external Zookeeper(my Master-ZK), but it doesn't work really. My setup: - latest Solr version from trunk - Tomcat 6 - external ZK - Target: 1 Server, 1 Tomcat, 1 Solr instance, 2 collections with different config/schema What i tried: -- 1. After checkout i build solr(ant run-example), it works. --- 2. I send my config/schema files to external ZK with Jetty: java -Djetty.port=8080 -Dbootstrap_confdir=/root/solrCloud/conf/ -Dcollection.configName=conf1 -DzkHost=master-zk:2181 -jar start.jar it works, too. --- 3. I create my (empty, without cores)solr.xml, like Bruno: http://www.ngdata.com/site/blog/57-ng.html#disqus_thread --- 4. I started my Tomcat, and get the first error: in UI: This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml: !-- Admin Handlers - This will register all the standard admin RequestHandlers. -- requestHandler name=/admin/ class=solr.admin.AdminHandlers / Admin request Handlers are definitely activated in my solrconfig. I get this error only with the latest trunk versions, with r1292064 from February not. Sometimes it works with the new version, sometimes not and i get this error. -- 5. Ok , it it works, after few restarts, i changed my JAVA_OPTS for Tomcat and added this: -DzkHost=master-zk:2181 Next Error: This The web application [/solr2] appears to have started a thread named [main-SendThread(master-zk:2181)] but has failed to stop it. This is very likely to create a memory leak. Exception in thread Thread-2 java.lang.NullPointerException at org.apache.solr.cloud.Overseer$CloudStateUpdater.amILeader(Overseer.java:179) at org.apache.solr.cloud.Overseer$CloudStateUpdater.run(Overseer.java:104) at java.lang.Thread.run(Thread.java:662) 15.03.2012 13:25:17 org.apache.catalina.loader.WebappClassLoader loadClass INFO: Illegal access: this web application instance has been stopped already. Could not load org.apache.zookeeper.server.ZooTrace. The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact. java.lang.IllegalStateException at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1196) 15.03.2012 13:25:17 org.apache.coyote.http11.Http11Protocol destroy - 6. Ok, we assume, that the first steps works, and i would create new cores and my 2 collections. My requests with CoreAdminHandler are ok, my solr.xml looks like this: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8080 hostContext=solr core name=shard1_data collection=col1 shard=shard1 instanceDir=xxx/ / core name=shard2_data collection=col2 shard=shard2 instanceDir=xx2/ / /cores /solr Now i get the following exception: ...couldn't find conf name for collection1... I don't have an collection 1. Why this exception? --- You can see, there are too many exceptions and eventually configuration problems with Tomcat and an external ZK. Has anyone set up an identical configuration and does it work? Does anyone detect mistakes in my configuration steps? Best regards Vadim
Re: whethere solr 3.3 index file is compatable with solr 4.0
you have to re-index your data. best regards vadim 2012/3/21 syed kather in.ab...@gmail.com: Team I have indexed my data with solr 3.3 version , As I need to use hierarchical facets features from solr 4.0 . Can I use the existing data with Solr 4.0 version or should need to re-index the data with new version? Thanks and Regards, S SYED ABDUL KATHER
Solr 4.0 and tomcat, error in new admin UI
Hi folks, i comment this issue : https://issues.apache.org/jira/browse/SOLR-3238 , but i want to ask here if anyone have the same problem. I use Solr 4.0 from trunk(latest) with tomcat6. I get an error in New Admin UI: This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml: !-- Admin Handlers - This will register all the standard admin RequestHandlers. -- requestHandler name=/admin/ class=solr.admin.AdminHandlers / Admin request Handlers are definitely activated in my solrconfig. A problem with tomcat? It works with embedded jetty, but i should use tomcat. Best Regards Vadim
Re: Apache Lucene Eurocon 2012
Hi Chris, thanks for your response.Ok, we will wait :) Best Regards Vadim 2012/3/8 Chris Hostetter hossman_luc...@fucit.org : where and when is the next Eurocon scheduled? : I read something about denmark and autumn 2012(i don't know where *g*). I do not know where, but sometime in the fall is probably the correct time frame. I beleive the details will be announced at Lucene Revolution... http://lucenerevolution.org/ (that's what happened last year) -Hoss
Apache Lucene Eurocon 2012
Hi folks, where and when is the next Eurocon scheduled? I read something about denmark and autumn 2012(i don't know where *g*). Best regards and thanks Vadim
Re: maxClauseCount Exception
Set maxBooleanClauses in your solrconfig.xml higher, default is 1024. Your query blast this limit. Regards Vadim 2012/2/22 Darren Govoni dar...@ontrenet.com Hi, I am suddenly getting a maxClauseCount exception for no reason. I am using Solr 3.5. I have only 206 documents in my index. Any ideas? This is wierd. QUERY PARAMS: [hl, hl.snippets, hl.simple.pre, hl.simple.post, fl, hl.mergeContiguous, hl.usePhraseHighlighter, hl.requireFieldMatch, echoParams, hl.fl, q, rows, start]|#] [#|2012-02-22T13:40:13.129-0500|INFO|glassfish3.1.1| org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=Thread-2;|[] webapp=/solr3 path=/select params={hl=truehl.snippets=4hl.simple.pre=b/bfl=*,scorehl.mergeContiguous=truehl.usePhraseHighlighter=truehl.requireFieldMatch=trueechoParams=allhl.fl=text_tq={!lucene+q.op%3DOR+df%3Dtext_t}+(+kind_s:doc+OR+kind_s:xml)+AND+(type_s:[*+TO+*])+AND+(usergroup_sm:admin)rows=20start=0wt=javabinversion=2} hits=204 status=500 QTime=166 |#] [#|2012-02-22T13:40:13.131-0500|SEVERE|glassfish3.1.1| org.apache.solr.servlet.SolrDispatchFilter| _ThreadID=22;_ThreadName=Thread-2;|org.apache.lucene.search.BooleanQuery $TooManyClauses: maxClauseCount is set to 1024 at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136) at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:127) at org.apache.lucene.search.ScoringRewrite $1.addClause(ScoringRewrite.java:51) at org.apache.lucene.search.ScoringRewrite $1.addClause(ScoringRewrite.java:41) at org.apache.lucene.search.ScoringRewrite $3.collect(ScoringRewrite.java:95) at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:38) at org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:93) at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:304) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:98) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:385) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:217) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:185) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.so
Custom Query Component: parameters are not appended to query
Hello folks, I build a simple custom component for “hl.q” query. My case was to inject hl.q=params on the fly, with filter params like fields which were in my standard query. These were highlighted , because Solr/Lucene have no way of interpreting an extended q clause and saying this part is a query and should be highlighted and this part isn't. If it works, the community can have it :) Facts: q=roomba AND irobot AND language:de My component is extended form SearchComponent. I use ResponseBuilder to get all needed params like field-names from schema, q-params, etc… My component is called as first (it works(debugging,debugQuery)) from my SearchHandler: arr name=first-components strhighlightQuery/str /arr Important Clippings from Sourcecode: public class HighlightQueryComponent extends SearchComponent { ……. ……. public void process(ResponseBuilder rb) throws IOException { if(rb.doHighlights){ ListString terms = new ArrayListString(0); SolrQueryRequest req = rb.req; IndexSchema schema = req.getSchema(); MapString,SchemaField fields = schema.getFields(); SolrParams params = req.getParams(); ….. …. …magic … …. Query hlq = new TermQuery(new Term(“text”, hlQuery.toString())); rb.setHighlightQuery(hlq); // hlq = text:(roomba AND irobot) Problem: In last step my query is adjusted (hlq params from debugging are “text:(roomba AND irobot)”). It looks fine, the magic in method process() works. But nothing happen. If I continue to debug the next components were called, But my query is the same, without changes. Either setHighlightQuery doesn´t work, or my params are overridden in following components. What can it be? Best Regards Vadim
Re: How to reindex about 10Mio. docs
Hi Otis, thanks for your response:) We had a solution yesterday. It works with an ruby script, curl and saxon/xslt. The performance is great. We moved all the docs in 5-batches to prevent an overload of our machines. Best regards Vadim 2012/2/8 Otis Gospodnetic otis_gospodne...@yahoo.com: Vadim, Would using xslt output help? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Vadim Kisselmann v.kisselm...@googlemail.com To: solr-user@lucene.apache.org Sent: Wednesday, February 8, 2012 7:09 AM Subject: Re: How to reindex about 10Mio. docs Another problem appeared ;) how can i export my docs in csv-format? In Solr 3.1+ i can use the query-param wt=csv, but in Solr 1.4.1? Best Regards Vadim 2012/2/8 Vadim Kisselmann v.kisselm...@googlemail.com: Hi Ahmet, thanks for quick response:) I've already thought the same... And it will be a pain to export and import this huge doc-set as CSV. Do i have an another solution? Regards Vadim 2012/2/8 Ahmet Arslan iori...@yahoo.com: i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another Solr(1.4.1). I changed my schema.xml (field types sing to slong), standard replication would fail. what is the fastest and smartest way to manage this? this here sound great (EntityProcessor): http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr But would it work with Solr 1.4.1? SolrEntityProcessor is not available in 1.4.1. I would dump stored fields into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to feed into new solr instance.
How to reindex about 10Mio. docs
Hello folks, i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another Solr(1.4.1). I changed my schema.xml (field types sing to slong), standard replication would fail. what is the fastest and smartest way to manage this? this here sound great (EntityProcessor): http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr But would it work with Solr 1.4.1? Best Regards Vadim
Re: How to reindex about 10Mio. docs
Hi Ahmet, thanks for quick response:) I've already thought the same... And it will be a pain to export and import this huge doc-set as CSV. Do i have an another solution? Regards Vadim 2012/2/8 Ahmet Arslan iori...@yahoo.com: i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another Solr(1.4.1). I changed my schema.xml (field types sing to slong), standard replication would fail. what is the fastest and smartest way to manage this? this here sound great (EntityProcessor): http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr But would it work with Solr 1.4.1? SolrEntityProcessor is not available in 1.4.1. I would dump stored fields into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to feed into new solr instance.
Re: How to reindex about 10Mio. docs
Another problem appeared ;) how can i export my docs in csv-format? In Solr 3.1+ i can use the query-param wt=csv, but in Solr 1.4.1? Best Regards Vadim 2012/2/8 Vadim Kisselmann v.kisselm...@googlemail.com: Hi Ahmet, thanks for quick response:) I've already thought the same... And it will be a pain to export and import this huge doc-set as CSV. Do i have an another solution? Regards Vadim 2012/2/8 Ahmet Arslan iori...@yahoo.com: i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another Solr(1.4.1). I changed my schema.xml (field types sing to slong), standard replication would fail. what is the fastest and smartest way to manage this? this here sound great (EntityProcessor): http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr But would it work with Solr 1.4.1? SolrEntityProcessor is not available in 1.4.1. I would dump stored fields into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to feed into new solr instance.
Re: Edismax, Filter Query and Highlighting
hl.q works:) But i have to attach the hl.q to my standard query. In bigger queries it would by a pain to find out, which terms i need in my hl.q. My plan: Own query parser in solr, which loops through q, identifies filter terms(in my case language:de) and append it as hl.q to the standard query. Sounds like a plan? :) Best Regards Vadim 2012/2/1 Koji Sekiguchi k...@r.email.ne.jp: (12/02/01 4:28), Vadim Kisselmann wrote: Hmm, i don´t know, but i can test it tomorrow at work. i´m not sure about the right syntax with hl.q. (?) but i report :) hl.q can accept same syntax of q, including local params. koji -- http://www.rondhuit.com/en/
Edismax, Filter Query and Highlighting
Hi, i have problems with edismax, filter queries and highlighting. First of all: can edismax deal with filter queries? My case: Edismax is my default requestHandler. My query in SolrAdminGUI: (roomba OR irobot) AND language:de You can see, that my q is roomba OR irobot and my fq is language:de(language is a field in schema.xml) With this params i turn highlighting on: hl=truehl.fl=text,title,url In my shown result you can see that highlighting matched on emde/em in url(last arr). lst name=de.blog-gedanken/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger arr name=titlestrErste Erfahrung mit unserem emRoomba/em Roboter Staubsauger/str/arr arr name=textstr Erste Erfahrung mit unserem emRoomba/em Roboter Staubsauger Tags: Haushaltshilfe, Roboter/str/arr arr name=urlstrhttp://www.blog-gedanken.emde/em/produkte/erste-erfahrung-mit-unserem-emroomba/em-roboter-staubsauger//str/arr/lst in calalina.out i can see the following query: path=/select/ params={hl=trueversion=2.2indent=onrows=10start=0q=(roomba+OR+irobot)+AND+language:de} hits=1 status=0 QTime=65 language:de is a filter, and shouldn't be highlighted. Do i have a thinking error, or is my query wrong? Or is it an edismax problem? Vest Regards Vadim
Re: Edismax, Filter Query and Highlighting
Hi Ahmet, thanks for quick response :) I've also discovered this failure. I wonder that the query themselves works. For example: query = language:de I get results which only have language:de. Also works the fq and i get only the de-result in my field language. I can't understand the behavior. It seems like the fq works, but at the end my fq-params be converted to q-params. Regards Vadim 2012/1/31 Ahmet Arslan iori...@yahoo.com: in calalina.out i can see the following query: path=/select/ params={hl=trueversion=2.2indent=onrows=10start=0q=(roomba+OR+irobot)+AND+language:de} hits=1 status=0 QTime=65 language:de is a filter, and shouldn't be highlighted. Do i have a thinking error, or is my query wrong? Or is it an edismax problem? In your example, language:de is a part of query. Use fq= instead. q=(roomba OR irobot)fq=language:de
Re: Edismax, Filter Query and Highlighting
/doublelst name=preparedouble name=time0.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double/lst/lstlst name=processdouble name=time15.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time8.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time7.0/double/lst/lst/lst/lst I hope you can read it:) Best Regards Vadim 2012/1/31 Erick Erickson erickerick...@gmail.com: Seeing the results with debugQuery=on would help. No, fq does NOT get translated into q params, it's a completely separate mechanism so I'm not quite sure what you're seeing. Best Erick On Tue, Jan 31, 2012 at 8:40 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi Ahmet, thanks for quick response :) I've also discovered this failure. I wonder that the query themselves works. For example: query = language:de I get results which only have language:de. Also works the fq and i get only the de-result in my field language. I can't understand the behavior. It seems like the fq works, but at the end my fq-params be converted to q-params. Regards Vadim 2012/1/31 Ahmet Arslan iori...@yahoo.com: in calalina.out i can see the following query: path=/select/ params={hl=trueversion=2.2indent=onrows=10start=0q=(roomba+OR+irobot)+AND+language:de} hits=1 status=0 QTime=65 language:de is a filter, and shouldn't be highlighted. Do i have a thinking error, or is my query wrong? Or is it an edismax problem? In your example, language:de is a part of query. Use fq= instead. q=(roomba OR irobot)fq=language:de
Re: Edismax, Filter Query and Highlighting
Hi Erick, I didn't read your first post carefully enough, I was keying on the words filter query. Your query does not have any filter queries! I thought you were talking about fq=language:de type clauses, which is what I was responding to. no problem, i understand:) Solr/Lucene have no way of interpreting an extended q clause and saying this part is a query and should be highlighted and this part isn't. Try the fq option maybe? I thought so, unfortunately. fq will be the only option. I should rebuild my application :) Best Regards Vadim
Re: Edismax, Filter Query and Highlighting
Hmm, i don´t know, but i can test it tomorrow at work. i´m not sure about the right syntax with hl.q. (?) but i report :) 2012/1/31 Ahmet Arslan iori...@yahoo.com: Try the fq option maybe? I thought so, unfortunately. fq will be the only option. I should rebuild my application :) Could hl.q help? http://wiki.apache.org/solr/HighlightingParameters#hl.q
Re: Solr 3.5.0 can't find Carrot classes
Hi Christopher, when all needed jars are included, you can only have wrong paths in your solrconfig.xml Regards Vadim 2012/1/26 Stanislaw Osinski stanislaw.osin...@carrotsearch.com: Hi, Can you paste the logs from the second run? Thanks, Staszek On Wed, Jan 25, 2012 at 00:12, Christopher J. Bottaro cjbott...@onespot.com wrote: On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote: SEVERE: java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.init(CarrotClusteringEngine.java:102) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at java.lang.Class.newInstance0(Unknown Source) at java.lang.Class.newInstance(Unknown Source) … I'm starting Solr with -Dsolr.clustering.enabled=true and I can see that the Carrot jars in contrib are getting loaded. Full log file is here: http://onespot-development.s3.amazonaws.com/solr.log Any ideas? Thanks for the help. Ok, got a little further. Seems that Solr doesn't like it if you include jars more than once (I had a lib dir and also lib directives in the solrconfig which ended up loading the same jars twice). But now I'm getting these errors: java.lang.NoClassDefFoundError: org/apache/solr/handler/clustering/SearchClusteringEngine Any help? Thanks.
decreasing of maxFieldLength in solrconfig.xml doesn't work
Hello Folks, i want to decrease the max. number of terms for my fields to 500. I thought what the maxFieldLength parameter in solrconfig.xml is intended for this. In my case it doesn't work. The half of my text fields includes longer text(about 1 words). With 100 docs in my index i had an segment size of 1140KB for indexed data and 270KB for stored data (.fdx, .fdt). After a change from default maxFieldLength1/maxFieldLength to maxFieldLength500/maxFieldLength, delete(index folder), restarting Tomcat and reindex, i see the same segment sizes (1140KB for indexed and 270KB for stored data). Please tell me if I made an error in reasoning. Regards Vadim
Re: decreasing of maxFieldLength in solrconfig.xml doesn't work
P.S.: i use Solr 4.0 from trunk. Is maxFieldLength deprecated in Solr 4.0 ? If so, do i have an alternative to decrease the number of terms during indexing? Regards Vadim 2012/1/26 Vadim Kisselmann v.kisselm...@googlemail.com: Hello Folks, i want to decrease the max. number of terms for my fields to 500. I thought what the maxFieldLength parameter in solrconfig.xml is intended for this. In my case it doesn't work. The half of my text fields includes longer text(about 1 words). With 100 docs in my index i had an segment size of 1140KB for indexed data and 270KB for stored data (.fdx, .fdt). After a change from default maxFieldLength1/maxFieldLength to maxFieldLength500/maxFieldLength, delete(index folder), restarting Tomcat and reindex, i see the same segment sizes (1140KB for indexed and 270KB for stored data). Please tell me if I made an error in reasoning. Regards Vadim
Re: decreasing of maxFieldLength in solrconfig.xml doesn't work
Sean, Ahmet, thanks for response:) I use Solr 4.0 from trunk. In my solrconfig.xml is only one maxFieldLength param. I think it is deprecated in Solr Versions 3.5+... But LimitTokenCountFilterFactory works in my case :) Thanks! Regards Vadim 2012/1/26 Ahmet Arslan iori...@yahoo.com: i want to decrease the max. number of terms for my fields to 500. I thought what the maxFieldLength parameter in solrconfig.xml is intended for this. In my case it doesn't work. The half of my text fields includes longer text(about 1 words). With 100 docs in my index i had an segment size of 1140KB for indexed data and 270KB for stored data (.fdx, .fdt). After a change from default maxFieldLength1/maxFieldLength to maxFieldLength500/maxFieldLength, delete(index folder), restarting Tomcat and reindex, i see the same segment sizes (1140KB for indexed and 270KB for stored data). Please tell me if I made an error in reasoning. What version of solr are you using? Could it be http://lucene.apache.org/solr/api/org/apache/solr/analysis/LimitTokenCountFilterFactory.html? http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html
Re: Size of index to use shard
Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Re: Size of index to use shard
@Erick thanks:) i´m with you with your opinion. my load tests show the same. @Dmitry my docs are small too, i think about 3-15KB per doc. i update my index all the time and i have an average of 20-50 requests per minute (20% facet queries, 80% large boolean queries with wildcard/fuzzy) . How much docs at a time= depends from choosed filters, from 10 to all 100Mio. I work with very small caches (strangely, but if my index is under 100GB i need larger caches, over 100GB smaller caches..) My JVM has 6GB, 18GB for I/O. With few updates a day i would configure very big caches, like Tim Burton (see HathiTrust´s Blog) Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Thanks for the explanation Erick :) 2012/1/24, Erick Erickson erickerick...@gmail.com: Talking about index size can be very misleading. Take a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. Note that the *.fdt and *.fdx files are used to for stored fields, i.e. the verbatim copy of data put in the index when you specify stored=true. These files have virtually no impact on search speed. So, if your *.fdx and *.fdt files are 90G out of a 100G index it is a much different thing than if these files are 10G out of a 100G index. And this doesn't even mention the peculiarities of your query mix. Nor does it say a thing about whether your cheapest alternative is to add more memory. Anderson's method is about the only reliable one, you just have to test with your index and real queries. At some point, you'll find your tipping point, typically when you come under memory pressure. And it's a balancing act between how much memory you allocate to the JVM and how much you leave for the op system. Bottom line: No hard and fast numbers. And you should periodically re-test the empirical numbers you *do* arrive at... Best Erick On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of shards. Thanks 2012/1/24, Dmitry Kan dmitry@gmail.com: Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Size of fields from one document (monitoring, debugging)
Hello folks, is it possible to find out the size (in KB) of specific fields from one document? Eventually with Luke or Lucid Gaze? My case: docs in my old index (Solr 1.4) have sizes of 3-4KB each. In my new index(Solr 4.0 trunk) there are about 15KB per doc. I changed only 2 things in my schema.xml. I added the ReversedWildcardFilterFactory(indexing) and one field (LatLonType, stored and indexed). My content is more or less the same. I would like to debug this to refactor my schema.xml. The newest Luke Version(3.5) doesn't work with Solr 4.0 from trunk, so i can't test it. Cheers Vadim
Re: Weird docs-id clustering output in Solr 1.4.1
Hi Stanislaw, did you already have time to create a patch? If not, can you tell me please which lines in which class in source code are relevant? Thanks and regards Vadim Kisselmann 2011/11/29 Vadim Kisselmann v.kisselm...@googlemail.com Hi, the quick and dirty way sound good:) It would be great if you can send me a patch for 1.4.1. By the way, i tested Solr. 3.5 with my 1.4.1 test index. I can search and optimize, but clustering doesn't work (java.lang.Integer cannot be cast to java.lang.String) My uniqieKey for my docs it the id(sint). These here was the error message: Problem accessing /solr/select/. Reason: Carrot2 clustering failed org.apache.solr.common.SolrException: Carrot2 clustering failed at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:217) at org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.getDocuments(CarrotClusteringEngine.java:364) at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:201) ... 23 more It this case it's better for me to upgrade/patch the 1.4.1 version. Best regards Vadim 2011/11/29 Stanislaw Osinski stanislaw.osin...@carrotsearch.com But my actual live system works on solr 1.4.1. i can only change my solrconfig.xml and integrate new packages... i check the possibility to upgrade from 1.4.1 to 3.5 with the same index (without reinidex) with luceneMatchVersion 2.9. i hope it works... Another option would be to check out Solr 1.4.1 source code, fix the issue and recompile the clustering component. The quick and dirty way would be to convert all identifiers to strings in the clustering component, before the they are returned for serialization (I can send you a patch that does this). The proper way would be to fix the root cause of the problem, but I'd need to dig deeper into the code to find this. Staszek
Re: Error in New Solr version
Hi, comment out the lines with the collapse component in your solrconfig.xml if not need it. otherwise, you're missing the right jar's for this component, or path's to this jars in your solrconfig.xml are wrong. regards vadim 2011/12/1 Pawan Darira pawan.dar...@gmail.com Hi I am migrating from Solr 1.4 to Solr 3.2. I am getting below error in my logs org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.CollapseComponent Could not found satisfactory solution on google. please help thanks Pawan
Re: Weird docs-id clustering output in Solr 1.4.1
Hi Stanislaw, unfortunately it doesn't work. I changed the line 216 with the new toString()-part and rebuild the source. still the same behavior, without errors(because of changes). an another line to change? Thanks and regards Vadim 2011/12/1 Stanislaw Osinski stanislaw.osin...@carrotsearch.com Hi Vadim, I've had limited connectivity, so I couldn't check out the complete 1.4.1 code and test the changes. Here's what you can try: In this file: http://svn.apache.org/viewvc/lucene/solr/tags/release-1.4.1/contrib/clustering/src/main/java/org/apache/solr/handler/clustering/carrot2/CarrotClusteringEngine.java?revision=957515view=markup around line 216 you will see: for (Document doc : docs) { docList.add(doc.getField(solrId)); } You need to change this to: for (Document doc : docs) { docList.add(doc.getField(solrId).toString()); } Let me know if this did the trick. Cheers, S. On Thu, Dec 1, 2011 at 10:43, Vadim Kisselmann v.kisselm...@googlemail.comwrote: Hi Stanislaw, did you already have time to create a patch? If not, can you tell me please which lines in which class in source code are relevant? Thanks and regards Vadim Kisselmann 2011/11/29 Vadim Kisselmann v.kisselm...@googlemail.com Hi, the quick and dirty way sound good:) It would be great if you can send me a patch for 1.4.1. By the way, i tested Solr. 3.5 with my 1.4.1 test index. I can search and optimize, but clustering doesn't work (java.lang.Integer cannot be cast to java.lang.String) My uniqieKey for my docs it the id(sint). These here was the error message: Problem accessing /solr/select/. Reason: Carrot2 clustering failed org.apache.solr.common.SolrException: Carrot2 clustering failed at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:217) at org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.getDocuments(CarrotClusteringEngine.java:364) at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:201) ... 23 more It this case it's better for me to upgrade/patch the 1.4.1 version. Best regards Vadim 2011/11/29 Stanislaw Osinski stanislaw.osin...@carrotsearch.com But my actual live system works on solr 1.4.1. i can only change my solrconfig.xml and integrate new packages... i check the possibility to upgrade from 1.4.1 to 3.5 with the same index (without reinidex) with luceneMatchVersion 2.9. i hope it works... Another option would be to check out Solr 1.4.1 source code, fix the issue and recompile the clustering component. The quick and dirty way would be to convert all identifiers to strings
Weird docs-id clustering output in Solr 1.4.1
Hi folks, i've installed the clustering component in solr 1.4.1 and it works, but not really:) You can see what the doc id is corrupt. arr name=clusterslst arr name=labels strEuro-Krise/str /arrarr name=docs str½Íџ/str str¾ͽ/str str¿)ై/str str/str /arr/lst my fields: field name=id type=sint indexed=true stored=true required=true/ field name=url type=string indexed=false stored=true required=true/ field name=title type=customtext indexed=false stored=true required=true/ field name=text type=customtext indexed=false stored=true multiValued=true compressed=true/ and my config-snippets: str name=carrot.titletitle/str str name=carrot.urlid/str !-- The field to cluster on -- str name=carrot.snippettext/str i changed my config snippets (carrot.url=id, url, title..) but the result is the same. anyone an idea? best regards and thanks vadim
Re: Weird docs-id clustering output in Solr 1.4.1
Hello Staszek, thanks for testing:) i think the same (serialization issue -int to string). This config works fine with solr 4.0 in my test cluster, i think with 3,5 too, without problems. But my actual live system works on solr 1.4.1. i can only change my solrconfig.xml and integrate new packages... i check the possibility to upgrade from 1.4.1 to 3.5 with the same index (without reinidex) with luceneMatchVersion 2.9. i hope it works... Thanks and regards Vadim 2011/11/29 Stanislaw Osinski stanis...@osinski.name Hi, It looks like some serialization issue related to writing integer ids to the output. I've just tried a similar configuration on Solr 3.5 and the integer identifiers looked fine. Can you try the same configuration on Solr 3.5? Thanks, Staszek On Tue, Nov 29, 2011 at 12:03, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi folks, i've installed the clustering component in solr 1.4.1 and it works, but not really:) You can see what the doc id is corrupt. arr name=clusterslst arr name=labels strEuro-Krise/str /arrarr name=docs str½Íџ/str str¾ͽ/str str¿)ై/str strˆ/str /arr/lst my fields: field name=id type=sint indexed=true stored=true required=true/ field name=url type=string indexed=false stored=true required=true/ field name=title type=customtext indexed=false stored=true required=true/ field name=text type=customtext indexed=false stored=true multiValued=true compressed=true/ and my config-snippets: str name=carrot.titletitle/str str name=carrot.urlid/str !-- The field to cluster on -- str name=carrot.snippettext/str i changed my config snippets (carrot.url=id, url, title..) but the result is the same. anyone an idea? best regards and thanks vadim
Re: Weird docs-id clustering output in Solr 1.4.1
Hi, the quick and dirty way sound good:) It would be great if you can send me a patch for 1.4.1. By the way, i tested Solr. 3.5 with my 1.4.1 test index. I can search and optimize, but clustering doesn't work (java.lang.Integer cannot be cast to java.lang.String) My uniqieKey for my docs it the id(sint). These here was the error message: Problem accessing /solr/select/. Reason: Carrot2 clustering failed org.apache.solr.common.SolrException: Carrot2 clustering failed at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:217) at org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.getDocuments(CarrotClusteringEngine.java:364) at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:201) ... 23 more It this case it's better for me to upgrade/patch the 1.4.1 version. Best regards Vadim 2011/11/29 Stanislaw Osinski stanislaw.osin...@carrotsearch.com But my actual live system works on solr 1.4.1. i can only change my solrconfig.xml and integrate new packages... i check the possibility to upgrade from 1.4.1 to 3.5 with the same index (without reinidex) with luceneMatchVersion 2.9. i hope it works... Another option would be to check out Solr 1.4.1 source code, fix the issue and recompile the clustering component. The quick and dirty way would be to convert all identifiers to strings in the clustering component, before the they are returned for serialization (I can send you a patch that does this). The proper way would be to fix the root cause of the problem, but I'd need to dig deeper into the code to find this. Staszek
Re: how to : multicore setup with same config files
Hi, yes, see http://wiki.apache.org/solr/DistributedSearch Regards Vadim 2011/11/2 Val Minyaylo vminya...@centraldesktop.com Have you tried to query multiple cores at same time? On 10/31/2011 8:30 AM, Vadim Kisselmann wrote: it works. it was one wrong placed backslash in my config;) sharing the config/schema files is not a problem. regards vadim 2011/10/31 Vadim Kisselmannv.kisselmann@**googlemail.comv.kisselm...@googlemail.com Hi folks, i have a small blockade in the configuration of an multicore setup. i use the latest solr version (4.0) from trunk and the example (with jetty). single core is running without problems. We assume that i have this structure: /solr-trunk/solr/example/**multicore/ solr.xml core0/ core1/ /solr-data/ /conf/ schema.xml solrconfig.xml /data/ core0/ index core1/ index I want so share the config-files(same instanceDir but different docDir) How can i configure this so that it works(solrconfig.xml, solr.xml)? Do i need the directories for core0/core1 in solr-trunk/...? I found issues in Jira with old patches which unfortunately doesn't work. Thanks and Regards Vadim
Re: InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting
Hi Edwin, Chris it´s an old bug. I have big problems too with OffsetExceptions when i use Highlighting, or Carrot. It looks like a problem with HTMLStripCharFilter. Patch doesn´t work. https://issues.apache.org/jira/browse/LUCENE-2208 Regards Vadim 2011/11/11 Edwin Steiner edwin.stei...@gmail.com I just entered a bug: https://issues.apache.org/jira/browse/SOLR-2891 Thanks regards, Edwin On Nov 7, 2011, at 8:47 PM, Chris Hostetter wrote: : finally I want to use Solr highlighting. But there seems to be a problem : if I combine the char filter and the compound word filter in combination : with highlighting (an : org.apache.lucene.search.highlight.InvalidTokenOffsetsException is : raised). Definitely sounds like a bug somwhere in dealing with the offsets. can you please file a Jira, and include all of the data you have provided here? it would also be helpful to know what the analysis tool says about the various attributes of your tokens at each stage of the analysis? : SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token fall exceeds length of provided text sized 12 : at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:469) : at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378) : at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116) : at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) : at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) : at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) : at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) : at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) : at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) : at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) : at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) : at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) : at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462) : at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) : at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) : at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:851) : at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) : at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405) : at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:278) : at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515) : at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302) : at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) : at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) : at java.lang.Thread.run(Thread.java:680) : Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token fall exceeds length of provided text sized 12 : at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:228) : at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:462) : ... 23 more -Hoss
Similar documents and advantages / disadvantages of MLT / Deduplication
Hello folks, i have questions about MLT and Deduplication and what would be the best choice in my case. Case: I index 1000 docs, 5 of them are 95% the same (for example: copy pasted blog articles from different sources, with slight changes (author name, etc..)). But they have differences. *Now i like to see 1 doc in my result set and the other 4 should be marked as similar.* With *MLT*: str name=mlt.fltext/str int name=mlt.minwl5/int int name=mlt.maxwl50/int int name=mlt.maxqt3/int int name=mlt.maxntp5000/int bool name=mlt.boosttrue/bool str name=mlt.qftext/str /lst With this config i get about 500 similar docs for this 1 doc, unfortunately too much. *Deduplication*: I index this docs now with an signature and i'm using TextProfileSignature. updateRequestProcessorChain name=dedupe processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsignature_t/str bool name=overwriteDupesfalse/bool str name=fieldstext/str str name=signatureClasssolr.processor.TextProfileSignature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain How can i compare the created signatures? I want only see the 5 similar docs, nothing else. Which of this two cases is relevant to me? Can i tune the MLT for my requirement? Or should i use Dedupe? Thanks and Regards Vadim
shard indexing
Hello folks, i have an problem with shard indexing. with an single core i use this update command: http://localhost:8983/solr/update . now i have 2 shards, we can call them core0 / core1 http://localhost:8983/solr/core0/update . can i adjust anything to indexing in the same way like with a single core without core-name? thanks and regards vadim
Re: shard indexing
Hello Jan, thanks for your quick response. It's quite difficult to explain: We want to create new shards on the fly every month and switch the default shard to the newest one. We always want to index to the newest shard with the same update query like http://localhost:8983/solr/update.(content stream) Is our idea possible to implement? Thanks in advance. Regards Vadim 2011/11/2 Jan Høydahl jan@cominvent.com Hi, The only difference is the core name in the URL, which should be easy enough to handle from your indexing client code. I don't really understand the reason behind your request. How would you control which core to index your document to if you did not specify it in the URL? You could name ONE of your cores as ., meaning it would be the default core living at /solr/update, perhaps that is what you're looking for? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote: Hello folks, i have an problem with shard indexing. with an single core i use this update command: http://localhost:8983/solr/update . now i have 2 shards, we can call them core0 / core1 http://localhost:8983/solr/core0/update . can i adjust anything to indexing in the same way like with a single core without core-name? thanks and regards vadim
Re: shard indexing
Hello Yury, thanks for your response. This is exactly my plan. But defaultCoreName is buggy. When i use it (defaultCore=core_november), the defaultCore will be deleted. I think this here was the issue: https://issues.apache.org/jira/browse/SOLR-2127 Do you use this feature and did it work? Thanks and Regards Vadim 2011/11/2 Yury Kats yuryk...@yahoo.com There's a defaultCore parameter in solr.xml that let's you specify what core should be used when none is specified in the URL. You can change that every time you create a new core. From: Vadim Kisselmann v.kisselm...@googlemail.com To: solr-user@lucene.apache.org Sent: Wednesday, November 2, 2011 6:16 AM Subject: Re: shard indexing Hello Jan, thanks for your quick response. It's quite difficult to explain: We want to create new shards on the fly every month and switch the default shard to the newest one. We always want to index to the newest shard with the same update query like http://localhost:8983/solr/update.(content stream) Is our idea possible to implement? Thanks in advance. Regards Vadim 2011/11/2 Jan Høydahl jan@cominvent.com Hi, The only difference is the core name in the URL, which should be easy enough to handle from your indexing client code. I don't really understand the reason behind your request. How would you control which core to index your document to if you did not specify it in the URL? You could name ONE of your cores as ., meaning it would be the default core living at /solr/update, perhaps that is what you're looking for? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote: Hello folks, i have an problem with shard indexing. with an single core i use this update command: http://localhost:8983/solr/update . now i have 2 shards, we can call them core0 / core1 http://localhost:8983/solr/core0/update . can i adjust anything to indexing in the same way like with a single core without core-name? thanks and regards vadim
Re: shard indexing
Hello Jan, i think personally the same (switch URL for my indexing code), but my requirement is to use the same query. Thanks for your suppose with this one trick. Great idea which could work in my case, i test it. Regards Vadim 2011/11/2 Jan Høydahl jan@cominvent.com Personally I think it is better to be explicit about where you index, so that when you create a new shard december, you also switch the URL for your indexing code. I suppose one trick you could use is to have a core called current, which now would be for november, and once you get to december, you create a november core, and do a SWAP between current-november. Then your new core would now be current and you don't need to change URLs on the index client side. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 2. nov. 2011, at 11:16, Vadim Kisselmann wrote: Hello Jan, thanks for your quick response. It's quite difficult to explain: We want to create new shards on the fly every month and switch the default shard to the newest one. We always want to index to the newest shard with the same update query like http://localhost:8983/solr/update.(content stream) Is our idea possible to implement? Thanks in advance. Regards Vadim 2011/11/2 Jan Høydahl jan@cominvent.com Hi, The only difference is the core name in the URL, which should be easy enough to handle from your indexing client code. I don't really understand the reason behind your request. How would you control which core to index your document to if you did not specify it in the URL? You could name ONE of your cores as ., meaning it would be the default core living at /solr/update, perhaps that is what you're looking for? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote: Hello folks, i have an problem with shard indexing. with an single core i use this update command: http://localhost:8983/solr/update . now i have 2 shards, we can call them core0 / core1 http://localhost:8983/solr/core0/update . can i adjust anything to indexing in the same way like with a single core without core-name? thanks and regards vadim
how to : multicore setup with same config files
Hi folks, i have a small blockade in the configuration of an multicore setup. i use the latest solr version (4.0) from trunk and the example (with jetty). single core is running without problems. We assume that i have this structure: /solr-trunk/solr/example/multicore/ solr.xml core0/ core1/ /solr-data/ /conf/ schema.xml solrconfig.xml /data/ core0/ index core1/ index I want so share the config-files(same instanceDir but different docDir) How can i configure this so that it works(solrconfig.xml, solr.xml)? Do i need the directories for core0/core1 in solr-trunk/...? I found issues in Jira with old patches which unfortunately doesn't work. Thanks and Regards Vadim
Re: how to : multicore setup with same config files
it works. it was one wrong placed backslash in my config;) sharing the config/schema files is not a problem. regards vadim 2011/10/31 Vadim Kisselmann v.kisselm...@googlemail.com Hi folks, i have a small blockade in the configuration of an multicore setup. i use the latest solr version (4.0) from trunk and the example (with jetty). single core is running without problems. We assume that i have this structure: /solr-trunk/solr/example/multicore/ solr.xml core0/ core1/ /solr-data/ /conf/ schema.xml solrconfig.xml /data/ core0/ index core1/ index I want so share the config-files(same instanceDir but different docDir) How can i configure this so that it works(solrconfig.xml, solr.xml)? Do i need the directories for core0/core1 in solr-trunk/...? I found issues in Jira with old patches which unfortunately doesn't work. Thanks and Regards Vadim
Re: LUCENE-2208 (SOLR-1883) Bug with HTMLStripCharFilter, given patch in next nightly build?
UPDATE: i checked out the latest trunk-version and patched this with the patch from LUCENE-2208. This patch seems not to work. Or i had done something wrong. My old log snippets: Http - 500 Internal Server Error Error: Carrot2 clustering failed And this was caused by: Http - 500 Internal Server Error Error: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token the exceeds length of provided text sized 41 Best Regards Vadim 2011/10/20 Vadim Kisselmann v.kisselm...@googlemail.com Hello folks, i have big problems with InvalidTokenOffsetExceptions with highlighting. Looks like a bug in HTMLStripCharFilter. H.Wang added a patch in LUCENE-2208, but nobody have time to look at this. Could someone of the committers please take a look at this patch and commit it or is this problem more complicated as i think? :) Thanks guys... Best Regards Vadim
LUCENE-2208 (SOLR-1883) Bug with HTMLStripCharFilter, given patch in next nightly build?
Hello folks, i have big problems with InvalidTokenOffsetExceptions with highlighting. Looks like a bug in HTMLStripCharFilter. H.Wang added a patch in LUCENE-2208, but nobody have time to look at this. Could someone of the committers please take a look at this patch and commit it or is this problem more complicated as i think? :) Thanks guys... Best Regards Vadim
Re: millions of records problem
Hi, a number of relevant questions is given. i have another one: which type of docs do you have? Do you add some new docs every day? Or is it a stable number of docs (500Mio.) ? What about Replication? Regards Vadim 2011/10/17 Otis Gospodnetic otis_gospodne...@yahoo.com Hi Jesús, Others have already asked a number of relevant question. If I had to guess, I'd guess this is simply a disk IO issue, but of course there may be room for improvement without getting more RAM or SSDs, so tell us more about your queries, about disk IO you are seeing, etc. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Jesús Martín García jmar...@cesca.cat To: solr-user@lucene.apache.org Sent: Monday, October 17, 2011 6:19 AM Subject: millions of records problem Hi, I've got 500 millions of documents in solr everyone with the same number of fields an similar width. The version of solr which I used is 1.4.1 with lucene 2.9.3. I don't have the option to use shards so the whole index has to be in a machine... The size of the index is about 50Gb and the ram is 8GbEverything is working but the searches are so slowly, although I tried different configurations of the solrconfig.xml as: - configure a first searcher with the most used searches - configure the caches (query, filter and document) with great numbers... but everything is still working slowly, so do you have any ideas to boost the searches without the penalty to use much more ram? Thanks in advance, Jesús -- ... __ / / Jesús Martín García C E / S / C A Tècnic de Projectes /__ / Centre de Serveis Científics i Acadèmics de Catalunya Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat ...
Morelikethis understanding question
Hello folks, i have a question about the MLT. For example my query: localhost:8983/solr/mlt/?q=gefechtseinsatz+AND+dnamlt=truemlt.fl=textmlt.count=0mlt.boost=truemlt.mindf=5mlt.mintf=5mlt.minwl=4 *I have 1 Query-RESULT and 13 MLT-docs. The MLT-Result corresponds to the half of my index.* In my case i want j*ust this docs, which have at least half of the words from my Query-RESULT-Document,* they should be very similar. How should i set my parameters to achieve this? Thanks and Regards Vadim
Re: strange performance issue with many shards on one server
Hi Fred, analyze the queries which take longer. We observe our queries and see the problems with q-time with queries which are complex, with phrase queries or queries which contains numbers or special characters. if you don't know it: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com Hi, I am experiencing a strange issue doing some load tests. Our setup: - 2 server with each 24 cpu cores, 130GB of RAM - 10 shards per server (needed for response times) running in a single tomcat instance - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Any ideas are greatly appreciated :) Fred.
Re: strange performance issue with many shards on one server
Hi Fred, ok, it's a strange behavior with same queries. Another questions: -which solr version? -do you indexing during your load test? (because of index rebuilt) -do you replicate your index? Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com Hi Vladim, the thing is, that those exact same queries, that take longer during a load test, perform just fine when executed at a slower request rate and are also random, i.e. there is no pattern in bad/slow queries. My first thought was some kind of contention and/or connection starvation for the internal shard communication? Fred. Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann: Hi Fred, analyze the queries which take longer. We observe our queries and see the problems with q-time with queries which are complex, with phrase queries or queries which contains numbers or special characters. if you don't know it: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto: frederik.kr...@gmail.com) Hi, I am experiencing a strange issue doing some load tests. Our setup: - 2 server with each 24 cpu cores, 130GB of RAM - 10 shards per server (needed for response times) running in a single tomcat instance - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Any ideas are greatly appreciated :) Fred.
Re: Still too many files after running solr optimization
why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: Still too many files after running solr optimization
if numDocs und maxDocs have the same mumber of docs nothing will be deleted on optimize. You only rebuild your index. Regards Vadim 2011/9/28 Kissue Kissue kissue...@gmail.com numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: strange performance issue with many shards on one server
) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus: Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann: Hi Fred, ok, it's a strange behavior with same queries. Another questions: -which solr version? 3.3 (might the NIOFSDirectory from 3.4 help?) -do you indexing during your load test? (because of index rebuilt) nope -do you replicate your index? nope Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto: frederik.kr...@gmail.com) Hi Vladim, the thing is, that those exact same queries, that take longer during a load test, perform just fine when executed at a slower request rate and are also random, i.e. there is no pattern in bad/slow queries. My first thought was some kind of contention and/or connection starvation for the internal shard communication? Fred. Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann: Hi Fred, analyze the queries which take longer. We observe our queries and see the problems with q-time with queries which are complex, with phrase queries or queries which contains numbers or special characters. if you don't know it: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto: frederik.kr...@gmail.com) (mailto: frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com)) Hi, I am experiencing a strange issue doing some load tests. Our setup: - 2 server with each 24 cpu cores, 130GB of RAM - 10 shards per server (needed for response times) running in a single tomcat instance - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Any ideas are greatly appreciated :) Fred.
Re: Still too many files after running solr optimization
2011/9/28 Manish Bafna manish.bafna...@gmail.com Will it not merge the index? yes While merging on windows, the old index files dont get deleted. (Windows has an issue where the file opened for reading cannot be deleted) So, if you call optimize again, it will delete the older index files. no. during optimize you only delete docs, which are flagged as deleted. no matter how old they are. if your numDocs and maxDocs have the same number of Docs, you only rebuild and merge your index, but you delete nothing. Regards On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: if numDocs und maxDocs have the same mumber of docs nothing will be deleted on optimize. You only rebuild your index. Regards Vadim 2011/9/28 Kissue Kissue kissue...@gmail.com numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: Still too many files after running solr optimization
we had an understanding problem:) docs are the docs in index. files are the files in the index directory (index parts). during the optimization you don't delete docs if they are don't flagged as deleted. but you merge your index und delete the files in your index directory, thats right. after an second optimize the files are deleted which were opened for reading. Regards 2011/9/28 Manish Bafna manish.bafna...@gmail.com We tested it so many times. 1st time we optimize, the new index file is created (merged one), but the existing index files are not deleted (because they might be still open for reading) 2nd time optimize, other than the new index file, all else gets deleted. This is happening specifically on Windows. On Wed, Sep 28, 2011 at 8:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: 2011/9/28 Manish Bafna manish.bafna...@gmail.com Will it not merge the index? yes While merging on windows, the old index files dont get deleted. (Windows has an issue where the file opened for reading cannot be deleted) So, if you call optimize again, it will delete the older index files. no. during optimize you only delete docs, which are flagged as deleted. no matter how old they are. if your numDocs and maxDocs have the same number of Docs, you only rebuild and merge your index, but you delete nothing. Regards On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: if numDocs und maxDocs have the same mumber of docs nothing will be deleted on optimize. You only rebuild your index. Regards Vadim 2011/9/28 Kissue Kissue kissue...@gmail.com numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: NRT and commit behavior
Tirthankar, are you indexing 1.smaller docs or 2.books? if 1. your caches are too big for your memory, as Erick already said. Try to allocate 10GB für JVM, leave 14GB for your HDD-Cache and make your caches smaller. if 2. read the blog-posts on hathitrust.com. http://www.hathitrust.org/blogs/large-scale-search Regards Vadim 2011/9/24 Erick Erickson erickerick...@gmail.com No G. The problem is that number of documents isn't a reliable indicator of resource consumption. Consider the difference between indexing a twitter message and a book. I can put a LOT more docs of 140 chars on a single machine of size X than I can books. Unfortunately, the only way I know of is to test. Use something like jMeter of SolrMeter to fire enough queries at your machine to determine when you're over-straining resources and shard at that point (or get a bigger machine G).. Best Erick On Wed, Sep 21, 2011 at 8:24 PM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Okay, but is there any number that if we reach on the index size or total docs in the index or the size of physical memory that sharding should be considered. I am trying to find the winning combination. Tirthankar -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, September 16, 2011 7:46 AM To: solr-user@lucene.apache.org Subject: Re: NRT and commit behavior Uhm, you're putting a lot of index into not very much memory. I really think you're going to have to shard your index across several machines to get past this problem. Simply increasing the size of your caches is still limited by the physical memory you're working with. You really have to put a profiler on the system to see what's going on. At that size there are too many things that it *could* be to definitively answer it with e-mails Best Erick On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Erick, Also, we had our solrconfig where we have tried increasing the cache making the below value for autowarm count as 0 helps returning the commit call within the second, but that will slow us down on searches filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ !-- Cache used to hold field values that are quickly accessible by document id. The fieldValueCache is created by default even if not configured here. fieldValueCache class=solr.FastLRUCache size=512 autowarmCount=128 showItems=32 / -- !-- queryResultCache caches results of searches - ordered lists of document ids (DocList) based on a query, a sort, and the range of documents requested. -- queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=512/ -Original Message- From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com] Sent: Wednesday, September 14, 2011 7:31 AM To: solr-user@lucene.apache.org Subject: RE: NRT and commit behavior Erick, Here is the answer to your questions: Our index is 267 GB We are not optimizing... No we have not profiled yet to check the bottleneck, but logs indicate opening the searchers is taking time... Nothing except SOLR Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and JVM and Tomcat -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, September 11, 2011 11:37 AM To: solr-user@lucene.apache.org Subject: Re: NRT and commit behavior Hmm, OK. You might want to look at the non-cached filter query stuff, it's quite recent. The point here is that it is a filter that is applied only after all of the less expensive filter queries are run, One of its uses is exactly ACL calculations. Rather than calculate the ACL for the entire doc set, it only calculates access for docs that have made it past all the other elements of the query See SOLR-2429 and note that it is a 3.4 (currently being released) only. As to why your commits are taking so long, I have no idea given that you really haven't given us much to work with. How big is your index? Are you optimizing? Have you profiled the application to see what the bottleneck is (I/O, CPU, etc?). What else is running on your machine? It's quite surprising that it takes that long. How much memory are you giving the JVM? etc... You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Sep 9, 2011 at 9:41 AM,
Last successful build of Solr 4.0 and Near Realtime Search
Hi folks, I'm writing here again (beside Jira: SOLR-2565), eventually any one can help here: I tested the nightly build #1595 with an new patch (2565), but NRT doesn't work in my case. I index 10 docs/sec, it takes 1-30sec. to see the results. same behavior when i update an existing document. My addedDate is an timestamp (default=NOW). In worst case i can see what the document which i indexed is already more when 30 seconds in my index, but i can't see it. My Settings: autoCommit maxDocs1000/maxDocs maxTime6/maxTime /autoCommit autoSoftCommit maxDocs1/maxDocs maxTime1000/maxTime /autoSoftCommit Are my settings wrong or need you more details? Should i use the coldSearcher (default=false)? Or set maxWarmingSearchers higher than 2? UPDATE: If i only use autoSoftCommit and uncomment autoCommit it works. But i should use the hard autoCommit, right? Mark said yes, because only with hard commits my docs are in stable storage: http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr- ‘near-realtime’-improvements/ Regards Vadim
Re: Unbuffered entity enclosing request can not be repeated Invalid chunk header
Hi Markus, thanks for your answer. I'm using Solr. 4.0 and jetty now and observe the behavior and my error logs next week. tomcat can be a reason, we will see, i'll report. I'm indexing WITHOUT batches, one doc after another. But i would try out the batch indexing as well as retry indexing faulty docs. if you indexing one batch, and one doc in batch is corrupt, what happens with another 249docs(total 250/batch)? Are they indexed and updated when you retry to indexing the batch, or fails the complete batch? Regards Vadim 2011/8/11 Markus Jelsma markus.jel...@openindex.io Hi, We see these errors too once on a while but there is real answer on the mailing list here except one user suspecting Tomcat is responsible (connection time outs). Another user proposed to limit the number of documents per batch but that, of course, increases the number of connections made. We do only 250 docs/batch to limit RAM usage on the client and start to see these errors very occasionally. There may be a coincidence.. or not. Anyway, it's really hard to reproduce if not impossible. It happens when connecting directly as well when connecting through a proxy. What you can do is simply retry the batch and it usually works out fine. At least you don't loose a batch in the process. We retry all failures at least a couple of times before giving up an indexing job. Cheers, Hello folks, i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my log files. on the client side: 2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing failed with SolrServerException. Details: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.: Stacktrace: org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt pSolrServer.java:469) . . on the server side: INFO: [] webapp=/solr path=/update params={wt=javabinversion=1} status=0 QTime=3 04.08.2011 12:01:18 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 04.08.2011 12:01:18 org.apache.solr.common.SolrException log SCHWERWIEGEND: org.apache.solr.common.SolrException: java.io.IOException: Invalid chunk header . . . i`m indexing ONE document per call, 15-20 documents per second, 24/7. what may be the problem? best regards vadim
Unbuffered entity enclosing request can not be repeated Invalid chunk header
Hello folks, i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my log files. on the client side: 2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing failed with SolrServerException. Details: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.: Stacktrace: org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469) . . on the server side: INFO: [] webapp=/solr path=/update params={wt=javabinversion=1} status=0 QTime=3 04.08.2011 12:01:18 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 04.08.2011 12:01:18 org.apache.solr.common.SolrException log SCHWERWIEGEND: org.apache.solr.common.SolrException: java.io.IOException: Invalid chunk header . . . i`m indexing ONE document per call, 15-20 documents per second, 24/7. what may be the problem? best regards vadim
Re: Replication slows down massively during high load
Hello Shawn, Primary assumption: You have a 64-bit OS and a 64-bit JVM. Jepp, it's running 64-bit Linux with 64-bit JVM It sounds to me like you're I/O bound, because your machine cannot keep enough of your index in RAM. Relative to your 100GB index, you only have a maximum of 14GB of RAM available to the OS disk cache, since Java's heap size is 10GB. The load test seems to be more CPU bound than I/O bound. All cores are fully busy and iostat says that there isn't much more disk I/O going on than without load test. The index is on a RAID10 array with four disks. How much disk space do all of the index files that end in x take up? I would venture a guess that it's significantly more than 14GB. On Linux, you could do this command to tally it quickly: # du -hc *x 27G total # du -hc `ls | egrep -v tvf|fdt` 51G total If you installed enough RAM so the disk cache can be much larger than the total size of those files ending in x, you'd probably stop having these performance issues. Realizing that this is a Alternatively, you could take steps to reduce the size of your index, or perhaps add more machines to go distributed. Unfortunately, this doesn't seem to be the problem. The queries themselves are running fine. The problem is that the replications is crawling when there are many queries going on and that the replication speed stays low even after the load is gone. Cheers Vadim
Re: Replication slows down massively during high load
On Mar 17, 2011, at 3:19 PM, Shawn Heisey wrote: On 3/17/2011 3:43 AM, Vadim Kisselmann wrote: Unfortunately, this doesn't seem to be the problem. The queries themselves are running fine. The problem is that the replications is crawling when there are many queries going on and that the replication speed stays low even after the load is gone. If you run iostat 5 what are typical values on each iteration for the various CPU states while you're doing load testing and replication at the same time? In particular, %iowait is important. CPU stats from top (iostat doesn't seem to show CPU load correctly): 90.1%us, 4.5%sy, 0.0%ni, 5.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Seems like I/O is not the bottleneck here. Other interesting thing: When Solr starts its replication under heavy load, it tries to download the whole index from master. From /solr/admin/replication/index.jsp: Current Replication Status Start Time: Thu Mar 17 15:57:20 CET 2011 Files Downloaded: 9 / 163 Downloaded: 83,04 MB / 97,75 GB [0.0%] Downloading File: _d5x.nrm, Downloaded: 86,82 KB / 86,82 KB [100.0%] Time Elapsed: 419s, Estimated Time Remaining: 504635s, Speed: 202,94 KB/s
Re: Replication slows down massively during high load
Hi Bill, You could always rsync the index dir and reload (old scripts). I used them previously but was getting problems with them. The application querying the Solr doesn't cause enough load on it to trigger the issue. Yet. But this is still something we should investigate. Indeed :-) See if the Nic is configured right? Routing? Speed of transfer? Network doesn't seem to be the problem. Testing with iperf from slave to master yields a full gigabit, even while Solrmeter is hammering the server. Bill Bell Vadim
Replication slows down massively during high load
Hi everyone, I have Solr running on one master and two slaves (load balanced) via Solr 1.4.1 native replication. If the load is low, both slaves replicate with around 100MB/s from master. But when I use Solrmeter (100-400 queries/min) for load tests (over the load balancer), the replication slows down to an unacceptable speed, around 100KB/s (at least that's whats the replication page on /solr/admin says). Going to a slave directly without load balancer yields the same result for the slave under test: Slave 1 gets hammered with Solrmeter and the replication slows down to 100KB/s. At the same time, Slave 2 with only 20-50 queries/min without the load test has no problems. It replicates with 100MB/s and the index version is 5-10 versions ahead of Slave 1. The replications stays in the 100KB/s range even after the load test is over until the application server is restarted. The same issue comes up under both Tomcat and Jetty. The setup looks like this: - Same hardware for all servers: Physical machines with quad core CPUs, 24GB RAM (JVM starts up with -XX:+UseConcMarkSweepGC -Xms10G -Xmx10G) - Index size is about 100GB with 40M docs - Master commits every 10 min/10k docs - Slaves polls every minute I checked this: - Changed network interface; same behavior - Increased thread pool size from 200 to 500 and queue size from 100 to 500 in Tomcat; same behavior - Both disk and network I/O are not bottlenecked. Disk I/O went down to almost zero after every query in the load test got cached. Network isn't doing much and can put through almost an GBit/s with iPerf (network throughput tester) while Solrmeter is running. Any ideas what could be wrong? Best Regards Vadim