Re: SOLR not starting after restart 2 node cloud setup
Dear Erick, Thanks for your thoughts, it helped me a lot. In my instances no solr logs are appended in to catalina.out. Now I placed the log4j.properties file. Solr logs are captured in solr.log file with the help of it I found the reason for the issue. I am starting tomcat with the option -Dbootstrap_conf=true which made solr to look for core configuration files in a wrong directory, after removing this it started without any issues. I also commented suggester component which made solr to load fast. Thanks, Doss. On Thu, Nov 20, 2014 at 9:47 PM, Erick Erickson erickerick...@gmail.com wrote: Doss: Tomcat often puts things in catalina.out, you might check there, I've often seen logging information from Solr go there by default. Without having some idea what kinds of problems Solr is reporting when you see this situation, it's really hard to say. Some things I'd check first though, in order of what I _guess_ is most likely. There have been anecdotal reports (in fact, I'm trying to understand the why of it right now) of the suggester taking a long time to initialize, even if you don't use it! So if you're not using the suggest component, try commenting out those sections in solrconfig.xml for the cores in question. I like this explanation since it fits with your symptoms, but I don't like it since the index you are using isn't all that big. So it's something of a shot in the dark. I expect that the core will _eventually_ come up, but I've seen reports of 10-15 minutes being required, far beyond my patience! That said, this would also explain why deleting the index works. OutOfMemory errors. You might be able to attach jConsole (part of the standard Java stuff) to the process and monitor the memory usage. If it's being pushed near the 5G limit that's the first thing I'd suspect. If you're using the default setups, then the Zookeeper timeout may be too low, I think the default (not sure about whether it's been changed in 4.9) is 15 seconds, 30-60 is usually much better. Best, Erick On Thu, Nov 20, 2014 at 3:47 AM, Doss itsmed...@gmail.com wrote: Dear Erick, Forgive my ignorance. Please find some of the details you required. *have you looked at the solr logs?* Sorry I haven't defined the log4j.properties file, so I don't have solr logs. Since it requires tomcat restart I am planning to do it in next restart. But found the following in tomcat log 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/mima] appears to have started a thread named [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) *How big are the cores?* We have 16 cores, out of it only 5 are big ones. Total size of all 16 cores is 10+ GB *How many docs in the cores when the problem happens?* 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB) 4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB) remaining cores are 1,00,000 to 40,00,000 documents *How much memory are you allocating the JVM? * 5GB for JVM, Total RAM available in the systems is 30 GB *can you restart Tomcat without a problem?* This problem is occurring in production, I never tried. Thanks, Doss. On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson erickerick...@gmail.com wrote: You've really got to provide details for us to say much of anything. There are about a zillion things that it could be. In particular, have you looked at the solr logs? Are there any interesting things in them? How big are the cores? How much memory are you allocating the JVM? How many docs in the cores when the problem happens? Before the nodes stop responding, can you restart Tomcat without a problem? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote: I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in Node 1, but SOLR not starting up, but if I remove the solr cores in both nodes and try restarting it starts working, and then I have to reindex the whole data again. We are using this setup in production because of this issue we
Re: SOLR not starting after restart 2 node cloud setup
Glad you found a solution! Best, Erick On Tue, Dec 2, 2014 at 4:30 AM, Doss itsmed...@gmail.com wrote: Dear Erick, Thanks for your thoughts, it helped me a lot. In my instances no solr logs are appended in to catalina.out. Now I placed the log4j.properties file. Solr logs are captured in solr.log file with the help of it I found the reason for the issue. I am starting tomcat with the option -Dbootstrap_conf=true which made solr to look for core configuration files in a wrong directory, after removing this it started without any issues. I also commented suggester component which made solr to load fast. Thanks, Doss. On Thu, Nov 20, 2014 at 9:47 PM, Erick Erickson erickerick...@gmail.com wrote: Doss: Tomcat often puts things in catalina.out, you might check there, I've often seen logging information from Solr go there by default. Without having some idea what kinds of problems Solr is reporting when you see this situation, it's really hard to say. Some things I'd check first though, in order of what I _guess_ is most likely. There have been anecdotal reports (in fact, I'm trying to understand the why of it right now) of the suggester taking a long time to initialize, even if you don't use it! So if you're not using the suggest component, try commenting out those sections in solrconfig.xml for the cores in question. I like this explanation since it fits with your symptoms, but I don't like it since the index you are using isn't all that big. So it's something of a shot in the dark. I expect that the core will _eventually_ come up, but I've seen reports of 10-15 minutes being required, far beyond my patience! That said, this would also explain why deleting the index works. OutOfMemory errors. You might be able to attach jConsole (part of the standard Java stuff) to the process and monitor the memory usage. If it's being pushed near the 5G limit that's the first thing I'd suspect. If you're using the default setups, then the Zookeeper timeout may be too low, I think the default (not sure about whether it's been changed in 4.9) is 15 seconds, 30-60 is usually much better. Best, Erick On Thu, Nov 20, 2014 at 3:47 AM, Doss itsmed...@gmail.com wrote: Dear Erick, Forgive my ignorance. Please find some of the details you required. *have you looked at the solr logs?* Sorry I haven't defined the log4j.properties file, so I don't have solr logs. Since it requires tomcat restart I am planning to do it in next restart. But found the following in tomcat log 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/mima] appears to have started a thread named [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) *How big are the cores?* We have 16 cores, out of it only 5 are big ones. Total size of all 16 cores is 10+ GB *How many docs in the cores when the problem happens?* 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB) 4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB) remaining cores are 1,00,000 to 40,00,000 documents *How much memory are you allocating the JVM? * 5GB for JVM, Total RAM available in the systems is 30 GB *can you restart Tomcat without a problem?* This problem is occurring in production, I never tried. Thanks, Doss. On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson erickerick...@gmail.com wrote: You've really got to provide details for us to say much of anything. There are about a zillion things that it could be. In particular, have you looked at the solr logs? Are there any interesting things in them? How big are the cores? How much memory are you allocating the JVM? How many docs in the cores when the problem happens? Before the nodes stop responding, can you restart Tomcat without a problem? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote: I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in Node 1, but SOLR not starting up, but if I remove the solr cores in both nodes and try restarting it starts working, and
Re: SOLR not starting after restart 2 node cloud setup
Dear Erick, Forgive my ignorance. Please find some of the details you required. *have you looked at the solr logs?* Sorry I haven't defined the log4j.properties file, so I don't have solr logs. Since it requires tomcat restart I am planning to do it in next restart. But found the following in tomcat log 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/mima] appears to have started a thread named [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) *How big are the cores?* We have 16 cores, out of it only 5 are big ones. Total size of all 16 cores is 10+ GB *How many docs in the cores when the problem happens?* 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB) 4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB) remaining cores are 1,00,000 to 40,00,000 documents *How much memory are you allocating the JVM? * 5GB for JVM, Total RAM available in the systems is 30 GB *can you restart Tomcat without a problem?* This problem is occurring in production, I never tried. Thanks, Doss. On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson erickerick...@gmail.com wrote: You've really got to provide details for us to say much of anything. There are about a zillion things that it could be. In particular, have you looked at the solr logs? Are there any interesting things in them? How big are the cores? How much memory are you allocating the JVM? How many docs in the cores when the problem happens? Before the nodes stop responding, can you restart Tomcat without a problem? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote: I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in Node 1, but SOLR not starting up, but if I remove the solr cores in both nodes and try restarting it starts working, and then I have to reindex the whole data again. We are using this setup in production because of this issue we are having 1 to 1.30 hours of service down time. Any suggestions would be greatly appreciated. Thanks, Doss.
Re: SOLR not starting after restart 2 node cloud setup
Doss: Tomcat often puts things in catalina.out, you might check there, I've often seen logging information from Solr go there by default. Without having some idea what kinds of problems Solr is reporting when you see this situation, it's really hard to say. Some things I'd check first though, in order of what I _guess_ is most likely. There have been anecdotal reports (in fact, I'm trying to understand the why of it right now) of the suggester taking a long time to initialize, even if you don't use it! So if you're not using the suggest component, try commenting out those sections in solrconfig.xml for the cores in question. I like this explanation since it fits with your symptoms, but I don't like it since the index you are using isn't all that big. So it's something of a shot in the dark. I expect that the core will _eventually_ come up, but I've seen reports of 10-15 minutes being required, far beyond my patience! That said, this would also explain why deleting the index works. OutOfMemory errors. You might be able to attach jConsole (part of the standard Java stuff) to the process and monitor the memory usage. If it's being pushed near the 5G limit that's the first thing I'd suspect. If you're using the default setups, then the Zookeeper timeout may be too low, I think the default (not sure about whether it's been changed in 4.9) is 15 seconds, 30-60 is usually much better. Best, Erick On Thu, Nov 20, 2014 at 3:47 AM, Doss itsmed...@gmail.com wrote: Dear Erick, Forgive my ignorance. Please find some of the details you required. *have you looked at the solr logs?* Sorry I haven't defined the log4j.properties file, so I don't have solr logs. Since it requires tomcat restart I am planning to do it in next restart. But found the following in tomcat log 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/mima] appears to have started a thread named [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) *How big are the cores?* We have 16 cores, out of it only 5 are big ones. Total size of all 16 cores is 10+ GB *How many docs in the cores when the problem happens?* 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB) 4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB) remaining cores are 1,00,000 to 40,00,000 documents *How much memory are you allocating the JVM? * 5GB for JVM, Total RAM available in the systems is 30 GB *can you restart Tomcat without a problem?* This problem is occurring in production, I never tried. Thanks, Doss. On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson erickerick...@gmail.com wrote: You've really got to provide details for us to say much of anything. There are about a zillion things that it could be. In particular, have you looked at the solr logs? Are there any interesting things in them? How big are the cores? How much memory are you allocating the JVM? How many docs in the cores when the problem happens? Before the nodes stop responding, can you restart Tomcat without a problem? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote: I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in Node 1, but SOLR not starting up, but if I remove the solr cores in both nodes and try restarting it starts working, and then I have to reindex the whole data again. We are using this setup in production because of this issue we are having 1 to 1.30 hours of service down time. Any suggestions would be greatly appreciated. Thanks, Doss.
SOLR not starting after restart 2 node cloud setup
I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in Node 1, but SOLR not starting up, but if I remove the solr cores in both nodes and try restarting it starts working, and then I have to reindex the whole data again. We are using this setup in production because of this issue we are having 1 to 1.30 hours of service down time. Any suggestions would be greatly appreciated. Thanks, Doss.
Re: SOLR not starting after restart 2 node cloud setup
You've really got to provide details for us to say much of anything. There are about a zillion things that it could be. In particular, have you looked at the solr logs? Are there any interesting things in them? How big are the cores? How much memory are you allocating the JVM? How many docs in the cores when the problem happens? Before the nodes stop responding, can you restart Tomcat without a problem? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote: I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in Node 1, but SOLR not starting up, but if I remove the solr cores in both nodes and try restarting it starts working, and then I have to reindex the whole data again. We are using this setup in production because of this issue we are having 1 to 1.30 hours of service down time. Any suggestions would be greatly appreciated. Thanks, Doss.