Re: master/slave failure scenario
Indexing is usually much more expensive that replication so it won't scale well as you add more servers. Also, what would a client do if it was able to send the update to only some of the servers because others were down (for maintenance, etc)? -Bryan On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote: Just curious. What would be the disadvantages of a no replication / multi master (no slave) setup? The client code should do the updates for evey master ofc, but if one machine would fail then I can imediatly continue the indexing process and also I can query the index on any machine for a valid result. I might be missing something... On Thu, May 14, 2009 at 4:19 PM, nk 11 nick.cass...@gmail.com wrote: wow! that was just a couple of days old! thanks as lot! 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com yeah there is a hack https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel #action_12708316 On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com wrote: sorry for the mail. I wanted to hit reply :( On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com wrote: oh, so the configuration must be manualy changed? Can't something be passed at (re)start time? 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com wrote: Ok so the VIP will point to the new master. but what makes a slave promoted to a master? Only the fact that it will receive add/update requests? And I suppose that this hot promotion is possible only if the slave is convigured as master also... right.. By default you can setup all slaves to be master also. It does not cost anything if it is not serving any requests. so , if you have such a setting you will have to disable that slave to be a slave and restart it and you will have to make the VIP point to this new slave as master. so hot promotion is still not possible. 2009/5/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com ideally , we don't do that. you can just keep the master host behind a VIP so if you wish to change the master make the VIP point to the new host On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass. 1...@gmail.com wrote: This is more interesting.Such a procedure would involve taking down and reconfiguring the slave? On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot btal...@aeriagames.comwrote: Or ... 1. Promote existing slave to new master 2. Add new slave to cluster -Bryan On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: - Migrate configuration files from old master (or backup) to new master. - Replicate from a slave to the new master. - Resume indexing to new master. -Jay On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com wrote: Nice. What if the master fails permanently (like a disk crash...) and the new master is a clean machine? 2009/5/13 Noble Paul നോബിള് नो ब्ळ् noble.p...@corp.aol.com On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com wrote: Hello I'm kind of new to Solr and I've read about replication, and the fact that a node can act as both master and slave. I a replica fails and then comes back on line I suppose that it will resyncs with the master. right But what happnes if the master fails? A slave that is configured as master will kick in? What if that slave is not yes fully sync'ed with the failed master and has old data? if the master fails you can't index the data. but the slaves will continue serving the requests with the last index. You an bring back the master up and resume indexing. What happens when the original master comes back on line? He will remain a slave because there is another node with the master role? Thank you! -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Howto? Applying a filter across schema fileds using state information
I needed to do something like this recently as well. I needed to copy a date field (with full precision to the millisecond) to a string field of just MMDD. I didn't see a way to do it in solr core. I ended up doing it in the Data Import Handler during import. I'd rather have code like that in the core someplace in case documents are added via some other mechanism. -Bryan On May 18, 2009, at May 18, 1:44 AM, Yatir wrote: Hi, I need to write a filter that extracts information from the content of one filed (say the Body field) and then applies some transformation based on this content, to a *different* filed (say: the Title field) is this possible ? Example: I will find certain keywords in the body and then locate them and transform them in the title -- View this message in context: http://www.nabble.com/Howto--Applying-a-filter-across-schema-fileds-using-state-information-tp23593424p23593424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replication of lucene-write.lock file
https://issues.apache.org/jira/browse/SOLR-1170 -Bryan On May 15, 2009, at May 15, 12:24 AM, Noble Paul നോബിള് नोब्ळ् wrote: the replication relies on lucene API to know what are the files associated with an index version. If it returns the lock file also it is replicated too. I guess we must ignore the .lock file if it is returned in the list of files. you can raise an issue and we can fix it. --Noble On Fri, May 15, 2009 at 12:38 AM, Bryan Talbot btal...@aeriagames.com wrote: When using solr 1.4 replication, I see that the lucene-write.lock file is being replicated to slaves. I'm importing data from a db every 5 minutes using cron to trigger a DIH delta-import. Replication polls every 60 seconds and the master is configured to take a snapshot (replicateAfter) commit. Why should the lock file be replicated to slaves? The lock file isn't stale on the master and is absent unless the delta-import is in process. I've not tried it yet, but with the lock file replicated, it seems like promotion of a slave to a master in a failure recovery scenario requires the manual removal of the lock file. -Bryan -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Replication master+slave
https://issues.apache.org/jira/browse/SOLR-1167 -Bryan On May 13, 2009, at May 13, 7:20 PM, Otis Gospodnetic wrote: Bryan, maybe it's time to stick this in JIRA? http://wiki.apache.org/solr/HowToContribute Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bryan Talbot btal...@aeriagames.com To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 10:11:21 PM Subject: Re: Replication master+slave I think the patch I included earlier covers solr core, but it looks like at least some other extensions (DIH) create and use their own XML parser. So, if this functionality is to extend to all XML files, those will need similar patches. Here's one for DIH: --- src/main/java/org/apache/solr/handler/dataimport/ DataImporter.java (revision 774137) +++ src/main/java/org/apache/solr/handler/dataimport/ DataImporter.java (working copy) @@ -148,8 +148,10 @@ void loadDataConfig(String configFile) { try { - DocumentBuilder builder = DocumentBuilderFactory.newInstance() - .newDocumentBuilder(); + DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + dbf.setNamespaceAware(true); + dbf.setXIncludeAware(true); + DocumentBuilder builder = dbf.newDocumentBuilder(); Document document = builder.parse(new InputSource(new StringReader( configFile))); The only down side I can see to this is it doesn't offer very expressive conditional inclusion: the file is included if it's present otherwise fallback inclusions can be used. It's also specific to XML files and obviously won't work for other types of configuration files. However, it is simple and effective. -Bryan On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote: Coincidentally, from http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ : Hadoop configuration files now support XInclude elements for including portions of another configuration file (HADOOP-4944). This mechanism allows you to make configuration files more modular and reusable. So others are doing it, too. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bryan Talbot To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 11:26:41 AM Subject: Re: Replication master+slave I see that Nobel's final comment in SOLR-1154 is that config files need to be able to include snippets from external files. In my limited testing, a simple patch to enable XInclude support seems to work. --- src/java/org/apache/solr/core/Config.java (revision 774137) +++ src/java/org/apache/solr/core/Config.java (working copy) @@ -100,8 +100,10 @@ if (lis == null) { lis = loader.openConfig(name); } - javax.xml.parsers.DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); - doc = builder.parse(lis); + javax.xml.parsers.DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + dbf.setNamespaceAware(true); + dbf.setXIncludeAware(true); + doc = dbf.newDocumentBuilder().parse(lis); DOMUtil.substituteProperties(doc, loader.getCoreProperties()); } catch (ParserConfigurationException e) { This allows a clause like this to include the contents of replication.xml if it exists. If it's not found an exception will be thrown. href=http://localhost:8983/solr/corename/admin/file/?file=replication.xml xmlns:xi=http://www.w3.org/2001/XInclude; If the file is optional and no exception should be thrown if the file is missing, simply include a fallback action: in this case the fallback is empty and does nothing. href=http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml xmlns:xi=http://www.w3.org/2001/XInclude; -Bryan On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote: I was looking at the same problem, and had a discussion with Noble. You can use a hack to achieve what you want, see https://issues.apache.org/jira/browse/SOLR-1154 Thanks, Jianhan On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote: So how are people managing solrconfig.xml files which are largely the same other than differences for replication? I don't think it's a good thing to maintain two copies of the same file and I'd like to avoid that. Maybe enabling the XInclude feature in DocumentBuilders would make it possible to modularize configuration files to make this possible? http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean) -Bryan On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote: On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot wrote: For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication says that a node can be both the master and a slave: A node can act as both master and slave. In that case
replication of lucene-write.lock file
When using solr 1.4 replication, I see that the lucene-write.lock file is being replicated to slaves. I'm importing data from a db every 5 minutes using cron to trigger a DIH delta-import. Replication polls every 60 seconds and the master is configured to take a snapshot (replicateAfter) commit. Why should the lock file be replicated to slaves? The lock file isn't stale on the master and is absent unless the delta- import is in process. I've not tried it yet, but with the lock file replicated, it seems like promotion of a slave to a master in a failure recovery scenario requires the manual removal of the lock file. -Bryan
Re: Replication master+slave
I see that Nobel's final comment in SOLR-1154 is that config files need to be able to include snippets from external files. In my limited testing, a simple patch to enable XInclude support seems to work. --- src/java/org/apache/solr/core/Config.java (revision 774137) +++ src/java/org/apache/solr/core/Config.java (working copy) @@ -100,8 +100,10 @@ if (lis == null) { lis = loader.openConfig(name); } - javax.xml.parsers.DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); - doc = builder.parse(lis); + javax.xml.parsers.DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + dbf.setNamespaceAware(true); + dbf.setXIncludeAware(true); + doc = dbf.newDocumentBuilder().parse(lis); DOMUtil.substituteProperties(doc, loader.getCoreProperties()); } catch (ParserConfigurationException e) { This allows a clause like this to include the contents of replication.xml if it exists. If it's not found an exception will be thrown. !-- include external file to define replication configuration -- xi:include href=http://localhost:8983/solr/corename/admin/file/?file=replication.xml xmlns:xi=http://www.w3.org/2001/XInclude; /xi:include If the file is optional and no exception should be thrown if the file is missing, simply include a fallback action: in this case the fallback is empty and does nothing. !-- include external file to define replication configuration -- xi:include href=http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml xmlns:xi=http://www.w3.org/2001/XInclude; xi:fallback/ /xi:include -Bryan On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote: I was looking at the same problem, and had a discussion with Noble. You can use a hack to achieve what you want, see https://issues.apache.org/jira/browse/SOLR-1154 Thanks, Jianhan On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot btal...@aeriagames.comwrote: So how are people managing solrconfig.xml files which are largely the same other than differences for replication? I don't think it's a good thing to maintain two copies of the same file and I'd like to avoid that. Maybe enabling the XInclude feature in DocumentBuilders would make it possible to modularize configuration files to make this possible? http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean) http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware%28boolean%29 -Bryan On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote: On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot btal...@aeriagames.com wrote: For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication says that a node can be both the master and a slave: A node can act as both master and slave. In that case both the master and slave configuration lists need to be present inside the ReplicationHandler requestHandler in the solrconfig.xml. What does this mean? Does the core then poll itself for updates? No. This type of configuration is meant for repeaters. Suppose there are slaves in multiple data-centers (say data center A and B). There is always a single master (say in A). One of the slaves in B is used as a master for the other slaves in B. Therefore, this one slave in B is both a master as well as the slave. I'd like to have a single set of configuration files that are shared by masters and slaves and avoid duplicating configuration details in multiple files (one for master and one for slave) to ease management and failover. Is this possible? You wouldn't want the master to be a slave. So I guess you'd need to have a separate file. Also, it needs to be a separate file so that the slave does not become a master when the solrconfig.xml is replicated. When I attempt to setup a multi server master-slave configuration and include both master and slave replication configuration options, I into some problems. I'm running a nightly build from May 7. Not sure what happened. Is that the url for this solr (meaning same solr url is master and slave of itself)? If yes, that is not a valid configuration. -- Regards, Shalin Shekhar Mangar.
Re: master/slave failure scenario
Or ... 1. Promote existing slave to new master 2. Add new slave to cluster -Bryan On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote: - Migrate configuration files from old master (or backup) to new master. - Replicate from a slave to the new master. - Resume indexing to new master. -Jay On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com wrote: Nice. What if the master fails permanently (like a disk crash...) and the new master is a clean machine? 2009/5/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com wrote: Hello I'm kind of new to Solr and I've read about replication, and the fact that a node can act as both master and slave. I a replica fails and then comes back on line I suppose that it will resyncs with the master. right But what happnes if the master fails? A slave that is configured as master will kick in? What if that slave is not yes fully sync'ed with the failed master and has old data? if the master fails you can't index the data. but the slaves will continue serving the requests with the last index. You an bring back the master up and resume indexing. What happens when the original master comes back on line? He will remain a slave because there is another node with the master role? Thank you! -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Replication master+slave
I think the patch I included earlier covers solr core, but it looks like at least some other extensions (DIH) create and use their own XML parser. So, if this functionality is to extend to all XML files, those will need similar patches. Here's one for DIH: --- src/main/java/org/apache/solr/handler/dataimport/ DataImporter.java (revision 774137) +++ src/main/java/org/apache/solr/handler/dataimport/ DataImporter.java (working copy) @@ -148,8 +148,10 @@ void loadDataConfig(String configFile) { try { - DocumentBuilder builder = DocumentBuilderFactory.newInstance() - .newDocumentBuilder(); + DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + dbf.setNamespaceAware(true); + dbf.setXIncludeAware(true); + DocumentBuilder builder = dbf.newDocumentBuilder(); Document document = builder.parse(new InputSource(new StringReader( configFile))); The only down side I can see to this is it doesn't offer very expressive conditional inclusion: the file is included if it's present otherwise fallback inclusions can be used. It's also specific to XML files and obviously won't work for other types of configuration files. However, it is simple and effective. -Bryan On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote: Coincidentally, from http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ : Hadoop configuration files now support XInclude elements for including portions of another configuration file (HADOOP-4944). This mechanism allows you to make configuration files more modular and reusable. So others are doing it, too. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bryan Talbot btal...@aeriagames.com To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 11:26:41 AM Subject: Re: Replication master+slave I see that Nobel's final comment in SOLR-1154 is that config files need to be able to include snippets from external files. In my limited testing, a simple patch to enable XInclude support seems to work. --- src/java/org/apache/solr/core/Config.java (revision 774137) +++ src/java/org/apache/solr/core/Config.java (working copy) @@ -100,8 +100,10 @@ if (lis == null) { lis = loader.openConfig(name); } - javax.xml.parsers.DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); - doc = builder.parse(lis); + javax.xml.parsers.DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); + dbf.setNamespaceAware(true); + dbf.setXIncludeAware(true); + doc = dbf.newDocumentBuilder().parse(lis); DOMUtil.substituteProperties(doc, loader.getCoreProperties()); } catch (ParserConfigurationException e) { This allows a clause like this to include the contents of replication.xml if it exists. If it's not found an exception will be thrown. href=http://localhost:8983/solr/corename/admin/file/?file=replication.xml xmlns:xi=http://www.w3.org/2001/XInclude; If the file is optional and no exception should be thrown if the file is missing, simply include a fallback action: in this case the fallback is empty and does nothing. href=http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml xmlns:xi=http://www.w3.org/2001/XInclude; -Bryan On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote: I was looking at the same problem, and had a discussion with Noble. You can use a hack to achieve what you want, see https://issues.apache.org/jira/browse/SOLR-1154 Thanks, Jianhan On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote: So how are people managing solrconfig.xml files which are largely the same other than differences for replication? I don't think it's a good thing to maintain two copies of the same file and I'd like to avoid that. Maybe enabling the XInclude feature in DocumentBuilders would make it possible to modularize configuration files to make this possible? http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean) -Bryan On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote: On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot wrote: For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication says that a node can be both the master and a slave: A node can act as both master and slave. In that case both the master and slave configuration lists need to be present inside the ReplicationHandler requestHandler in the solrconfig.xml. What does this mean? Does the core then poll itself for updates? No. This type of configuration is meant for repeaters. Suppose there are slaves in multiple data-centers (say data center A and B). There is always a single master (say in A). One of the slaves in B is used as a master for the other slaves
Replication master+slave
For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication says that a node can be both the master and a slave: A node can act as both master and slave. In that case both the master and slave configuration lists need to be present inside the ReplicationHandler requestHandler in the solrconfig.xml. What does this mean? Does the core then poll itself for updates? I'd like to have a single set of configuration files that are shared by masters and slaves and avoid duplicating configuration details in multiple files (one for master and one for slave) to ease management and failover. Is this possible? When I attempt to setup a multi server master-slave configuration and include both master and slave replication configuration options, I into some problems. I'm running a nightly build from May 7. requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str /lst lst name=slave str name=masterUrlhttp://master_core01:8983/solr/core01/ replication/str str name=pollInterval00:00:60/str /lst /requestHandler When the replication admin page (http://master_core01:8983/solr/core01/ admin/replication/index.jsp) is visited, the severe error show below appears in the solr log. The server is otherwise idle so there is no reason all threads should be busy unless the replication code is getting itself into a loop. What's the right way to do this? May 11, 2009 8:01:22 PM org.apache.tomcat.util.threads.ThreadPool logFull SEVERE: All threads (150) are currently busy, waiting. Increase maxThreads (150) or check the servlet status May 11, 2009 8:01:41 PM org.apache.solr.handler.ReplicationHandler getReplicationDetails WARNING: Exception while invoking a 'details' method on master java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java: 218) at java.io.BufferedInputStream.read(BufferedInputStream.java: 237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org .apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java: 1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager $ HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java: 1413) at org .apache .commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java: 1973) at org .apache .commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java: 1735) at org .apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java: 1098) at org .apache .commons .httpclient .HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org .apache .commons .httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java: 171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 323) at org .apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java: 183) at org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java: 178) at org .apache .solr .handler .ReplicationHandler.getReplicationDetails(ReplicationHandler.java:555) at org .apache .solr .handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java: 147) at org .apache .solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330) at org .apache.jsp.admin.replication.index_jsp.executeCommand(index_jsp.java: 34) at org.apache.jsp.admin.replication.index_jsp._jspService(index_jsp.java: 208) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org .apache .jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:331) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 269) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org .apache .catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java: 679) at org .apache .catalina
Re: Garbage Collectors
If you're using java 5 or 6 jmap is a useful tool in tracking down memory leaks. http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html jmap -histo:live pid will print a histogram of all live objects in the heap. Start at the top and work your way down until you find something suspicious -- the trick is in knowing what is suspicious of course. -Bryan On Apr 16, 2009, at Apr 16, 3:40 PM, David Baker wrote: Otis Gospodnetic wrote: Personally, I'd start from scratch: -Xmx -Xms... -server is not even needed any more. If you are not using Java 1.6, I suggest you do. Next, I'd try to investigate why objects are not being cleaned up - this should not be happening in the first place. Is Solr the only webapp running? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Baker dav...@mate1inc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 3:33:18 PM Subject: Garbage Collectors I have an issue with garbage collection on our solr servers. We have an issue where the old generation never gets cleaned up on one of our servers. This server has a little over 2 million records which are updated every hour or so. I have tried the parallel GC and the concurrent GC. The parallel seems more stable for us, but both end up running out of memory. I have increased the memory allocated to the servers, but this just seems to delay the problem. My question is, what are the suggested options for using the parallel GC. Currently we are using something of this nature: -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX: +UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m - XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr I am new to solr and GC tuning, so any advice is appreciated. Thanks for the reply, yes, solr is the only app running under this tomcat server. I will remove -server, and other options except the heap allocation options and see how it performs. Any suggestions on how to go about finding out why objects are not being cleaned up if these changes dont work?
Re: DataImporter : Java heap space
I think there is a bug in the 1.4 daily builds of data import handler which is causing the batchSize parameter to be ignored. This was probably introduced with more recent patches to resolve variables. The affected code is in JdbcDataSource.java String bsz = initProps.getProperty(batchSize); if (bsz != null) { bsz = (String) context.getVariableResolver().resolve(bsz); try { batchSize = Integer.parseInt(bsz); if (batchSize == -1) batchSize = Integer.MIN_VALUE; } catch (NumberFormatException e) { LOG.warn(Invalid batch size: + bsz); } } The call to context.getVariableResolver().resolve(bsz) is returning null, leading to a NumberFormatException and the batchSize never being set to Integer.MIN_VALUE. MySql won't use streaming result sets in this case which can lead to the OOM we're seeing. If your log file contains this entry like mine does, you're being affected by this bug too. Apr 15, 2009 1:21:58 PM org.apache.solr.handler.dataimport.JdbcDataSource init WARNING: Invalid batch size: null -Bryan On Apr 13, 2009, at Apr 13, 11:48 PM, Noble Paul നോബിള് नोब्ळ् wrote: DIH streams 1 row at a time. DIH is just a component in Solr. Solr indexing also takes a lot of memory On Tue, Apr 14, 2009 at 12:02 PM, Mani Kumar manikumarchau...@gmail.com wrote: Yes its throwing the same OOM error and from same place... yes i will try increasing the size ... just curious : how this dataimport works? Does it loads the whole table into memory? Is there any estimate about how much memory it needs to create index for 1GB of data. thx mani On Tue, Apr 14, 2009 at 11:48 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.com wrote: Hi Shalin: yes i tried with batchSize=-1 parameter as well here the config i tried with dataConfig dataSource type=JdbcDataSource batchSize=-1 name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / I hope i have used batchSize parameter @ right place. Yes that is correct. Did it still throw OOM from the same place? I'd suggest you increase the heap and see what works for you. Also try -server on the jvm. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: Changing to multicore
I would think that using a servlet filter to rewrite the URL should be pretty strait forward. You could write your own or use a tool like http://tuckey.org/urlrewrite/ and just configure that. Using something like this, I think the upgrade procedure could be: - install rewrite filter to rewrite multi-core URL to non-multi-core URL for all solr instances. - upgrade app to use multi-core URL - upgrade solr instances to multi-core when convenient and remove rewrite filter -Bryan On Jan 28, 2009, at Jan 28, 7:17 AM, Jeff Newburn wrote: We are moving from single core to multicore. We have a few servers that we want to migrate one at a time to ensure that each one functions. This process is proving difficult as there is no default core to allow the application to talk to the solr servers uniformly (ie without a core name during conversion). Would it be possible to re-add the default core as a configuration setting in solr.xml to allow for a smoother conversion? Am I missing a setting that would help with this process? -Jeff
Re: Help with Solr 1.3 lockups?
I think it's pretty easy to check if SOLR is alive. Even from a shell script, a simple command like curl -iIs --url http://solrhost/solr/select?start=0rows=0; | grep -c HTTP/1.1 200 OK will return 1 if the response is an HTTP 200. If the return is not 1, then there is a problem. A load balancer or other tool can probably internalize the check and not need to fork processes like a shell script would, but the check can be the same. This simply requests an HTTP HEAD (doesn't return any content) to for a fast executing query. In this case, the query with no q= specified seems to default to *:* when using dismax which is my default handler. -Bryan On Jan 15, 2009, at Jan 15, 2:13 PM, Stephen Weiss wrote: I've been wondering about this one myself - most of the services we have installed work this way, if they crash out for whatever reason they restart automatically (Apache, MySQL, even the OS itself). Failures are detected and corrected by the load balancers and also in some cases by the machine itself (like with kernel panics). But not SOLR, and I'm not quite sure what to do to get it there. We use Jetty but it's the same story. It's not like it fails out all that often, but when it does it will still respond to HTTP requests (because Jetty itself is still working), which makes it a lot harder to detect a failure... I've tried writing something for nagios but the problem is that most responses solr would give to a request vary depending on index updates, so it's not like I can just take a checksum and compare it - and even then, it would only really alert us to the problem, we'd still have to go in and restart everything (personally I don't enjoy restarting servers from my blackberry nearly as much as I should). I'd have to come up with something that can intelligently interpret the response and decide if the server's still working properly or not, and the processing time on that alone might make it too inefficient to run every few seconds, but at least with that we'd be able to tell the cluster don't send anything to this server for now. Is there some really obvious way to track if a particular servlet is still running properly (in either Tomcat or Jetty, because if Tomcat has this I'd switch) and restart the container if it's not? Thanks!! -- Steve On Jan 15, 2009, at 1:57 PM, Jerome L Quinn wrote: An even bigger problem is the fact that once Solr is wedged, it stays that way until a human notices and restarts things. The tomcat stays running and there's no automatic detection that will either restart Solr, or restart the Tomcat container. Any suggestions on either front? Thanks, Jerry Quinn
Re: Solr - DataImportHandler - Large Dataset results ?
It only supports streaming if properly enabled which is completely lame: http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time. To enable this functionality, you need to create a Statement instance in the following manner: stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(Integer.MIN_VALUE); The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row. -Bryan On Dec 12, 2008, at Dec 12, 2:15 PM, Kay Kay wrote: I am using MySQL. I believe (since MySQL 5) supports streaming. On more about streaming - can we assume that when the database driver supports streaming , the resultset iterator is a forward directional iterator. If , say the streaming size is 10K records and we are trying to retrieve a total of 100K records - what exactly happens when the threshold is reached , (say , the first 10K records were retrieved ). Are the previous set of records thrown away and replaced in memory by the new batch of records. --- On Fri, 12/12/08, Shalin Shekhar Mangar shalinman...@gmail.com wrote: From: Shalin Shekhar Mangar shalinman...@gmail.com Subject: Re: Solr - DataImportHandler - Large Dataset results ? To: solr-user@lucene.apache.org Date: Friday, December 12, 2008, 9:41 PM DataImportHandler is designed to stream rows one by one to create Solr documents. As long as your database driver supports streaming, you should be fine. Which database are you using? On Sat, Dec 13, 2008 at 2:20 AM, Kay Kay kaykay.uni...@yahoo.com wrote: As per the example in the wiki - http://wiki.apache.org/solr/DataImportHandler - I am seeing the following fragment. dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / document name=products entity name=item query=select * from item field column=ID name=id / field column=NAME name=name / .. /entity /document /dataSource My scaled-down application looks very similar along these lines but where my resultset is so big that it cannot fit within main memory by any chance. So I was planning to split this single query into multiple subqueries - with another conditional based on the id . ( id 0 and id 100 , say ) . I am curious if there is any way to specify another conditional clause , (splitData Column = id batch=1 /, where the column is supposed to be an integer value) - and internally , the implementation could actually generate the subqueries - i) get the min , max of the numeric column , and send queries to the database based on the batch size ii) Add Documents for each batch and close the resultset . This might end up putting more load on the database (but at least the dataset would fit in the main memory ). Let me know if anyone else had run into similar issues and how this was encountered. -- Regards, Shalin Shekhar Mangar.