Re: Opinions wanted about a new Solr logo (SOLR-58)
I like the version without the 'swoosh'. Simplicity is king in by book. -S On 12/18/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: On 12/18/06, Linda Tan <[EMAIL PROTECTED]> wrote: > I just learned no attachments are allowed on this list. I've put the > image in the jira.. Thanks, it looks good indeed! -Bertrand
Re: Top Searches
That's a great idea, thanks Yonik. -Sangraal On 12/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 12/11/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > I'm looking into creating something to track the top 10 - 20 searches that > run through Solr for a given period. For offline processing, using log files is the simplest thing... the code remains separated, you can do historical processing if you keep the logs, and it doesn't affect live queries. It depends on how fresh the info needs to be and how it will be used. -Yonik
Top Searches
I'm looking into creating something to track the top 10 - 20 searches that run through Solr for a given period. I could just create a counter object with an internal TreeMap or something that just keeps count of the various terms, but it could grow very large very fast and I'm not yet sure what implications this would have on memory usage. Also, storing it in memory means it would be wiped out during a restart, so it's not ideal. Other ideas I had were storing them in a database table, or in a separate Solr instance. Each method has it's own advantages and drawbacks. Has anyone looked into or had any experience doing something like this? Any info or advice would be appreciated. -Sangraal A.
Re: Replacing a nightly build
That would be helpful, as long as it's conveyed clearly. -S On 11/7/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 11/7/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > comparing the CHANGES.txt files of any two nightly builds is the best way > to see what thigns have changed that you might wnat to test before > deploying the nw version to a "production" environment. I'm just wondering if we might need to highlight backward incompatible changes somehow (a separate section in CHANGES.txt, or an all caps keyword that stands out). -Yonik
Re: Default XML Output Schema
Thanks for the great explanation Yonik, I passed it on to my collegues for reference... I knew there was a good reason. -Sangraal On 9/21/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 9/21/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Perhaps a silly questions, but I'm wondering if anyone can tell me why solr > outputs XML like this: During the initial development of Solr (2004), I remember throwing up both options, and most developers preferred to have a limited number of well defined tags. It allows you to have rather arbitrary field names, which you couldn't have if you used the field name as the tag. It also allows consistency with custom data. For example, here is the representation of an array of integer: 12 If field names were used as tags, we would have to either make up a dummy-name, or we wouldn't be able to use the same style. > > 201038 > 31 > 2006-09-15T21:36:39.000Z > > > rather than like this: > > > 201038 > 31 > 2006-09-15T21:36:39.000Z > > > A front-end PHP developer I know is having trouble parsing the default Solr > output because of that format and mentioned it would be much easier in the > former format... so I was curious if there was a reason it is the way it is. There are a number of options for you. You could write your own QueryResponseWriter to output XML just as you like it, or use an XSLT stylesheet in conjunction with http://issues.apache.org/jira/browse/SOLR-49 or use another format such as JSON. -Yonik
Default XML Output Schema
Perhaps a silly questions, but I'm wondering if anyone can tell me why solr outputs XML like this: 201038 31 2006-09-15T21:36:39.000Z rather than like this: 201038 31 2006-09-15T21:36:39.000Z A front-end PHP developer I know is having trouble parsing the default Solr output because of that format and mentioned it would be much easier in the former format... so I was curious if there was a reason it is the way it is. -Sangraal
Re: Doc add limit, im experiencing it too
I sent out an email about this a while back, but basically this limit appears only on Tomcat and only when Solr attempts to write to the response. You can work around it by splitting up your posts so that you're posting less than 5000 (or whatever your limit seems to be) at a time. You DO NOT have to commit after each post. I recently indexed a 38 million document data base with this problem and although it took about 8-9 hours it did work... I only commited every 100,000 or so. -Sangraal On 9/6/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Old issue (see http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html), but I'm experiencing the same exact thing on windows xp, latest tomcat. I noticed that the tomcat process gobbles memory (10 megs a second maybe) and then jams at 125 megs. Can't find a fix yet. I'm using a php interface and curl to post my xml, one document at a time, and commit every 100 document. Indexing 3 docs, it hangs at maybe 5000. Anyone got an idea on this one? It would be helpful. I may try to switch to jetty tomorrow if nothing works :( -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212
Re: Double Solr Installation on Single Tomcat (or Double Index)
I've set up 2 separate Solr indexes on one Tomcat instance. I basically created two separate Solr webapps. I have one webapp that is the client to both Solr instances as well. So the whole setup is 3 webapps. I have one set of Solr source classes and an ant task to build a jar file and copy it into the lib directory of both Solr webapps. This way if you customize your Solr installs you only have to do it once. Each Solr webapp obviously needs it's own solr config and data directories which is configurable through solrConfig. Both indexes are completely separate and configurable independently through these config files. If you need more detail let me know, I'll try to help you out. -S On 9/6/06, Tom Weber <[EMAIL PROTECTED]> wrote: Hello, I need to have a second separate index (separate data) on the same server. Is there a possibility to do this in a single solr install on a tomcat server or do I need to have a second instance in the same tomcat install ? If either one is possible, does somebody has some advice how to set this up, and how to be sure that both indexes do not interact ? Many thanks for any help, Best Greetings, Tom
Add doc limit - Follow Up
Hey guys, You might remember a bunch of emails going back and forth between me and the very helpful Solr folks a few weeks back. I just wanted to let you know about what I've learned about the problem in last week or so. The problem was that I would run into a hard limit of how many documents I could add to Solr in a single post on Tomcat. It was usually around 5000 - 6000 posts per add, before the system would hang indefinitely. I tried the Java Solr client and the problem was exactly the same. I'm still unsure of what exactly causes the problem... but I have now been able to work around it. The problem only occurs when adding docs that contain tags in the body of the tag. The problem also only seems to cause an add limit on an individual post. I limited the size of my HTTP posts to 5000 documents per post, and the problem never showed up. You do not need to do a commit after each batch as I previously thought. So, like I said, I'm still unsure of what causes this problem. It does seem to only happen on Tomcat. I've verified that the doc limit does not show up when running on Jetty. It seems to be some sort of a problem when Solr attempts to write to the response, but doesn't seem to be an issue with Solr itself. Again, it only occurs if you have CData tags in your xml as well. Strange one indeed, but I hope if any of you run into this problem this will help you out. -Sangraal
Re: Doc add limit
Just an update, I changed my doUpdate method to use the HTTPClient API and have the exact same problem... the update hangs at exactly the same point... 6,144. private String doUpdate(String sw) { StringBuffer updateResult = new StringBuffer(); try { // open connection log.info("Connecting to and preparing to post to SolrUpdate servlet."); URL url = new URL("http://localhost:8080/update";); HTTPConnection con = new HTTPConnection(url); HTTPResponse resp = con.Post(url.getFile(), sw); if (resp.getStatusCode() >= 300) { System.err.println("Received Error: " + resp.getReasonLine()); System.err.println(resp.getText()); } else { updateResult.append(new String(resp.getData())); } log.info("Done updating Solr for site " + updateSite); } catch (IOException ioe) { System.err.println(ioe.toString()); } catch (ModuleException me) { System.err.println("Error handling request: " + me.getMessage()); } catch (Exception e) { System.err.println("Unknown Error: " + e.getMessage()); } return updateResult.toString(); } -S On 7/31/06, sangraal aiken <[EMAIL PROTECTED]> wrote: Very interesting... thanks Thom. I haven't given HttpClient a shot yet, but will be soon. -S On 7/31/06, Thom Nelson < [EMAIL PROTECTED]> wrote: > > I had a similar problem and was able to fix it in Solr by manually > buffering the responses to a StringWriter before sending it to Tomcat. > Essentially, Tomcat's buffer will only hold so much and at that point > it blocks (thus it always hangs at a constant number of documents). > However, a better solution (to be implemented) is to use more > intelligent code on the client to read the response at the same time > that it is sending input -- not too difficult to do, though best to do > with two threads ( i.e. fire off a thread to read the response before > you send any data). Seeing as the HttpClient code probably does this > already, I'll most likely end up using that. > > On 7/31/06, sangraal aiken < [EMAIL PROTECTED]> wrote: > > Those are some great ideas Chris... I'm going to try some of them > out. I'll > > post the results when I get a chance to do more testing. Thanks. > > > > At this point I can work around the problem by ignoring Solr's > response but > > this is obviously not ideal. I would feel better knowing what is > causing the > > issue as well. > > > > -Sangraal > > > > > > > > On 7/29/06, Chris Hostetter < [EMAIL PROTECTED]> wrote: > > > > > > > > > : Sure, the method that does all the work updating Solr is the > > > doUpdate(String > > > : s) method in the GanjaUpdate class I'm pasting below. It's hanging > when > > > I > > > : try to read the response... the last output I receive in my log is > Got > > > : Reader... > > > > > > I don't have the means to try out this code right now ... but i > can't see > > > any obvious problems with it (there may be somewhere that you are > opening > > > a stream or reader and not closing it, but i didn't see one) ... i > notice > > > you are running this client on the same machine as Solr (hence the > > > localhost URLs) did you by any chance try running the client on a > seperate > > > machine to see if hte number of updates before it hangs changes? > > > > > > my money is still on a filehandle resource limit somwhere ... if you > are > > > running on a system that has "lsof" (on some Unix/Linux > installations you > > > need sudo/su root permissions to run it) you can use "lsof -p " > to > > > look up what files/network connections are open for a given > process. You > > > can try running that on both the client pid and the Solr server pid > once > > > it's hung -- You'll probably see a lot of Jar files in use for both, > but > > > if you see more then a few XML files open by the client, or more > then a > > > 1 TCP connection open by either the client or the server, there's > your > > > culprit. > > > > > > I'm not sure what Windows equivilent of lsof may exist. > > > > > > Wait ... i just had another thought > > > > > > You are using InputStreamReader to deal with the InputStreams of > your > > > remote XML files -- but you aren't specifying a charset, so it's > using > > > your system default which may be differnet from the charset of the > > > orriginal XML files you a
Re: Doc add limit
Very interesting... thanks Thom. I haven't given HttpClient a shot yet, but will be soon. -S On 7/31/06, Thom Nelson <[EMAIL PROTECTED]> wrote: I had a similar problem and was able to fix it in Solr by manually buffering the responses to a StringWriter before sending it to Tomcat. Essentially, Tomcat's buffer will only hold so much and at that point it blocks (thus it always hangs at a constant number of documents). However, a better solution (to be implemented) is to use more intelligent code on the client to read the response at the same time that it is sending input -- not too difficult to do, though best to do with two threads (i.e. fire off a thread to read the response before you send any data). Seeing as the HttpClient code probably does this already, I'll most likely end up using that. On 7/31/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Those are some great ideas Chris... I'm going to try some of them out. I'll > post the results when I get a chance to do more testing. Thanks. > > At this point I can work around the problem by ignoring Solr's response but > this is obviously not ideal. I would feel better knowing what is causing the > issue as well. > > -Sangraal > > > > On 7/29/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > > > > : Sure, the method that does all the work updating Solr is the > > doUpdate(String > > : s) method in the GanjaUpdate class I'm pasting below. It's hanging when > > I > > : try to read the response... the last output I receive in my log is Got > > : Reader... > > > > I don't have the means to try out this code right now ... but i can't see > > any obvious problems with it (there may be somewhere that you are opening > > a stream or reader and not closing it, but i didn't see one) ... i notice > > you are running this client on the same machine as Solr (hence the > > localhost URLs) did you by any chance try running the client on a seperate > > machine to see if hte number of updates before it hangs changes? > > > > my money is still on a filehandle resource limit somwhere ... if you are > > running on a system that has "lsof" (on some Unix/Linux installations you > > need sudo/su root permissions to run it) you can use "lsof -p " to > > look up what files/network connections are open for a given process. You > > can try running that on both the client pid and the Solr server pid once > > it's hung -- You'll probably see a lot of Jar files in use for both, but > > if you see more then a few XML files open by the client, or more then a > > 1 TCP connection open by either the client or the server, there's your > > culprit. > > > > I'm not sure what Windows equivilent of lsof may exist. > > > > Wait ... i just had another thought > > > > You are using InputStreamReader to deal with the InputStreams of your > > remote XML files -- but you aren't specifying a charset, so it's using > > your system default which may be differnet from the charset of the > > orriginal XML files you are pulling from the URL -- which (i *think*) > > means that your InputStreamReader may in some cases fail to read all of > > the bytes of the stream, which might some dangling filehandles (i'm just > > guessing on that part ... i'm not acctually sure whta happens in that > > case). > > > > What if you simplify your code (for the purposes of testing) and just put > > the post-transform version ganja-full.xml in a big ass String variable in > > your java app and just call GanjaUpdate.doUpdate(bigAssString) over and > > over again ... does that cause the same problem? > > > > > > : > > : -- > > : > > : package com.iceninetech.solr.update; > > : > > : import com.iceninetech.xml.XMLTransformer; > > : > > : import java.io.*; > > : import java.net.HttpURLConnection; > > : import java.net.URL; > > : import java.util.logging.Logger; > > : > > : public class GanjaUpdate { > > : > > : private String updateSite = ""; > > : private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";; > > : > > : private static final File xmlStorageDir = new > > : File("/source/solr/xml-dls/"); > > : > > : final Logger log = Logger.getLogger(GanjaUpdate.class.getName()); > > : > > : public GanjaUpdate(String siteName) { > > : this.updateSite = siteName; > > : log.info("GanjaUpdate is primed and ready to update " + siteName); > > : } > >
Re: Doc add limit
Chris, my response is below each of your paragraphs... I don't have the means to try out this code right now ... but i can't see any obvious problems with it (there may be somewhere that you are opening a stream or reader and not closing it, but i didn't see one) ... i notice you are running this client on the same machine as Solr (hence the localhost URLs) did you by any chance try running the client on a seperate machine to see if hte number of updates before it hangs changes? When I run the client locally and the Solr server on a slower and separate development box, the maximum number of updates drops to 3,219. So it's almost as if it's related to some sort of timeout problem because the maximum number of updates drops considerably on a slower machine, but it's weird how consistent the number is. 6,144 locally, 5,000 something when I run it on the external server, and 3,219 when the client is separate from the server. my money is still on a filehandle resource limit somwhere ... if you are running on a system that has "lsof" (on some Unix/Linux installations you need sudo/su root permissions to run it) you can use "lsof -p " to look up what files/network connections are open for a given process. You can try running that on both the client pid and the Solr server pid once it's hung -- You'll probably see a lot of Jar files in use for both, but if you see more then a few XML files open by the client, or more then a 1 TCP connection open by either the client or the server, there's your culprit. The only output I get from 'lsof -p' that pertains to TCP connections are the following...I'm not too sure how to interpret it though: java4104 sangraal 261u IPv6 0x5b060f0 0t0 TCP *:8009 (LISTEN) java4104 sangraal 262u IPv6 0x55d59e8 0t0 TCP [::127.0.0.1]:8005 (LISTEN) java4104 sangraal 263u IPv6 0x53cc0e0 0t0 TCP [::127.0.0.1 ]:http-alt->[::127.0.0.1]:51039 (ESTABLISHED) java4104 sangraal 264u IPv6 0x5b059d0 0t0 TCP [::127.0.0.1 ]:51045->[::127.0.0.1]:http-alt (ESTABLISHED) java4104 sangraal 265u IPv6 0x53cc9c8 0t0 TCP [::127.0.0.1 ]:http-alt->[::127.0.0.1]:51045 (ESTABLISHED) java4104 sangraal 11u IPv6 0x5b04f20 0t0 TCP *:http-alt (LISTEN) java4104 sangraal 12u IPv6 0x5b06d68 0t0 TCP localhost:51037->localhost:51036 (TIME_WAIT) I'm not sure what Windows equivilent of lsof may exist. Wait ... i just had another thought You are using InputStreamReader to deal with the InputStreams of your remote XML files -- but you aren't specifying a charset, so it's using your system default which may be differnet from the charset of the orriginal XML files you are pulling from the URL -- which (i *think*) means that your InputStreamReader may in some cases fail to read all of the bytes of the stream, which might some dangling filehandles (i'm just guessing on that part ... i'm not acctually sure whta happens in that case). What if you simplify your code (for the purposes of testing) and just put the post-transform version ganja-full.xml in a big ass String variable in your java app and just call GanjaUpdate.doUpdate(bigAssString) over and over again ... does that cause the same problem? In the code, I read the XML with a StringReader and then pass it to GanjaUpdate as a string anyway. I've output the String object and verified that it is in fact all there. -Sangraal
Re: Doc add limit
Those are some great ideas Chris... I'm going to try some of them out. I'll post the results when I get a chance to do more testing. Thanks. At this point I can work around the problem by ignoring Solr's response but this is obviously not ideal. I would feel better knowing what is causing the issue as well. -Sangraal On 7/29/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Sure, the method that does all the work updating Solr is the doUpdate(String : s) method in the GanjaUpdate class I'm pasting below. It's hanging when I : try to read the response... the last output I receive in my log is Got : Reader... I don't have the means to try out this code right now ... but i can't see any obvious problems with it (there may be somewhere that you are opening a stream or reader and not closing it, but i didn't see one) ... i notice you are running this client on the same machine as Solr (hence the localhost URLs) did you by any chance try running the client on a seperate machine to see if hte number of updates before it hangs changes? my money is still on a filehandle resource limit somwhere ... if you are running on a system that has "lsof" (on some Unix/Linux installations you need sudo/su root permissions to run it) you can use "lsof -p " to look up what files/network connections are open for a given process. You can try running that on both the client pid and the Solr server pid once it's hung -- You'll probably see a lot of Jar files in use for both, but if you see more then a few XML files open by the client, or more then a 1 TCP connection open by either the client or the server, there's your culprit. I'm not sure what Windows equivilent of lsof may exist. Wait ... i just had another thought You are using InputStreamReader to deal with the InputStreams of your remote XML files -- but you aren't specifying a charset, so it's using your system default which may be differnet from the charset of the orriginal XML files you are pulling from the URL -- which (i *think*) means that your InputStreamReader may in some cases fail to read all of the bytes of the stream, which might some dangling filehandles (i'm just guessing on that part ... i'm not acctually sure whta happens in that case). What if you simplify your code (for the purposes of testing) and just put the post-transform version ganja-full.xml in a big ass String variable in your java app and just call GanjaUpdate.doUpdate(bigAssString) over and over again ... does that cause the same problem? : : -- : : package com.iceninetech.solr.update; : : import com.iceninetech.xml.XMLTransformer; : : import java.io.*; : import java.net.HttpURLConnection; : import java.net.URL; : import java.util.logging.Logger; : : public class GanjaUpdate { : : private String updateSite = ""; : private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";; : : private static final File xmlStorageDir = new : File("/source/solr/xml-dls/"); : : final Logger log = Logger.getLogger(GanjaUpdate.class.getName()); : : public GanjaUpdate(String siteName) { : this.updateSite = siteName; : log.info("GanjaUpdate is primed and ready to update " + siteName); : } : : public void update() { : StringWriter sw = new StringWriter(); : : try { : // transform gawkerInput XML to SOLR update XML : XMLTransformer transform = new XMLTransformer(); : log.info("About to transform ganjaInput XML to Solr Update XML"); : transform.transform(getXML(), sw, getXSL()); : log.info("Completed ganjaInput/SolrUpdate XML transform"); : : // Write transformed XML to Disk. : File transformedXML = new File(xmlStorageDir, updateSite+".sml"); : FileWriter fw = new FileWriter(transformedXML); : fw.write(sw.toString()); : fw.close(); : : // post to Solr : log.info("About to update Solr for site " + updateSite); : String result = this.doUpdate(sw.toString()); : log.info("Solr says: " + result); : sw.close(); : } catch (Exception e) { : e.printStackTrace(); : } : } : : public File getXML() { : String XML_URL = "http://localhost:8080/"; + updateSite + "/ganja- : full.xml"; : : // check for file : File localXML = new File(xmlStorageDir, updateSite + ".xml"); : : try { : if (localXML.createNewFile() && localXML.canWrite()) { : // open connection : log.info("Downloading: " + XML_URL); : URL url = new URL(XML_URL); : HttpURLConnection conn = (HttpURLConnection) url.openConnection (); : conn.setRequestMethod("GET"); : : // Read response to File : log.info("Storing XML to File" + localXML.getCanonicalPath()); : FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir, : updateSite + ".xml")); : : BufferedReader rd = new BufferedReader(new InputStreamReader( : conn.getInputStream())); : String line; : while ((line = rd.readLine()) != null) { :
Re: Doc add limit
Yeah that code is pretty bare bones... I'm still in the initial testing stage. You're right it definitely needs some more thourough work. I did try removing all the conn.disconnect(); statements and there was no change. I'm going to give the Java Client code you sent me yesterday a shot and see what happens with that. I'm kind of out of ideas for what could be causing the hang... it really seems to just get locked in some sort of loop, but there are absolutely no exceptions being thrown either on the Solr side or the Client side... it just stops processing. -Sangraal On 7/28/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: It may be some sort of weird interaction with persistent connections and timeouts (both client and server have connection timeouts I assume). Does anything change if you remove your .disconnect() call (it shouldn't be needed). Do you ever see any exceptions in the client side? The code you show probably needs more error handling (finally blocks with closes), but if you don't see any stack traces from your e.printStackTrace() then it doesn't have anything to do with this problem. Getting all the little details of connection handling correct can be tough... it's probably a good idea if we work toward common client libraries so everyone doesn't have to reinvent them. -Yonik On 7/28/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Sure, the method that does all the work updating Solr is the doUpdate(String > s) method in the GanjaUpdate class I'm pasting below. It's hanging when I > try to read the response... the last output I receive in my log is Got > Reader... > > -- > > package com.iceninetech.solr.update; > > import com.iceninetech.xml.XMLTransformer; > > import java.io.*; > import java.net.HttpURLConnection; > import java.net.URL; > import java.util.logging.Logger; > > public class GanjaUpdate { > > private String updateSite = ""; > private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";; > > private static final File xmlStorageDir = new > File("/source/solr/xml-dls/"); > > final Logger log = Logger.getLogger(GanjaUpdate.class.getName()); > > public GanjaUpdate(String siteName) { > this.updateSite = siteName; > log.info("GanjaUpdate is primed and ready to update " + siteName); > } > > public void update() { > StringWriter sw = new StringWriter(); > > try { > // transform gawkerInput XML to SOLR update XML > XMLTransformer transform = new XMLTransformer(); > log.info("About to transform ganjaInput XML to Solr Update XML"); > transform.transform(getXML(), sw, getXSL()); > log.info("Completed ganjaInput/SolrUpdate XML transform"); > > // Write transformed XML to Disk. > File transformedXML = new File(xmlStorageDir, updateSite+".sml"); > FileWriter fw = new FileWriter(transformedXML); > fw.write(sw.toString()); > fw.close(); > > // post to Solr > log.info("About to update Solr for site " + updateSite); > String result = this.doUpdate(sw.toString()); > log.info("Solr says: " + result); > sw.close(); > } catch (Exception e) { > e.printStackTrace(); > } > } > > public File getXML() { > String XML_URL = "http://localhost:8080/"; + updateSite + "/ganja- > full.xml"; > > // check for file > File localXML = new File(xmlStorageDir, updateSite + ".xml"); > > try { > if (localXML.createNewFile() && localXML.canWrite()) { > // open connection > log.info("Downloading: " + XML_URL); > URL url = new URL(XML_URL); > HttpURLConnection conn = (HttpURLConnection) url.openConnection (); > conn.setRequestMethod("GET"); > > // Read response to File > log.info("Storing XML to File" + localXML.getCanonicalPath()); > FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir, > updateSite + ".xml")); > > BufferedReader rd = new BufferedReader(new InputStreamReader( > conn.getInputStream())); > String line; > while ((line = rd.readLine()) != null) { > line = line + '\n'; // add break after each line. It preserves > formatting. > fos.write(line.getBytes("UTF8")); > } > > // close connections > rd.close(); > fos.close(); > conn.disconnect(); > log.info("Got the XML... File saved."); > } > } catch (Exception e) { > e.
Re: Doc add limit
Sure, the method that does all the work updating Solr is the doUpdate(String s) method in the GanjaUpdate class I'm pasting below. It's hanging when I try to read the response... the last output I receive in my log is Got Reader... -- package com.iceninetech.solr.update; import com.iceninetech.xml.XMLTransformer; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.logging.Logger; public class GanjaUpdate { private String updateSite = ""; private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";; private static final File xmlStorageDir = new File("/source/solr/xml-dls/"); final Logger log = Logger.getLogger(GanjaUpdate.class.getName()); public GanjaUpdate(String siteName) { this.updateSite = siteName; log.info("GanjaUpdate is primed and ready to update " + siteName); } public void update() { StringWriter sw = new StringWriter(); try { // transform gawkerInput XML to SOLR update XML XMLTransformer transform = new XMLTransformer(); log.info("About to transform ganjaInput XML to Solr Update XML"); transform.transform(getXML(), sw, getXSL()); log.info("Completed ganjaInput/SolrUpdate XML transform"); // Write transformed XML to Disk. File transformedXML = new File(xmlStorageDir, updateSite+".sml"); FileWriter fw = new FileWriter(transformedXML); fw.write(sw.toString()); fw.close(); // post to Solr log.info("About to update Solr for site " + updateSite); String result = this.doUpdate(sw.toString()); log.info("Solr says: " + result); sw.close(); } catch (Exception e) { e.printStackTrace(); } } public File getXML() { String XML_URL = "http://localhost:8080/"; + updateSite + "/ganja- full.xml"; // check for file File localXML = new File(xmlStorageDir, updateSite + ".xml"); try { if (localXML.createNewFile() && localXML.canWrite()) { // open connection log.info("Downloading: " + XML_URL); URL url = new URL(XML_URL); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("GET"); // Read response to File log.info("Storing XML to File" + localXML.getCanonicalPath()); FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir, updateSite + ".xml")); BufferedReader rd = new BufferedReader(new InputStreamReader( conn.getInputStream())); String line; while ((line = rd.readLine()) != null) { line = line + '\n'; // add break after each line. It preserves formatting. fos.write(line.getBytes("UTF8")); } // close connections rd.close(); fos.close(); conn.disconnect(); log.info("Got the XML... File saved."); } } catch (Exception e) { e.printStackTrace(); } return localXML; } public File getXSL() { StringBuffer retVal = new StringBuffer(); // check for file File localXSL = new File(xmlStorageDir, "ganja.xsl"); try { if (localXSL.createNewFile() && localXSL.canWrite()) { // open connection log.info("Downloading: " + XSL_URL); URL url = new URL(XSL_URL); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("GET"); // Read response BufferedReader rd = new BufferedReader(new InputStreamReader( conn.getInputStream())); String line; while ((line = rd.readLine()) != null) { line = line + '\n'; retVal.append(line); } // close connections rd.close(); conn.disconnect(); log.info("Got the XSLT."); // output file log.info("Storing XSL to File" + localXSL.getCanonicalPath()); FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir, "ganja.xsl")); fos.write(retVal.toString().getBytes()); fos.close(); log.info("File saved."); } } catch (Exception e) { e.printStackTrace(); } return localXSL; } private String doUpdate(String sw) { StringBuffer updateResult = new StringBuffer(); try { // open connection log.info("Connecting to and preparing to post to SolrUpdate servlet."); URL url = new URL("http://localhost:8080/update";); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "application/octet-stream"); conn.setDoOutput(true); conn.setDoInput(true); conn.setUseCaches(false); // Write to server log.info("About to post to SolrUpdate servlet."); DataOutputStream output = new DataOutputStream(conn.getOutputStream ()); output.writeBytes(sw); output.flush(); output.close(); log.info("Finished posting to SolrUpdate servlet."); // Read response log.info("Ready to read response."); BufferedReader rd = new BufferedReader(new InputStream
Re: Doc add limit
I'm sure... it seems like solr is having trouble writing to a tomcat response that's been inactive for a bit. It's only 30 seconds though, so I'm not entirely sure why that would happen. I use the same client code for DL'ing XSL sheets from external servers and it works fine, but in those instances the server responds much faster to the request. This is an elusive bug for sure. -S On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Commenting out the following line in SolrCore fixes my problem... but of > course I don't get the result status info... but this isn't a problem for me > really. > > -Sangraal > > writer.write(""); While it's possible you hit a Tomcat bug, I think it's more likely a client problem. -Yonik
Re: Doc add limit
I'll give that a shot... Thanks again for all your help. -S On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: You might also try the Java update client here: http://issues.apache.org/jira/browse/SOLR-20 -Yonik
Re: Doc add limit
Commenting out the following line in SolrCore fixes my problem... but of course I don't get the result status info... but this isn't a problem for me really. -Sangraal writer.write(""); On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: I'm running on Tomcat... and I've verified that the complete post is making it through the SolrUpdate servlet and into the SolrCore object... thanks for the info though. -- So the code is hanging on this call in SolrCore.java writer.write(""); The thread dump: "http-8080-Processor24" Id=32 in RUNNABLE (running in native) total cpu time= 40698.0440ms user time=38646.1680ms at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java :92) at java.net.SocketOutputStream.write (SocketOutputStream.java:136) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes( InternalOutputBuffer.java:746) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java :433) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:348) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite (InternalOutputBuffer.java:769) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite ( ChunkedOutputFilter.java:125) at org.apache.coyote.http11.InternalOutputBuffer.doWrite( InternalOutputBuffer.java:579) at org.apache.coyote.Response.doWrite(Response.java:559) at org.apache.catalina.connector.OutputBuffer.realWriteBytes ( OutputBuffer.java:361) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:324) at org.apache.tomcat.util.buf.IntermediateOutputStream.write( C2BConverter.java:235) at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes (StreamEncoder.java :336) at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer( StreamEncoder.java:404) at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java :408) at sun.nio.cs.StreamEncoder.flush (StreamEncoder.java:152) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213) at org.apache.tomcat.util.buf.WriteConvertor.flush(C2BConverter.java :184) at org.apache.tomcat.util.buf.C2BConverter.flushBuffer ( C2BConverter.java:127) at org.apache.catalina.connector.OutputBuffer.realWriteChars( OutputBuffer.java:536) at org.apache.tomcat.util.buf.CharChunk.flushBuffer(CharChunk.java :439) at org.apache.tomcat.util.buf.CharChunk.append (CharChunk.java:370) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java :491) at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java :161) at org.apache.catalina.connector.CoyoteWriter.write ( CoyoteWriter.java:170) at org.apache.solr.core.SolrCore.update(SolrCore.java:695) at org.apache.solr.servlet.SolrUpdateServlet.doPost( SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java :252) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke ( StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process ( Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection (Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java :684) at java.lang.Thread.run(Thread.java:613) On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED] > wrote: > > I haven't been following the thread, but > Not sure if you are using Tomcat or Jetty, but Jetty has a POST size > limit (set somewhere in its configs) that may be the source of the problem. > > Otis > P.S. > Just occurred to me. > Tomcat. Jetty. Tom & Jerry. Jetty guys should have called their thing > Jerry or Jerrymouse. > > - Original Message > From: Mike Klaas < [EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, July 27, 2006 6:33:16 PM &
Re: Doc add limit
I'm running on Tomcat... and I've verified that the complete post is making it through the SolrUpdate servlet and into the SolrCore object... thanks for the info though. -- So the code is hanging on this call in SolrCore.java writer.write(""); The thread dump: "http-8080-Processor24" Id=32 in RUNNABLE (running in native) total cpu time=40698.0440ms user time=38646.1680ms at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes( InternalOutputBuffer.java:746) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:433) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:348) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite (InternalOutputBuffer.java:769) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite( ChunkedOutputFilter.java:125) at org.apache.coyote.http11.InternalOutputBuffer.doWrite( InternalOutputBuffer.java:579) at org.apache.coyote.Response.doWrite(Response.java:559) at org.apache.catalina.connector.OutputBuffer.realWriteBytes( OutputBuffer.java:361) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:324) at org.apache.tomcat.util.buf.IntermediateOutputStream.write( C2BConverter.java:235) at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java :336) at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer( StreamEncoder.java:404) at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213) at org.apache.tomcat.util.buf.WriteConvertor.flush(C2BConverter.java :184) at org.apache.tomcat.util.buf.C2BConverter.flushBuffer( C2BConverter.java:127) at org.apache.catalina.connector.OutputBuffer.realWriteChars( OutputBuffer.java:536) at org.apache.tomcat.util.buf.CharChunk.flushBuffer(CharChunk.java:439) at org.apache.tomcat.util.buf.CharChunk.append(CharChunk.java:370) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java :491) at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java :161) at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java :170) at org.apache.solr.core.SolrCore.update(SolrCore.java:695) at org.apache.solr.servlet.SolrUpdateServlet.doPost( SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process( Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection (Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java:684) at java.lang.Thread.run(Thread.java:613) On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: I haven't been following the thread, but Not sure if you are using Tomcat or Jetty, but Jetty has a POST size limit (set somewhere in its configs) that may be the source of the problem. Otis P.S. Just occurred to me. Tomcat. Jetty. Tom & Jerry. Jetty guys should have called their thing Jerry or Jerrymouse. - Original Message From: Mike Klaas <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, July 27, 2006 6:33:16 PM Subject: Re: Doc add limit Hi Sangraal: Sorry--I tried not to imply that this might affect your issue. You may have to crank up the solr logging to determine where it is freezing (and what might be happening). It is certainly worth investigating why this occurs, but I wonder about the advantages of using such huge batches. Assuming a few hundred bytes per document, 6100 docs produces a POST over 1MB
Re: Doc add limit
Yeah, I'm closing them. Here's the method: - private String doUpdate(String sw) { StringBuffer updateResult = new StringBuffer(); try { // open connection log.info("Connecting to and preparing to post to SolrUpdate servlet."); URL url = new URL("http://localhost:8080/update";); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "application/octet-stream"); conn.setDoOutput(true); conn.setDoInput(true); conn.setUseCaches(false); // Write to server log.info("About to post to SolrUpdate servlet."); DataOutputStream output = new DataOutputStream(conn.getOutputStream ()); output.writeBytes(sw); output.flush(); output.close(); log.info("Finished posting to SolrUpdate servlet."); // Read response log.info("Ready to read response."); BufferedReader rd = new BufferedReader(new InputStreamReader( conn.getInputStream())); log.info("Got reader"); String line; while ((line = rd.readLine()) != null) { log.info("Writing to result..."); updateResult.append(line); } rd.close(); // close connections conn.disconnect(); log.info("Done updating Solr for site" + updateSite); } catch (Exception e) { e.printStackTrace(); } return updateResult.toString(); } } -Sangraal On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: Are you reading the response and closing the connection? If not, you are probably running out of socket connections. -Yonik On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Yonik, > It looks like the problem is with the way I'm posting to the SolrUpdate > servlet. I am able to use curl to post the data to my tomcat instance > without a problem. It only fails when I try to handle the http post from > java... my code is below: > > URL url = new URL("http://localhost:8983/solr/update";); > HttpURLConnection conn = (HttpURLConnection) url.openConnection(); > conn.setRequestMethod("POST"); > conn.setRequestProperty("Content-Type", "application/octet-stream"); > conn.setDoOutput(true); > conn.setDoInput(true); > conn.setUseCaches(false); > > // Write to server > log.info("About to post to SolrUpdate servlet."); > DataOutputStream output = new DataOutputStream( conn.getOutputStream > ()); > output.writeBytes(sw); > output.flush(); > log.info("Finished posting to SolrUpdate servlet."); > > -Sangraal > > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > > > I removed everything from the Add xml so the docs looked like this: > > > > > > > > > 187880 > > > > > > > > > 187852 > > > > > > > > > and it still hung at 6,144... > > > > Maybe you can try the following simple Python client to try and rule > > out some kind of different client interactions... the attached script > > adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17 > > and Jetty > > > > -Yonik > > > > > > solr.py -- > > import httplib > > import socket > > > > class SolrConnection: > > def __init__(self, host='localhost:8983', solrBase='/solr'): > > self.host = host > > self.solrBase = solrBase > > #a connection to the server is not opened at this point. > > self.conn = httplib.HTTPConnection(self.host) > > #self.conn.set_debuglevel(100) > > self.postheaders = {"Connection":"close"} > > > > def doUpdateXML(self, request): > > try: > > self.conn.request('POST', self.solrBase+'/update', request, > > self.postheaders) > > except (socket.error,httplib.CannotSendRequest) : > > #reconnect in case the connection was broken from the server going > > down, > > #the server timing out our persistent connection, or another > > #network failure. > > #Also catch httplib.CannotSendRequest because the HTTPConnection > > object > > #can get in a bad state. > > self.conn.close() > > self.conn.connect() > > self.conn.request('POST', self.solrBase+'/update', request, > > self.postheaders) > > > > rsp = self.conn.getresponse() > > #print rsp.status, rsp.reason > > data = rsp.read() > > #print "data=",data > > self.conn.close() > > > > def delete(self, id): > > xstr = ''+id+'' > > self.doUpdateXML(xstr) > > > > def add(self, **fields): > > #todo: XML escaping > > flist=['%s' % f for f in fields.items() ] > > flist.insert(0,'') > > flist.append('') > > xstr = ''.join(flist) > > self.doUpdateXML(xstr) > > > > c = SolrConnection() > > #for i in range(1): > > # c.delete(str(i)) > > for i in range(1): > > c.add(id=i)
Re: Doc add limit
I think you're right... I will probably work on splitting the batches up into smaller pieces at some point in the future. I think I will need the capability to do large batches at some point though, so I want to make sure the system can handle it. I also want to make sure this problem doesn't pop up and bite me later. -Sangraal On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote: Hi Sangraal: Sorry--I tried not to imply that this might affect your issue. You may have to crank up the solr logging to determine where it is freezing (and what might be happening). It is certainly worth investigating why this occurs, but I wonder about the advantages of using such huge batches. Assuming a few hundred bytes per document, 6100 docs produces a POST over 1MB in size. -Mike On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Mike, > I've been posting with the content type set like this: > conn.setRequestProperty("Content-Type", "application/octet-stream"); > > I tried your suggestion though, and unfortunately there was no change. > conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8"); > > -Sangraal > > > On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote: > > > > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > > class SolrConnection: > > > def __init__(self, host='localhost:8983', solrBase='/solr'): > > > self.host = host > > > self.solrBase = solrBase > > > #a connection to the server is not opened at this point. > > > self.conn = httplib.HTTPConnection(self.host) > > > #self.conn.set_debuglevel(100) > > > self.postheaders = {"Connection":"close"} > > > > > > def doUpdateXML(self, request): > > > try: > > > self.conn.request('POST', self.solrBase+'/update', request, > > > self.postheaders) > > > > Disgressive note: I'm not sure if it is necessary with tomcat, but in > > my experience driving solr with python using Jetty, it was necessary > > to specify the content-type when posting utf-8 data: > > > > self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'}) > > > > -Mike > > > >
Re: Doc add limit
Mike, I've been posting with the content type set like this: conn.setRequestProperty("Content-Type", "application/octet-stream"); I tried your suggestion though, and unfortunately there was no change. conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8"); -Sangraal On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote: On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > class SolrConnection: > def __init__(self, host='localhost:8983', solrBase='/solr'): > self.host = host > self.solrBase = solrBase > #a connection to the server is not opened at this point. > self.conn = httplib.HTTPConnection(self.host) > #self.conn.set_debuglevel(100) > self.postheaders = {"Connection":"close"} > > def doUpdateXML(self, request): > try: > self.conn.request('POST', self.solrBase+'/update', request, > self.postheaders) Disgressive note: I'm not sure if it is necessary with tomcat, but in my experience driving solr with python using Jetty, it was necessary to specify the content-type when posting utf-8 data: self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'}) -Mike
Re: Doc add limit
Yonik, It looks like the problem is with the way I'm posting to the SolrUpdate servlet. I am able to use curl to post the data to my tomcat instance without a problem. It only fails when I try to handle the http post from java... my code is below: URL url = new URL("http://localhost:8983/solr/update";); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "application/octet-stream"); conn.setDoOutput(true); conn.setDoInput(true); conn.setUseCaches(false); // Write to server log.info("About to post to SolrUpdate servlet."); DataOutputStream output = new DataOutputStream(conn.getOutputStream ()); output.writeBytes(sw); output.flush(); log.info("Finished posting to SolrUpdate servlet."); -Sangraal On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > I removed everything from the Add xml so the docs looked like this: > > > 187880 > > > 187852 > > > and it still hung at 6,144... Maybe you can try the following simple Python client to try and rule out some kind of different client interactions... the attached script adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17 and Jetty -Yonik solr.py -- import httplib import socket class SolrConnection: def __init__(self, host='localhost:8983', solrBase='/solr'): self.host = host self.solrBase = solrBase #a connection to the server is not opened at this point. self.conn = httplib.HTTPConnection(self.host) #self.conn.set_debuglevel(100) self.postheaders = {"Connection":"close"} def doUpdateXML(self, request): try: self.conn.request('POST', self.solrBase+'/update', request, self.postheaders) except (socket.error,httplib.CannotSendRequest) : #reconnect in case the connection was broken from the server going down, #the server timing out our persistent connection, or another #network failure. #Also catch httplib.CannotSendRequest because the HTTPConnection object #can get in a bad state. self.conn.close() self.conn.connect() self.conn.request('POST', self.solrBase+'/update', request, self.postheaders) rsp = self.conn.getresponse() #print rsp.status, rsp.reason data = rsp.read() #print "data=",data self.conn.close() def delete(self, id): xstr = ''+id+'' self.doUpdateXML(xstr) def add(self, **fields): #todo: XML escaping flist=['%s' % f for f in fields.items() ] flist.insert(0,'') flist.append('') xstr = ''.join(flist) self.doUpdateXML(xstr) c = SolrConnection() #for i in range(1): # c.delete(str(i)) for i in range(1): c.add(id=i)
Re: Doc add limit
I removed everything from the Add xml so the docs looked like this: 187880 187852 and it still hung at 6,144... -S On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: If you narrow the docs down to just the "id" field, does it still happen at the same place? -Yonik On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > I see the problem on Mac OS X/JDK: 1.5.0_06 and Debian/JDK: 1.5.0_07. > > I don't think it's a socket problem, because I can initiate additional > updates while the server is hung... weird I know. > > Thanks for all your help, I'll send a post if/when I find a solution. > > -S > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > Tomcat problem, or a Solr problem that is only manifesting on your > > platform, or a JVM or libc problem, or even a client update problem... > > (possibly you might be exhausting the number of sockets in the server > > by using persistent connections with a long timeout and never reusing > > them?) > > > > What is your OS/JVM? > > > > -Yonik > > > > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > > > Right now the heap is set to 512M but I've increased it up to 2GB and > > yet it > > > still hangs at the same number 6,144... > > > > > > Here's something interesting... I pushed this code over to a different > > > server and tried an update. On that server it's hanging on #5,267. Then > > > tomcat seems to try to reload the webapp... indefinitely. > > > > > > So I guess this is looking more like a tomcat problem more than a > > > lucene/solr problem huh? > > > > > > -Sangraal > > > > > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > > > > > So it looks like your client is hanging trying to send somethig over > > > > the socket to the server and blocking... probably because Tomcat isn't > > > > reading anything from the socket because it's busy trying to restart > > > > the webapp. > > > > > > > > What is the heap size of the server? try increasing it... maybe tomcat > > > > could have detected low memory and tried to reload the webapp. > > > > > > > > -Yonik > > > > > > > > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > > > > > Thanks for you help Yonik, I've responded to your questions below: > > > > > > > > > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > It's possible it's not hanging, but just takes a long time on a > > > > > > specific add. This is because Lucene will occasionally merge > > > > > > segments. When very large segments are merged, it can take a long > > > > > > time. > > > > > > > > > > > > > > > I've left it running (hung) for up to a half hour at a time and I've > > > > > verified that my cpu idles during the hang. I have witnessed much > > > > shorter > > > > > hangs on the ramp up to my 6,144 limit but they have been more like > > 2 - > > > > 10 > > > > > seconds in length. Perhaps this is the Lucene merging you mentioned. > > > > > > > > > > In the log file, add commands are followed by the number of > > > > > > milliseconds the operation took. Next time Solr hangs, wait for a > > > > > > number of minutes until you see the operation logged and note how > > long > > > > > > it took. > > > > > > > > > > > > > > > Here are the last 5 log entries before the hang the last one is doc > > > > #6,144. > > > > > Also it looks like Tomcat is trying to redeploy the webapp those > > last > > > > tomcat > > > > > entries repeat indefinitely every 10 seconds or so. Perhaps this is > > a > > > > Tomcat > > > > > problem? > > > > > > > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > > > > > INFO: add (id=110705) 0 36596 > > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > > > > > INFO: add (id=110700) 0 36600 > > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > > > > > INFO: add (id=110688) 0 36603 > > > > > Jul 26, 2006 1:25:28 PM
Re: Doc add limit
I see the problem on Mac OS X/JDK: 1.5.0_06 and Debian/JDK: 1.5.0_07. I don't think it's a socket problem, because I can initiate additional updates while the server is hung... weird I know. Thanks for all your help, I'll send a post if/when I find a solution. -S On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: Tomcat problem, or a Solr problem that is only manifesting on your platform, or a JVM or libc problem, or even a client update problem... (possibly you might be exhausting the number of sockets in the server by using persistent connections with a long timeout and never reusing them?) What is your OS/JVM? -Yonik On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Right now the heap is set to 512M but I've increased it up to 2GB and yet it > still hangs at the same number 6,144... > > Here's something interesting... I pushed this code over to a different > server and tried an update. On that server it's hanging on #5,267. Then > tomcat seems to try to reload the webapp... indefinitely. > > So I guess this is looking more like a tomcat problem more than a > lucene/solr problem huh? > > -Sangraal > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > So it looks like your client is hanging trying to send somethig over > > the socket to the server and blocking... probably because Tomcat isn't > > reading anything from the socket because it's busy trying to restart > > the webapp. > > > > What is the heap size of the server? try increasing it... maybe tomcat > > could have detected low memory and tried to reload the webapp. > > > > -Yonik > > > > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > > > Thanks for you help Yonik, I've responded to your questions below: > > > > > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > > > > > It's possible it's not hanging, but just takes a long time on a > > > > specific add. This is because Lucene will occasionally merge > > > > segments. When very large segments are merged, it can take a long > > > > time. > > > > > > > > > I've left it running (hung) for up to a half hour at a time and I've > > > verified that my cpu idles during the hang. I have witnessed much > > shorter > > > hangs on the ramp up to my 6,144 limit but they have been more like 2 - > > 10 > > > seconds in length. Perhaps this is the Lucene merging you mentioned. > > > > > > In the log file, add commands are followed by the number of > > > > milliseconds the operation took. Next time Solr hangs, wait for a > > > > number of minutes until you see the operation logged and note how long > > > > it took. > > > > > > > > > Here are the last 5 log entries before the hang the last one is doc > > #6,144. > > > Also it looks like Tomcat is trying to redeploy the webapp those last > > tomcat > > > entries repeat indefinitely every 10 seconds or so. Perhaps this is a > > Tomcat > > > problem? > > > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > > > INFO: add (id=110705) 0 36596 > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > > > INFO: add (id=110700) 0 36600 > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > > > INFO: add (id=110688) 0 36603 > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > > > INFO: add (id=110690) 0 36608 > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > > > INFO: add (id=110686) 0 36611 > > > Jul 26, 2006 1:25:36 PM > > org.apache.catalina.startup.HostConfigcheckResources > > > FINE: Checking context[] redeploy resource /source/solr/apache- > > tomcat-5.5.17 > > > /webapps/ROOT > > > Jul 26, 2006 1:25:36 PM > > org.apache.catalina.startup.HostConfigcheckResources > > > FINE: Checking context[] redeploy resource /source/solr/apache- > > tomcat-5.5.17 > > > /webapps/ROOT/META-INF/context.xml > > > Jul 26, 2006 1:25:36 PM > > org.apache.catalina.startup.HostConfigcheckResources > > > FINE: Checking context[] reload resource /source/solr/apache- > > tomcat-5.5.17 > > > /webapps/ROOT/WEB-INF/web.xml > > > Jul 26, 2006 1:25:36 PM > > org.apache.catalina.startup.HostConfigcheckResources > > > FINE: Checking context[] reload resource /source/solr/apache- > > tomcat-5.5.17 > > > /webapps/ROOT/META-INF/context.xml > >
Re: Doc add limit
Right now the heap is set to 512M but I've increased it up to 2GB and yet it still hangs at the same number 6,144... Here's something interesting... I pushed this code over to a different server and tried an update. On that server it's hanging on #5,267. Then tomcat seems to try to reload the webapp... indefinitely. So I guess this is looking more like a tomcat problem more than a lucene/solr problem huh? -Sangraal On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: So it looks like your client is hanging trying to send somethig over the socket to the server and blocking... probably because Tomcat isn't reading anything from the socket because it's busy trying to restart the webapp. What is the heap size of the server? try increasing it... maybe tomcat could have detected low memory and tried to reload the webapp. -Yonik On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Thanks for you help Yonik, I've responded to your questions below: > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > It's possible it's not hanging, but just takes a long time on a > > specific add. This is because Lucene will occasionally merge > > segments. When very large segments are merged, it can take a long > > time. > > > I've left it running (hung) for up to a half hour at a time and I've > verified that my cpu idles during the hang. I have witnessed much shorter > hangs on the ramp up to my 6,144 limit but they have been more like 2 - 10 > seconds in length. Perhaps this is the Lucene merging you mentioned. > > In the log file, add commands are followed by the number of > > milliseconds the operation took. Next time Solr hangs, wait for a > > number of minutes until you see the operation logged and note how long > > it took. > > > Here are the last 5 log entries before the hang the last one is doc #6,144. > Also it looks like Tomcat is trying to redeploy the webapp those last tomcat > entries repeat indefinitely every 10 seconds or so. Perhaps this is a Tomcat > problem? > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > INFO: add (id=110705) 0 36596 > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > INFO: add (id=110700) 0 36600 > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > INFO: add (id=110688) 0 36603 > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > INFO: add (id=110690) 0 36608 > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update > INFO: add (id=110686) 0 36611 > Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources > FINE: Checking context[] redeploy resource /source/solr/apache- tomcat-5.5.17 > /webapps/ROOT > Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources > FINE: Checking context[] redeploy resource /source/solr/apache- tomcat-5.5.17 > /webapps/ROOT/META-INF/context.xml > Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources > FINE: Checking context[] reload resource /source/solr/apache- tomcat-5.5.17 > /webapps/ROOT/WEB-INF/web.xml > Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources > FINE: Checking context[] reload resource /source/solr/apache- tomcat-5.5.17 > /webapps/ROOT/META-INF/context.xml > Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources > FINE: Checking context[] reload resource /source/solr/apache- tomcat-5.5.17 > /conf/context.xml > > How many documents are in the index before you do a batch that causes > > a hang? Does it happen on the first batch? If so, you might be > > seeing some other bug. What appserver are you using? Do the admin > > pages respond when you see this hang? If so, what does a stack trace > > look like? > > > I actually don't think I had the problem on the first batch, in fact my > first batch contained very close to 6,144 documents so perhaps there is a > relation there. Right now, I'm adding to an index with close to 90,000 > documents in it. > I'm running Tomcat 5.5.17 and the admin pages respond just fine when it's > hung... I did a thread dump and this is the trace of my update: > > "http-8080-Processor25" Id=33 in RUNNABLE (running in native) total cpu > time=6330.7360ms user time=5769.5920ms > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java :92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java :105) > at java.io.PrintStream.write(PrintStream.java:412) > at java.io.ByteArrayOutputStream.writeTo(Byte
Re: Doc add limit
ina.core.StandardHostValve.invoke( StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process( Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection (Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java:684) at java.lang.Thread.run(Thread.java:613) -Yonik On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Hey there... I'm having an issue with large doc updates on my solr > installation. I'm adding in batches between 2-20,000 docs at a time and I've > noticed Solr seems to hang at 6,144 docs every time. Breaking the adds into > smaller batches works just fine, but I was wondering if anyone knew why this > would happen. I've tried doubling memory as well as tweaking various config > options but nothing seems to let me break the 6,144 barrier. > > This is the output from Solr admin. Any help would be greatly appreciated. > > > *name: * updateHandler *class: * > org.apache.solr.update.DirectUpdateHandler2 *version: * 1.0 *description: > * Update handler that efficiently directly updates the on-disk main lucene > index *stats: *commits : 0 > optimizes : 0 > docsPending : 6144 > deletesPending : 6144 > adds : 6144 > deletesById : 0 > deletesByQuery : 0 > errors : 0 > cumulative_adds : 6144 > cumulative_deletesById : 0 > cumulative_deletesByQuery : 0 > cumulative_errors : 0 > docsDeleted : 0 > >
Doc add limit
Hey there... I'm having an issue with large doc updates on my solr installation. I'm adding in batches between 2-20,000 docs at a time and I've noticed Solr seems to hang at 6,144 docs every time. Breaking the adds into smaller batches works just fine, but I was wondering if anyone knew why this would happen. I've tried doubling memory as well as tweaking various config options but nothing seems to let me break the 6,144 barrier. This is the output from Solr admin. Any help would be greatly appreciated. *name: * updateHandler *class: * org.apache.solr.update.DirectUpdateHandler2 *version: * 1.0 *description: * Update handler that efficiently directly updates the on-disk main lucene index *stats: *commits : 0 optimizes : 0 docsPending : 6144 deletesPending : 6144 adds : 6144 deletesById : 0 deletesByQuery : 0 errors : 0 cumulative_adds : 6144 cumulative_deletesById : 0 cumulative_deletesByQuery : 0 cumulative_errors : 0 docsDeleted : 0