hi all when using post.jar to post very large xml files to solr like "java -Xmx1g -Durl=http://localhost/solr/update -jar post.jar sample.xml", it will use many java heap space. e.g. in our case, we will post xml files larger than 1GB. Because UpdateHandler in solr will do many things. It uses XMLStreamReader to parse xml file and contruct documents and add them into indice. while SimplePostTool just simple read the file and post it using http post protocol. when UpdateHandler is busy. it will not read data from system input buffer so SimplePostTool can't write data to system output buffer. Thus, SimplePostTool will write data into it's own java heap space and waste memory. In c like language in linux, we can control the speed of sender and reciever. After some test, we improved SimplePostTool when posting large files, it also use little memory. The key statement is: urlc.setFixedLengthStreamingMode(len); it's added atfter urlc.setAllowUserInteraction(false); urlc.setRequestProperty("Content-type", "text/xml; charset=" + POST_ENCODING);
where len is the length of the xml file by new File("xml file").length() because the file is encoded by utf8 and post encoding is also utf8, so we can set http request header without needing to read the whole file into memory to calculate it's size. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org