hi all
    when using post.jar to post very large xml files to solr like
"java -Xmx1g -Durl=http://localhost/solr/update -jar post.jar
sample.xml",
it will use many java heap space. e.g. in our case, we will post  xml
files larger than 1GB.
    Because UpdateHandler in solr will do many things. It uses
XMLStreamReader to parse xml file and contruct documents and add them
into indice. while SimplePostTool just simple read the file and post
it using http post protocol. when UpdateHandler is busy. it will not
read data from system input buffer so SimplePostTool can't write data
to system output buffer. Thus, SimplePostTool will write data into
it's own java heap space and waste memory. In c like language in
linux, we can control the speed of sender and reciever.
    After some test, we improved SimplePostTool  when posting large
files, it also use little memory.
    The key statement is:
      urlc.setFixedLengthStreamingMode(len);
    it's added atfter
      urlc.setAllowUserInteraction(false);
      urlc.setRequestProperty("Content-type", "text/xml; charset=" +
POST_ENCODING);

    where len is the length of the xml file by new File("xml file").length()

    because the file is encoded by utf8 and post encoding is also
utf8, so we can set http request header without needing to read the
whole
file into memory to calculate it's size.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to