hi all
when using post.jar to post very large xml files to solr like
"java -Xmx1g -Durl=http://localhost/solr/update -jar post.jar
sample.xml",
it will use many java heap space. e.g. in our case, we will post xml
files larger than 1GB.
Because UpdateHandler in solr will do many things. It uses
XMLStreamReader to parse xml file and contruct documents and add them
into indice. while SimplePostTool just simple read the file and post
it using http post protocol. when UpdateHandler is busy. it will not
read data from system input buffer so SimplePostTool can't write data
to system output buffer. Thus, SimplePostTool will write data into
it's own java heap space and waste memory. In c like language in
linux, we can control the speed of sender and reciever.
After some test, we improved SimplePostTool when posting large
files, it also use little memory.
The key statement is:
urlc.setFixedLengthStreamingMode(len);
it's added atfter
urlc.setAllowUserInteraction(false);
urlc.setRequestProperty("Content-type", "text/xml; charset=" +
POST_ENCODING);
where len is the length of the xml file by new File("xml file").length()
because the file is encoded by utf8 and post encoding is also
utf8, so we can set http request header without needing to read the
whole
file into memory to calculate it's size.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]