[ http://issues.apache.org/jira/browse/DERBY-302?page=all ]
Sunitha Kambhampati updated DERBY-302: -------------------------------------- Attachment: fixclobperf_0609.txt Currently in derby, for an insert on a clob using setCharacterStream what will happen is , the entire stream will be materialized into a char array and sent to store for the insert. ( we should not have to stream here. I will file another jira issue for this and put in all information I learnt) Given this is how inserts for large clobs are happening, the performance issue analysis is as follows: -- profiler run shows that most time is spent in SQLChar.readExternal which is where the materialization into a char array for the user's input stream happens. The growth of this array happens gradually till the entire stream is materialized into the array. Below code snippet shows by how much the array is grown each time when it realizes it has to read more bytes from the stream. The dvd hierarchy for clob is - SQLClob ( dvd) extends SQLVarChar extends SQLChar. So in SQLChar.readExternal ........ int growby = in.available(); if(growby < 64) growby = 64 and then an allocation and an arraycopy to the new allocated array. -- In the code snippet, 'in' is the wrapper around the user's stream which is ReaderToUTF8Stream . ReaderToUTF8Stream extends InputStream and does not override available() method . As per the spec, InputStream.available() returns 0. -- Thus each time, the array growth is by 64 bytes which is obviously not performant. so for a 500k clob insert, this would mean allocation & arraycopy steps happen ~8000 times. -- The ReaderToUTF8Stream that has the user's stream reads from the stream and does the utf8 conversion and puts it in a 4k array. I think it is reasonable to have a 32k buffer to store this information for clobs. Although I think there seems to be more possible optimizations in this area, I prefer the incremental approach too :) so this patch is a first step towards fixing the insert clob performance in the current system. Fix includes: -- enhanced the way the array was grown to keep the original 64 bytes for char ( seems reasonable given the upper limit for char) but override it to have 4k for varchar and clobs. -- override available() in ReaderToUTF8Stream to return a better estimate of how many bytes can be read. svn stat M java\engine\org\apache\derby\impl\jdbc\ReaderToUTF8Stream.java M java\engine\org\apache\derby\iapi\services\io\LimitReader.java M java\engine\org\apache\derby\iapi\types\SQLChar.java M java\engine\org\apache\derby\iapi\types\SQLVarchar.java -- ran derbyall ok with sun jvm. -- I can add a test and compare times but I realize that is probably not the best solution here. It should ideally be part of a performance regression suite. Numbers for clob inserts in seconds for one insert on my laptop - as per the jira issue. With fix , times are in seconds for 1 insert on a clob on my laptop (windows, 1G ram, 1.6Ghz Intel Pentium(M) ) FileSize ibm jvm 1.4.2 sun jvm 1.4.2 sun jvm 1.5 500k 0.9s 1.6s 1.7s 1M 2.1s 4s 5s 2M 3s 9s 11s 4M 7s 18s 22s Without the fix, 500k with sun jvm takes 3 mins and ibm jvm takes 20 seconds. I will add the test program along with the input files to jira issue. _________________________ Without this fix : As I already mentioned in the jira comment for derby302, I changed the program in the attached jira entry to use BufferedReader with the buffersize set to a bigger value than the default ( to 64k) brought down the times for sun jvm closer to ibm jvm. I noticed that in my test, if I ran the test multiple times and did multiple inserts the performance of sun jvm and ibm jvm for 500k clob was around 20 seconds - guess the jit kicks in , plus the OS cache may also be a factor.. ________________________ > Takes over 3 minutes to insert a 500kb String into CLOB > ------------------------------------------------------- > > Key: DERBY-302 > URL: http://issues.apache.org/jira/browse/DERBY-302 > Project: Derby > Type: Bug > Components: SQL > Environment: Windows XP Professional, Dell Pentium IV > Reporter: Glenn Orbon > Assignee: Sunitha Kambhampati > Attachments: fixclobperf_0609.txt > > I downloaded a Cloudescape Version 10 from here > http://www-106.ibm.com/developerworks/db2/library/techarticle/dm-0408cline/index.html. > Not sure exactly which version of 10 this is. > Would like to store up to a 10MB XML string in a CLOB. However, I noticed it > took over 3 minutes to insert a 500kb string. I am using the > PreparedStatement to get around the 37kb limitation encountered in a previous > issue. It takes less than 2 seconds to insert a 500kb string into a CLOB in > mySQl. > Here the snippet: > FileReader fr = new FileReader ("sample.txt"); > BufferedReader br = new BufferedReader(fr); > String record = null; > try { > while ( (record=br.readLine()) != null ) { > bufferStr.append( record ); > } > } catch (IOException e) { > // > // put your error-handling code here > // > System.out.println("Error reading file"); > } > System.out.println("Size of inputStr: "+bufferStr.length() ); > > PreparedStatement ps = mm.connection.prepareStatement("INSERT into > documents VALUES (?,?)" ); > ps.setInt(1, 13 ); > StringReader reader1 = new StringReader(bufferStr.toString()); > ps.setCharacterStream(2, reader1, bufferStr.length()); > System.out.println("Uploading string...."); > ps.execute(); > System.out.println("Done uploading string..."); > mm.connection.commit(); > Thanks for your the help, > Glenn O. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira