[ http://issues.apache.org/jira/browse/DERBY-302?page=all ]

Sunitha Kambhampati updated DERBY-302:
--------------------------------------

    Attachment: fixclobperf_0609.txt

Currently in derby,  for an insert on a clob using setCharacterStream what will 
happen is , the entire stream will be materialized into a char array and sent 
to store for the insert.  ( we should not have to stream here. I will file 
another jira issue for this and put in all information I learnt)

Given this is how inserts for large clobs are happening, the performance issue 
analysis is as follows:
--  profiler run shows that most time is spent in SQLChar.readExternal which is 
where the materialization into a char array for the user's input stream 
happens.  The growth of this array happens gradually till the entire stream is 
materialized into the array.  Below code snippet shows by how much the array is 
grown each time when it realizes it has to read more bytes from the stream.
 
The dvd hierarchy for clob is  -  SQLClob ( dvd) extends SQLVarChar extends 
SQLChar.

So in SQLChar.readExternal
........
    int growby = in.available();
   if(growby < 64)
       growby = 64
 and then an allocation and an arraycopy to the new allocated array.

--  In the code snippet,  'in' is the wrapper around the user's stream which is 
ReaderToUTF8Stream .   ReaderToUTF8Stream extends InputStream and  does not 
override available() method . As per the spec, InputStream.available() returns 
0.

-- Thus each time, the array growth is by 64 bytes which is obviously not 
performant.  so for a 500k clob insert, this would mean allocation & arraycopy 
steps happen  ~8000 times.

-- The ReaderToUTF8Stream that has the user's stream reads from the stream and 
does the utf8 conversion and puts it in a 4k array.  I think it is reasonable 
to have a 32k buffer to store this information for clobs.

Although I think there seems to be more possible optimizations in this area,  I 
prefer the incremental approach too :)  so this patch  is a first step towards 
fixing the insert clob performance in the current system.

Fix includes:
-- enhanced the way the array was grown to keep the original  64 bytes for char 
( seems reasonable given the upper limit for char) but override it to have  4k 
for varchar and clobs.
-- override available() in ReaderToUTF8Stream to return a better estimate of 
how many bytes can be read.

svn stat
M      java\engine\org\apache\derby\impl\jdbc\ReaderToUTF8Stream.java
M      java\engine\org\apache\derby\iapi\services\io\LimitReader.java
M      java\engine\org\apache\derby\iapi\types\SQLChar.java
M      java\engine\org\apache\derby\iapi\types\SQLVarchar.java

-- ran derbyall ok with sun jvm.

--  I can add a test and compare times but I realize that is probably not the 
best solution here.  It should ideally be part of a performance regression 
suite.

Numbers for clob inserts in seconds for one insert  on my laptop  - as per the 
jira issue.
With fix , times are in seconds for 1 insert on a clob on my laptop (windows, 
1G ram, 1.6Ghz Intel Pentium(M) )

FileSize            ibm jvm 1.4.2      sun jvm 1.4.2         sun jvm 1.5
500k               0.9s                     1.6s                        1.7s
1M                  2.1s                     4s                           5s
2M                  3s                        9s                           11s
4M                  7s                        18s                         22s


Without the fix, 500k with sun jvm takes 3 mins and ibm jvm takes 20 seconds.
I will add the test program along with the input files to jira issue.
_________________________
Without this fix :  As I already mentioned in the jira comment for derby302,  I 
changed the program in the attached jira entry to use BufferedReader with the 
buffersize set to a bigger value than the default ( to 64k) brought down the 
times for sun jvm closer to ibm jvm.  I noticed that in my test, if I ran the 
test multiple times and did multiple inserts the performance of sun jvm and ibm 
jvm for 500k clob was around 20 seconds - guess the jit kicks in , plus the OS 
cache may also be a factor..
________________________


> Takes over 3 minutes to insert a 500kb String into CLOB
> -------------------------------------------------------
>
>          Key: DERBY-302
>          URL: http://issues.apache.org/jira/browse/DERBY-302
>      Project: Derby
>         Type: Bug
>   Components: SQL
>  Environment: Windows XP Professional, Dell Pentium IV
>     Reporter: Glenn Orbon
>     Assignee: Sunitha Kambhampati
>  Attachments: fixclobperf_0609.txt
>
> I downloaded a Cloudescape Version 10 from here 
> http://www-106.ibm.com/developerworks/db2/library/techarticle/dm-0408cline/index.html.
>   Not sure exactly which version of 10 this is.
> Would like to store up to a 10MB XML string in a CLOB.  However, I noticed it 
> took over 3 minutes to insert a 500kb string.  I am using the 
> PreparedStatement to get around the 37kb limitation encountered in a previous 
> issue.  It takes less than 2 seconds to insert a 500kb string into a CLOB in 
> mySQl.
> Here the snippet:
>       FileReader fr = new FileReader ("sample.txt");  
>       BufferedReader br = new BufferedReader(fr);
>       String record = null;
>       try {
>         while ( (record=br.readLine()) != null ) {
>           bufferStr.append( record );
>       }
>       } catch (IOException e) {
>       //
>       // put your error-handling code here
>       //
>          System.out.println("Error reading file");
>       }
>       System.out.println("Size of inputStr: "+bufferStr.length() );
>       
>       PreparedStatement ps = mm.connection.prepareStatement("INSERT into  
> documents           VALUES (?,?)" );
>       ps.setInt(1, 13 );
>       StringReader reader1 = new StringReader(bufferStr.toString());
>       ps.setCharacterStream(2, reader1, bufferStr.length());
>       System.out.println("Uploading string....");
>       ps.execute();
>       System.out.println("Done uploading string...");
>       mm.connection.commit();
> Thanks for your the help,
> Glenn O.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to