[ 
https://issues.apache.org/jira/browse/DERBY-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639082#action_12639082
 ] 

Kristian Waagan commented on DERBY-3907:
----------------------------------------

[Header format]

Mike wrote:
-----
What does the following mean? Will the changes apply to all sql which inserts 
clobs, or to only particular jdbc interfaces?
1) Clob modifications are done on a copy (i.e. TemporaryClob).
-----
With Clob modifications I mean updates of parts of an existing Clob. To get 
into this state, one must first do a select to get the Clob that has already 
been stored in the database. I think updating parts of the Clob can only be 
done through the Clob interface. Is that correct?

The ResultSet.updateXXX-methods can be seen as inserting a new Clob.
My current hope is that all insertion will go through ReaderToUTF8Stream, which 
seems like a good place to count characters (and bytes) and obtain bytes per 
char statistics.

There might be a slight complication as we allow using setString on Clob 
columns.

-----
What is the expected call sequence to store, and what is the goal performance 
characteristic?
-----
The expected call sequence is exactly as you describe it (see Mike's comment 
from 10/Oct/08 10:10 AM).
Depending on the information we need to obtain, the header can be written at 
once or as the last step of insertion. Even if we only store length 
information, we need to support the latter due to the lengthless JDBC methods.

The goal performance characteristic for the length operation is that getting 
the length for the largest storable Clob should be as fast as for the shortest 
one (read first page and decode stream header bytes). This is not the case 
today, because the Clob data must be decoded to find the length. Besides from 
Clob.getLength, this is hurting us where other methods do argument checking 
using the Clob length.

Positioning can be expressed with costs like this:
[reset stream] + decode_chars + skip_bytes  
In certain cases, we can remove the decoding costs by knowing that all chars 
are represented by one, two or three bytes. In these cases, the positioning 
cost should be as for Blob. This is the motivation for the bytes per char 
information.

> Save useful length information for Clobs in store
> -------------------------------------------------
>
>                 Key: DERBY-3907
>                 URL: https://issues.apache.org/jira/browse/DERBY-3907
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC, Store
>    Affects Versions: 10.5.0.0
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>
> The store should save useful length information for Clobs. This allows the 
> length to be found without decoding the whole data stream.
> The following thread raised the issue on what information to store, and also 
> contains some background information: 
> http://www.nabble.com/Storing-length-information-for-CLOB-on-disk-tp19197535p19197535.html
> The information to store, and the exact format of it, is still to be 
> discussed/determined.
> Currently two bytes are set aside for length information, which is inadequate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to