[ 
https://issues.apache.org/jira/browse/DERBY-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638538#action_12638538
 ] 

Kristian Waagan commented on DERBY-3907:
----------------------------------------

A few starting points for discussion follows (all about the meta-information 
for Clobs).
I have assumed the following prerequisites:
 1) Clob modifications are done on a copy (i.e. TemporaryClob).
 2) The meta-information is of fixed length and at the start of the data stream 
(first page), so that it can be updated after the data has been streamed to 
store.

a) Format specification byte
    Shall we use a format specification ("magic number") byte?

b) Maximum Clob length (in characters)
    How many bits shall we use for the Clob length?
    Is representing todays maximum (2G-1) enough, or should we make some 
headroom?

c) Storing byte length
    I mentioned storing the byte length as well, but haven't found any strong 
use cases.
    Opinions?

[Optimizations]

d) Bytes per character information
    Use a few bits to save byte per character information, which can be used to 
optimize positioning.
    If the value is different than 0, one can calculate the byte position from 
the char position without decoding the stream.
    This information must be obtained by looking at all the bytes in the Clob, 
typically when inserting it.
    Example with 2 bits:
      0 = unknown/mixed
      1 = one byte per char
      2 = two bytes per char
      3 = three bytes per char

e) Save "key positions" for the Clob
    For instance save the char/byte positions for 25%, 50% and 75% of the Clob.
    This increases space overhead, but reduces the decoding/positioning costs 
for large Clobs.
    Also adds some complexity to the positioning logic in upper layer code 
(i.e. above store).


Please comment on these issues.
Information about the upgrade issue is also appreciated.

> Save useful length information for Clobs in store
> -------------------------------------------------
>
>                 Key: DERBY-3907
>                 URL: https://issues.apache.org/jira/browse/DERBY-3907
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC, Store
>    Affects Versions: 10.5.0.0
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>
> The store should save useful length information for Clobs. This allows the 
> length to be found without decoding the whole data stream.
> The following thread raised the issue on what information to store, and also 
> contains some background information: 
> http://www.nabble.com/Storing-length-information-for-CLOB-on-disk-tp19197535p19197535.html
> The information to store, and the exact format of it, is still to be 
> discussed/determined.
> Currently two bytes are set aside for length information, which is inadequate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to