[jira] Updated: (DERBY-3907) Save useful length information for Clobs in store

Kristian Waagan (JIRA) Wed, 26 Nov 2008 05:07:47 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kristian Waagan updated DERBY-3907:
-----------------------------------

    Attachment: derby-3907-alternative_approach.diff

I got stuck trying to implement the original solution, so I tried an 
alternative approach.

It is a lot simpler, but people might not like it. Note however, that it 
follows roughly the same pattern as Blob.
Note the patch is a quick mash-up, and I want some feedback from the community.

The alternative approach is to make all classes writing and reading data from 
store able to peek at it and determine which format it has to use to read/write 
the data.
Including my second format, we have these two byte formats:
 - current: D1_D2_DATA
 - new: D4_D3_M_D2_D1_DATA

M is a magic byte, and is used to detect the new format. It is a illegal UTF-8 
encoding, so it should not be possible to interpret it incorrectly as the first 
format and data.
I have set M to F0 (11110000), but I'm masking out the last four bits when 
looking for the magic byte. This makes it possible to have arbitrary many 
formats, should that be necessary, the main point is to keep the four highest 
bits set.
With respect to data corruption (i.e. one bit getting flipped), is this 
approach safe enough?

So if we need to be able to store huge Clobs in the future, we could change M 
and use another format:
 - future: D6_D5_M_D4_D3_D2_D1_DATA
The same approach could be used to store other meta information.

The patch 'derby-3907-alternative_approach.diff' only changes behavior for 
small Clobs. To enable a new format for a larger Clob, the streaming classes 
have to be changed (ReaderToUTF8Stream, UTF8Reader).
It should be noted that these classes are used to write other character types 
(CHAR, VARCHAR) as well, and I do not intend to change how they are 
represented. This means that I have to include enough information to be able to 
do the correct thing.

While the format can be detected on read, an informed decision must be made on 
write. Now I'm consulting the data dictionary to check the database version, 
and if it is less than 10.5 I use th e old format. Is there a better way?


Regarding the original approach, I got stuck because the upper layers of Derby 
are sending down NULL values of the data types into store. The upper layer 
don't have any context information, and is unable to choose the correct 
implementation. The system doesn't seem to be set up for having multiple 
implementations of a single data type at this level.
I ended up with a series of hacks, for instance having store override the Clob 
implementation type, but it just didn't work very well. At one point I had 
normal, soft- and hard-upgraded working, but compress table failed. I'm sure 
this isn't the only path that will fail.

I might pick up the work again later, but right now I want to wait for a while 
and work on other issues.

> Save useful length information for Clobs in store
> -------------------------------------------------
>
>                 Key: DERBY-3907
>                 URL: https://issues.apache.org/jira/browse/DERBY-3907
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC, Store
>    Affects Versions: 10.5.0.0
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>         Attachments: derby-3907-alternative_approach.diff
>
>
> The store should save useful length information for Clobs. This allows the 
> length to be found without decoding the whole data stream.
> The following thread raised the issue on what information to store, and also 
> contains some background information: 
> http://www.nabble.com/Storing-length-information-for-CLOB-on-disk-tp19197535p19197535.html
> The information to store, and the exact format of it, is still to be 
> discussed/determined.
> Currently two bytes are set aside for length information, which is inadequate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3907) Save useful length information for Clobs in store

Reply via email to