[
https://issues.apache.org/jira/browse/DERBY-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kristian Waagan updated DERBY-3907:
-----------------------------------
Attachment: derby-3907-alternative_approach.diff
I got stuck trying to implement the original solution, so I tried an
alternative approach.
It is a lot simpler, but people might not like it. Note however, that it
follows roughly the same pattern as Blob.
Note the patch is a quick mash-up, and I want some feedback from the community.
The alternative approach is to make all classes writing and reading data from
store able to peek at it and determine which format it has to use to read/write
the data.
Including my second format, we have these two byte formats:
- current: D1_D2_DATA
- new: D4_D3_M_D2_D1_DATA
M is a magic byte, and is used to detect the new format. It is a illegal UTF-8
encoding, so it should not be possible to interpret it incorrectly as the first
format and data.
I have set M to F0 (11110000), but I'm masking out the last four bits when
looking for the magic byte. This makes it possible to have arbitrary many
formats, should that be necessary, the main point is to keep the four highest
bits set.
With respect to data corruption (i.e. one bit getting flipped), is this
approach safe enough?
So if we need to be able to store huge Clobs in the future, we could change M
and use another format:
- future: D6_D5_M_D4_D3_D2_D1_DATA
The same approach could be used to store other meta information.
The patch 'derby-3907-alternative_approach.diff' only changes behavior for
small Clobs. To enable a new format for a larger Clob, the streaming classes
have to be changed (ReaderToUTF8Stream, UTF8Reader).
It should be noted that these classes are used to write other character types
(CHAR, VARCHAR) as well, and I do not intend to change how they are
represented. This means that I have to include enough information to be able to
do the correct thing.
While the format can be detected on read, an informed decision must be made on
write. Now I'm consulting the data dictionary to check the database version,
and if it is less than 10.5 I use th e old format. Is there a better way?
Regarding the original approach, I got stuck because the upper layers of Derby
are sending down NULL values of the data types into store. The upper layer
don't have any context information, and is unable to choose the correct
implementation. The system doesn't seem to be set up for having multiple
implementations of a single data type at this level.
I ended up with a series of hacks, for instance having store override the Clob
implementation type, but it just didn't work very well. At one point I had
normal, soft- and hard-upgraded working, but compress table failed. I'm sure
this isn't the only path that will fail.
I might pick up the work again later, but right now I want to wait for a while
and work on other issues.
> Save useful length information for Clobs in store
> -------------------------------------------------
>
> Key: DERBY-3907
> URL: https://issues.apache.org/jira/browse/DERBY-3907
> Project: Derby
> Issue Type: Improvement
> Components: JDBC, Store
> Affects Versions: 10.5.0.0
> Reporter: Kristian Waagan
> Assignee: Kristian Waagan
> Attachments: derby-3907-alternative_approach.diff
>
>
> The store should save useful length information for Clobs. This allows the
> length to be found without decoding the whole data stream.
> The following thread raised the issue on what information to store, and also
> contains some background information:
> http://www.nabble.com/Storing-length-information-for-CLOB-on-disk-tp19197535p19197535.html
> The information to store, and the exact format of it, is still to be
> discussed/determined.
> Currently two bytes are set aside for length information, which is inadequate.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.