[jira] Updated: (DERBY-3907) Save useful length information for Clobs in store

Kristian Waagan (JIRA) Wed, 21 Jan 2009 06:28:27 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kristian Waagan updated DERBY-3907:
-----------------------------------

    Attachment: derby-3907-7a-write_new_header_format.stat
                derby-3907-7a-write_new_header_format.diff

The patch 'derby-3907-7a-write_new_header_format.diff' is the second attempt at 
implementing the handling of the new stream header format.

Hopefully, the performance of reading/writing CHAR, VARCHAR and LONG VARCHAR 
hasn't suffered, but I'll run some performance tests to confirm it. The 
performance for reading the old header format for Clobs (i.e. accessing old 
databases) may suffer a little bit, because in some situations too many bytes 
are read and the stream has to be reset before reading it again. I need to test 
this as well.

Description of the changes:
* EmbedResultSet
   Adjusted the usage of the ReaderToUTF8Stream constructor. Note the usage of 
'setSoftUpgradeMode', which is required because the stream can be read and 
written when the context stack hasn't been set up. This is true for updatable 
result sets. When there is no context, the context service fails to obtain the 
DatabaseContext used by the generator to get to the data dictionary.

 * EmbedPreparedStatement
   Adjusted the usage of the ReaderToUTF8Stream constructor.

 * ArrayInputStream
   Added an argument to 'readDerbyUTF'. The stream header is now read outside 
of this method.

 * StreamHeaderGenerator
   Added an interface for generating stream headers.

 * CharStreamHeaderGenerator
   New class generating old-style headers (two bytes long). Always used for 
CHAR, VARCHAR and LONG VARCHAR. In addition, it is also used for CLOB in pre 
10.5 databases (i.e. soft upgrade mode).

 * ClobStreamHeaderGenerator
   New class generating new-style headers (five bytes long). Used only for 
CLOBs written into a 10.5 database. If a old-style header is needed, the work 
is delegated to CharStreamHeaderGenerator.

 * ReaderToUTF8Stream
   Added  a StreamHeaderGenerator to the constructors, and updated the header 
writing logic to use it. Also added a constant to distinguish the first 
invocation of 'fillBuffer'. The header is generated on the first invocation, 
and possibly updated again in 'checkSufficientData'.

 * StringDataValue
   Replaced method 'generateStreamHeader(long)' with 
'getStreamHeaderGenerator()'. Added method 'setSoftUpgradeMode'. The latter is 
used in situations where the generator itself is unable to determine of the 
database being written into is in soft upgrade mode.

 * SQLChar
   Factored out code to write the modified UTF-8 format (see 'writeUTF'). 
Updated 'writeExternal', which will now only be invoked for non-CLOB data 
values. Added method 'writeClobUTF', which is used to write CLOB data values. 
It is kept in SQLChar to avoid having to make more of the internal state 
available to the subclasses. Added a second version of 'readExternal', which is 
the one doing the actual work. It takes both a byte count and a char count, 
where both can can by unknown. Implemented the new method in StringDataValue.

* SQLClob
   Added variable 'inSoftUpgradeMode', which tells if the DVD is used in a 
database being in soft upgrade mode or not. This must be known to generate the 
correct header format.  Note that this may be unknown, in which case the header 
generator itself will try to determine the mode. Implemented 'getLength', which 
will obtain the length from the stream header, delegate the work to 
'SQLChar.getLength' if the value is not a stream, or decode the stream data if 
the length is not stored in the header. The data value is not materialized. 
Added support to read both stream header formats in 'getStreamWithDescriptor'. 
Implemented 'investigateStream' to decode the header.  Added 'writeExternal', 
'readExternal' and 'readExternalFromArray'. In general, some preparation steps 
are taken and then the work is delegated to SQLChar. Added utility class 
HeaderInfo.

 * StreamHeaderHolder
   Deleted the class.

 * UTF8UtilTest
   Updated code to use the new generator class.

Patch ready for review.
I have run the regression tests without failures, but due to a small last 
minute change I will run them again tonight.
I have also tried reading and writing Clob from a 10.4 database manually, both 
in soft and hard upgrade mode.

Based on Mike's suggestion, I was hoping that a table compress would update the 
old headers to the new header format after a hard upgrade. This is in principle 
correct, but the new header is written with "unknown length" encoded. I haven't 
investigated how to best solve this problem.

> Save useful length information for Clobs in store
> -------------------------------------------------
>
>                 Key: DERBY-3907
>                 URL: https://issues.apache.org/jira/browse/DERBY-3907
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC, Store
>    Affects Versions: 10.5.0.0
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>             Fix For: 10.5.0.0
>
>         Attachments: derby-3907-1a-alternative_approach.diff, 
> derby-3907-2b-header_write_preparation.diff, 
> derby-3907-2b-header_write_preparation.diff, 
> derby-3907-2b-header_write_preparation.stat, 
> derby-3907-2c-header_write_preparation-PREVIEW.diff, 
> derby-3907-2c-header_write_preparation-PREVIEW.stat, 
> derby-3907-2c-header_write_preparation.diff, 
> derby-3907-2c-header_write_preparation.diff, 
> derby-3907-2c-header_write_preparation.stat, 
> derby-3907-3a-readertoutf8stream_cleanup.diff, 
> derby-3907-3a-readertoutf8stream_cleanup.diff, 
> derby-3907-3a-readertoutf8stream_cleanup.stat, 
> derby-3907-3b-readertoutf8stream_cleanup.diff, 
> derby-3907-4a-add_getStreamWithDescriptor.diff, 
> derby-3907-4a-add_getStreamWithDescriptor.stat, 
> derby-3907-5a-use_getStreamWithDescriptor.diff, 
> derby-3907-5a-use_getStreamWithDescriptor.stat, 
> derby-3907-6a-SQLClob_stream_descriptor_sync.diff, 
> derby-3907-7a-write_new_header_format-PREVIEW.diff, 
> derby-3907-7a-write_new_header_format.diff, 
> derby-3907-7a-write_new_header_format.stat
>
>
> The store should save useful length information for Clobs. This allows the 
> length to be found without decoding the whole data stream.
> The following thread raised the issue on what information to store, and also 
> contains some background information: 
> http://www.nabble.com/Storing-length-information-for-CLOB-on-disk-tp19197535p19197535.html
> The information to store, and the exact format of it, is still to be 
> discussed/determined.
> Currently two bytes are set aside for length information, which is inadequate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3907) Save useful length information for Clobs in store

Reply via email to