[
https://issues.apache.org/jira/browse/DERBY-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kristian Waagan updated DERBY-3907:
-----------------------------------
Attachment: derby-3907-7a-write_new_header_format.stat
derby-3907-7a-write_new_header_format.diff
The patch 'derby-3907-7a-write_new_header_format.diff' is the second attempt at
implementing the handling of the new stream header format.
Hopefully, the performance of reading/writing CHAR, VARCHAR and LONG VARCHAR
hasn't suffered, but I'll run some performance tests to confirm it. The
performance for reading the old header format for Clobs (i.e. accessing old
databases) may suffer a little bit, because in some situations too many bytes
are read and the stream has to be reset before reading it again. I need to test
this as well.
Description of the changes:
* EmbedResultSet
Adjusted the usage of the ReaderToUTF8Stream constructor. Note the usage of
'setSoftUpgradeMode', which is required because the stream can be read and
written when the context stack hasn't been set up. This is true for updatable
result sets. When there is no context, the context service fails to obtain the
DatabaseContext used by the generator to get to the data dictionary.
* EmbedPreparedStatement
Adjusted the usage of the ReaderToUTF8Stream constructor.
* ArrayInputStream
Added an argument to 'readDerbyUTF'. The stream header is now read outside
of this method.
* StreamHeaderGenerator
Added an interface for generating stream headers.
* CharStreamHeaderGenerator
New class generating old-style headers (two bytes long). Always used for
CHAR, VARCHAR and LONG VARCHAR. In addition, it is also used for CLOB in pre
10.5 databases (i.e. soft upgrade mode).
* ClobStreamHeaderGenerator
New class generating new-style headers (five bytes long). Used only for
CLOBs written into a 10.5 database. If a old-style header is needed, the work
is delegated to CharStreamHeaderGenerator.
* ReaderToUTF8Stream
Added a StreamHeaderGenerator to the constructors, and updated the header
writing logic to use it. Also added a constant to distinguish the first
invocation of 'fillBuffer'. The header is generated on the first invocation,
and possibly updated again in 'checkSufficientData'.
* StringDataValue
Replaced method 'generateStreamHeader(long)' with
'getStreamHeaderGenerator()'. Added method 'setSoftUpgradeMode'. The latter is
used in situations where the generator itself is unable to determine of the
database being written into is in soft upgrade mode.
* SQLChar
Factored out code to write the modified UTF-8 format (see 'writeUTF').
Updated 'writeExternal', which will now only be invoked for non-CLOB data
values. Added method 'writeClobUTF', which is used to write CLOB data values.
It is kept in SQLChar to avoid having to make more of the internal state
available to the subclasses. Added a second version of 'readExternal', which is
the one doing the actual work. It takes both a byte count and a char count,
where both can can by unknown. Implemented the new method in StringDataValue.
* SQLClob
Added variable 'inSoftUpgradeMode', which tells if the DVD is used in a
database being in soft upgrade mode or not. This must be known to generate the
correct header format. Note that this may be unknown, in which case the header
generator itself will try to determine the mode. Implemented 'getLength', which
will obtain the length from the stream header, delegate the work to
'SQLChar.getLength' if the value is not a stream, or decode the stream data if
the length is not stored in the header. The data value is not materialized.
Added support to read both stream header formats in 'getStreamWithDescriptor'.
Implemented 'investigateStream' to decode the header. Added 'writeExternal',
'readExternal' and 'readExternalFromArray'. In general, some preparation steps
are taken and then the work is delegated to SQLChar. Added utility class
HeaderInfo.
* StreamHeaderHolder
Deleted the class.
* UTF8UtilTest
Updated code to use the new generator class.
Patch ready for review.
I have run the regression tests without failures, but due to a small last
minute change I will run them again tonight.
I have also tried reading and writing Clob from a 10.4 database manually, both
in soft and hard upgrade mode.
Based on Mike's suggestion, I was hoping that a table compress would update the
old headers to the new header format after a hard upgrade. This is in principle
correct, but the new header is written with "unknown length" encoded. I haven't
investigated how to best solve this problem.
> Save useful length information for Clobs in store
> -------------------------------------------------
>
> Key: DERBY-3907
> URL: https://issues.apache.org/jira/browse/DERBY-3907
> Project: Derby
> Issue Type: Improvement
> Components: JDBC, Store
> Affects Versions: 10.5.0.0
> Reporter: Kristian Waagan
> Assignee: Kristian Waagan
> Fix For: 10.5.0.0
>
> Attachments: derby-3907-1a-alternative_approach.diff,
> derby-3907-2b-header_write_preparation.diff,
> derby-3907-2b-header_write_preparation.diff,
> derby-3907-2b-header_write_preparation.stat,
> derby-3907-2c-header_write_preparation-PREVIEW.diff,
> derby-3907-2c-header_write_preparation-PREVIEW.stat,
> derby-3907-2c-header_write_preparation.diff,
> derby-3907-2c-header_write_preparation.diff,
> derby-3907-2c-header_write_preparation.stat,
> derby-3907-3a-readertoutf8stream_cleanup.diff,
> derby-3907-3a-readertoutf8stream_cleanup.diff,
> derby-3907-3a-readertoutf8stream_cleanup.stat,
> derby-3907-3b-readertoutf8stream_cleanup.diff,
> derby-3907-4a-add_getStreamWithDescriptor.diff,
> derby-3907-4a-add_getStreamWithDescriptor.stat,
> derby-3907-5a-use_getStreamWithDescriptor.diff,
> derby-3907-5a-use_getStreamWithDescriptor.stat,
> derby-3907-6a-SQLClob_stream_descriptor_sync.diff,
> derby-3907-7a-write_new_header_format-PREVIEW.diff,
> derby-3907-7a-write_new_header_format.diff,
> derby-3907-7a-write_new_header_format.stat
>
>
> The store should save useful length information for Clobs. This allows the
> length to be found without decoding the whole data stream.
> The following thread raised the issue on what information to store, and also
> contains some background information:
> http://www.nabble.com/Storing-length-information-for-CLOB-on-disk-tp19197535p19197535.html
> The information to store, and the exact format of it, is still to be
> discussed/determined.
> Currently two bytes are set aside for length information, which is inadequate.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.