Daniel John Debrunner wrote:
Kristian Waagan wrote:

Hello,

I'm working on DERBY-1417; adding new lengthless overloads to the
streaming API.  So far, I have only been looking at implementing this in
the embedded driver. Based on some comments in the code, I have a few
questions and observations regarding truncation of trailing blanks in
the various character data types.

Type            Trail. blank trunc.     Where
====================================================================
CHAR                allowed             SQLChar.normalize
VARCHAR             allowed             SQLVarchar.normalize
LONG VARCHAR       disallowed           SQLLongVarchar.normalize
CLOB                allowed             streaming or
                                        SQLVarchar.normalize, depending
                                        on source.

As can be seen, only data for CLOB is truncated for trailing blanks in
the streaming class. We must still read all the data, or so much as we
need to know the insertion will fail, but we don't have to store it all
in memory.

Truncation of trailing blanks is not allowed at all for LONG VARCHAR
(according to code comments and bug 5592 - haven't found the place this
is stated in the specs).

My question is, should we do the truncate check for CHAR and VARCHAR on
the streaming level as well?
If we don't add this feature, inserting a
10GB file into a VARCHAR field by mistake will cause 10GB to be loaded
into memory even though the max size allowed is ~32K, possibly causing
out-of-memory errors. The error could be generated at an earlier stage
(possibly after reading ~32K +1 bytes).

I would say its a separate issue to the one you are addressing.
Applications most likely won't be inserting 10Gb values into
CHAR/VARCHAR columns using streams as it's not going to work.
Maybe enter a bug, but doesn't seem it has to be fixed as part of this
issue.

Handling this as a separate issue is better. The code changes would most likely go into the same class as my changes for DERBY-1417. It seems like a simple enough change, so I will enter a Jira improvement issue. I don't like getting an out-of-memory exception (heap).

As far as I can tell, adding this feature is a matter of modifying the
'ReaderToUTF8' class and the
'EmbedPrearedStatement.setCharacterStreamInternal' method.
One could also optimize the reading of data into LONG VARCHAR, where one
would abort the reading as soon as you can instead of taking it all into
memory first. This would require some special case handling in the
mentioned locations.


Another matter is that streams will not be checked for exact length
match when using the lengthless overloads, as we don't have a specified
length to compare against.
I have a preliminary implementation for setAsciiStream,
setCharacterStream  and setClob (all without length specifications) in
EmbedPreparedStatement.

What's the issue with setClob()? The current method doesn't take a
length. Has java.sql.Clob been changed to not always have a length?

There are/will be 3 setClob methods in PreparedStatement:
 1) setClob(int,Clob)
 2) setClob(int,Reader,long)
 3) setClob(int,Reader)

1) and 2) are already implemented, 3) is a new method added to JDBC. The same story goes for the other setXXXStream methods and updateXXXStream methods. These are located in PreparedStatement, CallableStatement and ResultSet.



--
Kristian

Dan.


Reply via email to