Re: Tuncation of trailing blanks and lengthless streaming overloads

2006-06-22 Thread Kristian Waagan

Daniel John Debrunner wrote:

Kristian Waagan wrote:

  

Hello,

I'm working on DERBY-1417; adding new lengthless overloads to the
streaming API.  So far, I have only been looking at implementing this in
the embedded driver. Based on some comments in the code, I have a few
questions and observations regarding truncation of trailing blanks in
the various character data types.

TypeTrail. blank trunc. Where

CHARallowed SQLChar.normalize
VARCHAR allowed SQLVarchar.normalize
LONG VARCHAR   disallowed   SQLLongVarchar.normalize
CLOBallowed streaming or
SQLVarchar.normalize, depending
on source.

As can be seen, only data for CLOB is truncated for trailing blanks in
the streaming class. We must still read all the data, or so much as we
need to know the insertion will fail, but we don't have to store it all
in memory.

Truncation of trailing blanks is not allowed at all for LONG VARCHAR
(according to code comments and bug 5592 - haven't found the place this
is stated in the specs).

My question is, should we do the truncate check for CHAR and VARCHAR on
the streaming level as well?
If we don't add this feature, inserting a
10GB file into a VARCHAR field by mistake will cause 10GB to be loaded
into memory even though the max size allowed is ~32K, possibly causing
out-of-memory errors. The error could be generated at an earlier stage
(possibly after reading ~32K +1 bytes).



I would say its a separate issue to the one you are addressing.
Applications most likely won't be inserting 10Gb values into
CHAR/VARCHAR columns using streams as it's not going to work.
Maybe enter a bug, but doesn't seem it has to be fixed as part of this
issue.
  


Handling this as a separate issue is better. The code changes would most 
likely go into the same class as my changes for DERBY-1417. It seems 
like a simple enough change, so I will enter a Jira improvement issue. I 
don't like getting an out-of-memory exception (heap).


  

As far as I can tell, adding this feature is a matter of modifying the
'ReaderToUTF8' class and the
'EmbedPrearedStatement.setCharacterStreamInternal' method.
One could also optimize the reading of data into LONG VARCHAR, where one
would abort the reading as soon as you can instead of taking it all into
memory first. This would require some special case handling in the
mentioned locations.


Another matter is that streams will not be checked for exact length
match when using the lengthless overloads, as we don't have a specified
length to compare against.
I have a preliminary implementation for setAsciiStream,
setCharacterStream  and setClob (all without length specifications) in
EmbedPreparedStatement.



What's the issue with setClob()? The current method doesn't take a
length. Has java.sql.Clob been changed to not always have a length?
  


There are/will be 3 setClob methods in PreparedStatement:
 1) setClob(int,Clob)
 2) setClob(int,Reader,long)
 3) setClob(int,Reader)

1) and 2) are already implemented, 3) is a new method added to JDBC. The 
same story goes for the other setXXXStream methods and updateXXXStream 
methods. These are located in PreparedStatement, CallableStatement and 
ResultSet.




--
Kristian


Dan.

  




Re: Tuncation of trailing blanks and lengthless streaming overloads

2006-06-22 Thread Daniel John Debrunner
Kristian Waagan wrote:

> Hello,
> 
> I'm working on DERBY-1417; adding new lengthless overloads to the
> streaming API.  So far, I have only been looking at implementing this in
> the embedded driver. Based on some comments in the code, I have a few
> questions and observations regarding truncation of trailing blanks in
> the various character data types.
> 
> TypeTrail. blank trunc. Where
> 
> CHARallowed SQLChar.normalize
> VARCHAR allowed SQLVarchar.normalize
> LONG VARCHAR   disallowed   SQLLongVarchar.normalize
> CLOBallowed streaming or
> SQLVarchar.normalize, depending
> on source.
> 
> As can be seen, only data for CLOB is truncated for trailing blanks in
> the streaming class. We must still read all the data, or so much as we
> need to know the insertion will fail, but we don't have to store it all
> in memory.
> 
> Truncation of trailing blanks is not allowed at all for LONG VARCHAR
> (according to code comments and bug 5592 - haven't found the place this
> is stated in the specs).
> 
> My question is, should we do the truncate check for CHAR and VARCHAR on
> the streaming level as well?
> If we don't add this feature, inserting a
> 10GB file into a VARCHAR field by mistake will cause 10GB to be loaded
> into memory even though the max size allowed is ~32K, possibly causing
> out-of-memory errors. The error could be generated at an earlier stage
> (possibly after reading ~32K +1 bytes).

I would say its a separate issue to the one you are addressing.
Applications most likely won't be inserting 10Gb values into
CHAR/VARCHAR columns using streams as it's not going to work.
Maybe enter a bug, but doesn't seem it has to be fixed as part of this
issue.

> As far as I can tell, adding this feature is a matter of modifying the
> 'ReaderToUTF8' class and the
> 'EmbedPrearedStatement.setCharacterStreamInternal' method.
> One could also optimize the reading of data into LONG VARCHAR, where one
> would abort the reading as soon as you can instead of taking it all into
> memory first. This would require some special case handling in the
> mentioned locations.
> 
> 
> Another matter is that streams will not be checked for exact length
> match when using the lengthless overloads, as we don't have a specified
> length to compare against.
> I have a preliminary implementation for setAsciiStream,
> setCharacterStream  and setClob (all without length specifications) in
> EmbedPreparedStatement.

What's the issue with setClob()? The current method doesn't take a
length. Has java.sql.Clob been changed to not always have a length?

Dan.



Tuncation of trailing blanks and lengthless streaming overloads

2006-06-22 Thread Kristian Waagan

Hello,

I'm working on DERBY-1417; adding new lengthless overloads to the
streaming API.  So far, I have only been looking at implementing this in
the embedded driver.  Based on some comments in the code, I have a few
questions and observations regarding truncation of trailing blanks in
the various character data types.

TypeTrail. blank trunc. Where

CHARallowed SQLChar.normalize
VARCHAR allowed SQLVarchar.normalize
LONG VARCHAR   disallowed   SQLLongVarchar.normalize
CLOBallowed streaming or
SQLVarchar.normalize, depending
on source.

As can be seen, only data for CLOB is truncated for trailing blanks in
the streaming class. We must still read all the data, or so much as we
need to know the insertion will fail, but we don't have to store it all
in memory.

Truncation of trailing blanks is not allowed at all for LONG VARCHAR
(according to code comments and bug 5592 - haven't found the place this
is stated in the specs).

My question is, should we do the truncate check for CHAR and VARCHAR on
the streaming level as well?
If we don't add this feature, inserting a
10GB file into a VARCHAR field by mistake will cause 10GB to be loaded
into memory even though the max size allowed is ~32K, possibly causing
out-of-memory errors. The error could be generated at an earlier stage
(possibly after reading ~32K +1 bytes).

As far as I can tell, adding this feature is a matter of modifying the
'ReaderToUTF8' class and the
'EmbedPrearedStatement.setCharacterStreamInternal' method.
One could also optimize the reading of data into LONG VARCHAR, where one
would abort the reading as soon as you can instead of taking it all into
memory first. This would require some special case handling in the
mentioned locations.


Another matter is that streams will not be checked for exact length
match when using the lengthless overloads, as we don't have a specified
length to compare against.
I have a preliminary implementation for setAsciiStream,
setCharacterStream  and setClob (all without length specifications) in
EmbedPreparedStatement.
I will continue my work by adding methods throwing
not-implemented-exceptions and implement the methods where appropriate.


Thoughts and feedback appreciated :)



--
Kristian