Re: Tuncation of trailing blanks and lengthless streaming overloads
Daniel John Debrunner wrote: Kristian Waagan wrote: Hello, I'm working on DERBY-1417; adding new lengthless overloads to the streaming API. So far, I have only been looking at implementing this in the embedded driver. Based on some comments in the code, I have a few questions and observations regarding truncation of trailing blanks in the various character data types. TypeTrail. blank trunc. Where CHARallowed SQLChar.normalize VARCHAR allowed SQLVarchar.normalize LONG VARCHAR disallowed SQLLongVarchar.normalize CLOBallowed streaming or SQLVarchar.normalize, depending on source. As can be seen, only data for CLOB is truncated for trailing blanks in the streaming class. We must still read all the data, or so much as we need to know the insertion will fail, but we don't have to store it all in memory. Truncation of trailing blanks is not allowed at all for LONG VARCHAR (according to code comments and bug 5592 - haven't found the place this is stated in the specs). My question is, should we do the truncate check for CHAR and VARCHAR on the streaming level as well? If we don't add this feature, inserting a 10GB file into a VARCHAR field by mistake will cause 10GB to be loaded into memory even though the max size allowed is ~32K, possibly causing out-of-memory errors. The error could be generated at an earlier stage (possibly after reading ~32K +1 bytes). I would say its a separate issue to the one you are addressing. Applications most likely won't be inserting 10Gb values into CHAR/VARCHAR columns using streams as it's not going to work. Maybe enter a bug, but doesn't seem it has to be fixed as part of this issue. Handling this as a separate issue is better. The code changes would most likely go into the same class as my changes for DERBY-1417. It seems like a simple enough change, so I will enter a Jira improvement issue. I don't like getting an out-of-memory exception (heap). As far as I can tell, adding this feature is a matter of modifying the 'ReaderToUTF8' class and the 'EmbedPrearedStatement.setCharacterStreamInternal' method. One could also optimize the reading of data into LONG VARCHAR, where one would abort the reading as soon as you can instead of taking it all into memory first. This would require some special case handling in the mentioned locations. Another matter is that streams will not be checked for exact length match when using the lengthless overloads, as we don't have a specified length to compare against. I have a preliminary implementation for setAsciiStream, setCharacterStream and setClob (all without length specifications) in EmbedPreparedStatement. What's the issue with setClob()? The current method doesn't take a length. Has java.sql.Clob been changed to not always have a length? There are/will be 3 setClob methods in PreparedStatement: 1) setClob(int,Clob) 2) setClob(int,Reader,long) 3) setClob(int,Reader) 1) and 2) are already implemented, 3) is a new method added to JDBC. The same story goes for the other setXXXStream methods and updateXXXStream methods. These are located in PreparedStatement, CallableStatement and ResultSet. -- Kristian Dan.
Re: Tuncation of trailing blanks and lengthless streaming overloads
Kristian Waagan wrote: > Hello, > > I'm working on DERBY-1417; adding new lengthless overloads to the > streaming API. So far, I have only been looking at implementing this in > the embedded driver. Based on some comments in the code, I have a few > questions and observations regarding truncation of trailing blanks in > the various character data types. > > TypeTrail. blank trunc. Where > > CHARallowed SQLChar.normalize > VARCHAR allowed SQLVarchar.normalize > LONG VARCHAR disallowed SQLLongVarchar.normalize > CLOBallowed streaming or > SQLVarchar.normalize, depending > on source. > > As can be seen, only data for CLOB is truncated for trailing blanks in > the streaming class. We must still read all the data, or so much as we > need to know the insertion will fail, but we don't have to store it all > in memory. > > Truncation of trailing blanks is not allowed at all for LONG VARCHAR > (according to code comments and bug 5592 - haven't found the place this > is stated in the specs). > > My question is, should we do the truncate check for CHAR and VARCHAR on > the streaming level as well? > If we don't add this feature, inserting a > 10GB file into a VARCHAR field by mistake will cause 10GB to be loaded > into memory even though the max size allowed is ~32K, possibly causing > out-of-memory errors. The error could be generated at an earlier stage > (possibly after reading ~32K +1 bytes). I would say its a separate issue to the one you are addressing. Applications most likely won't be inserting 10Gb values into CHAR/VARCHAR columns using streams as it's not going to work. Maybe enter a bug, but doesn't seem it has to be fixed as part of this issue. > As far as I can tell, adding this feature is a matter of modifying the > 'ReaderToUTF8' class and the > 'EmbedPrearedStatement.setCharacterStreamInternal' method. > One could also optimize the reading of data into LONG VARCHAR, where one > would abort the reading as soon as you can instead of taking it all into > memory first. This would require some special case handling in the > mentioned locations. > > > Another matter is that streams will not be checked for exact length > match when using the lengthless overloads, as we don't have a specified > length to compare against. > I have a preliminary implementation for setAsciiStream, > setCharacterStream and setClob (all without length specifications) in > EmbedPreparedStatement. What's the issue with setClob()? The current method doesn't take a length. Has java.sql.Clob been changed to not always have a length? Dan.
Tuncation of trailing blanks and lengthless streaming overloads
Hello, I'm working on DERBY-1417; adding new lengthless overloads to the streaming API. So far, I have only been looking at implementing this in the embedded driver. Based on some comments in the code, I have a few questions and observations regarding truncation of trailing blanks in the various character data types. TypeTrail. blank trunc. Where CHARallowed SQLChar.normalize VARCHAR allowed SQLVarchar.normalize LONG VARCHAR disallowed SQLLongVarchar.normalize CLOBallowed streaming or SQLVarchar.normalize, depending on source. As can be seen, only data for CLOB is truncated for trailing blanks in the streaming class. We must still read all the data, or so much as we need to know the insertion will fail, but we don't have to store it all in memory. Truncation of trailing blanks is not allowed at all for LONG VARCHAR (according to code comments and bug 5592 - haven't found the place this is stated in the specs). My question is, should we do the truncate check for CHAR and VARCHAR on the streaming level as well? If we don't add this feature, inserting a 10GB file into a VARCHAR field by mistake will cause 10GB to be loaded into memory even though the max size allowed is ~32K, possibly causing out-of-memory errors. The error could be generated at an earlier stage (possibly after reading ~32K +1 bytes). As far as I can tell, adding this feature is a matter of modifying the 'ReaderToUTF8' class and the 'EmbedPrearedStatement.setCharacterStreamInternal' method. One could also optimize the reading of data into LONG VARCHAR, where one would abort the reading as soon as you can instead of taking it all into memory first. This would require some special case handling in the mentioned locations. Another matter is that streams will not be checked for exact length match when using the lengthless overloads, as we don't have a specified length to compare against. I have a preliminary implementation for setAsciiStream, setCharacterStream and setClob (all without length specifications) in EmbedPreparedStatement. I will continue my work by adding methods throwing not-implemented-exceptions and implement the methods where appropriate. Thoughts and feedback appreciated :) -- Kristian