Thank you Knut, for your reply. Your point #1 is correct. As for point #2 and #3 just a small correction. It is all characters falling outside the *US-ASCII* encoding that will get a length lower than 255 characters as anything other than ASCII requires more than just 1 byte to encode. I'm fairly sure that at this point we do not support ISO-8859-1 through the client driver as these characters (the extended ones like áéó etc) fall outside US-ASCII. So hopefully this won't break anything as we didn't support these characters previously.
As for your suggestion of increasing the length of the field, I'm not sure that's an option. This length limitation is imposed by the DRDA specification and the ACR unfortunately didn't change this. On the ACR it reads "As of DDM Level 7, the RDBNAM can accommodate an RDB name of up to 255 bytes in length, and its format will vary depending on the length of the RDB name". So essentially, we could easily support a much larger RDB name on Derby but the specification forbids it. You're right about the current discrepancy in lengths... so this means it should be fairly ok to have it on a different level. I think this is definitely something that should be properly documented though as it will be an odd behavior from an end-user's point of view, who might be oblivious to the byte length limitation and character encoding. Thanks, Tiago ----- Original Message ---- From: Knut Anders Hatlen <[email protected]> To: [email protected] Sent: Mon, 13 September, 2010 11:05:59 Subject: Re: Database name length Tiago Espinha <[email protected]> writes: > Is this an okay behavior? Or would it be preferable to impose a more strict > limit where we assume that all characters take 4 bytes (worst case scenario > in > UTF-8) and **always** cap the dbname length at 63 characters (255 bytes / 4 > bytes)? This would mean more work for my implementation and possibly an > exclusion from 10.7. On the other hand, if we have this variable-length limit > depending on the type of characters used, we should probably have some sort > of > release note alerting people about this fact. Hi Tiago, Let me see if I've understood this problem. Please correct me if I've got it wrong. Currently, the network protocol supports database names that take up to 255 bytes when encoded in EBCDIC. We don't allow any characters not supported by EBCDIC. Since EBCDIC supports mostly the same set of characters as ISO-8859-1, it means that we allow database names up to 255 characters from the Unicode range 0x00-0xff. With the change to UTF-8, we get the following situation: 1) Database names which only contain US-ASCII characters (Unicode range 0x00-0x7f) still have a maximum length of 255 characters. 2) Database names which only contain ISO-8859-1 characters, some of which not in the US-ASCII range, get a maximum length lower than 255 (exact limit depends on the number of non-ASCII characters), because UTF-8 encodes the non-ASCII characters in two bytes. 3) Database names which contain characters outside of the ISO-8859-1 range will be supported, but with a lower maximum length than 255 characters (exact limit depends on the characters used). (1) is not a change from previous versions, so that should be fine. Since we didn't allow any characters outside of ISO-8859-1 before, the change in (3) is an improvement, so I think it's fine too. The problematic issue is (2), since existing applications that rely on the ability to create long database names using characters from the entire ISO-8859-1 range, may now be unable to connect to the database using the client driver. This will be a functional regression, so we will need a release note that explains how to work around this issue. Does the above description sound about right? The pragmatic approach would be to increase the maximum length. I see that the writeScalarString() method that we use to write the RDBNAM token, uses two bytes for the length: // now write the length. We have the string byte length plus // 4 bytes, 2 for length and 2 for codepoint. int totalLength = stringByteLength + 4; bytes_[lengthOffset] = (byte) ((totalLength >>> 8) & 0xff); bytes_[lengthOffset + 1] = (byte) ((totalLength) & 0xff); So it seems to me we have enough length bits to allow database names up to 2^16-4 == 65532 characters. I cannot think of any problems that such a change would cause. And I believe it would have a much smaller risk of affecting existing applications than the suggestion to limit all database names to 63 characters. As to the possibility for a discrepancy between the maximum length in client mode and embedded mode, I think we already have such a discrepancy. The file system limit that prevents use of more than 255 characters in a database name in embedded mode, applies to each component of the path name. The total length of the path in the URL may exceed 255 characters if none of the directory names in the path exceed 255 characters. The 255 characters limit in the network client, on the other hand, applies to the entire path in the URL, not to each component of the path. Also, the network client will take any connection attributes (like create=true) as part of the database name, whereas the embedded driver will not. Increasing the maximum length accepted by the network client should make it less likely that someone gets bitten by this difference between the drivers. -- Knut Anders
