Storing a SHA1 checksum
sql, table I'm storing a SHA1 checksum as varchar(20) binary in my application. After running a test, it seems MySQL will strip trailing spaces from a varchar column, even if it is binary! That means if the last character of my SHA1 checksum happens to be a space, MySQL will corrupt it. What should I do? It seems I can: 1. Use blob instead of varchar. Problem: blob type is slower. 2. Make my application pad the checksum out to 20 spaces. Problem: Increases my code complexity a bit. 3. Wait for MySQL to fix the strip trailing spaces bug. Problem: That doesn't provide an immediate solution. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Storing a SHA1 checksum
On Sun 2002-12-29 at 05:28:57 -0500, [EMAIL PROTECTED] wrote: sql, table I'm storing a SHA1 checksum as varchar(20) binary in my application. After running a test, it seems MySQL will strip trailing spaces from a varchar column, even if it is binary! Yes, the BINARY keyword only influences how comparisons are done (mainly case-sensivity, but also umlauts, etc...). Stripping space from VARCHAR is a known deficiency: http://www.mysql.com/doc/en/Bugs.html It also mentions, that the TEXT/BLOB types are save from it. That means if the last character of my SHA1 checksum happens to be a space, MySQL will corrupt it. What should I do? It seems I can: 1. Use blob instead of varchar. Problem: blob type is slower. Is that really a problem? Did you measure it? If so, I would be intersted in the results. Advantage: Other application programmers do not need to be aware of the hack. After MySQL is fixed, the source doesn't contain redundant code. 2. Make my application pad the checksum out to 20 spaces. Problem: Increases my code complexity a bit. Advantage: Doesn't affect performance (noticeably). The DBA doesn't need to be aware of the hack. 3. Wait for MySQL to fix the strip trailing spaces bug. Problem: That doesn't provide an immediate solution. 4. Append a non-space at the end, and ignore it on retrieval Problem: Same as 2. Although 2. looks like the prettier solution, 4. makes easier to spot the problem, if the additional handling is forgotten in new code. Well, what you should do? It depends on what you need. It's a trade-off and no one except you can answer what your priorities are. If, for example, you have many applications / programmers who access this stuff, 1. is least intrusive. OTOH, if it is used only in one place, perhaps in a well-encapsulated object, 2. is the least intrusive change. And someone (that includes yourself in 1 year) looking at your SQL dump wouldn't know why you have chosen a BLOB, while you can have a neat comment in the source about it. Since any of the solutions involves only minor changes, I would not bother to waste time on the decision. Simply go with one and rewrite if it really turns out to become a problem later (which I don't believe). HTH, Benjamin. -- [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Storing a SHA1 checksum
Benjamin, Philip, 1. Use blob instead of varchar. Problem: blob type is slower. Is that really a problem? Did you measure it? If so, I would be intersted in the results. As the manual states: Note that each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other column types, for which storage is allocated once per column when the table is opened. (http://www.mysql.com/doc/en/BLOB.html) Reading between the lines, I would assume using BLOB or TEXT will affect performance. To cite once more: There is no trailing-space removal for BLOB and TEXT columns when values are stored, as there is for VARCHAR columns. So, a solution would be to use TINYTEXT which can take up to 255 bytes. To not only cite :) I performed this little test with MySQL 4.0.3 on a slow Win2K box with InnoDB tables: mysql DESCRIBE text_tinytext; DESCRIBE text_varchar; +---+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-+---+ | col | tinytext | YES | | NULL| | +---+--+--+-+-+---+ 1 row in set (0.00 sec) +---+-+--+-+-+---+ | Field | Type| Null | Key | Default | Extra | +---+-+--+-+-+---+ | col | varchar(50) | YES | | NULL| | +---+-+--+-+-+---+ 1 row in set (0.00 sec) mysql SELECT COUNT(*) FROM text_tinytext; SELECT COUNT(*) FROM text_varchar; +--+ | COUNT(*) | +--+ |33990 | +--+ 1 row in set (0.16 sec) +--+ | COUNT(*) | +--+ |33990 | +--+ 1 row in set (0.20 sec) Seems like TINYTEXT can be even faster than VARCHAR. Regards, -- Stefan Hinz [EMAIL PROTECTED] Geschäftsführer / CEO iConnect GmbH http://iConnect.de Heesestr. 6, 12169 Berlin (Germany) Tel: +49 30 7970948-0 Fax: +49 30 7970948-3 - Original Message - From: Benjamin Pflugmann [EMAIL PROTECTED] To: Philip Mak [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Sunday, December 29, 2002 12:34 PM Subject: Re: Storing a SHA1 checksum On Sun 2002-12-29 at 05:28:57 -0500, [EMAIL PROTECTED] wrote: sql, table I'm storing a SHA1 checksum as varchar(20) binary in my application. After running a test, it seems MySQL will strip trailing spaces from a varchar column, even if it is binary! Yes, the BINARY keyword only influences how comparisons are done (mainly case-sensivity, but also umlauts, etc...). Stripping space from VARCHAR is a known deficiency: http://www.mysql.com/doc/en/Bugs.html It also mentions, that the TEXT/BLOB types are save from it. That means if the last character of my SHA1 checksum happens to be a space, MySQL will corrupt it. What should I do? It seems I can: 1. Use blob instead of varchar. Problem: blob type is slower. Is that really a problem? Did you measure it? If so, I would be intersted in the results. Advantage: Other application programmers do not need to be aware of the hack. After MySQL is fixed, the source doesn't contain redundant code. 2. Make my application pad the checksum out to 20 spaces. Problem: Increases my code complexity a bit. Advantage: Doesn't affect performance (noticeably). The DBA doesn't need to be aware of the hack. 3. Wait for MySQL to fix the strip trailing spaces bug. Problem: That doesn't provide an immediate solution. 4. Append a non-space at the end, and ignore it on retrieval Problem: Same as 2. Although 2. looks like the prettier solution, 4. makes easier to spot the problem, if the additional handling is forgotten in new code. Well, what you should do? It depends on what you need. It's a trade-off and no one except you can answer what your priorities are. If, for example, you have many applications / programmers who access this stuff, 1. is least intrusive. OTOH, if it is used only in one place, perhaps in a well-encapsulated object, 2. is the least intrusive change. And someone (that includes yourself in 1 year) looking at your SQL dump wouldn't know why you have chosen a BLOB, while you can have a neat comment in the source about it. Since any of the solutions involves only minor changes, I would not bother to waste time on the decision. Simply go with one and rewrite if it really turns out to become a problem later (which I don't believe). HTH, Benjamin. -- [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Storing a SHA1 checksum
I would try storing the SHA1() result as a hexadecimal string. Adolfo -Original Message- From: Benjamin Pflugmann [mailto:[EMAIL PROTECTED]] Sent: Sunday, December 29, 2002 7:35 AM To: Philip Mak Cc: [EMAIL PROTECTED] Subject: Re: Storing a SHA1 checksum On Sun 2002-12-29 at 05:28:57 -0500, [EMAIL PROTECTED] wrote: sql, table I'm storing a SHA1 checksum as varchar(20) binary in my application. After running a test, it seems MySQL will strip trailing spaces from a varchar column, even if it is binary! Yes, the BINARY keyword only influences how comparisons are done (mainly case-sensivity, but also umlauts, etc...). Stripping space from VARCHAR is a known deficiency: http://www.mysql.com/doc/en/Bugs.html It also mentions, that the TEXT/BLOB types are save from it. That means if the last character of my SHA1 checksum happens to be a space, MySQL will corrupt it. What should I do? It seems I can: 1. Use blob instead of varchar. Problem: blob type is slower. Is that really a problem? Did you measure it? If so, I would be intersted in the results. Advantage: Other application programmers do not need to be aware of the hack. After MySQL is fixed, the source doesn't contain redundant code. 2. Make my application pad the checksum out to 20 spaces. Problem: Increases my code complexity a bit. Advantage: Doesn't affect performance (noticeably). The DBA doesn't need to be aware of the hack. 3. Wait for MySQL to fix the strip trailing spaces bug. Problem: That doesn't provide an immediate solution. 4. Append a non-space at the end, and ignore it on retrieval Problem: Same as 2. Although 2. looks like the prettier solution, 4. makes easier to spot the problem, if the additional handling is forgotten in new code. Well, what you should do? It depends on what you need. It's a trade-off and no one except you can answer what your priorities are. If, for example, you have many applications / programmers who access this stuff, 1. is least intrusive. OTOH, if it is used only in one place, perhaps in a well-encapsulated object, 2. is the least intrusive change. And someone (that includes yourself in 1 year) looking at your SQL dump wouldn't know why you have chosen a BLOB, while you can have a neat comment in the source about it. Since any of the solutions involves only minor changes, I would not bother to waste time on the decision. Simply go with one and rewrite if it really turns out to become a problem later (which I don't believe). HTH, Benjamin. -- [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Storing a SHA1 checksum
At 5:28 -0500 12/29/02, Philip Mak wrote: sql, table I'm storing a SHA1 checksum as varchar(20) binary in my application. Other people have addressed other aspects of your message, but I'm curious why you're using a VARCHAR(20), when SHA1() returns a 40-byte string. After running a test, it seems MySQL will strip trailing spaces from a varchar column, even if it is binary! That means if the last character of my SHA1 checksum happens to be a space, MySQL will corrupt it. SHA1() returns a string of 40 hexadecimal digits. There won't be trailing spaces. Are you converting the hex string to some other representation before storing it? What should I do? It seems I can: 1. Use blob instead of varchar. Problem: blob type is slower. 2. Make my application pad the checksum out to 20 spaces. Problem: Increases my code complexity a bit. 3. Wait for MySQL to fix the strip trailing spaces bug. Problem: That doesn't provide an immediate solution. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Storing a SHA1 checksum
On Sun, Dec 29, 2002 at 11:09:47PM -0600, Paul DuBois wrote: At 5:28 -0500 12/29/02, Philip Mak wrote: sql, table I'm storing a SHA1 checksum as varchar(20) binary in my application. Other people have addressed other aspects of your message, but I'm curious why you're using a VARCHAR(20), when SHA1() returns a 40-byte string. After running a test, it seems MySQL will strip trailing spaces from a varchar column, even if it is binary! That means if the last character of my SHA1 checksum happens to be a space, MySQL will corrupt it. SHA1() returns a string of 40 hexadecimal digits. There won't be trailing spaces. Are you converting the hex string to some other representation before storing it? Yes, I'm converting it to a binary representation first, so that it only takes 20 bytes instead of 40 bytes. I ended up using a TINYBLOB to store my checksum, since performance shouldn't be overly critical in my application. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Storing a SHA1 checksum
At 0:14 -0500 12/30/02, Philip Mak wrote: On Sun, Dec 29, 2002 at 11:09:47PM -0600, Paul DuBois wrote: At 5:28 -0500 12/29/02, Philip Mak wrote: sql, table I'm storing a SHA1 checksum as varchar(20) binary in my application. Other people have addressed other aspects of your message, but I'm curious why you're using a VARCHAR(20), when SHA1() returns a 40-byte string. After running a test, it seems MySQL will strip trailing spaces from a varchar column, even if it is binary! That means if the last character of my SHA1 checksum happens to be a space, MySQL will corrupt it. SHA1() returns a string of 40 hexadecimal digits. There won't be trailing spaces. Are you converting the hex string to some other representation before storing it? Yes, I'm converting it to a binary representation first, so that it only takes 20 bytes instead of 40 bytes. Okay. I see then why you might end up with trailing 'spaces'. I ended up using a TINYBLOB to store my checksum, since performance shouldn't be overly critical in my application. Sounds like a good choice. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php