Storing a SHA1 checksum

2002-12-29 Thread Philip Mak
sql, table

I'm storing a SHA1 checksum as varchar(20) binary in my application.

After running a test, it seems MySQL will strip trailing spaces from a
varchar column, even if it is binary! That means if the last character
of my SHA1 checksum happens to be a space, MySQL will corrupt it.

What should I do? It seems I can:

1. Use blob instead of varchar.
   Problem: blob type is slower.

2. Make my application pad the checksum out to 20 spaces.
   Problem: Increases my code complexity a bit.

3. Wait for MySQL to fix the strip trailing spaces bug.
   Problem: That doesn't provide an immediate solution.

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Storing a SHA1 checksum

2002-12-29 Thread Benjamin Pflugmann
On Sun 2002-12-29 at 05:28:57 -0500, [EMAIL PROTECTED] wrote:
 sql, table
 
 I'm storing a SHA1 checksum as varchar(20) binary in my application.
 
 After running a test, it seems MySQL will strip trailing spaces from a
 varchar column, even if it is binary!

Yes, the BINARY keyword only influences how comparisons are done
(mainly case-sensivity, but also umlauts, etc...).

Stripping space from VARCHAR is a known deficiency:

  http://www.mysql.com/doc/en/Bugs.html

It also mentions, that the TEXT/BLOB types are save from it.

 That means if the last character of my SHA1 checksum happens to be a
 space, MySQL will corrupt it.
 
 What should I do? It seems I can:
 
 1. Use blob instead of varchar.
Problem: blob type is slower.

Is that really a problem? Did you measure it? If so, I would be
intersted in the results.

 Advantage: Other application programmers do not need to be aware
 of the hack. After MySQL is fixed, the source doesn't contain
 redundant code.

 2. Make my application pad the checksum out to 20 spaces.
Problem: Increases my code complexity a bit.
 Advantage: Doesn't affect performance (noticeably). The DBA
 doesn't need to be aware of the hack.

 3. Wait for MySQL to fix the strip trailing spaces bug.
Problem: That doesn't provide an immediate solution.

4. Append a non-space at the end, and ignore it on retrieval
   Problem: Same as 2.
   Although 2. looks like the prettier solution, 4. makes easier to
   spot the problem, if the additional handling is forgotten in new
   code.

Well, what you should do? It depends on what you need. It's a
trade-off and no one except you can answer what your priorities are.

If, for example, you have many applications / programmers who access
this stuff, 1. is least intrusive. OTOH, if it is used only in one
place, perhaps in a well-encapsulated object, 2. is the least
intrusive change. And someone (that includes yourself in 1 year)
looking at your SQL dump wouldn't know why you have chosen a BLOB,
while you can have a neat comment in the source about it.

Since any of the solutions involves only minor changes, I would not
bother to waste time on the decision. Simply go with one and rewrite
if it really turns out to become a problem later (which I don't
believe).

HTH,

Benjamin.

-- 
[EMAIL PROTECTED]

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Storing a SHA1 checksum

2002-12-29 Thread Stefan Hinz, iConnect \(Berlin\)
Benjamin, Philip,

  1. Use blob instead of varchar.
 Problem: blob type is slower.

 Is that really a problem? Did you measure it? If so, I would be
 intersted in the results.

As the manual states: Note that each BLOB or TEXT value is represented
internally by a separately allocated object. This is in contrast to all
other column types, for which storage is allocated once per column when
the table is opened. (http://www.mysql.com/doc/en/BLOB.html) Reading
between the lines, I would assume using BLOB or TEXT will affect
performance.

To cite once more: There is no trailing-space removal for BLOB and TEXT
columns when values are stored, as there is for VARCHAR columns. So, a
solution would be to use TINYTEXT which can take up to 255 bytes.

To not only cite :) I performed this little test with MySQL 4.0.3 on a
slow Win2K box with InnoDB tables:

mysql DESCRIBE text_tinytext; DESCRIBE text_varchar;
+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| col   | tinytext | YES  | | NULL|   |
+---+--+--+-+-+---+
1 row in set (0.00 sec)

+---+-+--+-+-+---+
| Field | Type| Null | Key | Default | Extra |
+---+-+--+-+-+---+
| col   | varchar(50) | YES  | | NULL|   |
+---+-+--+-+-+---+
1 row in set (0.00 sec)

mysql SELECT COUNT(*) FROM text_tinytext; SELECT COUNT(*) FROM
text_varchar;
+--+
| COUNT(*) |
+--+
|33990 |
+--+
1 row in set (0.16 sec)

+--+
| COUNT(*) |
+--+
|33990 |
+--+
1 row in set (0.20 sec)

Seems like TINYTEXT can be even faster than VARCHAR.

Regards,
--
  Stefan Hinz [EMAIL PROTECTED]
  Geschäftsführer / CEO iConnect GmbH http://iConnect.de
  Heesestr. 6, 12169 Berlin (Germany)
  Tel: +49 30 7970948-0  Fax: +49 30 7970948-3

- Original Message -
From: Benjamin Pflugmann [EMAIL PROTECTED]
To: Philip Mak [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Sunday, December 29, 2002 12:34 PM
Subject: Re: Storing a SHA1 checksum


 On Sun 2002-12-29 at 05:28:57 -0500, [EMAIL PROTECTED] wrote:
  sql, table
 
  I'm storing a SHA1 checksum as varchar(20) binary in my
application.
 
  After running a test, it seems MySQL will strip trailing spaces from
a
  varchar column, even if it is binary!

 Yes, the BINARY keyword only influences how comparisons are done
 (mainly case-sensivity, but also umlauts, etc...).

 Stripping space from VARCHAR is a known deficiency:

   http://www.mysql.com/doc/en/Bugs.html

 It also mentions, that the TEXT/BLOB types are save from it.

  That means if the last character of my SHA1 checksum happens to be a
  space, MySQL will corrupt it.
 
  What should I do? It seems I can:
 
  1. Use blob instead of varchar.
 Problem: blob type is slower.

 Is that really a problem? Did you measure it? If so, I would be
 intersted in the results.

  Advantage: Other application programmers do not need to be aware
  of the hack. After MySQL is fixed, the source doesn't contain
  redundant code.

  2. Make my application pad the checksum out to 20 spaces.
 Problem: Increases my code complexity a bit.
  Advantage: Doesn't affect performance (noticeably). The DBA
  doesn't need to be aware of the hack.

  3. Wait for MySQL to fix the strip trailing spaces bug.
 Problem: That doesn't provide an immediate solution.

 4. Append a non-space at the end, and ignore it on retrieval
Problem: Same as 2.
Although 2. looks like the prettier solution, 4. makes easier to
spot the problem, if the additional handling is forgotten in new
code.

 Well, what you should do? It depends on what you need. It's a
 trade-off and no one except you can answer what your priorities are.

 If, for example, you have many applications / programmers who access
 this stuff, 1. is least intrusive. OTOH, if it is used only in one
 place, perhaps in a well-encapsulated object, 2. is the least
 intrusive change. And someone (that includes yourself in 1 year)
 looking at your SQL dump wouldn't know why you have chosen a BLOB,
 while you can have a neat comment in the source about it.

 Since any of the solutions involves only minor changes, I would not
 bother to waste time on the decision. Simply go with one and rewrite
 if it really turns out to become a problem later (which I don't
 believe).

 HTH,

 Benjamin.

 --
 [EMAIL PROTECTED]

 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail
[EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

RE: Storing a SHA1 checksum

2002-12-29 Thread Adolfo Bello
I would try storing the SHA1() result as a hexadecimal string.

Adolfo

 -Original Message-
 From: Benjamin Pflugmann [mailto:[EMAIL PROTECTED]] 
 Sent: Sunday, December 29, 2002 7:35 AM
 To: Philip Mak
 Cc: [EMAIL PROTECTED]
 Subject: Re: Storing a SHA1 checksum
 
 
 On Sun 2002-12-29 at 05:28:57 -0500, [EMAIL PROTECTED] wrote:
  sql, table
  
  I'm storing a SHA1 checksum as varchar(20) binary in my 
 application.
  
  After running a test, it seems MySQL will strip trailing 
 spaces from a 
  varchar column, even if it is binary!
 
 Yes, the BINARY keyword only influences how comparisons are 
 done (mainly case-sensivity, but also umlauts, etc...).
 
 Stripping space from VARCHAR is a known deficiency:
 
   http://www.mysql.com/doc/en/Bugs.html
 
 It also mentions, that the TEXT/BLOB types are save from it.
 
  That means if the last character of my SHA1 checksum 
 happens to be a 
  space, MySQL will corrupt it.
  
  What should I do? It seems I can:
  
  1. Use blob instead of varchar.
 Problem: blob type is slower.
 
 Is that really a problem? Did you measure it? If so, I would 
 be intersted in the results.
 
  Advantage: Other application programmers do not need to be aware
  of the hack. After MySQL is fixed, the source doesn't contain
  redundant code.
 
  2. Make my application pad the checksum out to 20 spaces.
 Problem: Increases my code complexity a bit.
  Advantage: Doesn't affect performance (noticeably). The DBA
  doesn't need to be aware of the hack.
 
  3. Wait for MySQL to fix the strip trailing spaces bug.
 Problem: That doesn't provide an immediate solution.
 
 4. Append a non-space at the end, and ignore it on retrieval
Problem: Same as 2.
Although 2. looks like the prettier solution, 4. makes easier to
spot the problem, if the additional handling is forgotten in new
code.
 
 Well, what you should do? It depends on what you need. It's a 
 trade-off and no one except you can answer what your priorities are.
 
 If, for example, you have many applications / programmers who 
 access this stuff, 1. is least intrusive. OTOH, if it is used 
 only in one place, perhaps in a well-encapsulated object, 2. 
 is the least intrusive change. And someone (that includes 
 yourself in 1 year) looking at your SQL dump wouldn't know 
 why you have chosen a BLOB, while you can have a neat comment 
 in the source about it.
 
 Since any of the solutions involves only minor changes, I 
 would not bother to waste time on the decision. Simply go 
 with one and rewrite if it really turns out to become a 
 problem later (which I don't believe).
 
 HTH,
 
   Benjamin.
 
 -- 
 [EMAIL PROTECTED]
 
 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)
 
 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail 
 [EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
 
 
 
 


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Storing a SHA1 checksum

2002-12-29 Thread Paul DuBois
At 5:28 -0500 12/29/02, Philip Mak wrote:

sql, table

I'm storing a SHA1 checksum as varchar(20) binary in my application.


Other people have addressed other aspects of your message, but I'm
curious why you're using a VARCHAR(20), when SHA1() returns a 40-byte
string.



After running a test, it seems MySQL will strip trailing spaces from a
varchar column, even if it is binary! That means if the last character
of my SHA1 checksum happens to be a space, MySQL will corrupt it.


SHA1() returns a string of 40 hexadecimal digits.  There won't be trailing
spaces.

Are you converting the hex string to some other representation before
storing it?



What should I do? It seems I can:

1. Use blob instead of varchar.
   Problem: blob type is slower.

2. Make my application pad the checksum out to 20 spaces.
   Problem: Increases my code complexity a bit.

3. Wait for MySQL to fix the strip trailing spaces bug.
   Problem: That doesn't provide an immediate solution.



-
Before posting, please check:
  http://www.mysql.com/manual.php   (the manual)
  http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Storing a SHA1 checksum

2002-12-29 Thread Philip Mak
On Sun, Dec 29, 2002 at 11:09:47PM -0600, Paul DuBois wrote:
 At 5:28 -0500 12/29/02, Philip Mak wrote:
 sql, table
 
 I'm storing a SHA1 checksum as varchar(20) binary in my application.
 
 Other people have addressed other aspects of your message, but I'm
 curious why you're using a VARCHAR(20), when SHA1() returns a 40-byte
 string.
 
 After running a test, it seems MySQL will strip trailing spaces from a
 varchar column, even if it is binary! That means if the last character
 of my SHA1 checksum happens to be a space, MySQL will corrupt it.
 
 SHA1() returns a string of 40 hexadecimal digits.  There won't be trailing
 spaces.
 
 Are you converting the hex string to some other representation before
 storing it?

Yes, I'm converting it to a binary representation first, so that it
only takes 20 bytes instead of 40 bytes.

I ended up using a TINYBLOB to store my checksum, since performance
shouldn't be overly critical in my application.

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Storing a SHA1 checksum

2002-12-29 Thread Paul DuBois
At 0:14 -0500 12/30/02, Philip Mak wrote:

On Sun, Dec 29, 2002 at 11:09:47PM -0600, Paul DuBois wrote:

 At 5:28 -0500 12/29/02, Philip Mak wrote:
 sql, table
 
 I'm storing a SHA1 checksum as varchar(20) binary in my application.

 Other people have addressed other aspects of your message, but I'm
 curious why you're using a VARCHAR(20), when SHA1() returns a 40-byte
 string.
 
 After running a test, it seems MySQL will strip trailing spaces from a
 varchar column, even if it is binary! That means if the last character
 of my SHA1 checksum happens to be a space, MySQL will corrupt it.

 SHA1() returns a string of 40 hexadecimal digits.  There won't be trailing
 spaces.

 Are you converting the hex string to some other representation before
 storing it?


Yes, I'm converting it to a binary representation first, so that it
only takes 20 bytes instead of 40 bytes.


Okay.  I see then why you might end up with trailing 'spaces'.



I ended up using a TINYBLOB to store my checksum, since performance
shouldn't be overly critical in my application.


Sounds like a good choice.

-
Before posting, please check:
  http://www.mysql.com/manual.php   (the manual)
  http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php