Re: blob data types
Am 26.05.2014 11:40, schrieb geetanjali mehra: I want to know where does MyISAM and innodb stores its BLOB data ; inside the table or outside the table. I tried to understand BLOB using MySQL online docs but failed. inside the table, it's just a field type signature.asc Description: OpenPGP digital signature
Re: blob data types
Is it possible to move blob data type values out of table and keep it in separate page, keeping BLOB the part of the table. Geetanjali Mehra Oracle and MySQL DBA Corporate Trainer On Mon, May 26, 2014 at 3:26 PM, Reindl Harald h.rei...@thelounge.netwrote: Am 26.05.2014 11:40, schrieb geetanjali mehra: I want to know where does MyISAM and innodb stores its BLOB data ; inside the table or outside the table. I tried to understand BLOB using MySQL online docs but failed. inside the table, it's just a field type
Re: blob data types
Hello, I can see MyISAM stores BLOB column as same space as other data type column, but InnoDB doesn't. (If you mean same .ibd file it's true) http://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-overview.html ROW_FORMAT= Compact holds first 768 bytes of BLOB column, ROW_FORMAT= Dynamic or Compressed holds first 20bytes of column. http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-storage-blobs.html 2014-05-26 18:56 GMT+09:00 Reindl Harald h.rei...@thelounge.net: Am 26.05.2014 11:40, schrieb geetanjali mehra: I want to know where does MyISAM and innodb stores its BLOB data ; inside the table or outside the table. I tried to understand BLOB using MySQL online docs but failed. inside the table, it's just a field type
Re: blob data types
just don't store large binary data in tables save the files somewhere in the application and keep only references to the files in the database Am 26.05.2014 12:11, schrieb geetanjali mehra: Is it possible to move blob data type values out of table and keep it in separate page, keeping BLOB the part of the table. On Mon, May 26, 2014 at 3:26 PM, Reindl Harald h.rei...@thelounge.netwrote: Am 26.05.2014 11:40, schrieb geetanjali mehra: I want to know where does MyISAM and innodb stores its BLOB data ; inside the table or outside the table. I tried to understand BLOB using MySQL online docs but failed. inside the table, it's just a field type signature.asc Description: OpenPGP digital signature
Re: blob data types
- Original Message - From: Reindl Harald h.rei...@thelounge.net To: mysql@lists.mysql.com Sent: Monday, 26 May, 2014 11:56:26 AM Subject: Re: blob data types Am 26.05.2014 11:40, schrieb geetanjali mehra: I want to know where does MyISAM and innodb stores its BLOB data ; inside the table or outside the table. I tried to understand BLOB using MySQL online docs but failed. inside the table, it's just a field type Didn't InnoDB store them out of line (but still inside the datafile, of course) ? I might be remembering wrong, though. You could design your application to store blobs in files instead; but you preferably also need to use redundant/shared storage, then. It's a common enough design, but it needs though and understanding :-) As a tangential thought, there was also a thirdparty plugin engine at some point that streamed blobs outside of the normal MySQL protocol. Can't remember the name. -- Unhappiness is discouraged and will be corrected with kitten pictures. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: BLOB data gets encoded as utf8!
Thank you for your reply, Janusz. I appreciate your help. I have tried making that call before the INSERT statement, but to no avail. The table collation is set to utf8 - default collation, and all columns are set to Table default. I am thinking that this problem might be due to me sending the BLOB data in a unicode string to the server, i.e: INSERT INTO TableName (Rawfield) VALUES ('%s'); - whereas the %s string is a unicode string That string is encoded by escaping mysql characters according to http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html (0x00, 0x22, 0x27, 0x5C). Then each byte (and each escaped byte) is followed by a 0x00 before being sent to the server, just so that I can send it as a unicode string via ODBC. I.e. the binary data FA 5C 93 A4 would be expanded into FA 00 5C 00 5C 00 93 00 A4 00 Otherwise I can't send the data in a unicode INSERT statement via ODBC from C++ / MFC. Do you think that could be the issue? If so, do you have a suggestion on how to do it better? Thank you again for your help! Sincerely, Andreas Iwanowski -Original Message- From: Janusz Paśkiewicz [mailto:ad...@multipasko.pl] Sent: Wednesday, February 09, 2011 7:45 AM To: Andreas Iwanowski Subject: BLOB data gets encoded as utf8! (Anyone?) Before inserting BLOB data use: mysql_query(SET CHARACTER SET latin1); after that you can set it back to: mysql_query(SET CHARACTER SET utf8); This is due to: http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html If the column has a binary data type (BINARY http://dev.mysql.com/doc/refman/5.1/en/binary-varbinary.html , VARBINARY http://dev.mysql.com/doc/refman/5.1/en/binary-varbinary.html , BLOB http://dev.mysql.com/doc/refman/5.1/en/blob.html ), all the values that it contains must be encoded using a single character set (the character set you're converting the column to). If you use a binary column to store information in multiple character sets, MySQL has no way to know which values use which character set and cannot convert the data properly. Kind Regards, Janusz Paskiewicz www.freelancer4u.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: BLOB data gets encoded as utf8!
I can't help but wonder, if you send a string, does that mean you're putting text in a blob ? Blobs are binary, and thus don't get encoded in the sense of UTF8 vs Unicode. For a string, you may want a TEXT type column. On the other hand, if you're indeed trying to insert binary data, it is not the best of ideas to insert it into an SQL string, where it will be subject to UTF8 or whatever interpretation. For large objects, and generally with repeating statements too, it's best to use bind variables to send your data. That also removes the need to worry about escaping, sql insertion and the like. 2011/2/10 Andreas Iwanowski namez...@afim.info Thank you for your reply, Janusz. I appreciate your help. I have tried making that call before the INSERT statement, but to no avail. The table collation is set to utf8 - default collation, and all columns are set to Table default. I am thinking that this problem might be due to me sending the BLOB data in a unicode string to the server, i.e: INSERT INTO TableName (Rawfield) VALUES ('%s'); - whereas the %s string is a unicode string That string is encoded by escaping mysql characters according to http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html (0x00, 0x22, 0x27, 0x5C). Then each byte (and each escaped byte) is followed by a 0x00 before being sent to the server, just so that I can send it as a unicode string via ODBC. I.e. the binary data FA 5C 93 A4 would be expanded into FA 00 5C 00 5C 00 93 00 A4 00 Otherwise I can't send the data in a unicode INSERT statement via ODBC from C++ / MFC. Do you think that could be the issue? If so, do you have a suggestion on how to do it better? Thank you again for your help! Sincerely, Andreas Iwanowski -Original Message- From: Janusz Paśkiewicz [mailto:ad...@multipasko.pl] Sent: Wednesday, February 09, 2011 7:45 AM To: Andreas Iwanowski Subject: BLOB data gets encoded as utf8! (Anyone?) Before inserting BLOB data use: mysql_query(SET CHARACTER SET latin1); after that you can set it back to: mysql_query(SET CHARACTER SET utf8); This is due to: http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html If the column has a binary data type (BINARY http://dev.mysql.com/doc/refman/5.1/en/binary-varbinary.html , VARBINARY http://dev.mysql.com/doc/refman/5.1/en/binary-varbinary.html , BLOB http://dev.mysql.com/doc/refman/5.1/en/blob.html ), all the values that it contains must be encoded using a single character set (the character set you're converting the column to). If you use a binary column to store information in multiple character sets, MySQL has no way to know which values use which character set and cannot convert the data properly. Kind Regards, Janusz Paskiewicz www.freelancer4u.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=vegiv...@tuxera.be -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel
Re: BLOB data gets encoded as utf8! (Anyone?)
I think that probably the mistake is in the way you are inserting the blob. If i understand right, you are inserting the blob with a traditional insert statement ??, in my experience i always use prepared statements to insert blobs, in that way the text goes exactly as the source without any utf-8 conversion. Besides this, using the insert can truncate your blob / statement, because some drivers/connectors limit the size of the sql statement, so if you are uploading a huge file the statement will not work (I dont know if this happens with the mysql driver or if there is a limit), so i strongly recommend using prepared statements for blobs with any database. Hope this helps. Carlos Proal On 8/28/2010 1:04 AM, Andreas Iwanowski wrote: Has no one encountered this before? There has got to be a solution, and I still haven't found it... So if you have any input please let me know! -Original Message- From: Andreas Iwanowski [mailto:namez...@afim.info] Sent: Tuesday, August 24, 2010 2:48 PM To: mysql@lists.mysql.com Subject: BLOB data gets encoded as utf8! Hello everyone! I am using an MFC unicode project that uses ODBC to access a MySQL 5.1.50 database via the MySQL ODBC 5.1.6 driver. character_set_connection is set to utf8 (Which I believe is the default for the driver) One of the tables contains two LONGBLOB columns, and the table default charset is utf-8 (since the application is unicode). However, when inserting into the LONGBLOB columns via an INSERT statement, the data gets corrupted/modified because is incorrectly UTF-8 encoded. My insert statement (simplified) does this: INSERT INTO _table_ (Desc, Data) VALUES ('...', '_blobdata_'); I have also tried the _binary introducer (http://dev.mysql.com/doc/refman/5.1/en/string-syntax.html) INSERT INTO _table_ (Desc, Data) VALUES ('...', _binary'_blobdata_'); I'm escaping 0x00, 0x22, 0x27, and 0x5C, which are the only four that MySQL requires to be escaped per documentation for BLOB fields, and according to the MySQL docs columns of any BLOB type use no character set. Here is the problem: I found that, for example, 0xFF gets modified to 0xC3BF, which is in fact the UTF8 encoding for ASCII/Binary 0xFF. Needless to say the data becomes useless. I did, however, determine that the data is already corrupted in the myodbc.log file that the driver outputs if the option is set, so there is likely a problem with the driver settings or with the statement itself. I cannot be the first one to encounter this issue, so maybe if you have an idea (or a solution would be even better! :) then please let me know. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=namez...@afim.info -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: BLOB data gets encoded as utf8! (Anyone?)
Is the code you use to get the data out in the same charset as the code you use to put the data in ? Both should ideally also match your database setting. Have you tried explicitly setting the connection to UTF8 ? Just swinging in the dark, here, really. On Sat, Aug 28, 2010 at 8:04 AM, Andreas Iwanowski namez...@afim.infowrote: Has no one encountered this before? There has got to be a solution, and I still haven't found it... So if you have any input please let me know! -Original Message- From: Andreas Iwanowski [mailto:namez...@afim.info] Sent: Tuesday, August 24, 2010 2:48 PM To: mysql@lists.mysql.com Subject: BLOB data gets encoded as utf8! Hello everyone! I am using an MFC unicode project that uses ODBC to access a MySQL 5.1.50 database via the MySQL ODBC 5.1.6 driver. character_set_connection is set to utf8 (Which I believe is the default for the driver) One of the tables contains two LONGBLOB columns, and the table default charset is utf-8 (since the application is unicode). However, when inserting into the LONGBLOB columns via an INSERT statement, the data gets corrupted/modified because is incorrectly UTF-8 encoded. My insert statement (simplified) does this: INSERT INTO _table_ (Desc, Data) VALUES ('...', '_blobdata_'); I have also tried the _binary introducer (http://dev.mysql.com/doc/refman/5.1/en/string-syntax.html) INSERT INTO _table_ (Desc, Data) VALUES ('...', _binary'_blobdata_'); I'm escaping 0x00, 0x22, 0x27, and 0x5C, which are the only four that MySQL requires to be escaped per documentation for BLOB fields, and according to the MySQL docs columns of any BLOB type use no character set. Here is the problem: I found that, for example, 0xFF gets modified to 0xC3BF, which is in fact the UTF8 encoding for ASCII/Binary 0xFF. Needless to say the data becomes useless. I did, however, determine that the data is already corrupted in the myodbc.log file that the driver outputs if the option is set, so there is likely a problem with the driver settings or with the statement itself. I cannot be the first one to encounter this issue, so maybe if you have an idea (or a solution would be even better! :) then please let me know. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=namez...@afim.info -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=vegiv...@tuxera.be -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel
RE: Blob data
I gave up on putting large blobs in Mysql -- too many limits around 16MB. Instead I broke blobs into pieces, inserting them with a sequence number. Added benefit: Does not clog up replication while huge single-insert is being copied over network and reexecuted on slaves. -Original Message- From: Paul McCullagh [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 27, 2007 2:57 AM To: Ann W. Harrison Cc: MySQL List; MySQL Internal Subject: Re: Blob data Hi Ann, Currently, the thoughts on how to make the BLOB references secure go like this: The BLOB reference consists of 2 components: The first component is basically an index used to find the BLOB on the server. The second component is a random number generated when the BLOB is created. The random number acts as an authorization code, and is checked when the BLOB is requested. So if the authorization code supplied in the BLOB reference does not match the code stored by the server for that BLOB, then the BLOB is not returned. If the authorization code is a 4-byte number, then the chances of getting the correct code for any particular BLOB is 1 in 4 billion. This makes it practically impossible to discover a BLOB by generating BLOB references and requesting them from the server. However, it does mean that once you have a valid BLOB reference it remains valid until the BLOB is deleted. So you can pass it around to your friends, or post it on the internet if you like. In order to prevent this (it will depend on the site, as to whether this is required), it would be possible to add a dynamic component to the BLOB reference which has a certain lifetime (for example, it expires after a certain amount of time, or when a database session is closed). Such a component would have to be added to the BLOB reference URL by the storage engine on the fly. So, as the SELECT result is being generated, the dynamic component is added to the BLOB references returned in the rowset. Security of the BLOB streaming stuff is one of the major issues, so further comments, questions and ideas are welcome! Best regards, Paul On Jun 26, 2007, at 4:36 PM, Ann W. Harrison wrote: Paul McCullagh wrote: It will also be possible to store the BLOBs out-of-row. In this case, only a BLOB reference is stored in the row. The reference is basically a URL which can be used to retrieve the data. So when you do an SQL SELECT which includes a BLOB column, the resulting rowset does not contain the data, just the BLOB reference (URL). How does this work with access privileges? Can you just send random numbers in the URL until you start seeing blob data? Best regards, Ann -- MySQL Internals Mailing List For list archives: http://lists.mysql.com/internals To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Blob data
Rick James wrote: Instead I broke blobs into pieces, inserting them with a sequence number. Understanding the underlying problem, that still seems like an unnatural way to store pictures and documents. Added benefit: Does not clog up replication while huge single-insert is being copied over network and reexecuted on slaves. The design of blobs that Jim did at DEC included the ability to send them across the network in chunks of a client specified size. In 1982 it was quite common to have blobs that were larger than physical memory. What he did more recently was add a blob repository separate from the active tablespace that allowed the backup function to skip unchanged blobs while backing up active data. It also allows replicants to share a single copy of blobs, if appropriate. There are lots of ways of making large blobs work better in relational databases. Regards, Ann -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Blob data
Rick is dead on correct, I call I chunking blob data.. There is an article here on a simple implementation: http://www.dreamwerx.net/phpforum/?id=1 I've had hundreds of thousands of files in this type of storage before with no issues. On Tue, 3 Jul 2007, Rick James wrote: I gave up on putting large blobs in Mysql -- too many limits around 16MB. Instead I broke blobs into pieces, inserting them with a sequence number. Added benefit: Does not clog up replication while huge single-insert is being copied over network and reexecuted on slaves. -Original Message- From: Paul McCullagh [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 27, 2007 2:57 AM To: Ann W. Harrison Cc: MySQL List; MySQL Internal Subject: Re: Blob data Hi Ann, Currently, the thoughts on how to make the BLOB references secure go like this: The BLOB reference consists of 2 components: The first component is basically an index used to find the BLOB on the server. The second component is a random number generated when the BLOB is created. The random number acts as an authorization code, and is checked when the BLOB is requested. So if the authorization code supplied in the BLOB reference does not match the code stored by the server for that BLOB, then the BLOB is not returned. If the authorization code is a 4-byte number, then the chances of getting the correct code for any particular BLOB is 1 in 4 billion. This makes it practically impossible to discover a BLOB by generating BLOB references and requesting them from the server. However, it does mean that once you have a valid BLOB reference it remains valid until the BLOB is deleted. So you can pass it around to your friends, or post it on the internet if you like. In order to prevent this (it will depend on the site, as to whether this is required), it would be possible to add a dynamic component to the BLOB reference which has a certain lifetime (for example, it expires after a certain amount of time, or when a database session is closed). Such a component would have to be added to the BLOB reference URL by the storage engine on the fly. So, as the SELECT result is being generated, the dynamic component is added to the BLOB references returned in the rowset. Security of the BLOB streaming stuff is one of the major issues, so further comments, questions and ideas are welcome! Best regards, Paul On Jun 26, 2007, at 4:36 PM, Ann W. Harrison wrote: Paul McCullagh wrote: It will also be possible to store the BLOBs out-of-row. In this case, only a BLOB reference is stored in the row. The reference is basically a URL which can be used to retrieve the data. So when you do an SQL SELECT which includes a BLOB column, the resulting rowset does not contain the data, just the BLOB reference (URL). How does this work with access privileges? Can you just send random numbers in the URL until you start seeing blob data? Best regards, Ann -- MySQL Internals Mailing List For list archives: http://lists.mysql.com/internals To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Blob data
And while you are at it, you may as well compress the chunks. You machine probably can compress/uncompress faster than it can write/read disk. I use Perl's Zlib::Compress or PHP's equivalent instead of Mysql's function for 2 reasons: * The network traffic is compressed. * Mysql puts an unnecessary extra byte on end of the string (ok, this is totally insignificant) And definitely compress each chunk separately. It seems that those library routines slow down (excessive memory realloc??) after about 50K. That is, you can probably compress 20 50K chunks faster than 1 1M chunk. My implementation did File - Database -- the huge blob was never instantiated completely in RAM, only one chunk at a time. (Imagine trying to store a 50GB file.) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 03, 2007 2:08 PM To: Rick James Cc: 'Paul McCullagh'; 'Ann W. Harrison'; 'MySQL List' Subject: RE: Blob data Rick is dead on correct, I call I chunking blob data.. There is an article here on a simple implementation: http://www.dreamwerx.net/phpforum/?id=1 I've had hundreds of thousands of files in this type of storage before with no issues. On Tue, 3 Jul 2007, Rick James wrote: I gave up on putting large blobs in Mysql -- too many limits around 16MB. Instead I broke blobs into pieces, inserting them with a sequence number. Added benefit: Does not clog up replication while huge single-insert is being copied over network and reexecuted on slaves. -Original Message- From: Paul McCullagh [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 27, 2007 2:57 AM To: Ann W. Harrison Cc: MySQL List; MySQL Internal Subject: Re: Blob data Hi Ann, Currently, the thoughts on how to make the BLOB references secure go like this: The BLOB reference consists of 2 components: The first component is basically an index used to find the BLOB on the server. The second component is a random number generated when the BLOB is created. The random number acts as an authorization code, and is checked when the BLOB is requested. So if the authorization code supplied in the BLOB reference does not match the code stored by the server for that BLOB, then the BLOB is not returned. If the authorization code is a 4-byte number, then the chances of getting the correct code for any particular BLOB is 1 in 4 billion. This makes it practically impossible to discover a BLOB by generating BLOB references and requesting them from the server. However, it does mean that once you have a valid BLOB reference it remains valid until the BLOB is deleted. So you can pass it around to your friends, or post it on the internet if you like. In order to prevent this (it will depend on the site, as to whether this is required), it would be possible to add a dynamic component to the BLOB reference which has a certain lifetime (for example, it expires after a certain amount of time, or when a database session is closed). Such a component would have to be added to the BLOB reference URL by the storage engine on the fly. So, as the SELECT result is being generated, the dynamic component is added to the BLOB references returned in the rowset. Security of the BLOB streaming stuff is one of the major issues, so further comments, questions and ideas are welcome! Best regards, Paul On Jun 26, 2007, at 4:36 PM, Ann W. Harrison wrote: Paul McCullagh wrote: It will also be possible to store the BLOBs out-of-row. In this case, only a BLOB reference is stored in the row. The reference is basically a URL which can be used to retrieve the data. So when you do an SQL SELECT which includes a BLOB column, the resulting rowset does not contain the data, just the BLOB reference (URL). How does this work with access privileges? Can you just send random numbers in the URL until you start seeing blob data? Best regards, Ann -- MySQL Internals Mailing List For list archives: http://lists.mysql.com/internals To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Blob data
Interesting, never tried compressing the data, sounds like that might be a nice addon.. Do you have any performance numbers you can share? I posted some performance numbers on one of my implementations some time ago. I found the thread here: http://lists.mysql.com/mysql/206337 On Tue, 3 Jul 2007, Rick James wrote: And while you are at it, you may as well compress the chunks. You machine probably can compress/uncompress faster than it can write/read disk. I use Perl's Zlib::Compress or PHP's equivalent instead of Mysql's function for 2 reasons: * The network traffic is compressed. * Mysql puts an unnecessary extra byte on end of the string (ok, this is totally insignificant) And definitely compress each chunk separately. It seems that those library routines slow down (excessive memory realloc??) after about 50K. That is, you can probably compress 20 50K chunks faster than 1 1M chunk. My implementation did File - Database -- the huge blob was never instantiated completely in RAM, only one chunk at a time. (Imagine trying to store a 50GB file.) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 03, 2007 2:08 PM To: Rick James Cc: 'Paul McCullagh'; 'Ann W. Harrison'; 'MySQL List' Subject: RE: Blob data Rick is dead on correct, I call I chunking blob data.. There is an article here on a simple implementation: http://www.dreamwerx.net/phpforum/?id=1 I've had hundreds of thousands of files in this type of storage before with no issues. On Tue, 3 Jul 2007, Rick James wrote: I gave up on putting large blobs in Mysql -- too many limits around 16MB. Instead I broke blobs into pieces, inserting them with a sequence number. Added benefit: Does not clog up replication while huge single-insert is being copied over network and reexecuted on slaves. -Original Message- From: Paul McCullagh [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 27, 2007 2:57 AM To: Ann W. Harrison Cc: MySQL List; MySQL Internal Subject: Re: Blob data Hi Ann, Currently, the thoughts on how to make the BLOB references secure go like this: The BLOB reference consists of 2 components: The first component is basically an index used to find the BLOB on the server. The second component is a random number generated when the BLOB is created. The random number acts as an authorization code, and is checked when the BLOB is requested. So if the authorization code supplied in the BLOB reference does not match the code stored by the server for that BLOB, then the BLOB is not returned. If the authorization code is a 4-byte number, then the chances of getting the correct code for any particular BLOB is 1 in 4 billion. This makes it practically impossible to discover a BLOB by generating BLOB references and requesting them from the server. However, it does mean that once you have a valid BLOB reference it remains valid until the BLOB is deleted. So you can pass it around to your friends, or post it on the internet if you like. In order to prevent this (it will depend on the site, as to whether this is required), it would be possible to add a dynamic component to the BLOB reference which has a certain lifetime (for example, it expires after a certain amount of time, or when a database session is closed). Such a component would have to be added to the BLOB reference URL by the storage engine on the fly. So, as the SELECT result is being generated, the dynamic component is added to the BLOB references returned in the rowset. Security of the BLOB streaming stuff is one of the major issues, so further comments, questions and ideas are welcome! Best regards, Paul On Jun 26, 2007, at 4:36 PM, Ann W. Harrison wrote: Paul McCullagh wrote: It will also be possible to store the BLOBs out-of-row. In this case, only a BLOB reference is stored in the row. The reference is basically a URL which can be used to retrieve the data. So when you do an SQL SELECT which includes a BLOB column, the resulting rowset does not contain the data, just the BLOB reference (URL). How does this work with access privileges? Can you just send random numbers in the URL until you start seeing blob data? Best regards, Ann -- MySQL Internals Mailing List For list archives: http://lists.mysql.com/internals To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql
Re: Blob data
Hi Ann, Currently, the thoughts on how to make the BLOB references secure go like this: The BLOB reference consists of 2 components: The first component is basically an index used to find the BLOB on the server. The second component is a random number generated when the BLOB is created. The random number acts as an authorization code, and is checked when the BLOB is requested. So if the authorization code supplied in the BLOB reference does not match the code stored by the server for that BLOB, then the BLOB is not returned. If the authorization code is a 4-byte number, then the chances of getting the correct code for any particular BLOB is 1 in 4 billion. This makes it practically impossible to discover a BLOB by generating BLOB references and requesting them from the server. However, it does mean that once you have a valid BLOB reference it remains valid until the BLOB is deleted. So you can pass it around to your friends, or post it on the internet if you like. In order to prevent this (it will depend on the site, as to whether this is required), it would be possible to add a dynamic component to the BLOB reference which has a certain lifetime (for example, it expires after a certain amount of time, or when a database session is closed). Such a component would have to be added to the BLOB reference URL by the storage engine on the fly. So, as the SELECT result is being generated, the dynamic component is added to the BLOB references returned in the rowset. Security of the BLOB streaming stuff is one of the major issues, so further comments, questions and ideas are welcome! Best regards, Paul On Jun 26, 2007, at 4:36 PM, Ann W. Harrison wrote: Paul McCullagh wrote: It will also be possible to store the BLOBs out-of-row. In this case, only a BLOB reference is stored in the row. The reference is basically a URL which can be used to retrieve the data. So when you do an SQL SELECT which includes a BLOB column, the resulting rowset does not contain the data, just the BLOB reference (URL). How does this work with access privileges? Can you just send random numbers in the URL until you start seeing blob data? Best regards, Ann -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Blob data
This one time, at band camp, Warren Young [EMAIL PROTECTED] wrote: Storing arbitrary data in a BLOB column is tantamount to trying to turn the database into a file system. If you want a file system, use a file system. What is a file system, if not a database? Kevin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Blob data
Hi Steve, Some very good points. We are working on a solution for BLOBs in the a MySQL database at www.blobstreaming.org. Any good solution will have to take your comments into account... On Jun 22, 2007, at 9:24 AM, Steve Edberg wrote: At 12:11 PM +0530 6/22/07, Ratheesh K J wrote: Hello All, I want a clarification. Whe run a forum wherein people send messages with/without attachments. Attachments may contain images, documents etc.. We are actually storing the attachment in a blob column. Sometimes the attachments are big. And today the table size has grown to 40 GB. This has created a headache for any maintanance task, backup, restoration. etc. I want to know whether this is the right approach. Or should we actually store the attachments in directories and just stiore the attachment path in the database. Kindly suggest the best approach so that I can reduce the database size. Thanks in advance Yes, storing files - especially non-textual files - in the file system instead of the database is generally considered the best practice. At one point I had created a document management system that stored everything in the database as you are doing; my rationale was that it allowed me to manage permissions using the existing database permissions, and to back up the whole database using mysqldump, vs mysqldump + doing a tar of the files. However, I abandoned this approach for the following reasons: (1) Storing non-plaintext items (eg; pictures) in the database makes it bigger and slower without added value - you can't (at least not yet, or in the foreseeable future) do a meaningful search on a blob. (2) It becomes more difficult to split storage out onto multiple filesystems; eg, leaving the database files in /var/database, putting the documents themselves into /home/docmanager, etc. (3) It makes queries on the commandline unwieldy; if you have a blob field, doing a select * to check a record's contents can dump a lot of garbage on the screen. (4) It can make doing incremental backups more difficult; if the documents themselves are relatively static, but the document metadata stored in the database is very dynamic, it becomes simple to do a compact daily database dump + a weekly document directory backup (for example) if the files are not in the database. What I do is create a unique SHA1 hash when a file is uploaded (eg; sha1(rand()). The original filename and the 40-character hash are stored in the database, and the document is stored in the filesystem using the hash as the filename. I can optionally compress and encrypt the document as well, storing the encryption key in the database. This gives (for me) adequate document security. An additional advantage is that you can take advantage of the filesystem tree if you have a large number of documents. For example, if a document hash is 'f0118e9bd2c4fb29c64ee03abce698b8', you can store the file in the directory tree f0/11/8e/ f0118e9bd2c4fb29c64ee03abce698b8 (extending to as many levels as you feel necessary). By keeping the number of files per directory fairly small, file retrieval becomes relatively fast. As the hashes approximate a random distribution, you should always have a close- to-balanced tree. Lastly, I store a hash of the document itself in the database as well. This allows me to detect if duplicate files are uploaded, and to determine if a previously-uploaded file has been corrupted in some way. steve -- +--- my people are the people of the dessert, ---+ | Steve Edberghttp:// pgfsun.ucdavis.edu/ | | UC Davis Genome Center [EMAIL PROTECTED] | | Bioinformatics programming/database/sysadmin (530) 754-9127 | + said t e lawrence, picking up his fork + -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql? [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Blob data
Hi Konstantin, On Jun 26, 2007, at 1:37 PM, Konstantin Osipov wrote: * Paul McCullagh [EMAIL PROTECTED] [07/06/26 15:35]: I am about to release the first version of the BLOB Streaming engine. We are not that far yet (obviously), but our aim is to create an open platform which will be extendable. So adding various image and fulltext searching should be possible. Also the inclusion of compression and encryption as you mentioned below. Our client/server protocol and SQL interpreter do not have support for blob streaming. We always copy the entire column to and from the engine and, in the protocol, to the client. So it would be actually very interesting to see how you contemplate to deliver data between applications and the storage engine. The BLOB Streaming engine allows you to stream BLOB data directly from a MySQL table using HTTP. To store a BLOB to the database you will use the HTTP PUT method to upload the data into a table row, and to retrieve a BLOB, you use the HTTP GET method to download the data from a table row. The URL specifies the database, table, column and search condition, in order to identify the BLOB location. It will also be possible to store the BLOBs out-of-row. In this case, only a BLOB reference is stored in the row. The reference is basically a URL which can be used to retrieve the data. So when you do an SQL SELECT which includes a BLOB column, the resulting rowset does not contain the data, just the BLOB reference (URL). The BLOB reference URL can be placed directly into an HTML page, or the data can be retrieved by the application using HTTP when it is needed. In order to store a BLOB out-or-row, you upload the BLOB to MySQL without specifying a target row. The PUT command returns the BLOB reference. Then you do a normal INSERT, placing the BLOB reference in the BLOB column. When the INSERT is committed, the uploaded BLOB is committed to the database as well. Best regards, Paul P.S. Thanks for your reply to MySQL forces engine shutdown, and hangs. Saved me quite a bit of time! :) -- -- Konstantin Osipov Software Developer, Moscow, Russia -- MySQL AB, www.mysql.com The best DATABASE COMPANY in the GALAXY -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Blob data
Kevin Waterson wrote: This one time, at band camp, Warren Young [EMAIL PROTECTED] wrote: Storing arbitrary data in a BLOB column is tantamount to trying to turn the database into a file system. If you want a file system, use a file system. What is a file system, if not a database? That's the sort of logic that leads one to believe that the only data structure anyone needs is a hash table (Javascript), or that because bc is Turing complete, that it is a good general purpose programming language. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Blob data
Ratheesh, If you still want to place you blob data in the database, seperate out the blob storage to a seperate table. This will help you alleviate few of your performance and maintenance problems. ~Alex On 6/22/07, Steve Edberg [EMAIL PROTECTED] wrote: At 12:11 PM +0530 6/22/07, Ratheesh K J wrote: Hello All, I want a clarification. Whe run a forum wherein people send messages with/without attachments. Attachments may contain images, documents etc.. We are actually storing the attachment in a blob column. Sometimes the attachments are big. And today the table size has grown to 40 GB. This has created a headache for any maintanance task, backup, restoration. etc. I want to know whether this is the right approach. Or should we actually store the attachments in directories and just stiore the attachment path in the database. Kindly suggest the best approach so that I can reduce the database size. Thanks in advance Yes, storing files - especially non-textual files - in the file system instead of the database is generally considered the best practice. At one point I had created a document management system that stored everything in the database as you are doing; my rationale was that it allowed me to manage permissions using the existing database permissions, and to back up the whole database using mysqldump, vs mysqldump + doing a tar of the files. However, I abandoned this approach for the following reasons: (1) Storing non-plaintext items (eg; pictures) in the database makes it bigger and slower without added value - you can't (at least not yet, or in the foreseeable future) do a meaningful search on a blob. (2) It becomes more difficult to split storage out onto multiple filesystems; eg, leaving the database files in /var/database, putting the documents themselves into /home/docmanager, etc. (3) It makes queries on the commandline unwieldy; if you have a blob field, doing a select * to check a record's contents can dump a lot of garbage on the screen. (4) It can make doing incremental backups more difficult; if the documents themselves are relatively static, but the document metadata stored in the database is very dynamic, it becomes simple to do a compact daily database dump + a weekly document directory backup (for example) if the files are not in the database. What I do is create a unique SHA1 hash when a file is uploaded (eg; sha1(rand()). The original filename and the 40-character hash are stored in the database, and the document is stored in the filesystem using the hash as the filename. I can optionally compress and encrypt the document as well, storing the encryption key in the database. This gives (for me) adequate document security. An additional advantage is that you can take advantage of the filesystem tree if you have a large number of documents. For example, if a document hash is 'f0118e9bd2c4fb29c64ee03abce698b8', you can store the file in the directory tree f0/11/8e/f0118e9bd2c4fb29c64ee03abce698b8 (extending to as many levels as you feel necessary). By keeping the number of files per directory fairly small, file retrieval becomes relatively fast. As the hashes approximate a random distribution, you should always have a close-to-balanced tree. Lastly, I store a hash of the document itself in the database as well. This allows me to detect if duplicate files are uploaded, and to determine if a previously-uploaded file has been corrupted in some way. steve -- +--- my people are the people of the dessert, ---+ | Steve Edberghttp://pgfsun.ucdavis.edu/ | | UC Davis Genome Center[EMAIL PROTECTED] | | Bioinformatics programming/database/sysadmin (530)754-9127 | + said t e lawrence, picking up his fork + -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
Re: Blob data
Ratheesh K J wrote: I want to know whether this is the right approach. Or should we actually store the attachments in directories and just stiore the attachment path in the database. Databases are designed to handle arbitrarily large numbers of rows of structured data, where each datum is small and roughly the same size as all others of its kind. Storing arbitrary data in a BLOB column is tantamount to trying to turn the database into a file system. If you want a file system, use a file system. Microsoft's been promising a database-backed file system for something like 15 years now. Maybe it's because they don't write software all that well up there in Redmond. Or maybe it's because this is one of those ideas that sounds good on paper but doesn't work out so well in practice. I'm betting on the latter. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Blob data
At 12:11 PM +0530 6/22/07, Ratheesh K J wrote: Hello All, I want a clarification. Whe run a forum wherein people send messages with/without attachments. Attachments may contain images, documents etc.. We are actually storing the attachment in a blob column. Sometimes the attachments are big. And today the table size has grown to 40 GB. This has created a headache for any maintanance task, backup, restoration. etc. I want to know whether this is the right approach. Or should we actually store the attachments in directories and just stiore the attachment path in the database. Kindly suggest the best approach so that I can reduce the database size. Thanks in advance Yes, storing files - especially non-textual files - in the file system instead of the database is generally considered the best practice. At one point I had created a document management system that stored everything in the database as you are doing; my rationale was that it allowed me to manage permissions using the existing database permissions, and to back up the whole database using mysqldump, vs mysqldump + doing a tar of the files. However, I abandoned this approach for the following reasons: (1) Storing non-plaintext items (eg; pictures) in the database makes it bigger and slower without added value - you can't (at least not yet, or in the foreseeable future) do a meaningful search on a blob. (2) It becomes more difficult to split storage out onto multiple filesystems; eg, leaving the database files in /var/database, putting the documents themselves into /home/docmanager, etc. (3) It makes queries on the commandline unwieldy; if you have a blob field, doing a select * to check a record's contents can dump a lot of garbage on the screen. (4) It can make doing incremental backups more difficult; if the documents themselves are relatively static, but the document metadata stored in the database is very dynamic, it becomes simple to do a compact daily database dump + a weekly document directory backup (for example) if the files are not in the database. What I do is create a unique SHA1 hash when a file is uploaded (eg; sha1(rand()). The original filename and the 40-character hash are stored in the database, and the document is stored in the filesystem using the hash as the filename. I can optionally compress and encrypt the document as well, storing the encryption key in the database. This gives (for me) adequate document security. An additional advantage is that you can take advantage of the filesystem tree if you have a large number of documents. For example, if a document hash is 'f0118e9bd2c4fb29c64ee03abce698b8', you can store the file in the directory tree f0/11/8e/f0118e9bd2c4fb29c64ee03abce698b8 (extending to as many levels as you feel necessary). By keeping the number of files per directory fairly small, file retrieval becomes relatively fast. As the hashes approximate a random distribution, you should always have a close-to-balanced tree. Lastly, I store a hash of the document itself in the database as well. This allows me to detect if duplicate files are uploaded, and to determine if a previously-uploaded file has been corrupted in some way. steve -- +--- my people are the people of the dessert, ---+ | Steve Edberghttp://pgfsun.ucdavis.edu/ | | UC Davis Genome Center[EMAIL PROTECTED] | | Bioinformatics programming/database/sysadmin (530)754-9127 | + said t e lawrence, picking up his fork + -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]