Thank you all, it all makes sense (at least to me) . The "row" object contains a pointer to a location elsewhere in the SAME FILE where the BLOB itself is kept. That pointer (and maybe the size of the BLOB itself) add the 5 to 9 bytes to the row as discussed in the docs. The actual BLOB data is included with the other "row" data in the same file but not actually stored as part of the row.
This brings up a tangent question. As related in another thread (I can't find it right now) another DBA suggested that for greater performance, that BLOB columns should be split from the non-BLOB columns into their own table. His reasoning was that in order to process a WHERE clause that the entire row (including the BLOB columns) was retrieved. He said he was able to improve his query performance by only including the BLOB data when it was actually needed for the output. Could his performance differences be because the BLOB objects are interspersed _between_ the "ROW" objects and is forcing his disks to seek more to get from row to row? Or, does anyone know if his initial theory (that the BLOBs were loaded even if not needed) is right? humbly impressed and freshly educated, Shawn Green Database Administrator Unimin Corporation - Spruce Pine [EMAIL PROTECTED] om To: <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]> cc: <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]> 06/17/2004 11:12 Fax to: AM Subject: RE: Where are BLOBs / TEXTs stored? Here's an excerpt from the MySQL internal doc, the format is better in html but not appropriate for this forum. The last paragraph sums it up: Hexadecimal Display of Table1.MYD file F1 61 62 63 00 F5 64 00 66 00 ... .abc..d e. Here's how to read this hexadecimal-dump display: The hexadecimal numbers F1 61 62 63 00 F5 64 20 66 00 are byte values and the column on the right is an attempt to show the same bytes in ASCII. The F1 byte means that there are no null fields in the first row. The F5 byte means that the second column of the second row is NULL. (It's probably easier to understand the flag setting if you restate F5 as 11110101 binary, and (a) notice that the third flag bit from the right is on, and (b) remember that the first flag bit is the X bit.) There are complications -- the record header is more complex if there are variable-length fields -- but the simple display shown in the example is exactly what you'd see if you looked at the MySQL Data file with a debugger or a hexadecimal file dumper. So much for the fixed format. Now, let's discuss the dynamic format. The dynamic file format is necessary if rows can vary in size. That will be the case if there are BLOB columns, or "true" VARCHAR columns. (Remember that MySQL may treat VARCHAR columns as if they're CHAR columns, in which case the fixed format is used.) A dynamic row has more fields in the header. The important ones are "the actual length", "the unused length", and "the overflow pointer". The actual length is the total number of bytes in all the columns. The unused length is the total number of bytes between one physical record and the next one. The overflow pointer is the location of the rest of the record if there are multiple parts. For example, here is a dynamic row: 03, 00 start of header 04 actual length 0c unused length 01, fc flags + overflow pointer **** data in the row ************ unused bytes <-- next row starts here) In the example, the actual length and the unused length are short (one byte each) because the table definition says that the columns are short -- if the columns were potentially large, then the actual length and the unused length could be two bytes each, three bytes each, and so on. In this case, actual length plus unused length is 10 hexadecimal (sixteen decimal), which is a minimum. As for the third format -- packed -- we will only say briefly that: Numeric values are stored in a form that depends on the range (start/end values) for the data type. All columns are packed using either Huffman or enum coding. For details, see the source files /myisam/mi_statrec.c (for fixed format), /myisam/mi_dynrec.c (for dynamic format), and /myisam/mi_packrec.c (for packed format). Note: Internally, MySQL uses a format much like the fixed format which it uses for disk storage. The main differences are: BLOBs have a length and a memory pointer rather than being stored inline. "True VARCHAR" (a column storage which will be fully implemented in version 5.0) will have a 16-bit length plus the data. All integer or floating-point numbers are stored with the low byte first. Point (3) does not apply for ISAM storage or internals. Ed ------------------------------------------------------------------------ ---- -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, June 17, 2004 9:00 AM To: [EMAIL PROTECTED] Cc: emierzwa; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: Where are BLOBs / TEXTs stored? [EMAIL PROTECTED] wrote on 17/06/2004 15:35:36: > > I am curious about this, too. However, I don't think that you answer the > original question. > > Are BLOBs stored as separate files, one file per object? Are they combined > into a single large BLOB file? are they aggregated into several medium > sized files? Answering "where are they stored on the disk" may be a better > response as the docs state that they are not stored in the MyISAM table > itself but as "separate objects" (which means what, exactly?) This is not how I read the section of the manual. Normally, a database row is a single "Object" within the MyISAM file, whcih contains many "Objects". Successive numeric fields will be stored in adjacent words of the MyISAM file in exactly the order you see them when you do a "SELECT *". If you want to access this record, then only one disk seek is needed to fetch it. However, because large BLOBs are rarely involved in searches, rather than creating a single huge record with the BLOB embedded in it, the BLOB is stored elsewhere *in the same .myd file*, with only a pointer to the position of the blob within the file. The upside of this is that for searches not involving the BLOB field, and after the indexes have been exausted, only the relatively small non-BLOB needs to be read and checked. The downside is that if the search involves the BLOB field, or if the BLOB field needsw to be fetched, then a second disk access is required, reducing performance. That is how I understand it: if anybody knows better, feel free to correct me - one learns by ones mistakes. Alec -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]