Thank you all, it all makes sense (at least to me) .

The "row" object contains a pointer to a location elsewhere in the SAME
FILE where the BLOB itself is kept. That pointer (and maybe the size of the
BLOB itself) add the 5 to 9 bytes to the row as discussed in the docs. The
actual BLOB data is included with the other "row" data in the same file but
not actually stored as part of the row.

This brings up a tangent question. As related in another thread (I can't
find it right now) another DBA suggested that for greater performance, that
BLOB columns should be split from the non-BLOB columns into their own
table. His reasoning was that in order to process a WHERE  clause that the
entire row (including the BLOB columns) was retrieved. He said he was able
to improve his query performance by only including the BLOB data when it
was actually needed for the output.

Could his performance differences be because the BLOB objects are
interspersed _between_ the "ROW" objects and is forcing his disks to seek
more to get from row to row? Or, does anyone know if his initial theory
(that the BLOBs were loaded even if not needed) is right?

humbly impressed and freshly educated,

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



                                                                                       
                                 
                      [EMAIL PROTECTED]                                                
                                 
                      om                       To:       <[EMAIL PROTECTED]>, <[EMAIL 
PROTECTED]>                 
                                               cc:       <[EMAIL PROTECTED]>, <[EMAIL 
PROTECTED]>         
                      06/17/2004 11:12         Fax to:                                 
                                 
                      AM                       Subject:  RE: Where are BLOBs / TEXTs 
stored?                            
                                                                                       
                                 
                                                                                       
                                 




Here's an excerpt from the MySQL internal doc, the format is better in
html but not appropriate for this forum. The last paragraph sums it up:


Hexadecimal Display of Table1.MYD file
F1 61 62 63 00 F5 64 00 66 00 ... .abc..d e.


Here's how to read this hexadecimal-dump display:


The hexadecimal numbers F1 61 62 63 00 F5 64 20 66 00 are byte values
and the column on the right is an attempt to show the same bytes in
ASCII.
The F1 byte means that there are no null fields in the first row.
The F5 byte means that the second column of the second row is NULL.
(It's probably easier to understand the flag setting if you restate F5
as 11110101 binary, and (a) notice that the third flag bit from the
right is on, and (b) remember that the first flag bit is the X bit.)


There are complications -- the record header is more complex if there
are variable-length fields -- but the simple display shown in the
example is exactly what you'd see if you looked at the MySQL Data file
with a debugger or a hexadecimal file dumper.


So much for the fixed format. Now, let's discuss the dynamic format.


The dynamic file format is necessary if rows can vary in size. That will
be the case if there are BLOB columns, or "true" VARCHAR columns.
(Remember that MySQL may treat VARCHAR columns as if they're CHAR
columns, in which case the fixed format is used.) A dynamic row has more
fields in the header. The important ones are "the actual length", "the
unused length", and "the overflow pointer". The actual length is the
total number of bytes in all the columns. The unused length is the total
number of bytes between one physical record and the next one. The
overflow pointer is the location of the rest of the record if there are
multiple parts.


For example, here is a dynamic row:
  03, 00             start of header
04                 actual length
0c                 unused length
01, fc             flags + overflow pointer
****               data in the row
************       unused bytes
                   <-- next row starts here)




In the example, the actual length and the unused length are short (one
byte each) because the table definition says that the columns are short
-- if the columns were potentially large, then the actual length and the
unused length could be two bytes each, three bytes each, and so on. In
this case, actual length plus unused length is 10 hexadecimal (sixteen
decimal), which is a minimum.

As for the third format -- packed -- we will only say briefly that:

Numeric values are stored in a form that depends on the range (start/end
values) for the data type.
All columns are packed using either Huffman or enum coding.
For details, see the source files /myisam/mi_statrec.c (for fixed
format), /myisam/mi_dynrec.c (for dynamic format), and
/myisam/mi_packrec.c (for packed format).

Note: Internally, MySQL uses a format much like the fixed format which
it uses for disk storage. The main differences are:

BLOBs have a length and a memory pointer rather than being stored
inline.
"True VARCHAR" (a column storage which will be fully implemented in
version 5.0) will have a 16-bit length plus the data.
All integer or floating-point numbers are stored with the low byte
first. Point (3) does not apply for ISAM storage or internals.

Ed
------------------------------------------------------------------------
----
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 17, 2004 9:00 AM
To: [EMAIL PROTECTED]
Cc: emierzwa; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: Where are BLOBs / TEXTs stored?


[EMAIL PROTECTED] wrote on 17/06/2004 15:35:36:

>
> I am curious about this, too. However, I don't think that you answer
the
> original question.
>
> Are BLOBs stored as separate files, one file per object? Are they
combined
> into a single large BLOB file? are they aggregated into several medium
> sized files? Answering  "where are they stored on the disk" may be a
better
> response as the docs state that they are not stored in the MyISAM
table
> itself but as "separate objects" (which means what, exactly?)

This is not how I read the section of the manual. Normally, a database
row
is a single "Object" within the MyISAM file, whcih contains many
"Objects". Successive numeric fields will be stored in adjacent words of

the MyISAM file in exactly the order you see them when you do a "SELECT
*". If you want to access this record, then only one disk seek is needed

to fetch it. However, because large BLOBs are rarely involved in
searches,
rather than creating a single huge record with the BLOB embedded in it,
the BLOB is stored elsewhere *in the same .myd file*, with only a
pointer
to the position of the blob within the file.

The upside of this is that for searches not involving the BLOB field,
and
after the indexes have been exausted, only the relatively small non-BLOB

needs to be read and checked. The downside is that if the search
involves
the BLOB field, or if the BLOB field needsw to be fetched, then a second

disk access is required, reducing performance.

That is how I understand it: if anybody knows better, feel free to
correct
me - one learns by ones mistakes.

        Alec






-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to