Re: blob versus file
Am Mittwoch, 3. Juli 2002 20:43 schrieb central: More specific: Can I efficiently read the bytes x to y from any BLOB stored in a MySQL database? Why not just add another column, Char(3), that contains the file extension? That would fix this particular case, but my thought were more along a general comparison of BLOB and file APIs. While ext2 open is dominated by directory lookup times when opening files, reiserfs and MySQL perform logarithmically here. On the other hand, during read and write phases ext2 and reiserfs as well as all other file systems have an API that allows them partial object reads and writes as well as seeking within that object. The question is, what does the MySQL BLOB API provide and how does it relate to performance of applications making use of the MySQL BLOB API? Kristian - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: blob versus file
Grabbing a half-gig video segment out of any database I'm sure you're absolutely right about not putting half gig videos in a database! I was thinking more of an application like an access control system, where there might be tens of thousands of photographs of people, each a jpeg of a small number of K, or a catalogue, again with thousands of tiny photos. The experiment I did with 500,000 operating system files was to see whether smallish blobs (bits of text, in this case, which I didn't need to search or index) were better stored as disk files or in an SQL Server database. The database won hands down on all counts. Tim Ward Brett Ward Limited - www.brettward.co.uk --- On August 5th ip.access will be moving to: CPC1 Telephone: +44 (0) 1223 219000 Capital Park Fax: +44 (0) 1223 219099 Fulbourn Cambridge CB1 5XE United Kingdom --- This e-mail and the information it contains are confidential and may be privileged. If you have received this e-mail in error please notify us immediately. You should not copy it for any purpose, or disclose its contents to any other person. All emails to and from ip.access may be monitored and stored for audit and other legal reasons. --- - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: blob versus file
At 09:42 AM 7/3/02 +0100, Tim Ward wrote: I was thinking more of an application like an access control system, where there might be tens of thousands of photographs of people, each a jpeg of a small number of K, or a catalogue, again with thousands of tiny photos. The experiment I did with 500,000 operating system files was to see whether smallish blobs (bits of text, in this case, which I didn't need to search or index) were better stored as disk files or in an SQL Server database. The database won hands down on all counts. Not meaning to put down MySQL, but have you tried this also with a ReiserFS filesystem? I had a similar number of files, about 70 GByte worth on an ext2 filesystem. Moved them to a ReiserFS filesystem and found I only needed 51 GByte. And got a much faster system... The main gain of MySQL is probably that MySQL packs the data tighter together, which in the end causes fewer hard disk head movents and less data to be transferred. And if you use indexes, your directory searches are binary chops rather than linear searches. ReiserFS does that also, but at a file system level. If you need to port your application to the outside world where you don't have control over your file systems, MySQL would be the way to go for this application. If you _do_ have control over the filesystem being used and you can use ReiserFS and you don't need MySQL specific features, then I wouldn't be so sure. Just my 2 eurocents worth... Liz - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: blob versus file
Hello. On Wed 2002-07-03 at 09:42:52 +0100, [EMAIL PROTECTED] wrote: [...] I was thinking more of an application like an access control system, where there might be tens of thousands of photographs of people, each a jpeg of a small number of K, or a catalogue, again with thousands of tiny photos. The experiment I did with 500,000 operating system files was to see whether smallish blobs (bits of text, in this case, which I didn't need to search or index) were better stored as disk files or in an SQL Server database. The database won hands down on all counts. Well, you did *not* put all 500.000 files in one directory, did you? I had a similar setting for some years (about 12GB of data in about 1.2 Mio. files at the end) stored in a directory hierachy and had no particular performance problems (at least once I started to avoid having more than 5.000 files in one dir ;-). About 50-100 random accesses/sec sustained at high time (including all the rest: web server, database server, etc...). Btw, this was still under Linux 2.0, Alpha 300MHz? SCSI (which is beaten performance-wise by any desktop PC nowadays). I mean not to say, that a SQL database would not have been able to handle this. But a filesystem is an specialized hierachical database, and I have yet to see an (more general) SQL database wins hands down in the preferred domain of the file system - of course, a reasonable implementation presumed (which ext2 IMHO is). Bye, Benjamin. -- [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: blob versus file
Am Mittwoch, 3. Juli 2002 10:58 schrieb Elizabeth Mattijsen: Not meaning to put down MySQL, but have you tried this also with a ReiserFS filesystem? I had a similar number of files, about 70 GByte worth on an ext2 filesystem. Moved them to a ReiserFS filesystem and found I only needed 51 GByte. And got a much faster system... ext2 searches directories linearly. With 500.000 files in a directory where each file is subsequently accessed in a benchmark, you get quadratic access times. reiserfs uses a tree structure for directories, giving you almost linear (O(n log n)) access. It also packs tails of different files into singular blocks, saving much space as well. Question: Can the MySQL BLOB API access and transfer partial blobs. That is, if you want to do the equivalent of a file * to a BLOB table, the first 10 bytes or so of each BLOB must be read in order to guess the type of the BLOB. Is it possible to implement this efficiently using the MySQL API? More specific: Can I efficiently read the bytes x to y from any BLOB stored in a MySQL database? Kristian - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: blob versus file
Kristian, Question: Can the MySQL BLOB API access and transfer partial blobs. That is, if you want to do the equivalent of a file * to a BLOB table, the first 10 bytes or so of each BLOB must be read in order to guess the type of the BLOB. Is it possible to implement this efficiently using the MySQL API? More specific: Can I efficiently read the bytes x to y from any BLOB stored in a MySQL database? Why not just add another column, Char(3), that contains the file extension? Then you wouldn't need to read the blob data. Or you can use ENUM and define the possible blob types. By having a second column you can sort and filter on it. Another column could be the blob size (in bytes) so you know how much blob data is being stored. Mike - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: blob versus file
Op maandag 1 juli 2002 14:18, schreef andy: Hi there, I am wondering if anybody has experiance in saving images to blob in mysql. I do save images with 1 K and 4 KB to blob fields while I used to save them to file. It seams to me that this is much slower accessing the files. The images take a bit (really short but absolutly noticable) to show up on the site. Is there a way to improve the performance, and why is this happening? I thought the performance might even boost after storing them to blobs. Thank you for any comments on that, Andy From the faq on http://www.bitbybit.dk/mysqlfaq/comments.html#ch7_2_0 Storing images and other binary data Many people ask how to store and retrieve images and other binary data in the database. The question usually pops up along with a statement a la how do I retrrieve and show an image on a web page? There are several reasons why you should consider not storing binary data in your database: * The whole point of storing stuff in a SQL database, is to put some kind of ordering and structure on your data, as well as being able to search on these data. But how do you search in the binary data of your picture? * For large data sets, storing binary data will quickly run up the size of your database files, making it harder to control the size of the database. * In order to store binary data in the database, you must continually escape and unescape the data to ensure that nothing breaks. You probably have several reasons why you would want to store your images in your database, despite all the statements above. Others have, before you. And they have all returned to the same solution: Store a link (e.g. a file path) to the image file in the database. Whenever you need the image, use the link in whatever program you use to retrieve the file containing the image. Besides keeping your sanity, you will often find that things work faster this way: You have less data to transfer between your application and MySQL, you don't need to worry about escaping funny characters. And especially for serving HTML pages, you're helping your server decide which data to keep cached for faster client retrieval. There is one good reason why you might want to store the binary data in the database: Replication. If you need to ensure that the binary data is available on several MySQL hosts, using replication might save your day. That said, you should always keep the statements above in mind. HTH Ray - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: blob versus file
There are several reasons why you should consider not storing binary data in your database: [snip] You probably have several reasons why you would want to store your images in your database, despite all the statements above. Others have, before you. And they have all returned to the same solution: Store a link (e.g. a file path) to the image file in the database. Whenever you need the image, use the link in whatever program you use to retrieve the file containing the image. Yes, but, this advice does *not* go on to describe how you cope with the deletion problem. If you store data in records in the database a DELETE will delete *all* the data for the set of rows. If some of the data is lying around in disk files these obviously don't get deleted by DELETE. So it seems to me that you need an image file garbage collection process, which every now and then scans the disk, scans the database and deletes disk files which no longer have a pointer to them from any reference field in any row in any table in the database. And of course as the file store and the database could both be changing under your feet whilst you're doing this it seems to be a good idea to shut down, or lock, the database server and the processes you've got adding disk files whilst you're doing the garbage collection. Doesn't sound to clever to me. Yes of course you could make your business logic layer clever enough to have a go at deleting the files when it deletes the database rows. But as you don't have transactions covering database operations *and* file system operations things will get out of step sooner or later so you'll still need the garbage collector. And some hints as to the viability of, say, hundreds of thousands of operating system files on the various platforms supported by MySQL would be helpful. For example many filing systems will lose *lots* of space through fragmentation like this, which one might hope would not happen with blobs. And what about the performance implications of lots of files in the same directory? - NT4 for example *will* handle half a million files in the same directory, 'cos I've tried it, but it's not terribly practical as some operations start taking tens of minutes instead of tens of milliseconds. Really one of the reasons for using a DBMS is to handle all this storage management crap for you, no? Tim Ward Brett Ward Limited - www.brettward.co.uk --- On August 5th ip.access will be moving to: CPC1 Telephone: +44 (0) 1223 219000 Capital Park Fax: +44 (0) 1223 219099 Fulbourn Cambridge CB1 5XE United Kingdom --- This e-mail and the information it contains are confidential and may be privileged. If you have received this e-mail in error please notify us immediately. You should not copy it for any purpose, or disclose its contents to any other person. All emails to and from ip.access may be monitored and stored for audit and other legal reasons. --- - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: blob versus file
Yes, but, this advice does *not* go on to describe how you cope with the deletion problem. If you store data in records in the database a DELETE will delete *all* the data for the set of rows. If some of the data is lying around in disk files these obviously don't get deleted by DELETE. So it seems to me that you need an image file garbage collection process, which every now and then scans the disk, scans the database and deletes disk files which no longer have a pointer to them from any reference field in any row in any table in the database. And of course as the file store and the database could both be changing under your feet whilst you're doing this it seems to be a good idea to shut down, or lock, the database server and the processes you've got adding disk files whilst you're doing the garbage collection. Doesn't sound to clever to me. Yes of course you could make your business logic layer clever enough to have a go at deleting the files when it deletes the database rows. But as you don't have transactions covering database operations *and* file system operations things will get out of step sooner or later so you'll still need the garbage collector. And some hints as to the viability of, say, hundreds of thousands of operating system files on the various platforms supported by MySQL would be helpful. For example many filing systems will lose *lots* of space through fragmentation like this, which one might hope would not happen with blobs. And what about the performance implications of lots of files in the same directory? - NT4 for example *will* handle half a million files in the same directory, 'cos I've tried it, but it's not terribly practical as some operations start taking tens of minutes instead of tens of milliseconds. Really one of the reasons for using a DBMS is to handle all this storage management crap for you, no? Well, sort of, yes, but only when the RDBMS can actually *help*. Storing large amounts of data is the job of a filesystem, not a database manager. RDBMS products work when : * Each item of data is relatively small * That data needs to be broken into pieces, some of which should be individually identifiable * Similar fragments from different items need to be 'related' together Bulk storage isn't really part of the metaphor. I'm building a multi-media database which references anything from names and addresses to 500 Mb video segments. The database stores the smaller items internally (names and addresses as said, small chunks of text and URLs primarily), and references everything else via a unique id. The main table relates the id, item type and location, where for the videos, the location is an external file. There are two issues in particular, one (as you said above) is the 'how do I delete something', and the other is 'how do I add a new item'. Deletion is more than the usual RDBMS delete. I mark a record to be deleted as I don't want this any more, and an external garbage collector looks for rows with this marker set and removes any external data, then deletes the row itself. You don't have relational integrity for the whole transaction, but this method handles that well (in my experience anyway) by having the delete process as a multi-stage event (the '(T)' shows the areas where transactions guarantee atomicity) : * mark all associated database entries as to be deleted (T) * delete external data referenced by database rows waiting to be deleted * delete all related database rows (T) Insertion involves looking for new files inside a recognised directory tree, the structure of which is controlled by either the user or other software modules. Anything found that the database doesn't currently know about goes into a Waiting table. A human then tells the system about the newly found file(s). For smaller items where there's no external data, the standard insert into table values (.) works. You're quite right about the dangers of huge numbers of files in a directory - most OSs these days will cope, but few cope well when the number exceeds a couple of thousand. UNIX visibly degrades at about that point. Performance is an issue with large blocks of data where ever it comes from. I'd argue that these days filesystems are better designed to cope with bulk data transfer than (most) RDBMSs. Grabbing a half-gig video segment out of any database would put a strain on the servers ability to respond well to other queries while the transfer is in progress - deleting it could prove intensive too (depends on what's indexed of course). After all, database servers also have garbage collection functions even if they are invisible. Hope this helps some (or invokes more discussion !). Regards, Paul Wilson Chime Communications - Before posting, please check: http://www.mysql.com/manual.php (the
Re: blob versus file
Andy, File Systems are made to store and retrieve files in an efficient way. You may not expect better performance when you put a database in between. When speed is an issue you should use MySQL to store and retrieve filenames and something like ReiserFS, RAID and good hardware to store images. Thomas Spahni On Mon, 1 Jul 2002, andy wrote: Hi there, I am wondering if anybody has experiance in saving images to blob in mysql. I do save images with 1 K and 4 KB to blob fields while I used to save them to file. It seams to me that this is much slower accessing the files. The images take a bit (really short but absolutly noticable) to show up on the site. Is there a way to improve the performance, and why is this happening? I thought the performance might even boost after storing them to blobs. Thank you for any comments on that, Andy PS: I was planing to store all the other images (20 - 50 KB) to blobs as well (thousands) but after this first experiance I am a bit confused if the performance would even more suffer. Would you guys recommend this? sql, query - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php