RE: Storing huge amount of binary data
Sometimes, the easiest way to do this is to use the file system of the linux machine to store the files, and make reference to them in the DB...storing not data in the DB and getting rid of all your possible problems. Thanks, Matt Babineau Criticalcode 858.733.0160 [EMAIL PROTECTED] http://www.criticalcode.com -Original Message- From: Cabbar Duzayak [mailto:[EMAIL PROTECTED] Sent: Saturday, July 09, 2005 12:01 AM To: mysql@lists.mysql.com Subject: Storing huge amount of binary data Hi Everyone, I will be starting a project for which I will need to store about 1 million image files all of which are about 50-100K in size. I will be using Linux for this project. The database won't be very busy, there will be batch image uploads 1-2 times a day, and around 100-200 concurrent users at most, most of which will be reading from the db and writing only session information type of data, etc... And, I don't really need transaction support (InnoDB)... Adding this up, the image data size will be around 50-100 Gb, and I will need to store a total of 1-2 Gb text information (1K for each image) along with each of these images... First of all, I heard that Mysql does not perform very well when tablesize goes above 1 Gb. Is this a myth? Image table is not a big deal, since I can partition/distribute it to ~ 100-200 tables, i.e. by table_(id % 100). However, text information needs to stay in a single table (since I need to do queries on it for information) and there will be multiple indexes over this information. And, as you can imagine, I am not sure if mysql can handle something like this, and was wondering if you can provide some feedback. So my questions are: 1. The main question is, do you guys have any experience with this much binary and regular data? Do you think Mysql can handle this much data in a reliable manner (without corrupting data and/or degrading/terrible performance) ? 2. Can I implement this using regular SCSI disks with regular mysql? Or do I have need advanced solutions such as clustered, replicated, etc? 3. Again, as you can understand, I want to minimize the cost here. If you don't think I can use mysql, do you think Microsoft SQL server is good enough for this task? Thanks... -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Storing huge amount of binary data
Cabbar Duzayak wrote: So my questions are: 1. The main question is, do you guys have any experience with this much binary and regular data? Do you think Mysql can handle this much data in a reliable manner (without corrupting data and/or degrading/terrible performance) ? I would say so, yes. One of my biggest databases holds 50-60million rows, and takes up about 5Gb diskspace. I don't think mysql will have any problems running what you describe. 2. Can I implement this using regular SCSI disks with regular mysql? Or do I have need advanced solutions such as clustered, replicated, etc? Clustering and replication is more to do with data-availability. You'll probably benefit from using RAID in some form - depends on whether you need reliability or speed. 3. Again, as you can understand, I want to minimize the cost here. If you don't think I can use mysql, do you think Microsoft SQL server is good enough for this task? I don't think so, no. /Per Jessen, Zürich -- http://www.spamchek.com/freetrial - managed anti-spam and anti-virus solution. Sign up for your free 30-day trial now! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Storing huge amount of binary data
First of all, I heard that Mysql does not perform very well when tablesize goes above 1 Gb. Is this a myth? I don't have much experience with individual tables being that large, though I have used MySQL databases with 100G total data with no problems at all. Based on my experience, it's not the table size which can degrade performance, but the index size - if the index is too large or clumsy, then MySQL can not search quickly enough. If the index is well-structured, the data is found quickly and overall performance is great regardless of the quantity of data. Image table is not a big deal, since I can partition/distribute it to ~ 100-200 tables, i.e. by table_(id % 100). This is a good way to help keep your indexes small and thus keep search/seek time to a minimum :) However, text information needs to stay in a single table (since I need to do queries on it for information) and there will be multiple indexes over this information. A suggestion to help keep your indexes on this table small: if you have large text fields that you need to index, you may want to use partial indexes. (for example, if you have a column filename varchar(200), create an index on filename(50), or what ever is appropriate. This will cut the size of the index file a lot, and even though it may result in some duplicate entries in the index, it will probably increase overall performance.) And, as you can imagine, I am not sure if mysql can handle something like this, and was wondering if you can provide some feedback. So my questions are: 1. The main question is, do you guys have any experience with this much binary and regular data? Do you think Mysql can handle this much data in a reliable manner (without corrupting data and/or degrading/terrible performance) ? Based on my experience with 100GB InnoDB databases, with the right indexes, MySQL can easily handle 3,000 reads/sec on text/numerical data. (This is on dual xeon 3Ghz, 4 GB RAM, SCSI Raid-5 disks.) I've never had any problem with data corruption, but I use primarily the InnoDB engine, which is not prone to corruption in the same way as MyISAM. I don't have experience storing large amounts of binary data, so I can't say anything about MySQL's performance in that area specifically. 2. Can I implement this using regular SCSI disks with regular mysql? Probably. Or do I have need advanced solutions such as clustered, replicated, etc? No need for clustered or distributed databases, from what you've described. Cluster would be useful if you need 100% availability, even in the event of hardware failures. Replication, such as single master - multiple slaves, is useful if you have massive reads and minimal writes, and _may_ be something you will need. I would recommend using the command SHOW PROCESSLIST, or a tool like MyTop, to see what state the client connections spend the most time in. (searching the index, or sending the data over the network? if it's the latter, then you would benefit from distributing the read load to multiple slave servers.) Regards, Devananda vdv -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Storing huge amount of binary data
Per Jessen wrote: 3. Again, as you can understand, I want to minimize the cost here. If you don't think I can use mysql, do you think Microsoft SQL server is good enough for this task? I don't think so, no. what are you basing this on? SQL Server is a truly great database package, don't let some foolish bias blind you to the fact that it is professional grade software. i like MySQL as much as the next guy, but its being open source does not mean you should ignore the facts. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]