Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
Since you have a development team that's built complex software on top of sqlite, why not instrument it to see how many seeks it's doing per query operation? This can be done relatively easily by writing a custom vfs shim that can collect stats for the read/write seeks. This stat collected over a period can give you an idea of fragmentation if the seeks are increasing as the db gets older. If you are on newer Linux kernels, you can use blktrace to diagnose latency issues. I work on a very large scale distributed datastore that uses sqlite as the on-disk format and we store huge amounts of data in it. But we try to keep the size of each individual db file less than a few tens of GBs for ease of management. But we have multiple dbs per server that are attached and accessed together. Thanks, Vinay On Tue, Oct 30, 2012 at 10:45 PM, David Barrett dbarr...@expensify.com wrote: On Mon, Oct 29, 2012 at 8:28 PM, Richard Hipp d...@sqlite.org wrote: On Mon, Oct 29, 2012 at 5:58 AM, David Barrett dbarr...@expensify.com wrote: So what specifically do you feel is the problem with sqlite at scale? And to be clear, I'd like to pre-empt well it doesn't do X, which you'll probably also want to do -- all those X's are already covered and working great. Generally the argument in favor of client/server databases versus SQLite comes down to (1) Concurrency, (2) Replication, and (3) Network access. The size of the database file shouldn't really be a factor. Yes, thank you, I'm sorry I didn't make it clear -- concurrency, replication, and network access are *not* problems for me. Specifically: 1) Our server is single-threaded (all writes are serialized), so there are no concurrency issues. 2) We have our own replication layer (which is better than MySQL and Postgres replication) 3) We provide our own network access (on top of the replication layer) 4) The backup API works fine 5) Everything is on the local disk So I agree entirely -- if you need something sqlite doesn't provide, then sqlite isn't the right choice. I'm just saying sqlite provides everything I need, and does it incredibly well. Similarly, I'm encouraged to hear that there aren't any particular size restrictions or concerns. We're currently at 40GB and it's working great. But back to the original question -- can anybody point me in a direction to learn more about MySQL's handling of fragmentation and how it differs from sqlite? This isn't an obsession, it doesn't even really affect us in light of our moving to SSDs. It's just a gap in my knowledge I'm looking to fill. Thanks! -david ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
In my scenario there is just one master and most transactions are non-overlapping, but I can foresee a need for resolution of the occasional collision. You mention other states. Can you explain further? Chris On Wed, 2012-10-31 at 11:40 +0700, David Barrett wrote: Ah, to clarify, there is only one master at any point in time. So this isn't a multi-master scenario where each node keeps committing locally and then somehow merging the results later. Rather, each node knows if it's the master or slave (or a variety of other states). If it's a master, it organizes the two-phase distributed commit. If it's a slave, it escalates to the master. And if it's something else, then it just holds on to the request and waits until it's either a slave or a master. -david On Wed, Oct 31, 2012 at 2:09 AM, Chris Peachment ch...@ononbb.com wrote: On Wed, 2012-10-31 at 00:49 +0700, David Barrett wrote: Thanks Alek! Yes, we're definitely planning on it, just trying to find the right time. We don't want to go through the work to open source it only to be greeted with silence. Might you be interested in using it in an actual deployed environment, or just studying it? Your proposal to open source the replication method used by Expensify has me interested. My application of interest is much smaller than yours, just a handful of remote clients that risk loss of connectivity but wish to continue with database updates during the downtime. Aside from the details of protocol usage and statement packaging, the concern for collisions during merge is a particular issue of interest. Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On Mon, Oct 29, 2012 at 8:28 PM, Richard Hipp d...@sqlite.org wrote: On Mon, Oct 29, 2012 at 5:58 AM, David Barrett dbarr...@expensify.com wrote: So what specifically do you feel is the problem with sqlite at scale? And to be clear, I'd like to pre-empt well it doesn't do X, which you'll probably also want to do -- all those X's are already covered and working great. Generally the argument in favor of client/server databases versus SQLite comes down to (1) Concurrency, (2) Replication, and (3) Network access. The size of the database file shouldn't really be a factor. Yes, thank you, I'm sorry I didn't make it clear -- concurrency, replication, and network access are *not* problems for me. Specifically: 1) Our server is single-threaded (all writes are serialized), so there are no concurrency issues. 2) We have our own replication layer (which is better than MySQL and Postgres replication) 3) We provide our own network access (on top of the replication layer) 4) The backup API works fine 5) Everything is on the local disk So I agree entirely -- if you need something sqlite doesn't provide, then sqlite isn't the right choice. I'm just saying sqlite provides everything I need, and does it incredibly well. Similarly, I'm encouraged to hear that there aren't any particular size restrictions or concerns. We're currently at 40GB and it's working great. But back to the original question -- can anybody point me in a direction to learn more about MySQL's handling of fragmentation and how it differs from sqlite? This isn't an obsession, it doesn't even really affect us in light of our moving to SSDs. It's just a gap in my knowledge I'm looking to fill. Thanks! -david ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On Tue, Oct 30, 2012 at 1:00 AM, Alek Paunov a...@declera.com wrote: On 29.10.2012 11:58, David Barrett wrote: Because in practice, as someone actually doing it (as opposed to theorizing about it), it works great. The MySQL portions of our service are always in a semi-constant state of emergency, while our sqlite portions just hum along And given that we're switching to SSDs, I expect they will hum even better. What problems would you expect me to be seeing that I can happily report I'm not, or what problems have I not yet encountered but will -- at 100GB, or 1TB? In your previous thread (2012-02), you have mentioned that you are about to open-source your replication method based on SQL statement distribution. Probably your work would be of interest for a huge number of sites managing data volumes around or bellow your current level, even if you switch to PostgreSQL at this point. IMHO, there might be a future for your replication model, because I think that SQLite, can more easily (relative to other proven DB technologies e.g. PostgreSQL) be turned to DB engine for more query languages than SQL (thanks to his clever VM design). Furthermore, AFAIK, PostgreSQL replicates at WAL distribution level, most NoSQL databases at keys distribution level, whereas your method seems more efficient as bandwidth. Thanks Alek! Yes, we're definitely planning on it, just trying to find the right time. We don't want to go through the work to open source it only to be greeted with silence. Might you be interested in using it in an actual deployed environment, or just studying it? As for the size this works up to, I should emphasize that Expensify uses this for our main database -- and we have over a *million* users on it. That's not to say a million users is the biggest thing ever, but it's a lot bigger than most websites (with far more complicated data structures), and it works great. Furthermore, we're in the process of upgrading all our hardware and we feel that alone will get us at *least* an order of magnitude improvement in capacity -- wiithout any algorithmic changes. And we've got plenty of ideas how to improve the basic technology and/or restructure our database to get even more capacity, should we need it. The upshot is I don't see a specific reason why it couldn't scale up to a 5M, 10M, or larger service. And if it starts to break down after that? Well that's a problem we should all love to have. Additionally, I think people get so excited about big data that they overlook the importance of *available* data. With this technology, everything is replicated offsite in realtime, ensuring that service can continue uninterrupted even when a whole datacenter goes underwater (as is happening to many datacenters at this very moment in NYC) or falls off the map (as happens to various AWS zones with surprising regularity). Our technology seamlessly fails over when any node (even the master) disappears (or reappears), without dropping a single transaction -- the web layer doesn't even know if it's talking to a master or slave, or it was a slave that became master mid-transaction. This total confidence in the data layer is what allows us to sleep soundly even when servers crash: similar to how Google only fixes broken servers every quarter, any business in this day and age that stresses out when a server dies is doing it wrong. Indeed, i'm writing this from a hotel in Bangkok because every year we take the whole company overseas for a month to work from the beach -- something that would be inconceivable to an organization that puts all its eggs in one datacenter. As for SQL versus binary replication, it has its pros and cons -- it's generally (though not always) more bandwidth efficient, but at a higher CPU cost: slaves need to redo all the work as the master. But it's fantastically simple, and I feel a simple design brings the most important efficiency of all: easy to understand, easy to debug, easy to verify. As for Postgre, MySQL, or any other database back end -- yes, it'd designed to be a layer above the database. We're in the midst of making it optionally backed by a MySQL store, but yes, it should be easy to put anything behind it. Finally, that's interesting about using this to replicate non-SQL languages -- yes, it's definitely language agnostic. Anything that has the notion of an atomic transaction with ROLLBACK and COMMIT should work fine with it. Thanks for the interest! -david ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On 30 Oct 2012, at 5:15pm, David Barrett dbarr...@expensify.com wrote: I'm encouraged to hear that there aren't any particular size restrictions or concerns. http://www.sqlite.org/limits.html Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On Wed, 2012-10-31 at 00:49 +0700, David Barrett wrote: Thanks Alek! Yes, we're definitely planning on it, just trying to find the right time. We don't want to go through the work to open source it only to be greeted with silence. Might you be interested in using it in an actual deployed environment, or just studying it? Your proposal to open source the replication method used by Expensify has me interested. My application of interest is much smaller than yours, just a handful of remote clients that risk loss of connectivity but wish to continue with database updates during the downtime. Aside from the details of protocol usage and statement packaging, the concern for collisions during merge is a particular issue of interest. Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
Ah, to clarify, there is only one master at any point in time. So this isn't a multi-master scenario where each node keeps committing locally and then somehow merging the results later. Rather, each node knows if it's the master or slave (or a variety of other states). If it's a master, it organizes the two-phase distributed commit. If it's a slave, it escalates to the master. And if it's something else, then it just holds on to the request and waits until it's either a slave or a master. -david On Wed, Oct 31, 2012 at 2:09 AM, Chris Peachment ch...@ononbb.com wrote: On Wed, 2012-10-31 at 00:49 +0700, David Barrett wrote: Thanks Alek! Yes, we're definitely planning on it, just trying to find the right time. We don't want to go through the work to open source it only to be greeted with silence. Might you be interested in using it in an actual deployed environment, or just studying it? Your proposal to open source the replication method used by Expensify has me interested. My application of interest is much smaller than yours, just a handful of remote clients that risk loss of connectivity but wish to continue with database updates during the downtime. Aside from the details of protocol usage and statement packaging, the concern for collisions during merge is a particular issue of interest. Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On Mon, Oct 29, 2012 at 2:16 AM, Richard Hipp d...@sqlite.org wrote: It sounds like you are pushing SQLite well beyond what it was intended to do. Remember the motto: SQLite is not intended to replace Oracle, it is intended to replace fopen(). SQLite does a great job for roles such as data storage for a desktop application, or for databases in cellphones or other gadgets, or as a local cache to a enterprise network database. But SQLite was never designed or intended to replace an enterprise-level RDBMS. And I think you probably need an enterprise-level RDBMS at this point. So what specifically do you feel is the problem with sqlite at scale? And to be clear, I'd like to pre-empt well it doesn't do X, which you'll probably also want to do -- all those X's are already covered and working great. I just mean, what in particular do you feel about sqlite works great for a 50MB database, but doesn't work at a 50GB database? I'm very open to being convinced -- and you'd be the person to convince me. But I don't like to make decisions based on vague fears. Best practices often aren't. Because in practice, as someone actually doing it (as opposed to theorizing about it), it works great. The MySQL portions of our service are always in a semi-constant state of emergency, while our sqlite portions just hum along And given that we're switching to SSDs, I expect they will hum even better. What problems would you expect me to be seeing that I can happily report I'm not, or what problems have I not yet encountered but will -- at 100GB, or 1TB? Thanks! -david PS: Also, if anybody does know anything about MySQL fragmentation, I'd still love some pointers. I'm not sure how my casual request became interpreted as an obsession, but either way, I'd still love the benefit of your knowledge. MySQL is a good choice. But here is another data point to consider: When we were writing the SqlLogicTest test suite for SQLite, we ran the test vectors on a wide variety of server-class database engines in addition to SQLite. And in every case (including SQLite) we found cases that would crash the server. Every case, that is, except one. We were never able to crash PostgreSQL, nor find a case where PostgreSQL gave the wrong answer. Furthermore, whenever there is a question about what the behavior of some obscure SQL construct ought to be and whether or not SQLite is doing it right, usually the first thing we check is how PostgreSQL responds to the same query. When in doubt, we try to get SQLite to do the same thing as PostgreSQL. Far be it from me to recommend one client/server database engine over another. But in my experience. well, you can fill in the rest, probably... On Sun, Oct 28, 2012 at 10:48 AM, David Barrett dbarr...@expensify.com wrote: Wow, I didn't realize this was such a controversial question. I'm a huge sqlite fan. Expensify is built on sqlite. We have a 40GB database, replicated using our custom distributed transaction layer across 5 severs in three different datacenters.[1] It's been powering all of Expensify (including our direct deposit reimbursement engine and credit card import layer -- both of which contain incredibly sensitive information, with mistakes causing millions of dollars to move in the wrong direction). On the back of sqlite, we've grown to over million users, processing millions of dollars in expense reports every day. However, we're starting to see problems. There is so much activity on some servers that there is never a chance for our checkpointing thread to do its thing, so our WAL file often ballons up to 30GB or more. This makes query times plummet. We regularly checkpoint manually, and often vacuum, all in an effort to keep queries moving quick. We also do things to trick out our indexes in order to ensure proper disk ordering, pay particular attention to block and cache amounts, etc. This isn't premature optimization for the sake of having fun, these are in response to real performance problems affecting our product. In light of that, there is a contingent pushing to drop sqlite in favor of MySQL. There are a wide range of reasons -- it has its own replication, better write concurrency, clustered indexes, and better edge-case data integrity (because we use 2 DBs and WAL, ATTACH doesn't provide atomic commit advantages). And for each I have a corresponding answer -- MySQL's replication isn't as good as ours, concurrency doesn't matter because we serialize writes and have a single threaded server anyway, clustered indexes would be nice but we can get close enough with custom ROWIDs, and the extremely rare situation where there's a cross-database integrity problem, we can detect and recover from any of the other slaves. And I also add in that sqlite can never crash because it's built into the server; its performance is fantastic because it runs in
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On 29 Oct 2012, at 9:58am, David Barrett dbarr...@expensify.com wrote: So what specifically do you feel is the problem with sqlite at scale? I think it might help if you asked that question with particular reference to the points in http://www.sqlite.org/whentouse.html particularly the points in the section _Situations Where Another RDBMS May Work Better_. Or perhaps something on that page might answer your question. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On Mon, Oct 29, 2012 at 5:58 AM, David Barrett dbarr...@expensify.comwrote: On Mon, Oct 29, 2012 at 2:16 AM, Richard Hipp d...@sqlite.org wrote: It sounds like you are pushing SQLite well beyond what it was intended to do. Remember the motto: SQLite is not intended to replace Oracle, it is intended to replace fopen(). SQLite does a great job for roles such as data storage for a desktop application, or for databases in cellphones or other gadgets, or as a local cache to a enterprise network database. But SQLite was never designed or intended to replace an enterprise-level RDBMS. And I think you probably need an enterprise-level RDBMS at this point. So what specifically do you feel is the problem with sqlite at scale? And to be clear, I'd like to pre-empt well it doesn't do X, which you'll probably also want to do -- all those X's are already covered and working great. I just mean, what in particular do you feel about sqlite works great for a 50MB database, but doesn't work at a 50GB database? I'm very open to being convinced -- and you'd be the person to convince me. But I don't like to make decisions based on vague fears. Best practices often aren't. Because in practice, as someone actually doing it (as opposed to theorizing about it), it works great. The MySQL portions of our service are always in a semi-constant state of emergency, while our sqlite portions just hum along And given that we're switching to SSDs, I expect they will hum even better. What problems would you expect me to be seeing that I can happily report I'm not, or what problems have I not yet encountered but will -- at 100GB, or 1TB? Generally the argument in favor of client/server databases versus SQLite comes down to (1) Concurrency, (2) Replication, and (3) Network access. The size of the database file shouldn't really be a factor. Or it least it hasn't as far as we know. On the other hand, we haven't heard from people pushing terabyte databases into SQLite before SQLite supports some concurrency (multiple readers, one writer) but not as much as typical client/server databases. SQLite is optimized more for the single-user case. SQLite has its backup API, which provides some basic replication capabilities. But it isn't the same thing has having a full-blown real-time replicator like you find in most client/server installations. And finally, SQLite, really really prefers to have its content on the local disk. Any application that uses a database can be conceptually divided into three pieces: (A) the application (B) the database engine and (C) the content on disk. If you have to cross a network to go between A and C (if the content is on a different machine from where the application is running) then it is better to cross that network at the A-B junction rather than at the B-C junction because the A-B junction requires less bandwidth. Client/server database engines use the A-B junction whereas SQLite on a network filesystem uses the B-C junction. Thanks! -david PS: Also, if anybody does know anything about MySQL fragmentation, I'd still love some pointers. I'm not sure how my casual request became interpreted as an obsession, but either way, I'd still love the benefit of your knowledge. MySQL is a good choice. But here is another data point to consider: When we were writing the SqlLogicTest test suite for SQLite, we ran the test vectors on a wide variety of server-class database engines in addition to SQLite. And in every case (including SQLite) we found cases that would crash the server. Every case, that is, except one. We were never able to crash PostgreSQL, nor find a case where PostgreSQL gave the wrong answer. Furthermore, whenever there is a question about what the behavior of some obscure SQL construct ought to be and whether or not SQLite is doing it right, usually the first thing we check is how PostgreSQL responds to the same query. When in doubt, we try to get SQLite to do the same thing as PostgreSQL. Far be it from me to recommend one client/server database engine over another. But in my experience. well, you can fill in the rest, probably... On Sun, Oct 28, 2012 at 10:48 AM, David Barrett dbarr...@expensify.com wrote: Wow, I didn't realize this was such a controversial question. I'm a huge sqlite fan. Expensify is built on sqlite. We have a 40GB database, replicated using our custom distributed transaction layer across 5 severs in three different datacenters.[1] It's been powering all of Expensify (including our direct deposit reimbursement engine and credit card import layer -- both of which contain incredibly sensitive information, with mistakes causing millions of dollars to move in the wrong direction). On the back of sqlite, we've grown to over million users, processing millions of dollars in expense reports every day. However, we're
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
Hi David, On 29.10.2012 11:58, David Barrett wrote: Because in practice, as someone actually doing it (as opposed to theorizing about it), it works great. The MySQL portions of our service are always in a semi-constant state of emergency, while our sqlite portions just hum along And given that we're switching to SSDs, I expect they will hum even better. What problems would you expect me to be seeing that I can happily report I'm not, or what problems have I not yet encountered but will -- at 100GB, or 1TB? In your previous thread (2012-02), you have mentioned that you are about to open-source your replication method based on SQL statement distribution. Probably your work would be of interest for a huge number of sites managing data volumes around or bellow your current level, even if you switch to PostgreSQL at this point. IMHO, there might be a future for your replication model, because I think that SQLite, can more easily (relative to other proven DB technologies e.g. PostgreSQL) be turned to DB engine for more query languages than SQL (thanks to his clever VM design). Furthermore, AFAIK, PostgreSQL replicates at WAL distribution level, most NoSQL databases at keys distribution level, whereas your method seems more efficient as bandwidth. Kind Regards, Alek ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
I completely understand the wide and varied differences. I'm just *also* interested in this very specific issue. -david On Sun, Oct 28, 2012 at 12:38 PM, Pavel Ivanov paiva...@gmail.com wrote: That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. This is far from main differences between sqlite and mysql that you should consider if you want to choose between them unless of course your question is just about an academic interest. As you are talking about employees I guess you are not in some purely academic exercise. In this case think more about in-process code vs pumping through TCP/IP stack, designed mostly to be accessible from machine-local processes only vs accessible to anyone on the network, plain access to everything vs versatile and complicated authorization and authentication mechanisms, and so on and so forth. Database format is never a part of the decision which DBMS you want to use. Pavel On Sat, Oct 27, 2012 at 9:32 PM, David Barrett dbarr...@expensify.com wrote: Thanks, this is really helpful! (And I lecture my employees about the evils of premature optimization all the time. In fact, I'll lecture anyone in earshot, so frequently that it's the butt of jokes.) That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. I often get in debates about sqlite versus other datbases, and I'm always eager to be informed. Thanks! -david On Sat, Oct 27, 2012 at 11:42 PM, Simon Slavin slav...@bigfraud.org wrote: On 27 Oct 2012, at 11:38am, David Barrett dbarr...@expensify.com wrote: I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Rather than answer your questions point-by-point, I'm going to give you the current state of play. Your understanding of how the various DBMSes work is right, but your excellent question in a world with SSDs and a tremendous amount of RAM, does vacuuming matter nearly as much as on a spinning disk with constrained RAM? cuts to the heart of all your other points. The following involves a little simplification and handwaving because otherwise it would be two chapters long and you'd have to do homework. Fragmentation made a big difference to apps running on Windows, but very little on any other platform. This is because Windows does something called 'read-ahead caching' which assumes that if you read block B, you're soon going to want to read block B+1, so at quite a low level it helpfully pre-reads it for you. Other operating systems don't make this assumption. This is why Windows users talk about defragmentation so much, but Unix users don't care about it. SSDs negate the whole point of defragmentation. On a rotational disk it's faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random blocks from the disk, because the read heads are already positioned in the right place, and the disk is going to rotate to show those five blocks in order. SSDs are just like RAM: they're Random Access Memory. Reading any five blocks in any order takes roughly the same time. So nobody cares about fragmentation on an SSD. Read whatever blocks you want in whatever order you want. As to clever management of disk block alignment with respect to rows and columns, this is rarely worth attention these days. The amount of programming and debugging time it takes to get this right, and the amount of extra processing and disk access you need to do, give you less return on investment than if you spent the same money on buying a faster hard disk. It's premature optimization (look up the term) except for two cases: overnight runs and realtime 3D graphics. If your overnight run takes more than one night, you have a problem. If you're programming realtime 3D graphics and they're jerky, your users won't enjoy your simulation. But you wouldn't be using a SQL engine for 3D graphics anyway. The matters you mentioned were all worth attention back in the 1980s when storage and bandwidth were expensive. As you pointed out near the end of your post, these things matter less now. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
OK. Curiosity is a good thing in certain situations. But could you kindly tell me what will you do with this information (assuming it's possible to obtain it of course)? Pavel On Sat, Oct 27, 2012 at 11:54 PM, David Barrett dbarr...@expensify.com wrote: I completely understand the wide and varied differences. I'm just *also* interested in this very specific issue. -david On Sun, Oct 28, 2012 at 12:38 PM, Pavel Ivanov paiva...@gmail.com wrote: That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. This is far from main differences between sqlite and mysql that you should consider if you want to choose between them unless of course your question is just about an academic interest. As you are talking about employees I guess you are not in some purely academic exercise. In this case think more about in-process code vs pumping through TCP/IP stack, designed mostly to be accessible from machine-local processes only vs accessible to anyone on the network, plain access to everything vs versatile and complicated authorization and authentication mechanisms, and so on and so forth. Database format is never a part of the decision which DBMS you want to use. Pavel On Sat, Oct 27, 2012 at 9:32 PM, David Barrett dbarr...@expensify.com wrote: Thanks, this is really helpful! (And I lecture my employees about the evils of premature optimization all the time. In fact, I'll lecture anyone in earshot, so frequently that it's the butt of jokes.) That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. I often get in debates about sqlite versus other datbases, and I'm always eager to be informed. Thanks! -david On Sat, Oct 27, 2012 at 11:42 PM, Simon Slavin slav...@bigfraud.org wrote: On 27 Oct 2012, at 11:38am, David Barrett dbarr...@expensify.com wrote: I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Rather than answer your questions point-by-point, I'm going to give you the current state of play. Your understanding of how the various DBMSes work is right, but your excellent question in a world with SSDs and a tremendous amount of RAM, does vacuuming matter nearly as much as on a spinning disk with constrained RAM? cuts to the heart of all your other points. The following involves a little simplification and handwaving because otherwise it would be two chapters long and you'd have to do homework. Fragmentation made a big difference to apps running on Windows, but very little on any other platform. This is because Windows does something called 'read-ahead caching' which assumes that if you read block B, you're soon going to want to read block B+1, so at quite a low level it helpfully pre-reads it for you. Other operating systems don't make this assumption. This is why Windows users talk about defragmentation so much, but Unix users don't care about it. SSDs negate the whole point of defragmentation. On a rotational disk it's faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random blocks from the disk, because the read heads are already positioned in the right place, and the disk is going to rotate to show those five blocks in order. SSDs are just like RAM: they're Random Access Memory. Reading any five blocks in any order takes roughly the same time. So nobody cares about fragmentation on an SSD. Read whatever blocks you want in whatever order you want. As to clever management of disk block alignment with respect to rows and columns, this is rarely worth attention these days. The amount of programming and debugging time it takes to get this right, and the amount of extra processing and disk access you need to do, give you less return on investment than if you spent the same money on buying a faster hard disk. It's premature optimization (look up the term) except for two cases: overnight runs and realtime 3D graphics. If your overnight run takes more than one night, you have a problem. If you're programming realtime 3D graphics and they're jerky, your users won't enjoy your simulation. But you wouldn't be using a SQL engine for 3D graphics anyway. The matters you mentioned were all worth attention back in the 1980s when storage and bandwidth were expensive. As you pointed out near the end of your post, these things matter less now. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
Wow, I didn't realize this was such a controversial question. I'm a huge sqlite fan. Expensify is built on sqlite. We have a 40GB database, replicated using our custom distributed transaction layer across 5 severs in three different datacenters.[1] It's been powering all of Expensify (including our direct deposit reimbursement engine and credit card import layer -- both of which contain incredibly sensitive information, with mistakes causing millions of dollars to move in the wrong direction). On the back of sqlite, we've grown to over million users, processing millions of dollars in expense reports every day. However, we're starting to see problems. There is so much activity on some servers that there is never a chance for our checkpointing thread to do its thing, so our WAL file often ballons up to 30GB or more. This makes query times plummet. We regularly checkpoint manually, and often vacuum, all in an effort to keep queries moving quick. We also do things to trick out our indexes in order to ensure proper disk ordering, pay particular attention to block and cache amounts, etc. This isn't premature optimization for the sake of having fun, these are in response to real performance problems affecting our product. In light of that, there is a contingent pushing to drop sqlite in favor of MySQL. There are a wide range of reasons -- it has its own replication, better write concurrency, clustered indexes, and better edge-case data integrity (because we use 2 DBs and WAL, ATTACH doesn't provide atomic commit advantages). And for each I have a corresponding answer -- MySQL's replication isn't as good as ours, concurrency doesn't matter because we serialize writes and have a single threaded server anyway, clustered indexes would be nice but we can get close enough with custom ROWIDs, and the extremely rare situation where there's a cross-database integrity problem, we can detect and recover from any of the other slaves. And I also add in that sqlite can never crash because it's built into the server; its performance is fantastic because it runs in the same process; in years of operation we've never once seen it corrupt data; it's so easy to use; etc etc. But there's an argument I've heard come up to which I don't have a response: MySQL handles fragmentation better, and by extension would give us better performance on the same hardware. I'd like to know more about it, which is why I've asked. Thanks! -david [1] http://permalink.gmane.org/gmane.comp.db.sqlite.general/71868 On Sun, Oct 28, 2012 at 2:12 PM, Pavel Ivanov paiva...@gmail.com wrote: OK. Curiosity is a good thing in certain situations. But could you kindly tell me what will you do with this information (assuming it's possible to obtain it of course)? Pavel On Sat, Oct 27, 2012 at 11:54 PM, David Barrett dbarr...@expensify.com wrote: I completely understand the wide and varied differences. I'm just *also* interested in this very specific issue. -david On Sun, Oct 28, 2012 at 12:38 PM, Pavel Ivanov paiva...@gmail.com wrote: That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. This is far from main differences between sqlite and mysql that you should consider if you want to choose between them unless of course your question is just about an academic interest. As you are talking about employees I guess you are not in some purely academic exercise. In this case think more about in-process code vs pumping through TCP/IP stack, designed mostly to be accessible from machine-local processes only vs accessible to anyone on the network, plain access to everything vs versatile and complicated authorization and authentication mechanisms, and so on and so forth. Database format is never a part of the decision which DBMS you want to use. Pavel On Sat, Oct 27, 2012 at 9:32 PM, David Barrett dbarr...@expensify.com wrote: Thanks, this is really helpful! (And I lecture my employees about the evils of premature optimization all the time. In fact, I'll lecture anyone in earshot, so frequently that it's the butt of jokes.) That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. I often get in debates about sqlite versus other datbases, and I'm always eager to be informed. Thanks! -david On Sat, Oct 27, 2012 at 11:42 PM, Simon Slavin slav...@bigfraud.org wrote: On 27 Oct 2012, at 11:38am, David Barrett dbarr...@expensify.com wrote: I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Rather than answer your questions point-by-point, I'm going to give you the current state of play. Your understanding of how the various DBMSes work is right, but your excellent question in a world
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On 28 Oct 2012, at 2:48pm, David Barrett dbarr...@expensify.com wrote: Wow, I didn't realize this was such a controversial question. Not particularly controversial, just complicated, and not subject to a good explanation other than reading lots of documentation about both engines. Your description of your setup suggests two thing: first you're obsessed with fragmentation when it has only a minor part to play in your problems, and second that you should move to a database engine with server/client organisation rather than trying to use SQLite in multi-user mode. Any description which includes 'often vacuum' suggests you're using the wrong tool for the job. Whether you'd be best with MySQL or ProgreSQL is another matter. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On 10/28/12 10:58 AM, Simon Slavin wrote: On 28 Oct 2012, at 2:48pm, David Barrett dbarr...@expensify.com wrote: Wow, I didn't realize this was such a controversial question. Not particularly controversial, just complicated, and not subject to a good explanation other than reading lots of documentation about both engines. Your description of your setup suggests two thing: first you're obsessed with fragmentation when it has only a minor part to play in your problems, and second that you should move to a database engine with server/client organisation rather than trying to use SQLite in multi-user mode. Any description which includes 'often vacuum' suggests you're using the wrong tool for the job. Whether you'd be best with MySQL or ProgreSQL is another matter. Simon. I agree with Simon. I don't see that fragmentation is the issue here. Bob ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
Thank you. This is what I wanted to hear. And as you already saw from responses, fragmentation is far from your main problem. I'd like to point to one particular issue: However, we're starting to see problems. There is so much activity on some servers that there is never a chance for our checkpointing thread to do its thing, so our WAL file often ballons up to 30GB or more. This makes query times plummet. Looking at this problem alone I'd say SQLite is not the right tool for you. At least at the scale you are working now. And I don't know all your arguments but I hope you are arguing not just because you are a fan of SQLite and don't want to move away from it. Pavel On Sun, Oct 28, 2012 at 7:48 AM, David Barrett dbarr...@expensify.com wrote: Wow, I didn't realize this was such a controversial question. I'm a huge sqlite fan. Expensify is built on sqlite. We have a 40GB database, replicated using our custom distributed transaction layer across 5 severs in three different datacenters.[1] It's been powering all of Expensify (including our direct deposit reimbursement engine and credit card import layer -- both of which contain incredibly sensitive information, with mistakes causing millions of dollars to move in the wrong direction). On the back of sqlite, we've grown to over million users, processing millions of dollars in expense reports every day. However, we're starting to see problems. There is so much activity on some servers that there is never a chance for our checkpointing thread to do its thing, so our WAL file often ballons up to 30GB or more. This makes query times plummet. We regularly checkpoint manually, and often vacuum, all in an effort to keep queries moving quick. We also do things to trick out our indexes in order to ensure proper disk ordering, pay particular attention to block and cache amounts, etc. This isn't premature optimization for the sake of having fun, these are in response to real performance problems affecting our product. In light of that, there is a contingent pushing to drop sqlite in favor of MySQL. There are a wide range of reasons -- it has its own replication, better write concurrency, clustered indexes, and better edge-case data integrity (because we use 2 DBs and WAL, ATTACH doesn't provide atomic commit advantages). And for each I have a corresponding answer -- MySQL's replication isn't as good as ours, concurrency doesn't matter because we serialize writes and have a single threaded server anyway, clustered indexes would be nice but we can get close enough with custom ROWIDs, and the extremely rare situation where there's a cross-database integrity problem, we can detect and recover from any of the other slaves. And I also add in that sqlite can never crash because it's built into the server; its performance is fantastic because it runs in the same process; in years of operation we've never once seen it corrupt data; it's so easy to use; etc etc. But there's an argument I've heard come up to which I don't have a response: MySQL handles fragmentation better, and by extension would give us better performance on the same hardware. I'd like to know more about it, which is why I've asked. Thanks! -david [1] http://permalink.gmane.org/gmane.comp.db.sqlite.general/71868 On Sun, Oct 28, 2012 at 2:12 PM, Pavel Ivanov paiva...@gmail.com wrote: OK. Curiosity is a good thing in certain situations. But could you kindly tell me what will you do with this information (assuming it's possible to obtain it of course)? Pavel On Sat, Oct 27, 2012 at 11:54 PM, David Barrett dbarr...@expensify.com wrote: I completely understand the wide and varied differences. I'm just *also* interested in this very specific issue. -david On Sun, Oct 28, 2012 at 12:38 PM, Pavel Ivanov paiva...@gmail.com wrote: That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. This is far from main differences between sqlite and mysql that you should consider if you want to choose between them unless of course your question is just about an academic interest. As you are talking about employees I guess you are not in some purely academic exercise. In this case think more about in-process code vs pumping through TCP/IP stack, designed mostly to be accessible from machine-local processes only vs accessible to anyone on the network, plain access to everything vs versatile and complicated authorization and authentication mechanisms, and so on and so forth. Database format is never a part of the decision which DBMS you want to use. Pavel On Sat, Oct 27, 2012 at 9:32 PM, David Barrett dbarr...@expensify.com wrote: Thanks, this is really helpful! (And I lecture my employees about the evils of premature optimization all the time. In fact, I'll lecture anyone in earshot, so frequently that it's the butt of
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
It sounds like you are pushing SQLite well beyond what it was intended to do. Remember the motto: SQLite is not intended to replace Oracle, it is intended to replace fopen(). SQLite does a great job for roles such as data storage for a desktop application, or for databases in cellphones or other gadgets, or as a local cache to a enterprise network database. But SQLite was never designed or intended to replace an enterprise-level RDBMS. And I think you probably need an enterprise-level RDBMS at this point. MySQL is a good choice. But here is another data point to consider: When we were writing the SqlLogicTest test suite for SQLite, we ran the test vectors on a wide variety of server-class database engines in addition to SQLite. And in every case (including SQLite) we found cases that would crash the server. Every case, that is, except one. We were never able to crash PostgreSQL, nor find a case where PostgreSQL gave the wrong answer. Furthermore, whenever there is a question about what the behavior of some obscure SQL construct ought to be and whether or not SQLite is doing it right, usually the first thing we check is how PostgreSQL responds to the same query. When in doubt, we try to get SQLite to do the same thing as PostgreSQL. Far be it from me to recommend one client/server database engine over another. But in my experience. well, you can fill in the rest, probably... On Sun, Oct 28, 2012 at 10:48 AM, David Barrett dbarr...@expensify.comwrote: Wow, I didn't realize this was such a controversial question. I'm a huge sqlite fan. Expensify is built on sqlite. We have a 40GB database, replicated using our custom distributed transaction layer across 5 severs in three different datacenters.[1] It's been powering all of Expensify (including our direct deposit reimbursement engine and credit card import layer -- both of which contain incredibly sensitive information, with mistakes causing millions of dollars to move in the wrong direction). On the back of sqlite, we've grown to over million users, processing millions of dollars in expense reports every day. However, we're starting to see problems. There is so much activity on some servers that there is never a chance for our checkpointing thread to do its thing, so our WAL file often ballons up to 30GB or more. This makes query times plummet. We regularly checkpoint manually, and often vacuum, all in an effort to keep queries moving quick. We also do things to trick out our indexes in order to ensure proper disk ordering, pay particular attention to block and cache amounts, etc. This isn't premature optimization for the sake of having fun, these are in response to real performance problems affecting our product. In light of that, there is a contingent pushing to drop sqlite in favor of MySQL. There are a wide range of reasons -- it has its own replication, better write concurrency, clustered indexes, and better edge-case data integrity (because we use 2 DBs and WAL, ATTACH doesn't provide atomic commit advantages). And for each I have a corresponding answer -- MySQL's replication isn't as good as ours, concurrency doesn't matter because we serialize writes and have a single threaded server anyway, clustered indexes would be nice but we can get close enough with custom ROWIDs, and the extremely rare situation where there's a cross-database integrity problem, we can detect and recover from any of the other slaves. And I also add in that sqlite can never crash because it's built into the server; its performance is fantastic because it runs in the same process; in years of operation we've never once seen it corrupt data; it's so easy to use; etc etc. But there's an argument I've heard come up to which I don't have a response: MySQL handles fragmentation better, and by extension would give us better performance on the same hardware. I'd like to know more about it, which is why I've asked. Thanks! -david [1] http://permalink.gmane.org/gmane.comp.db.sqlite.general/71868 On Sun, Oct 28, 2012 at 2:12 PM, Pavel Ivanov paiva...@gmail.com wrote: OK. Curiosity is a good thing in certain situations. But could you kindly tell me what will you do with this information (assuming it's possible to obtain it of course)? Pavel On Sat, Oct 27, 2012 at 11:54 PM, David Barrett dbarr...@expensify.com wrote: I completely understand the wide and varied differences. I'm just *also* interested in this very specific issue. -david On Sun, Oct 28, 2012 at 12:38 PM, Pavel Ivanov paiva...@gmail.com wrote: That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. This is far from main differences between sqlite and mysql that you should consider if you want to choose between them unless of course your question is just about an academic interest. As you are talking about employees
[sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Can anybody summarize for me the differences? 1) My understanding is sqlite, in general, has no automatic defragmentation: there is no process to gradually and incrementally reposition data on disk in index order. (Even auto-VACUUM just reclaims deleted data by inserting new transactions in the holes left behind by previous deletes, which the documentation says can actually worsen fragmentation.) The only defragmentation option is a full VACUUM, which rebuilds the entire database from scratch in index order. During this period, the whole database is locked and cannot be used. Is this roughly right? 2) My understanding of MySQL is that the equivalent function is to OPTIMIZE TABLE. Unlike sqlite, which works on a full database, this works on a single table, but is otherwise the same -- rebuilds the entire table (and thus needs 2x the disk space), locks the table during the operation, is not incremental (either works fully or not at all). Is this about right? 3) I've heard mention that in some configurations (perhaps some database engines?) MySQL doesn't need vacuuming for some reason -- somehow it just doesn't fragment, or resolves fragmentation as it goes -- but I can't track down an explanation of why. Any ideas? 4) Does MySQL (or sqlite?) have any fancy defragmentation capabilities, such as (for example) using a clustered primary index where no two secondary rows are put into the same block? (For example, imagine a table containing historical credit card transactions, where transactions are grouped first by cardID and then each card has a sequential list of rows identified by transactionID. Every night, new transactions are added to each card, meaning new data would ideally be inserted throughout the entire table, as opposed to just adding at the very end. I could imagine a clustered index ordering all transactions for a single card back-to-back on disk, except skipping to the next full database block before adding transactions for the next card. This would intentionally leave space free in the database for additional transactions to be added to each card in disk-sequential-order, explicitly avoiding transactions from multiple cards being stored in the same block. This wouldn't be a complete solution -- the database blocks for a particular card wouldn't necessarily be sequential -- but the transactions inside each block *would* be sequential, and no two blocks would contain transactions from different cards. Does this make sens? Does anything like this exist?) 4) My understanding is disk order matters a lot for spinning disks, as it reduces seek times when dealing with data in neighboring primary keys. However, am I correct in assuming that the necessity for this is dramatically reduced by SSDs, which allow random access? Granted, vacuuming should always help to a degree: if your rows are smaller than the block size, then it's better to have neighboring rows in the same block (so you needn't read multiple blocks to get the same data). And it's always good to ensure your database block size is the same as the disk block size (whether SSD or spinning) so you only need to read a single disk block per database block. But in a world with SSDs and a tremendous amount of RAM, does vacuuming matter nearly as much as on a spinning disk with constrained RAM? Thanks, I really appreciate your thoughts and links to reading material! -david ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On 27 Oct 2012, at 11:38am, David Barrett dbarr...@expensify.com wrote: I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Rather than answer your questions point-by-point, I'm going to give you the current state of play. Your understanding of how the various DBMSes work is right, but your excellent question in a world with SSDs and a tremendous amount of RAM, does vacuuming matter nearly as much as on a spinning disk with constrained RAM? cuts to the heart of all your other points. The following involves a little simplification and handwaving because otherwise it would be two chapters long and you'd have to do homework. Fragmentation made a big difference to apps running on Windows, but very little on any other platform. This is because Windows does something called 'read-ahead caching' which assumes that if you read block B, you're soon going to want to read block B+1, so at quite a low level it helpfully pre-reads it for you. Other operating systems don't make this assumption. This is why Windows users talk about defragmentation so much, but Unix users don't care about it. SSDs negate the whole point of defragmentation. On a rotational disk it's faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random blocks from the disk, because the read heads are already positioned in the right place, and the disk is going to rotate to show those five blocks in order. SSDs are just like RAM: they're Random Access Memory. Reading any five blocks in any order takes roughly the same time. So nobody cares about fragmentation on an SSD. Read whatever blocks you want in whatever order you want. As to clever management of disk block alignment with respect to rows and columns, this is rarely worth attention these days. The amount of programming and debugging time it takes to get this right, and the amount of extra processing and disk access you need to do, give you less return on investment than if you spent the same money on buying a faster hard disk. It's premature optimization (look up the term) except for two cases: overnight runs and realtime 3D graphics. If your overnight run takes more than one night, you have a problem. If you're programming realtime 3D graphics and they're jerky, your users won't enjoy your simulation. But you wouldn't be using a SQL engine for 3D graphics anyway. The matters you mentioned were all worth attention back in the 1980s when storage and bandwidth were expensive. As you pointed out near the end of your post, these things matter less now. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
On Sat, Oct 27, 2012 at 6:38 AM, David Barrett dbarr...@expensify.comwrote: I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Can anybody summarize for me the differences? 1) My understanding is sqlite, in general, has no automatic defragmentation: there is no process to gradually and incrementally reposition data on disk in index order. (Even auto-VACUUM just reclaims deleted data by inserting new transactions in the holes left behind by previous deletes, which the documentation says can actually worsen fragmentation.) The only defragmentation option is a full VACUUM, which rebuilds the entire database from scratch in index order. During this period, the whole database is locked and cannot be used. Is this roughly right? Roughly. SQLite does make some effort to keep things in index order as changes are made. If SQLite needs a new database page because of new data being inserted, it looks for a free page in approximately the right spot. But SQLite never moves pages around in order to put them all in the right order (apart from VACUUM) since that involves extra I/O and slows down the INSERT. In practice, though, fragmentation can still occur. Depending on your workload. In WAL mode, a VACUUM can be ongoing while there are other readers. This will lead to a *-wal file that is as big or bigger than the original database, however. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
Thanks, this is really helpful! (And I lecture my employees about the evils of premature optimization all the time. In fact, I'll lecture anyone in earshot, so frequently that it's the butt of jokes.) That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. I often get in debates about sqlite versus other datbases, and I'm always eager to be informed. Thanks! -david On Sat, Oct 27, 2012 at 11:42 PM, Simon Slavin slav...@bigfraud.org wrote: On 27 Oct 2012, at 11:38am, David Barrett dbarr...@expensify.com wrote: I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Rather than answer your questions point-by-point, I'm going to give you the current state of play. Your understanding of how the various DBMSes work is right, but your excellent question in a world with SSDs and a tremendous amount of RAM, does vacuuming matter nearly as much as on a spinning disk with constrained RAM? cuts to the heart of all your other points. The following involves a little simplification and handwaving because otherwise it would be two chapters long and you'd have to do homework. Fragmentation made a big difference to apps running on Windows, but very little on any other platform. This is because Windows does something called 'read-ahead caching' which assumes that if you read block B, you're soon going to want to read block B+1, so at quite a low level it helpfully pre-reads it for you. Other operating systems don't make this assumption. This is why Windows users talk about defragmentation so much, but Unix users don't care about it. SSDs negate the whole point of defragmentation. On a rotational disk it's faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random blocks from the disk, because the read heads are already positioned in the right place, and the disk is going to rotate to show those five blocks in order. SSDs are just like RAM: they're Random Access Memory. Reading any five blocks in any order takes roughly the same time. So nobody cares about fragmentation on an SSD. Read whatever blocks you want in whatever order you want. As to clever management of disk block alignment with respect to rows and columns, this is rarely worth attention these days. The amount of programming and debugging time it takes to get this right, and the amount of extra processing and disk access you need to do, give you less return on investment than if you spent the same money on buying a faster hard disk. It's premature optimization (look up the term) except for two cases: overnight runs and realtime 3D graphics. If your overnight run takes more than one night, you have a problem. If you're programming realtime 3D graphics and they're jerky, your users won't enjoy your simulation. But you wouldn't be using a SQL engine for 3D graphics anyway. The matters you mentioned were all worth attention back in the 1980s when storage and bandwidth were expensive. As you pointed out near the end of your post, these things matter less now. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?
That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. This is far from main differences between sqlite and mysql that you should consider if you want to choose between them unless of course your question is just about an academic interest. As you are talking about employees I guess you are not in some purely academic exercise. In this case think more about in-process code vs pumping through TCP/IP stack, designed mostly to be accessible from machine-local processes only vs accessible to anyone on the network, plain access to everything vs versatile and complicated authorization and authentication mechanisms, and so on and so forth. Database format is never a part of the decision which DBMS you want to use. Pavel On Sat, Oct 27, 2012 at 9:32 PM, David Barrett dbarr...@expensify.com wrote: Thanks, this is really helpful! (And I lecture my employees about the evils of premature optimization all the time. In fact, I'll lecture anyone in earshot, so frequently that it's the butt of jokes.) That said, I'd still welcome any quick summary of the differences between sqlite and mysql when it comes to fragmentation. I often get in debates about sqlite versus other datbases, and I'm always eager to be informed. Thanks! -david On Sat, Oct 27, 2012 at 11:42 PM, Simon Slavin slav...@bigfraud.org wrote: On 27 Oct 2012, at 11:38am, David Barrett dbarr...@expensify.com wrote: I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Rather than answer your questions point-by-point, I'm going to give you the current state of play. Your understanding of how the various DBMSes work is right, but your excellent question in a world with SSDs and a tremendous amount of RAM, does vacuuming matter nearly as much as on a spinning disk with constrained RAM? cuts to the heart of all your other points. The following involves a little simplification and handwaving because otherwise it would be two chapters long and you'd have to do homework. Fragmentation made a big difference to apps running on Windows, but very little on any other platform. This is because Windows does something called 'read-ahead caching' which assumes that if you read block B, you're soon going to want to read block B+1, so at quite a low level it helpfully pre-reads it for you. Other operating systems don't make this assumption. This is why Windows users talk about defragmentation so much, but Unix users don't care about it. SSDs negate the whole point of defragmentation. On a rotational disk it's faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random blocks from the disk, because the read heads are already positioned in the right place, and the disk is going to rotate to show those five blocks in order. SSDs are just like RAM: they're Random Access Memory. Reading any five blocks in any order takes roughly the same time. So nobody cares about fragmentation on an SSD. Read whatever blocks you want in whatever order you want. As to clever management of disk block alignment with respect to rows and columns, this is rarely worth attention these days. The amount of programming and debugging time it takes to get this right, and the amount of extra processing and disk access you need to do, give you less return on investment than if you spent the same money on buying a faster hard disk. It's premature optimization (look up the term) except for two cases: overnight runs and realtime 3D graphics. If your overnight run takes more than one night, you have a problem. If you're programming realtime 3D graphics and they're jerky, your users won't enjoy your simulation. But you wouldn't be using a SQL engine for 3D graphics anyway. The matters you mentioned were all worth attention back in the 1980s when storage and bandwidth were expensive. As you pointed out near the end of your post, these things matter less now. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users