On Apr 11, 2011, at 6:45 PM, Arjen Lentz wrote: > Hi Tim, > > On 12/04/2011, at 8:36 AM, Tim Soderstrom wrote: >> On Apr 11, 2011, at 3:32 PM, Patrick Crews wrote: >> >>> We removed PBXT, CSV, blackhole, filesystem_engine, and maybe >>> archive...lack of interest and support >> >> Monty mentioned the reason for PBXT. I would be sad to see ARCHIVE go since >> it's INSERT performance is so fast - even if it's use-case is rather tiny (I >> use it for a few things). To be fair, I haven't benchmarked ARCHIVE's INSERT >> performance in some time and have never compared it to INSERT performance of >> other engines under Drizzle. > > The insert performance of ARCHIVE is a nonsense, and always has been. > > You can verify this for yourself by > a) noting that you max out the CPU, not the disk I/O. From this you can > conclude that the overhead is in the parser and other handling, not the > reduction in disk I/O that Archive delivers. > b) achieving similar performance with other engines, by properly optimising > the path between the app and the db: multi-row inserts, proper buffer > settings. Doing this, in MySQL I was able to get >340k row inserts/sec on a > single thread on MyISAM. InnoDB was lower, in the 80-120k row/sec range but > that could've used a bit more tuning. > > With the MyISAM example noted above, I was again maxing out the CPU not the > IO, thus I could even start a second insert thread on the same table (which > introduces contention on MyISAM!) and get even more performance out of the > box, before saturating the IO path. > > > In short, ARCHIVE "solved" a problem (resolving I/O saturation by using extra > CPU power) that didn't exist in the first place. > Multi-row inserts rock, and tuning is important as is understand how the > server works. > > There may be a valid case for a new bulk-load system in MySQL and/or Drizzle. > However that's only relevant for bulk stuff not for sustained inserts. I tend > to batch the latter so the multi-row inserts can be used again. > > I also found that PBXT has very good sustained insert performance, whereas > InnoDB degrades since after initial time it has to do more I/O. That's not > news really as this attribute is clear from PBXT's design. So if you need > very high sustained insert performance, you actually want PBXT.
Hmmm, I'll have to take a look at that since I ended up getting very very fast INSERTs from ARCHIVE (though I suspect most of that was the larger insert buffer before compression). It's entirely possible that I was Doing It Wrong though :) I'll take another look at some point. I agree, though that PBXT seems to be a good place for that, plus it's transactional so that's nice. Unfortunately, if it won't be in Drizzle, I'd like to stick to engines that are tangible (at least if we're talking about Drizzle here). BlitzDB might be another option too, no? Tim _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

