Sorry, that was not meant to be sent to the list...
On Tue, Apr 28, 2009 at 3:27 PM, tim robertson <timrobertson...@gmail.com> wrote: > Hi, > > [Ankur]: How can make sure this happens? > -- show processlist is how we spot it... literally it takes hours on > our set up so easy to find. > > So we ended up with 2 DBs > - DB1 we insert to, prepare and do batch processing > - DB2 serving the read only web app > > Periodically we dump the DB1, point the read-only web app at DB1, load > the dump into DB2, and then DB2 becomes the write DB... and an endless > cycle. We can afford data latency in our system though. > > Why did we end up with this? Because of locking on writes that kill > reads as you say... basically you can't insert when a read is > happening on myisam as it locks the whole table. InnoDB has row level > locking to get around this but in our experience (at the time we had > 130million records) it just didn't work either. We spent €10,000 for > the supposed "european expert" on mysql from their professional > services and were unfortunately very disappointed. Seems such large > tables are just problematic with mysql. We are now very much looking > into Lucene technologies for search and Hadoop for reporting and > datamining type operations. SOLR does a lot of what our DB does for > us. > > So with myisam... here is what we learnt: > > Only very latest mysql versions (beta still I think) support more than > 4G memory for indexes (you really really need the index in memory, and > where possible the FK for joins in the index too). Mysql has > differing join strategies between innoDB and myisam, so be aware. > An undocumented feature of myisam is you can create memory buffers for > single indexes: > In the my.cnf: > taxon_concept_cache.key_buffer_size=3990M -- for some reason > you have to drop a little under 4G > > then in the DB run: > cache index taxon_concept in taxon_concept_cache; > load index into cache taxon_concept; > > This allows for making sure an index gets into memory for sure. > > And here is our main cnf changes: > > table_cache = 512 > sort_buffer_size = 128M > read_buffer_size = 64M > read_rnd_buffer_size = 300M > thread_cache_size = 8 > query_cache_type = 0 > query_cache_size = 0 > thread_concurrency = 8 > # this is used for repair by sorting and is faster than keycache by far > myisam_max_sort_file_size = 200G > myisam_max_extra_sort_file_size = 200G > # this is used for repair by sorting and is limited to 4G > myisam_sort_buffer_size = 3990M > # maximum value for this is 4g for now but see section on specific > keys for large indexes > key_buffer_size=3990M > tmp_table_size=512M > max_heap_table_size=512M > > -- for repair by sorting, here are the required values > myisam_max_sort_file_size = 200G > myisam_max_extra_sort_file_size = 200G > myisam_sort_buffer_size = 3990M > > > I guess just archive this mail off for if you ever do try myisam for > something... these params are the result of a lot of trial and error > and days of waiting! > > Our DB has about 44G of index which we can keep in memory but are > hitting limits - hence on the hadoop train. > > Cheers > > Tim > > > > > > On Tue, Apr 28, 2009 at 2:49 PM, Wang Zhong <wangzhong....@gmail.com> wrote: >> Hi, >> >> That's really cool. It seems that Hadoop could work with SQL DBs like >> Mysql with bearable time. I thought when inserting data to Mysql, the >> expense of communication was always a big problem. You got a method to >> reduce the expense. >> >> Using Distribution Databases like HBase is another good choice. It is >> based on Hadoop and HDFS, so there is no problem with communication >> expense. >> >> >> >> On Tue, Apr 28, 2009 at 6:46 PM, Ankur Goel <ankur.g...@corp.aol.com> wrote: >>> >>> hello hadoop users, >>> Recently I had a chance to lead a team building a log-processing system >>> that uses Hadoop and MySQL. The system's goal was to process the incoming >>> information as quickly as possible (real time or near real time), and make >>> it available for querying in MySQL. I thought it would be good to share the >>> experience and the challenges with the community. Couldn't think of a >>> better place than these mailing lists as I am not much of a blogger :-) >>> >>> The information flow in the system looks something like >>> >>> [Apache-Servers] -> [Hadoop] -> [MySQL-shards] -> [Query-Tools] >>> >>> Transferring from Apache-Servers to Hadoop was quite easy as we just had to >>> organize the data in timely buckets (directories). Once that was running >>> smooth we had to make sure that map-reduce jobs are fired at regular >>> intervals and they pick up the right data. The jobs would then >>> process/aggregate the date and dump the info into MySQL shards from the >>> reducers [we have our own DB partioning set up]. This is where we hit major >>> bottlenecks [any surprises? :-)] >>> >>> The table engine used was InnoDB as there was a need for fast replication >>> and writes but only moderate reads (should eventually support high read >>> rates). The data would take up quite a while to load completely far away >>> from being near-real time. And so our optimization journey begin. >>> >>> 1. We tried to optimize/tune InnoDB parameters like increasing the buffer >>> pool size to 75 % of available RAM. This helped but only till the time DBs >>> were lightly loaded i.e. innoDB had sufficient buffer pool to host the data >>> and indexes. >>> >>> 2. We also realized that InnoDB has considerable locking overhead because >>> of which write concurrency is really bad when you have a large number of >>> concurrent threads doing writes. The default thread concurrency for us was >>> set to no_of_cpu * 2 = 8 which is what the official documentation advises >>> as the optimal limit. So we limited the number of reduce tasks and >>> consequently the number of concurrent writes and boy the performance >>> improved 4x. We were almost there :-) >>> >>> 3. Next thing we tried is the standard DB optimzation techniques like >>> de-normalizing the schema and dropping constraints. This gave only a minor >>> performance improvement, nothing earth shattering. Note that we were >>> already caching connections in reducers to each MySQL shard and >>> partionining logic was embedded into reducers. >>> >>> 4. Falling still short of our performance objectives, we finally we decided >>> to get rid of JDBC writes from reducers and work on an alternative that >>> uses MySQLs LOAD utility. >>> - The processing would partition the data into MySQL shard specific files >>> resident in HDFS. >>> - A script would then spawn processes via ssh on different physical >>> machines to download this data. >>> - Each spawned process just downloads the data for the shard it should >>> upload to. >>> - All the processes then start uploading data in parallel into their >>> respective MySQL shards using LOAD DATA infile. >>> >>> This proved to be the fastest approach, even in the wake of increasing data >>> loads. The enitre processing/loading would complete in less than 6 min. The >>> system has been holding up quite well so far, even though we've had to >>> limit the number of days for which we keep the data or else the MySQLs get >>> overwhelmed. >>> >>> Hope this is helpful to people. >>> >>> Regards >>> -Ankur >>> >> >> >> >> -- >> Wang Zhong >> >