Yes, the rows are in primary key order, however each row contains
specific integer primary keys; I'm not inserting nulls into a table
where the primary key is auto increment, so I don't see why concurrent
inserts would fight for similar spots (although, I'm admittedly not a
MySQL hotshot, so the basis of my assumption is a *hunch* only).
I'm not sure (yet) if a single-threaded operation would run into an
i/o bottleneck. I didn't run mysqlimport using --use-threads=1 just
yet (will do if I have the time), but when I've ran it with
--use-threads=4 the import (of a ~500 MB dump) took more time than
running for different processes (I've split my tab delimited dumps
with split into four even pieces and imported those in four different
sessions).
Anyway, it seems that doing a simple import (from a dump, which isn't
tab delimited, but contains complete or extended inserts) takes the
same amount of time than doing a mysqlimport using --use-threads=4 and
as it turns out splitting my tab delimited dump is too complex to
handle gracefully, because my data contains newline characters all
over the place, so I've dropped the idea of this whole mysqlimport
thing for now. (I'll try the method of migrating an InnoDB database to
an NDBCluster described here[1] instead.)
If I have the time I'll write up a bug report, or documentation
enhancement request for this.
Thanks for the input!
Regards,
Kohányi Róbert
[1]:
http://johanandersson.blogspot.se/2012/04/mysql-cluster-how-to-load-it-with-data.html
On Wed, Jul 25, 2012 at 6:49 PM, Rick James rja...@yahoo-inc.com wrote:
I'm skeptical that use-treads can every be very effective.
What order are the rows in? They are probably in PRIMARY KEY order, which
means that the INSERTing threads will be fighting over similar spots in the
table.
Is it I/O bound when it is single-threaded? If so, then there can't be any
improvement with use-threads.
etc.
Suggest you file a bug with bugs.mysql.com. If nothing else, the
documentation should say more than it does.
-Original Message-
From: Róbert Kohányi [mailto:kohanyi.rob...@gmail.com]
Sent: Tuesday, July 24, 2012 10:52 AM
To: mysql@lists.mysql.com
Subject: mysqlimport --use-threads / mysqladmin processlist
I'm in the middle of migrating an InnoDB database to an NDBCluster. I
use mysqldump to first create two dumps, the first one contains only
the database schema, the second one contains only tab delimited data
(via mysqldump --tab). I edit my InnoDB schema here and there
(ENGINE=InnoDB to ENGINE=NDB, etc.) import it and after this I import
the InnoDB data *as is* using mysqlimport.
I use it like this:
mysqlimport --local --use-threads=4 db dir/*.txt
(dir of course cotains the tab delimited data I dumped before.)
The import starts, and I check its progress via mysqladmin, like this:
mysqladmin --sleep=1 processlist
this is what I see: http://pastebin.com/raw.php?i=M23fWVjc
Only a single process seems to be loading my data. Is this what I
*should* see, or, in my case using 4 threads, should I see four
processes? I'm not asking which one will be faster, I'm just simply
confused because I don't know what to expect. If I start four different
mysqlimport processes, each one importing different files, then I can
see four different process in the mysql processlist.
If it's matters, here is my server version (I use the official
binaries).
Server version: 5.5.25a-ndb-7.2.7-gpl MySQL Cluster Community Server
(GPL)
Regards,
Kohányi Róbert
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql