Re: Frequently corrupt tables

Kyle Hayes Thu, 18 Oct 2001 15:13:06 -0700

On Thursday 18 October 2001 12:31, Bill Adams wrote:
> Kyle Hayes wrote:
> > > I found yesterday (at the advice of this list) that adding an
> > > occasional call to "FLUSH TABLES" fixed my corruption problems.  I
> > > would do that right before the disconnect or program exit.
> >
> > What kernel are you using?  Some of the 2.4 series have... odd...
> > behavior with regards to caching.
>
> Linux host 2.2.19 #6 SMP Wed Jul 11 10:55:03 PDT 2001 i686 unknown
> 2GB Memory, 4 CPUs.
> (It happened on other systems with different kernel versions too.)


Hmm, 2.2 doesn't do SMP really well.  However, its drawbacks are limited to 
underuse of the CPUs rather than any kind of corruption or other issue.  You 
would get much better performance with 2.4, but 2.2 is probably a little more 
stable.

> > Are you using SCSI or IDE.  We've run many tests with both and not had
> > any corruption problems unless we did something whacked like pull the
> > power for the machine while it was running the test.
>
> SCSI.  (Had problem with different controllers on different systems)
>
> Three dual channel controllers, all the same:
>
> [bill@host ~/dev]$ cat /proc/scsi/aic7xxx/0
> Adaptec AIC7xxx driver version: 5.1.33/3.2.4
> Compile Options:
>   TCQ Enabled By Default : Disabled
>   AIC7XXX_PROC_STATS     : Disabled
>   AIC7XXX_RESET_DELAY    : 5
>
> Adapter Configuration:
>            SCSI Adapter: Adaptec AIC-7899 Ultra 160/m SCSI host adapter
>                            Ultra-160/m LVD/SE Wide Controller Channel A at

Is this a DAC960 or something similar?  If so, make sure you have the 
absolute latest drivers.  We have some dual processor machines with those 
controllers (or something closely related) and had to do many driver updates 
before it stabilized.  And, we're still not totally convinced.  If this is a 
big SCSI RAID card, I would definitely check the drivers and make sure that 
there isn't something newer/more stable out there.

> Statistics:
>
> (scsi0:0:0:0)
>   Device using Wide/Sync transfers at 80.0 MByte/sec, offset 31
>   Transinfo settings: current(10/31/1/0), goal(10/127/1/0), user(9/127/1/2)
>   Total transfers 36738885 (18761976 reads and 17976909 writes)

Waiter!  I'll have two of what that gentleman over there is having.

> > What filesystem are you running?
>
> ext2. At least that is what linux sees.  The disks are actually hardware
> raid0 winchester flashdisks.

Flash?  I.e. these are solid state disks?  If that is true, then maybe that 
is part of the problem.  Flash is different from "normal" disk.

Can these disks correct for bad sectors?  If so, the usual method to force 
remapping of bad sectors is to use dd:

        dd if=/dev/zero of=/dev/XXX bs=1M count=YYY

Where XXX is the RAID device and YYY is the number of megabytes of storage.

Please make a backup of your data first :-)

On a "normal" disk, this causes a write to each sector on the whole drive.  
That in turn causes the firmware on the drive to remap any bad sectors found 
this way.  If your disks support this, you might be unpleasantly surprized 
how many problems go away after this.  Most newer drives do this 
automatically, but it can still trash your data.  By doing the line above, 
you force the issue before you have valid data on the disk.

> > Just running FLUSH TABLES sounds like it is only going to make the
> > problem less common, not fix it.  Something is corrupting your
> > indexes/data.
>
> I loaded three big tables last night with no problems (after adding the
> occasional $dbh->do( "FLUSH TABLES" ).  Before it would happen at least
> once when doing a large (re)load of data.

We've done 7M rows in one single input file (just a hair under the 2GB limit 
for the older ext2 filesystem we have on that particular machine).  No 
problems at all.  That was with MySQL 3.23.26 or something close to that.  
We've done tests much larger than this that were either driver via Perl and 
DBI, or from a flat file.

> > Is the data getting mangled or the index?  If myisamchk can fix the
> > problem,
>
> That is the funny thing, I had to do a mysqldump > file; mysql <file to fix
> the table.  myisamchk would report the table was bad, I would try to repair
> with -o (and just about every other level).  then myisamchk would report it
> was good (even with -e).  When I continued to load the data, it would
> quickly become corrupted again.  Even rebuilding all of the indexes would
> not fix it.  Running the mysqldump, mysql fixed it much better.

There was a bug in myisamchk for a while that would cause data loss in 
certain circumstances unless you used the -v flag with -r.  This should have 
been fixed a while ago (over a year?).

If MySQL handled the input file resulting from the original dump then it was 
probably happy with the data itself.   You might want to dump out a flat file 
(tab delimited data) with mysqldump.  Then, go into the MySQL command line 
and run LOAD DATA INFILE ... to load the data in.  This will tell you how 
many warnings you get.  It will silently set date fields to 0000-00-00 etc. 
if there is something that it doesn't like for instance.

Can you check the integrity of the data in some manner?  I.e. do you have 
some code that will go through the table(s) and make sure that everything is 
OK with the data itself?  Mysqldump will forcibly write a clean file that can 
be used for input.  Thus, this might not be telling you much.

> > it is likely that the index is the problem.  MySQL will cache the index
> > in memory, but not the data.  Thus, if you see data mangling problems and
> > possibly index problems, I would look at the kernel, disk etc.  If you
> > are only see index problems, but the data looks OK, then the version of
> > MySQL might be a problem or maybe you have a bad build.  MySQL builds
> > more cleanly
>
> It happened with 3.23.41.

Did you use MySQL AB's prebuilt binary or build your own?  Their binary is 
probably the most stable.  We often build our own, but we are fairly careful 
to use the most vanilla configuration we can.

> > than most OSS projects, but it is a big complex beastie and can build
> > incorrectly without obvious errors sometimes in our experience.  Bad
> > library versions can also be a factor.
>
> I did build/run this on a RH6.2 system.

Hmm, that isn't the newest version of Red Hat...  Of course, we still have 
the odd RH 6.0 server around here :-/

> > We've run tests with 1000 hits per second on a database on a cheasy IDE
> > drive without a problem.  We've run those tests for hours at a time with
> > no problems.  SCSI definitely works better than IDE, but the newer IDE
> > drives aren't that bad anymore.  They still use a lot of CPU.
>
> It is not the selects that cause the problems, it is lots of inserts. 

This was a mix of SELECT, INSERT, UPDATE and DELETE that was designed to 
model our application load.  There were almost as many UPDATEs as SELECTs.  
We regularly generate test data in very large batches and that is all INSERTs 
or flat file input.

> Again, it only seems to happen on large loads.  I have three main tables
> and a large load means:
> mysql> select count(*) from pcm_test_header_200109;
> +----------+
>
> | count(*) |
>
> +----------+
>
> |     5844 |
>
> +----------+
> 1 row in set (0.07 sec)
>
> mysql> select count(*) from pcm_test_summary_200109;
> +----------+
>
> | count(*) |
>
> +----------+
>
> |   840413 |
>
> +----------+
> 1 row in set (0.04 sec)
>
> mysql> select count(*) from pcm_test_site_200109;
> +----------+
>
> | count(*) |
>
> +----------+
>
> |  7248366 |
>
> +----------+
> 1 row in set (0.02 sec)
>
> mysql>

Have you been able to see the problem when you load only the smaller tables?  
What I am trying to see here is if loading the large table causes disk 
corruption or does something savage and unnatural with the index cache in 
memory.  If the problems occur with small tables, then the load size may not 
have anything to do with it.  Are the rows very long?

> Any of the three tables can have problems but it is usually the site table.
>
> > If your drives to write caching, that can be a problem if you have a
> > power drop.  Most IDE drives (all?) will cache writes to allow the disk
> > firmware to
>
> This is not a power or crash problem.  It happens WHILE the loader is
> running.
>
> It could be a DBI/DBD bug.  I [try to] insert all of the above records with
> a single database handle (connection).

We have processes that use a single DBI connection for weeks at a time.  DBI 
has bugs (getting an auto increment field back that is 64 bits), but this 
kind of bug seems rare.

Best,
Kyle

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MicroTelco Services saves money on every Fax:
- Fax to email (FREE)
- Fax to PSTN based Fax (Up to 95% Savings)
- Fax Broadcasting: Send 100s of faxes to fax machines
and email addresses in the time it takes to send just one!
===========================================================
    So send a fax today and let us know what you think! 
       For more info. visit: www.internetfaxjack.com
===========================================================


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Frequently corrupt tables

Reply via email to