Spoiler: You may be right about the bad libs...

Kyle Hayes wrote:

> On Thursday 18 October 2001 12:31, Bill Adams wrote:
> Hmm, 2.2 doesn't do SMP really well.  However, its drawbacks are limited to
> underuse of the CPUs rather than any kind of corruption or other issue.  You
> would get much better performance with 2.4, but 2.2 is probably a little more
> stable.

2.4 is not an option for me because:
o Right not I use Informix as my production database.  Until they officially
support 2.4 or I 'upgrade' to MySQL I am stuck in the 2.2.x series.

o Until the VM crap is worked out, I am not installing the 2.5, er. 2.4 kernels
on any production machines unless it comes with the distribution.


> Is this a DAC960 or something similar?  If so, make sure you have the
> absolute latest drivers.  We have some dual processor machines with those
> controllers (or something closely related) and had to do many driver updates
> before it stabilized.  And, we're still not totally convinced.  If this is a
> big SCSI RAID card, I would definitely check the drivers and make sure that
> there isn't something newer/more stable out there.

I have a Mylex DAC1164P for the /, /home, etc. using RAID5.  All of the MySQL
tables are on an "Adaptec AIC-7899 Ultra 160/m SCSI host adapter" which is a
dual channel UW controller.


> > Statistics:
> >
> > (scsi0:0:0:0)
> >   Device using Wide/Sync transfers at 80.0 MByte/sec, offset 31
> >   Transinfo settings: current(10/31/1/0), goal(10/127/1/0), user(9/127/1/2)
> >   Total transfers 36738885 (18761976 reads and 17976909 writes)
>
> Waiter!  I'll have two of what that gentleman over there is having.

:)


> > > What filesystem are you running?
> >
> > ext2. At least that is what linux sees.  The disks are actually hardware
> > raid0 winchester flashdisks.
>
> Flash?  I.e. these are solid state disks?  If that is true, then maybe that
> is part of the problem.  Flash is different from "normal" disk.

No, that is the product name. http://www.winsys.com/products/  Basically, it is
a box with 12 drives in it and a dual channel scsi controller (in my model).  As
far as Linux is concerned, each box appears as two very large, very fast drives
on two channels.  You can partition in different ways and get them with one
channel, etc..


> Can these disks correct for bad sectors?  If so, the usual method to force
> remapping of bad sectors is to use dd:

AFAIK, the flash controller corrects for that.  But then again I am running
RAID0 and winchester systems does not officially support that level (they do
0+1, 5, others) because part of what they are selling besides blazingly fast
raid boxes is data security and integrity.  Obviously you do not get that with
RAID0.  For my application that is not an issue.  I care only about speed and
volume: my raw data is backed up elsewhere.  But I digress....


>         dd if=/dev/zero of=/dev/XXX bs=1M count=YYY
>
> Where XXX is the RAID device and YYY is the number of megabytes of storage.
>
> Please make a backup of your data first :-)
>
> On a "normal" disk, this causes a write to each sector on the whole drive.
> That in turn causes the firmware on the drive to remap any bad sectors found
> this way.  If your disks support this, you might be unpleasantly surprized
> how many problems go away after this.  Most newer drives do this
> automatically, but it can still trash your data.  By doing the line above,
> you force the issue before you have valid data on the disk.

I in my case I just 'login' to the controller on the flashdisk to get statistics
such as bad sectors and such.

Not to sound too much like an advertisement for Winchester Systems but these
people have been around for a long time and the controllers I have have been too
and are well tested and used by many other companies/people with much more
critical needs than I have.  I also do not have problems with the Informix
tables on the same disks using the same dataloader under the same conditions.
And it happens on different enclosures/disks/etc..


> We've done 7M rows in one single input file (just a hair under the 2GB limit
> for the older ext2 filesystem we have on that particular machine).  No
> problems at all.  That was with MySQL 3.23.26 or something close to that.
> We've done tests much larger than this that were either driver via Perl and
> DBI, or from a flat file.

Well, I am running an ancient version of DBI. I will upgrade to a more modern
version of DBI and msql-mysql-modules; reload data; and report back.


> > > Is the data getting mangled or the index?  If myisamchk can fix the
> > > problem,
> >
> > That is the funny thing, I had to do a mysqldump > file; mysql <file to fix
> > the table.  myisamchk would report the table was bad, I would try to repair
> > with -o (and just about every other level).  then myisamchk would report it
> > was good (even with -e).  When I continued to load the data, it would
> > quickly become corrupted again.  Even rebuilding all of the indexes would
> > not fix it.  Running the mysqldump, mysql fixed it much better.
>
> There was a bug in myisamchk for a while that would cause data loss in
> certain circumstances unless you used the -v flag with -r.  This should have
> been fixed a while ago (over a year?).

No dataloss.  It just would not work.


> Can you check the integrity of the data in some manner?  I.e. do you have
> some code that will go through the table(s) and make sure that everything is
> OK with the data itself?  Mysqldump will forcibly write a clean file that can
> be used for input.  Thus, this might not be telling you much.

I can update my loader to check data.


> > > it is likely that the index is the problem.  MySQL will cache the index
> > > in memory, but not the data.  Thus, if you see data mangling problems and
> > > possibly index problems, I would look at the kernel, disk etc.  If you
> > > are only see index problems, but the data looks OK, then the version of
> > > MySQL might be a problem or maybe you have a bad build.  MySQL builds
> > > more cleanly
> >
> > It happened with 3.23.41.
>
> Did you use MySQL AB's prebuilt binary or build your own?  Their binary is
> probably the most stable.  We often build our own, but we are fairly careful
> to use the most vanilla configuration we can.

My own.  But I always just configure it with just './configure
--prefix=/usr/lcoal/mysql'.



> > > than most OSS projects, but it is a big complex beastie and can build
> > > incorrectly without obvious errors sometimes in our experience.  Bad
> > > library versions can also be a factor.
> >
> > I did build/run this on a RH6.2 system.
>
> Hmm, that isn't the newest version of Red Hat...  Of course, we still have
> the odd RH 6.0 server around here :-/

If it works (most of the time) don't touch it. ;-)  I have another system that
is more recent that I can test this on after I do the DBD upgrade test.  Aside
from the system libs, I usually build and upgrade everything by hand (MySQL,
Apache, etc.).

*** OMG ***
But haha I cannot believe this, I was just looking at the libraries linked by
mysqld with ldd and it is using the informix libpthread.so.  Hmm, crap. *me
slaps head*



> Have you been able to see the problem when you load only the smaller tables?
> What I am trying to see here is if loading the large table causes disk
> corruption or does something savage and unnatural with the index cache in
> memory.  If the problems occur with small tables, then the load size may not
> have anything to do with it.  Are the rows very long?

It seems to happen on any of the tables, it is more likely on the larger ones.

header == 77 fields
summary == 56 fields
site == 4 fields

[bill@zeus /var/flashdisk/3p0/mysql/pcm_test]$ ls -l *200109*
-rw-rw----    1 mysql    mysql     4412220 Oct 17 22:28
pcm_test_header_200109.MYD
-rw-rw----    1 mysql    mysql     1585152 Oct 18 10:06
pcm_test_header_200109.MYI
-rw-rw----    1 mysql    mysql       11474 Oct 17 15:36
pcm_test_header_200109.frm
-rw-rw----    1 mysql    mysql    108725490 Oct 17 22:28
pcm_test_site_200109.MYD
-rw-rw----    1 mysql    mysql    96129024 Oct 18 10:08 pcm_test_site_200109.MYI

-rw-rw----    1 mysql    mysql        8654 Oct 17 15:36 pcm_test_site_200109.frm

-rw-rw----    1 mysql    mysql     1193493 Oct 17 22:28
pcm_test_site_coor_200109.MYD
-rw-rw----    1 mysql    mysql      583680 Oct 18 10:08
pcm_test_site_coor_200109.MYI
-rw-rw----    1 mysql    mysql        8718 Oct 17 15:36
pcm_test_site_coor_200109.frm
-rw-rw----    1 mysql    mysql    240358118 Oct 17 22:28
pcm_test_summary_200109.MYD
-rw-rw----    1 mysql    mysql    48247808 Oct 18 10:08
pcm_test_summary_200109.MYI
-rw-rw----    1 mysql    mysql       10848 Oct 17 15:36
pcm_test_summary_200109.frm


Well, the first order of business will be to remove the "flush tables" to ensure
that I still really get corruption.  Then I will fix my stupid pthread problem
and see what the result of that is.  Then I will upgrade (regardless) my DBI and
related modules.

This will probably take a couple of days....


--Bill



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to