Dear MySQL experts!

I'm at something of a loss here. I'm testing MySQL on a new hardware platform. Previously, we had it running on Tru64 Alpha boxes. We're now moving it onto Itanium2 boxes running Debian Linux. Each machine has 4 CPUs and 16 GB RAM. Kernel version is 2.6.5.

We couldn't get the MySQL binary distribution to run at all; it dumped core immediately with SEGV. We compiled it ourselves using the Intel compiler, and got the same result. I then compiled it with gcc, and we have a binary that does at least run without crashing instantly, and appears to work correctly.

The instance is replicated from one Itanium2 machine to a second identical machine.

The clients to these databases are computational jobs running on a cluster of approximately 1000 X86 Linux boxes. The jobs query the database for the data on which they are to work, and upload results to it once they are finished. They also update a status table in the database as they work so that a master control script can periodically poll the database and resubmit jobs which fail and so on.

This setup works fine with MySQL 4.0.18 running on AlphaServer ES45 machines. But on the Itanium2 Linux machines, the vast majority of clients are seeing aborted connections:

ia64c> show status like 'Aborted_%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 2177  |
| Aborted_connects | 0     |
+------------------+-------+
2 rows in set (0.00 sec)

Looking at:

http://dev.mysql.com/doc/mysql/en/Communication_errors.html

for possible reasons, I see the usual suspects of timeout variables, but those are fine on this instance:

ia64c> show global variables like '%_timeout';
+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| connect_timeout          | 5       |
| delayed_insert_timeout   | 300     |
| innodb_lock_wait_timeout | 50      |
| interactive_timeout      | 2678200 |
| net_read_timeout         | 30      |
| net_write_timeout        | 60      |
| slave_net_timeout        | 3600    |
| wait_timeout             | 2678200 |
+--------------------------+---------+
8 rows in set (0.00 sec)

These are the same settings we use on the Alphas, where they work fine.

The other possibility is max_allowed_packet, but we've got that set quite large (certainly large enough for these queries):

ia64c> show global variables like '%_packet';
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| max_allowed_packet | 16776192 |
+--------------------+----------+
1 row in set (0.00 sec)

So I don't think it's any of these settings.

As to the Linux problems which are mentioned:

1) We don't think it's ethernet duplex - these are gigabit ethernet.

2) TCP/IP seems to be correctly configured in all other respects.

3) The switches are all fine, as far as we know

The only base I can't cover is the statement:

"Some problem with the thread library that causes interrupts on reads."

Are there known problems of this sort on certain Linux versions? Is there any code around to test whether this machine has this problem?

Many thanks for any help you gurus can offer...

Regards,

Tim


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to