Just for others to know, it was the memory problem. I re-set the memory
parameters for ndbmtd (two nodes) to minimum. Then I could run the backup
successfully.

Thanks
BA


On Wed, Oct 24, 2012 at 11:57 AM, Bheemsen Aitha <pgb...@motorola.com>wrote:

> Hi,
>
> After following the steps at the following website, I tried to do an
> online backup of the cluster.
>
>
> http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-backup-using-management-client.html
>
> It is a plain vanilla command which is below. The cluster is almost an
> empty database, but backup is crashing at least one data node and was never
> successful.
>
> ndb_mgm -e "START BACKUP WAIT COMPLETED"
>
> Did anyone have this kind of error before? I  tried searching on web but
> could not find a solution.
>
> Here is the error I received.
>
> Connected to Management Server at: localhost:1186
> Waiting for completed, this may take several minutes
> Backup failed
> *  3001: Could not start backup
> *        Backup aborted due to node failure: Permanent error: Internal
> error
>
> ALERT    -- Node 2: Backup 2 started from 49 has been aborted. Error: 1326
>
>
> Here is little background about our setup.
>
> OS: Redhat Linux 5.8
> Cluster: MySQL 5.5, NDB 7.2.7
> Cluster was installed and set up on two hosts using MCM, one host hosting
> mysqld, ndb_mgmd and the other hosting
> ndbmtd1 and ndbmtd2.
>
> I even tried by setting up the following parameters, but got the same
> error again.
>
>
> set BackupDataBufferSize:ndbmtd=256M attcluster;
> set BackupLogBufferSize:ndbmtd=256M attcluster;
> set BackupMemory:ndbmtd=512M attcluster;
> set BackupWriteSize:ndbmtd=32M attcluster;
> set BackupMaxWriteSize:ndbmtd=128M attcluster;
>
> Here are some links I found on web similar to my error.
>
>
> http://forums.mysql.com/read.php?25,230891,230959#msg-230959
> http://grokbase.com/t/mysql/cluster/0578z8cj71/backup-error
> http://bugs.mysql.com/bug.php?id=66104
>
>
> mcm>  show status -r attcluster;
> +--------+----------+-----------------+---------+-----------+---------+
> | NodeId | Process  | Host            | Status  | Nodegroup | Package |
> +--------+----------+-----------------+---------+-----------+---------+
> | 49     | ndb_mgmd | ut06sandboxdb01 | running |           | 7.2.7   |
> | 50     | mysqld   | ut06sandboxdb01 | running |           | 7.2.7   |
> | 1      | ndbmtd   | ut06sandboxdb02 | failed  | 0         | 7.2.7   |
> | 2      | ndbmtd   | ut06sandboxdb02 | running | 0         | 7.2.7   |
> +--------+----------+-----------------+---------+-----------+---------+
> 4 rows in set (0.07 sec)
>
> mcm>
>
> I see the core dump in DataDir of node 1.
>
> [root@ut06sandboxdb02 data]# ls -ltr
> /opt/app/mcm-1.1.6_64-linux-rhel5-x86/mcm_data/clusters/attcluster/1/data
> total 16949760
> -rw-rw-r-- 1 mysql mysql          0 Oct 19 12:23 ndb_1_out.err
> -rw-rw-r-- 1 mysql mysql          1 Oct 21 04:02 ndb_1_trace.log.next
> -rw-rw-r-- 1 mysql mysql        568 Oct 21 04:02 ndb_1_error.log
> -rw-rw-r-- 1 mysql mysql      12202 Oct 21 04:02 ndb_1_trace.log.1_t4
> -rw-rw-r-- 1 mysql mysql     923467 Oct 21 04:02 ndb_1_trace.log.1_t3
> -rw-rw-r-- 1 mysql mysql     923489 Oct 21 04:02 ndb_1_trace.log.1_t2
> -rw-rw-r-- 1 mysql mysql     934663 Oct 21 04:02 ndb_1_trace.log.1_t1
> -rw-rw-r-- 1 mysql mysql     948989 Oct 21 04:02 ndb_1_trace.log.1
> -rw------- 1 mysql mysql 4104044544 Oct 23 11:04 core.21529
> -rw------- 1 mysql mysql 5880332288 Oct 23 18:22 core.8108
> -rw------- 1 mysql mysql 4538155008 Oct 23 23:56 core.1124
> -rw------- 1 mysql mysql 2924789760 Oct 24 00:32 core.9176
> -rw-rw-r-- 1 mysql mysql     460826 Oct 24 00:33 ndb_1_out.log
>
> Here is more info from mcmd.log. I really appreciate any help on this.
>
>
> 2012-10-23 18:09:58.193: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] WARNING  -- Node 2: Failed
> to memlock pages, error: 12 (Cannot allocate memory)
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2: Waiting
> 30 sec for nodes 1 to connect, nodes [ all: 1 and 2 connected: 2 no-wait:  ]
>
> 2012-10-23 18:09:58.193: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2: Waiting
> 30 sec for nodes 1 to connect, nodes [ all: 1 and 2 connected: 2 no-wait:  ]
>
> 2012-10-23 18:09:58.286: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2:
> Communication to Node 1 opened
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2: Node 1
> Connected
>
> 2012-10-23 18:09:58.347: (message) last message repeated 1 times
> 2012-10-23 18:09:58.347: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 1: Node 2
> Connected
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 1: Node 2:
> API mysql-5.5.25 ndb-7.2.7
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2:
> CM_REGCONF president = 1, own Node = 2, our dynamic id = 0/13
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2: Node 1:
> API mysql-5.5.25 ndb-7.2.7
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2: Start
> phase 1 completed
>
> 2012-10-23 18:09:58.437: (message) last message repeated 1 times
> 2012-10-23 18:09:58.437: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2: Start
> phase 2 completed (node restart)
>
> 2012-10-23 18:09:58.467: (message) last message repeated 1 times
> 2012-10-23 18:09:58.467: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 1: DICT:
> locked by node 2 for NodeRestart
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2: Start
> phase 3 completed (node restart)
> [1,ndb_mgmd,0]: 2012-10-23 18:09:58 [MgmtSrvr] INFO     -- Node 2: Start
> phase 4 completed (node restart)
>
> 2012-10-23 18:09:59.029: (message) last message repeated 1 times
> 2012-10-23 18:09:59.029: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:09:59 [MgmtSrvr] INFO     -- Node 2: Receive
> arbitrator node 49 [ticket=1fac001848b1b289]
>
> 2012-10-23 18:09:59.711: (message) last message repeated 1 times
> 2012-10-23 18:09:59.711: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:09:59 [MgmtSrvr] INFO     -- Node 2:
> Starting to restore schema
>
> 2012-10-23 18:10:00.185: (message) last message repeated 1 times
> 2012-10-23 18:10:00.185: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:10:00 [MgmtSrvr] INFO     -- Node 2: Restore
> of schema complete
>
> 2012-10-23 18:10:00.216: (message) last message repeated 1 times
> 2012-10-23 18:10:00.216: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:10:00 [MgmtSrvr] INFO     -- Node 1: Node
> restart completed copy of dictionary information
> [1,ndb_mgmd,0]: 2012-10-23 18:10:00 [MgmtSrvr] INFO     -- Node 2: DICT:
> activate index 6 done (sys/def/5/ndb_index_stat_sample_x1)
>
> 2012-10-23 18:10:00.227: (message) last message repeated 1 times
> 2012-10-23 18:10:00.227: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:10:00 [MgmtSrvr] INFO     -- Node 2: GCP
> Monitor: unlimited lags allowed
> [1,ndb_mgmd,0]: 2012-10-23 18:10:00 [MgmtSrvr] INFO     -- Node 2: Node
> restart starting to copy the fragments to Node 2
> [1,ndb_mgmd,0]: 2012-10-23 18:10:00 [MgmtSrvr] INFO     -- Node 2: Node: 2
> StartLog: [GCI Keep: 124385 LastCompleted: 124437 NewestRestorable: 124454]
>
> 2012-10-23 18:10:02.187: (message) last message repeated 1 times
> 2012-10-23 18:10:02.187: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:10:02 [MgmtSrvr] INFO     -- Node 2: LQH:
> Starting to rebuild ordered indexes
> [1,ndb_mgmd,0]: 2012-10-23 18:10:02 [MgmtSrvr] INFO     -- Node 2: LQH:
> index 6 rebuild done
> [1,ndb_mgmd,0]: 2012-10-23 18:10:02 [MgmtSrvr] INFO     -- Node 2: LQH:
> Rebuild ordered indexes complete
>
> 2012-10-23 18:10:02.296: (message) last message repeated 3 times
> 2012-10-23 18:10:02.296: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:10:02 [MgmtSrvr] INFO     -- Node 2: Node
> restart completed copying the fragments to Node 2
>
> 2012-10-23 18:10:03.431: (message) last message repeated 1 times
> 2012-10-23 18:10:03.431: (message) [T0x19add970 RECFG]:
> [1,ndb_mgmd,0]: 2012-10-23 18:10:03 [MgmtSrvr] INFO     -- Node 1: Local
> checkpoint 87 started. Keep GCI = 124428 oldest restorable GCI = 124426
>
>
>
>
> Thanks
>

Reply via email to