[Lustre-discuss] problems restoring from MDT backup (test file system)
Hi List, on my test file system I'm currently trying to verify once more that I can restore the MDT from my backups but I'm running into some problems. The MDS is running RHEL5.3 and Lustre 1.6.7.2. I use the following procedure to backup the MDT: create LVM snapshot from MDT device mount LVM snapshot as ldiskfs extract EAs with getfattr tar up the whole MDT tree from ldiskfs using '/bin/tar czSf /tmp/${BACKUP_FILE} --acls --numeric-owner .' In previous tests the restore appeared to work fine even though I was not quite sure about the ACLs as I had not recorded them before destroying the MDT at that time. This time I have some problems with the tar file when I want to verify the list of files before destroying the MDT: snip tar tizf test_MDT_Backup.tar.gz ./ROOT/tmp/frederik/cs04r-sc-com02-04/ ./ROOT/tmp/frederik/cs04r-sc-com02-04/iozone.DUMMY.47 tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now /snip Doing the same with older backup files or backups from our production file system don't show this error but for the current test file system, I can reproduce this easily with any new backup file that I create. The list of files that I see when creating a new tar file (adding -v to the tar options) from the test file system does include many files after the last one in the output above, but the new file has exactly the same problem at the same place. Has anyone seen something like this before? What could we try to recover the data from the old backup? Or is this most likely impossible? Could this indicate a problem on the files system? I've not tried to run fsck on the MDT as I'd like to extract the files from the tar files if possible independent of fixing the existing file system as an exercise. Any suggestions are welcome. Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problems restoring from MDT backup (test file system)
On Thu, 2010-03-04 at 11:21 +, Frederik Ferner wrote: tar tizf test_MDT_Backup.tar.gz ./ROOT/tmp/frederik/cs04r-sc-com02-04/ ./ROOT/tmp/frederik/cs04r-sc-com02-04/iozone.DUMMY.47 tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now /snip Looks to me like either your tar executable is broken or your archive is broken. A typical process of elimination should help you discover which is the case. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] LUN reassignment DDN and OSS
Hi Syed, If I'm not mistaken, Lustre uses a 4k block size anyways, so I don't think there is even a space penalty for going to 4k blocks. Due to the physical 512b sector size for the disks, 4k blocks also means that the s2a never has to do any unnecessary reads to complete a write. So all around 4k is the way to go. You might want to rezone them just to make sure, also try reloading the FC driver/rebooting to make sure that the HBA has rescanned the bus. Thanks, Kit syed haider wrote: All, I recently raised a question about unbalanced OSTs and received the right answer - increase the size of the OSTs. So I set forth to do this on our DDN controllers and rather than having 32 1TB LUNs i decided to go with 4 8TB LUNs instead. In doing this I learned our LUNs were created with the default 512 size and from reading the manual it appears it would improve performance for our work to go with 4096. Since most jobs run on lustre would be creating larger files (sequential) we're not concerned about losing space from smaller files taking up a 4k block. Is there any other concern I should have with going with the larger block size? Second question for the DDN expert- We have 4 OSS's connected to DUAL DDN 9550 controllers via fiber. With the older configuration we had one 1TB LUN using one Tier so the lun output looked something like this: Logical Unit Status Capacity Block LUN LabelOwner Status (Mbytes) Size Tiers Tier list --- 0 lun 01Ready1120098 5121 1 1 lun 11Ready1120098 5121 2 2 lun 21Ready1120098 5121 3 3 lun 31Ready1120098 5121 4 4 lun 41Ready1120098 5121 5 5 lun 51Ready1120098 5121 6 6 lun 61Ready1120098 5121 7 7 lun 71Ready1120098 5121 8 8 lun 82Ready1120098 5121 9 9 lun 92Ready1120098 5121 10 After making the change to only 4 LUNs two of my OSS's don't see any SCSI devices when i run fdisk. By deleting and recreating the luns did I somehow cause a zoning issue? Thanks in advance. Syed -- --- Kit Westneat kwestn...@datadirectnet.com 812-484-8485 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problems restoring from MDT backup (test file system)
Brian, thanks for your reply. Brian J. Murrell wrote: On Thu, 2010-03-04 at 11:21 +, Frederik Ferner wrote: tar tizf test_MDT_Backup.tar.gz ./ROOT/tmp/frederik/cs04r-sc-com02-04/ ./ROOT/tmp/frederik/cs04r-sc-com02-04/iozone.DUMMY.47 tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now /snip Looks to me like either your tar executable is broken or your archive is broken. A typical process of elimination should help you discover which is the case. It certainly looks like it's the tar archive that is broken. I get the same when I copy it over to a different machine. Unless is the tar executable that is broken so that it creates the broken archive as every time I create a new archive it seems to be broken at the same place. Other tar files created on the same machine don't have that problem, but I'll try creating a new archive with a new executable. Thanks, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] One or two OSS, no difference?
Hi Oleg, I just noticed the sequential performance is ok, but the random IO (which is what I am measuring) is not. Is there any way to increase random IO performance on Lustre? We have LUNs that can provide around 250.000 random read 4kb IOPS but we are only seeing 3.000 to 10.000 on Lustre. jab -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Thursday, March 04, 2010 12:49 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! This is pretty strange. Are there any differences in network topology that can explain this? If you remove the first client, does the second one shows performance at the level of of the first, but as soon as you start the load on the first again, the second client performance drops? Bye, Oleg On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote: Hi Oleg, thanks for your reply I was actually testing with only one client. When adding a second client using a different file, one client gets all the performance and the other one gets very low performance, any recommendation? Thanks in advance jab -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Wednesday, March 03, 2010 5:20 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote: We are building a very small Lustre cluster with 32 clients (patchless) and two OSS servers. Each OSS server has 1 OST with 1 TB of Solid State Drives. All is connected using dual-port DDR IB. For testing purposes, I am enabling/disabling one of the OSS/OST by using the lfs setstripe command. I am running XDD and vdbench benchmarks. Does anybody have an idea why there is no difference in MB/sec or random IOPS when using one OSS or two OSS? A quick test with dd also shows the same MB/sec when using one or two OSTs. I wonder if you just don't saturate even one OST (both backend SSD and IB interconnect) with this number of clients? Does the total throughput decreases as you decrease number of active clients and increases as you increase it even further? Increasing maximum number of in-flight rpcs might help in that case. Also are all of your clients writing to the same file or each client does io to a separate file (I hope)? Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] ext3 and over 8Tb OSTs
Bug 17569 states that ldiskfs.ko built on releases 1.8.1 will not allow 8tb OSTs to be mounted because there may be possible corruption. This bug was fixed by adding a force_over_8tb option. Is anyone running 8tb OSTs on ext3 out there? Any problems? What type of corruption are they talking about? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss