[Lustre-discuss] how to add force_over_8tb to MDS
I configured a Lustre file system on a collection of storage servers that have 12TB raw devices. I configured a combined MGS/MDS with the default configuration. On the OSTs however I added the force_over_8tb to the mountfsoptions. Two part question: 1- do I need to set that parameter on the MGS/MDS server as well 2- if yes, how do I properly add this parameter on this running Lustre file system (100TB on 9 storage servers) I can't resolve the ambiguity in the documentation as I can't find a good explanation of the configuration log mechanism that is being referenced in the man pages. The fact that the doc for --writeconf states This is very dangerous, I am hesitant to pull the trigger as there is 60TB of data on this file system that I rather not lose. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how to add force_over_8tb to MDS
On Jul 14, 2011, at 1:15 PM, Theodore Omtzigt wrote: Two part question: 1- do I need to set that parameter on the MGS/MDS server as well No, they are different filesystems. You shouldn't need to do this on the OSTs either. You must be using an older lustre release. 2- if yes, how do I properly add this parameter on this running Lustre file system (100TB on 9 storage servers) covered I can't resolve the ambiguity in the documentation as I can't find a good explanation of the configuration log mechanism that is being referenced in the man pages. The fact that the doc for --writeconf states This is very dangerous, I am hesitant to pull the trigger as there is 60TB of data on this file system that I rather not lose. I've had no issues with writeconf. Its nice because it shows you the old and new parameters. Make sure that the changes that you made were the what you want, and that the old parameters that you want to keep are still in tact. I don't remember the exact circumstances, but I've found settings were lost when doing a writeconf, and I had to explictly put these settings in tunefs.lustre command to preserve them. -mb -- +--- | Michael Barnes | | Thomas Jefferson National Accelerator Facility | Scientific Computing Group | 12000 Jefferson Ave. | Newport News, VA 23606 | (757) 269-7634 +--- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how to add force_over_8tb to MDS
If you are seeing this problem it means you are using the ext3-based ldiskfs. Go back to the download site and get the lustre-ldiskfs and lustre-modules RPMs with ext4 in the name. That is the code that was tested with LUNs over 8TB. We kept these separate for some time to reduce risk for users that did not need larger LUN sizes. This is the default for the recent Whamcloud 1.8.6 release. Cheers, Andreas On 2011-07-14, at 11:15 AM, Theodore Omtzigt t...@stillwater-sc.com wrote: I configured a Lustre file system on a collection of storage servers that have 12TB raw devices. I configured a combined MGS/MDS with the default configuration. On the OSTs however I added the force_over_8tb to the mountfsoptions. Two part question: 1- do I need to set that parameter on the MGS/MDS server as well 2- if yes, how do I properly add this parameter on this running Lustre file system (100TB on 9 storage servers) I can't resolve the ambiguity in the documentation as I can't find a good explanation of the configuration log mechanism that is being referenced in the man pages. The fact that the doc for --writeconf states This is very dangerous, I am hesitant to pull the trigger as there is 60TB of data on this file system that I rather not lose. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] potential issue with data corruption
Hi, We are seeing a problem where some running jobs attempted to copy a file from local disk on a worker node to a lustre file system. 14 of those files ended up empty or truncated. We have 7 OSSs with either 6 or 12 ost's on each. All 14 files ended up being on an ost on one of the two systems that have 12 osts. There are 12 different OST's involved. So if I look at the messages file on one of those OSS's and I specifically look for messages related to one of the OST's that have a truncated or empty file I see things like this: Jul 7 07:10:08 cmsls6 kernel: Lustre: 15431:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: c03badd9-c242-1507-6824-3a9648c8b21f reconnecting Jul 7 07:59:42 cmsls6 kernel: Lustre: 3272:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905647245 sent from cmsprod1-OST002d to NID 131.225.191.35@tcp 7s ago has timed out (7s prior to deadline). Jul 7 07:59:42 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.191.35@tcp was evicted due to a lock completion callback to 131.225.191.35@tcp timed out: rc -107 Jul 7 09:26:58 cmsls6 kernel: Lustre: 15433:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: 9235f65e-ff71-2b1f-60fb-c049cbad5728 reconnecting Jul 7 09:53:50 cmsls6 kernel: Lustre: 2663:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905668862 sent from cmsprod1-OST002d to NID 131.225.204.88@tcp 7s ago has timed out (7s prior to deadline). Jul 7 09:53:50 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.204.88@tcp was evicted due to a lock blocking callback to 131.225.204.88@tcp timed out: rc -107 Jul 7 10:18:57 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.207.176@tcp was evicted due to a lock blocking callback to 131.225.207.176@tcp timed out: rc -107 Jul 7 10:23:01 cmsls6 kernel: Lustre: 15405:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905675944 sent from cmsprod1-OST002d to NID 131.225.204.118@tcp 7s ago has timed out (7s prior to deadline). Jul 7 11:06:31 cmsls6 kernel: Lustre: 15341:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: e25b2761-680a-4d94-ed2c-10913403c0a3 reconnecting Jul 7 12:26:17 cmsls6 kernel: Lustre: 15352:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905703492 sent from cmsprod1-OST002d to NID 131.225.190.151@tcp 7s ago has timed out (7s prior to deadline). Jul 7 12:26:17 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.190.151@tcp was evicted due to a lock blocking callback to 131.225.190.151@tcp timed out: rc -107 Jul 7 12:26:17 cmsls6 kernel: LustreError: 15352:0:(ldlm_lockd.c:1167:ldlm_handle_enqueue()) ### lock on destroyed export 810c3926f400 ns: filter-cmsprod1-OST002d_UUID lock: 8109c7f21a00/0xf22d54118e04e04d lrc: 3/0,0 mode: --/PW res: 337742/0 rrc: 2 type: EXT [0-1048575] (req 0-1048575) flags: 0x0 remote: 0x6c03f21f59f6b4e6 expref: 19 pid: 15352 timeout 0 Jul 7 12:26:17 cmsls6 kernel: Lustre: 2740:0:(ost_handler.c:1219:ost_brw_write()) cmsprod1-OST002d: ignoring bulk IO comm error with f81d3629-7e6a-1b5d-810e-ad73d7f5c90d@NET_0x283e1be97_UUID id 12345-131.225.190.151@tcp - client will retry Jul 7 12:26:19 cmsls6 kernel: Lustre: 2742:0:(ost_handler.c:1219:ost_brw_write()) cmsprod1-OST002d: ignoring bulk IO comm error with f81d3629-7e6a-1b5d-810e-ad73d7f5c90d@NET_0x283e1be97_UUID id 12345-131.225.190.151@tcp - client will retry Some of these errors seem really bad - like the bulk IO comm error or the eviction due to a locking call back. What should I be looking for here? I have determined some of the messages that say a client has been evicted cause the OSS thinks its dead are not due the system being down. So what makes the OSS think the client is dead? Also is there any way to determine what files are involved in these errors? lisa attachment: lisa.vcf___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how to add force_over_8tb to MDS
Andreas: Thanks for taking a look at this. Unfortunately, I don't quite understand the guidance you present: If you are seeing 'this' problem. I haven't seen 'any' problems pertaining to 8tb yet, so I cannot place your guidance in the context of the question I posted. My question was whether or not I need this parameter on the MDS and if so, how to apply it retroactively. The Lustre environment I installed was the 1.8.5 set. Any insight in the issues would be appreciated. Theo On 7/14/2011 1:41 PM, Andreas Dilger wrote: If you are seeing this problem it means you are using the ext3-based ldiskfs. Go back to the download site and get the lustre-ldiskfs and lustre-modules RPMs with ext4 in the name. That is the code that was tested with LUNs over 8TB. We kept these separate for some time to reduce risk for users that did not need larger LUN sizes. This is the default for the recent Whamcloud 1.8.6 release. Cheers, Andreas On 2011-07-14, at 11:15 AM, Theodore Omtzigtt...@stillwater-sc.com wrote: I configured a Lustre file system on a collection of storage servers that have 12TB raw devices. I configured a combined MGS/MDS with the default configuration. On the OSTs however I added the force_over_8tb to the mountfsoptions. Two part question: 1- do I need to set that parameter on the MGS/MDS server as well 2- if yes, how do I properly add this parameter on this running Lustre file system (100TB on 9 storage servers) I can't resolve the ambiguity in the documentation as I can't find a good explanation of the configuration log mechanism that is being referenced in the man pages. The fact that the doc for --writeconf states This is very dangerous, I am hesitant to pull the trigger as there is 60TB of data on this file system that I rather not lose. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Packaged kerberized VM client image Re: Migrating virtual machines over Lustre using Proxmox
Hi Paul, I wanted to signify our interest in your project as we have something similar and related. As part of the OSG ExTENCI project, we've set up a kerberized lustre fs that uses virtual (VM) lustre clients in remote sites. With proper network tuning/route analysis, we observe that it is possible to saturate the full IO bandwidth even for remote VM lustre clients and obtain good IO rates. So far, we've made available kerberized XEN VM lustre image (ftp://ftp.psc.edu/pub/jwan/Lustre-2.1/2.0.62/vm-images/) that ExTENCI Tier3 remote sites can download and just boot up after being given the proper kerberos principals. We will also provide the kerberized images for KVM (Proxmox) and VMware. Currently, we use Lustre 2.1 (2.0.62) with 2.0.63 for clients. PSC locally runs the same set up on a separate kerberos realm. We invite collaboration with other parties who might be interested in trying packaged kerberized lustre VM clients at their sites. Regards, josephine On Sat, 9 Jul 2011, Paul Gray wrote: Like most of the readers on the list, my background with Lustre originates from cluster environments. But as virtualization trends seem to be here to stay, the question of using Lustre to support large-scale distributed virtualization naturally arises. Being able to leverage Lustre benefits in a VM cloud would seem to have quite a few advantages. As a test case, at UNI we extended the Proxmox Virtualization Environment to support *live* Virtual Machine migration across separate physical (bare-metal) hosts of the Proxmox virtualization cluster, supported by a distributed Lustre filesystem. If you aren't familiar with Proxmox and live migration support over Lustre, what we deployed at UNI is akin to being able to do VMWare's VMotion over Lustre (without the associated license costs). We put together two screencasts showing the prototype deployment and wanted to share the proof-of-concept results with the community: *) A small demonstration of live migration with a small Debian VM whose root filesystem is supported over a distributed lustre implementation can be found here: http://dragon.cs.uni.edu/flash/proxmoxlustre.html *) A short screencast showing live migration over Lustre using the Proxmox GUI can be viewed here: http://dragon.cs.uni.edu/flash/gui-migration.html Our immediate interests are in the performance of large (in terms of quantity), dynamic, live migrations that would leverage our high-throughput IB-based Lustre subsystem from our clusters. We'd welcome your comments, feedback, questions or requests for specific benchmarks to explore. ADVthanksANCE -- Paul Gray -o) 314 East Gym, Dept. of Computer Science /\\ University of Northern Iowa _\_V Message void if penguin violated ... Don't mess with the penguin No one says, Hey, I can't read that ASCII attachment ya sent me. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how to add force_over_8tb to MDS
Michael: The reason I had to do it on the OST's is because when issuing the mkfs.lustre command to build the OST it would error out with the message that I should use the force_over_8tb mount option. I was not able to create an OST on that device without the force_over_8tb option. Your insights on the writeconf are excellent: good to know that writeconf is solid. Thank you. Theo On 7/14/2011 1:29 PM, Michael Barnes wrote: On Jul 14, 2011, at 1:15 PM, Theodore Omtzigt wrote: Two part question: 1- do I need to set that parameter on the MGS/MDS server as well No, they are different filesystems. You shouldn't need to do this on the OSTs either. You must be using an older lustre release. 2- if yes, how do I properly add this parameter on this running Lustre file system (100TB on 9 storage servers) covered I can't resolve the ambiguity in the documentation as I can't find a good explanation of the configuration log mechanism that is being referenced in the man pages. The fact that the doc for --writeconf states This is very dangerous, I am hesitant to pull the trigger as there is 60TB of data on this file system that I rather not lose. I've had no issues with writeconf. Its nice because it shows you the old and new parameters. Make sure that the changes that you made were the what you want, and that the old parameters that you want to keep are still in tact. I don't remember the exact circumstances, but I've found settings were lost when doing a writeconf, and I had to explictly put these settings in tunefs.lustre command to preserve them. -mb -- +--- | Michael Barnes | | Thomas Jefferson National Accelerator Facility | Scientific Computing Group | 12000 Jefferson Ave. | Newport News, VA 23606 | (757) 269-7634 +--- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] potential issue with data corruption
I am running 1.8.3 on servers and clients. lisa On 7/14/11 12:59 PM, Lisa Giacchetti wrote: Hi, We are seeing a problem where some running jobs attempted to copy a file from local disk on a worker node to a lustre file system. 14 of those files ended up empty or truncated. We have 7 OSSs with either 6 or 12 ost's on each. All 14 files ended up being on an ost on one of the two systems that have 12 osts. There are 12 different OST's involved. So if I look at the messages file on one of those OSS's and I specifically look for messages related to one of the OST's that have a truncated or empty file I see things like this: Jul 7 07:10:08 cmsls6 kernel: Lustre: 15431:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: c03badd9-c242-1507-6824-3a9648c8b21f reconnecting Jul 7 07:59:42 cmsls6 kernel: Lustre: 3272:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905647245 sent from cmsprod1-OST002d to NID 131.225.191.35@tcp 7s ago has timed out (7s prior to deadline). Jul 7 07:59:42 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.191.35@tcp was evicted due to a lock completion callback to 131.225.191.35@tcp timed out: rc -107 Jul 7 09:26:58 cmsls6 kernel: Lustre: 15433:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: 9235f65e-ff71-2b1f-60fb-c049cbad5728 reconnecting Jul 7 09:53:50 cmsls6 kernel: Lustre: 2663:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905668862 sent from cmsprod1-OST002d to NID 131.225.204.88@tcp 7s ago has timed out (7s prior to deadline). Jul 7 09:53:50 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.204.88@tcp was evicted due to a lock blocking callback to 131.225.204.88@tcp timed out: rc -107 Jul 7 10:18:57 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.207.176@tcp was evicted due to a lock blocking callback to 131.225.207.176@tcp timed out: rc -107 Jul 7 10:23:01 cmsls6 kernel: Lustre: 15405:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905675944 sent from cmsprod1-OST002d to NID 131.225.204.118@tcp 7s ago has timed out (7s prior to deadline). Jul 7 11:06:31 cmsls6 kernel: Lustre: 15341:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: e25b2761-680a-4d94-ed2c-10913403c0a3 reconnecting Jul 7 12:26:17 cmsls6 kernel: Lustre: 15352:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905703492 sent from cmsprod1-OST002d to NID 131.225.190.151@tcp 7s ago has timed out (7s prior to deadline). Jul 7 12:26:17 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.190.151@tcp was evicted due to a lock blocking callback to 131.225.190.151@tcp timed out: rc -107 Jul 7 12:26:17 cmsls6 kernel: LustreError: 15352:0:(ldlm_lockd.c:1167:ldlm_handle_enqueue()) ### lock on destroyed export 810c3926f400 ns: filter-cmsprod1-OST002d_UUID lock: 8109c7f21a00/0xf22d54118e04e04d lrc: 3/0,0 mode: --/PW res: 337742/0 rrc: 2 type: EXT [0-1048575] (req 0-1048575) flags: 0x0 remote: 0x6c03f21f59f6b4e6 expref: 19 pid: 15352 timeout 0 Jul 7 12:26:17 cmsls6 kernel: Lustre: 2740:0:(ost_handler.c:1219:ost_brw_write()) cmsprod1-OST002d: ignoring bulk IO comm error with f81d3629-7e6a-1b5d-810e-ad73d7f5c90d@NET_0x283e1be97_UUID id 12345-131.225.190.151@tcp - client will retry Jul 7 12:26:19 cmsls6 kernel: Lustre: 2742:0:(ost_handler.c:1219:ost_brw_write()) cmsprod1-OST002d: ignoring bulk IO comm error with f81d3629-7e6a-1b5d-810e-ad73d7f5c90d@NET_0x283e1be97_UUID id 12345-131.225.190.151@tcp - client will retry Some of these errors seem really bad - like the bulk IO comm error or the eviction due to a locking call back. What should I be looking for here? I have determined some of the messages that say a client has been evicted cause the OSS thinks its dead are not due the system being down. So what makes the OSS think the client is dead? Also is there any way to determine what files are involved in these errors? lisa ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss attachment: lisa.vcf___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how to add force_over_8tb to MDS
--writeconf will erase parameters set via lctl conf_param, and will erase pools definitions. It will also allow you to set rather silly parameters that can prevent your filesystem from starting, such as incorrect server NIDs or incorrect failover NIDs. For this reason (and from a history of customer support) we caveat it's use in the manual. The --writeconf option never touches data, only server configs, so it will not mess up your data. So, given sensible precautions as mentioned above, it's safe to do. cliffw On Thu, Jul 14, 2011 at 11:03 AM, Theodore Omtzigt t...@stillwater-sc.comwrote: Andreas: Thanks for taking a look at this. Unfortunately, I don't quite understand the guidance you present: If you are seeing 'this' problem. I haven't seen 'any' problems pertaining to 8tb yet, so I cannot place your guidance in the context of the question I posted. My question was whether or not I need this parameter on the MDS and if so, how to apply it retroactively. The Lustre environment I installed was the 1.8.5 set. Any insight in the issues would be appreciated. Theo On 7/14/2011 1:41 PM, Andreas Dilger wrote: If you are seeing this problem it means you are using the ext3-based ldiskfs. Go back to the download site and get the lustre-ldiskfs and lustre-modules RPMs with ext4 in the name. That is the code that was tested with LUNs over 8TB. We kept these separate for some time to reduce risk for users that did not need larger LUN sizes. This is the default for the recent Whamcloud 1.8.6 release. Cheers, Andreas On 2011-07-14, at 11:15 AM, Theodore Omtzigtt...@stillwater-sc.com wrote: I configured a Lustre file system on a collection of storage servers that have 12TB raw devices. I configured a combined MGS/MDS with the default configuration. On the OSTs however I added the force_over_8tb to the mountfsoptions. Two part question: 1- do I need to set that parameter on the MGS/MDS server as well 2- if yes, how do I properly add this parameter on this running Lustre file system (100TB on 9 storage servers) I can't resolve the ambiguity in the documentation as I can't find a good explanation of the configuration log mechanism that is being referenced in the man pages. The fact that the doc for --writeconf states This is very dangerous, I am hesitant to pull the trigger as there is 60TB of data on this file system that I rather not lose. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how to add force_over_8tb to MDS
This error message you are seeing is what Andreas was talking about - you must use the ext4-based version, as you will not need any option with your size LUNS. The 'must use force_over_8tb' error is the key here, you most certainly want/need to *.ext4.rpm versions of stuff. cliffw On Thu, Jul 14, 2011 at 11:10 AM, Theodore Omtzigt t...@stillwater-sc.comwrote: Michael: The reason I had to do it on the OST's is because when issuing the mkfs.lustre command to build the OST it would error out with the message that I should use the force_over_8tb mount option. I was not able to create an OST on that device without the force_over_8tb option. Your insights on the writeconf are excellent: good to know that writeconf is solid. Thank you. Theo On 7/14/2011 1:29 PM, Michael Barnes wrote: On Jul 14, 2011, at 1:15 PM, Theodore Omtzigt wrote: Two part question: 1- do I need to set that parameter on the MGS/MDS server as well No, they are different filesystems. You shouldn't need to do this on the OSTs either. You must be using an older lustre release. 2- if yes, how do I properly add this parameter on this running Lustre file system (100TB on 9 storage servers) covered I can't resolve the ambiguity in the documentation as I can't find a good explanation of the configuration log mechanism that is being referenced in the man pages. The fact that the doc for --writeconf states This is very dangerous, I am hesitant to pull the trigger as there is 60TB of data on this file system that I rather not lose. I've had no issues with writeconf. Its nice because it shows you the old and new parameters. Make sure that the changes that you made were the what you want, and that the old parameters that you want to keep are still in tact. I don't remember the exact circumstances, but I've found settings were lost when doing a writeconf, and I had to explictly put these settings in tunefs.lustre command to preserve them. -mb -- +--- | Michael Barnes | | Thomas Jefferson National Accelerator Facility | Scientific Computing Group | 12000 Jefferson Ave. | Newport News, VA 23606 | (757) 269-7634 +--- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] LNET o2ib networking and MTU
Just need some clarification on this: We use the o2ib driver for Lustre IB communication. We also use IPoIB to define IP addresses for the IB interfaces in the network. Does the MTU configuration parameter impact Lustre in any way? My understanding is that LNET is only using IPoIB for address resolution when using o2ib. --- Yemi ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] potential issue with data corruption
Hello! On Jul 14, 2011, at 1:59 PM, Lisa Giacchetti wrote: Jul 7 07:10:08 cmsls6 kernel: Lustre: 15431:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: c03badd9-c242-1507-6824-3a9648c8b21f reconnecting Jul 7 07:59:42 cmsls6 kernel: Lustre: 3272:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905647245 sent from cmsprod1-OST002d to NID 131.225.191.35@tcp 7s ago has timed out (7s prior to deadline). Jul 7 07:59:42 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.191.35@tcp was evicted due to a lock completion callback to 131.225.191.35@tcp timed out: rc -107 Jul 7 09:26:58 cmsls6 kernel: Lustre: 15433:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: 9235f65e-ff71-2b1f-60fb-c049cbad5728 reconnecting Jul 7 09:53:50 cmsls6 kernel: Lustre: 2663:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905668862 sent from cmsprod1-OST002d to NID 131.225.204.88@tcp 7s ago has timed out (7s prior to deadline). Jul 7 09:53:50 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.204.88@tcp was evicted due to a lock blocking callback to 131.225.204.88@tcp timed out: rc -107 Jul 7 10:18:57 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.207.176@tcp was evicted due to a lock blocking callback to 131.225.207.176@tcp timed out: rc -107 Jul 7 10:23:01 cmsls6 kernel: Lustre: 15405:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905675944 sent from cmsprod1-OST002d to NID 131.225.204.118@tcp 7s ago has timed out (7s prior to deadline). Jul 7 11:06:31 cmsls6 kernel: Lustre: 15341:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: e25b2761-680a-4d94-ed2c-10913403c0a3 reconnecting Jul 7 12:26:17 cmsls6 kernel: Lustre: 15352:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905703492 sent from cmsprod1-OST002d to NID 131.225.190.151@tcp 7s ago has timed out (7s prior to deadline). Jul 7 12:26:17 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.190.151@tcp was evicted due to a lock blocking callback to 131.225.190.151@tcp timed out: rc -107 Jul 7 12:26:17 cmsls6 kernel: LustreError: 15352:0:(ldlm_lockd.c:1167:ldlm_handle_enqueue()) ### lock on destroyed export 810c3926f400 ns: filter-cmsprod1-OST002d_UUID lock: 8109c7f21a00/0xf22d54118e04e04d lrc: 3/0,0 mode: --/PW res: 337742/0 rrc: 2 type: EXT [0-1048575] (req 0-1048575) flags: 0x0 remote: 0x6c03f21f59f6b4e6 expref: 19 pid: 15352 timeout 0 Jul 7 12:26:17 cmsls6 kernel: Lustre: 2740:0:(ost_handler.c:1219:ost_brw_write()) cmsprod1-OST002d: ignoring bulk IO comm error with f81d3629-7e6a-1b5d-810e-ad73d7f5c90d@NET_0x283e1be97_UUID id 12345-131.225.190.151@tcp - client will retry Jul 7 12:26:19 cmsls6 kernel: Lustre: 2742:0:(ost_handler.c:1219:ost_brw_write()) cmsprod1-OST002d: ignoring bulk IO comm error with f81d3629-7e6a-1b5d-810e-ad73d7f5c90d@NET_0x283e1be97_UUID id 12345-131.225.190.151@tcp - client will retry Some of these errors seem really bad - like the bulk IO comm error or the eviction due to a locking call back. What should I be looking for here? I have determined some of the messages that say a client has been evicted cause the OSS thinks its dead are not due the system being down. So what makes the OSS think the client is dead? Well, the clients become unresponsive for some reason, you really need to look at the client side logs for some clues on that. Also is there any way to determine what files are involved in these errors? Well, the lock blocking callbacks message will provide you with ost number and object index that you might be able to backreference to a file. All that said, 1.8.3 is quite old and I think it would be a much better idea to try 1.8.6 and see if it improves things. Bye, Oleg -- Oleg Drokin Senior Software Engineer Whamcloud, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] potential issue with data corruption
Oleg, thanks for your response. See my responses inline. lisa On 7/14/11 2:47 PM, Oleg Drokin wrote: Hello! On Jul 14, 2011, at 1:59 PM, Lisa Giacchetti wrote: Jul 7 07:10:08 cmsls6 kernel: Lustre: 15431:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: c03badd9-c242-1507-6824-3a9648c8b21f reconnecting Jul 7 07:59:42 cmsls6 kernel: Lustre: 3272:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905647245 sent from cmsprod1-OST002d to NID 131.225.191.35@tcp 7s ago has timed out (7s prior to deadline). Jul 7 07:59:42 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.191.35@tcp was evicted due to a lock completion callback to 131.225.191.35@tcp timed out: rc -107 Jul 7 09:26:58 cmsls6 kernel: Lustre: 15433:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: 9235f65e-ff71-2b1f-60fb-c049cbad5728 reconnecting Jul 7 09:53:50 cmsls6 kernel: Lustre: 2663:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905668862 sent from cmsprod1-OST002d to NID 131.225.204.88@tcp 7s ago has timed out (7s prior to deadline). Jul 7 09:53:50 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.204.88@tcp was evicted due to a lock blocking callback to 131.225.204.88@tcp timed out: rc -107 Jul 7 10:18:57 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.207.176@tcp was evicted due to a lock blocking callback to 131.225.207.176@tcp timed out: rc -107 Jul 7 10:23:01 cmsls6 kernel: Lustre: 15405:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905675944 sent from cmsprod1-OST002d to NID 131.225.204.118@tcp 7s ago has timed out (7s prior to deadline). Jul 7 11:06:31 cmsls6 kernel: Lustre: 15341:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: e25b2761-680a-4d94-ed2c-10913403c0a3 reconnecting Jul 7 12:26:17 cmsls6 kernel: Lustre: 15352:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1359120905703492 sent from cmsprod1-OST002d to NID 131.225.190.151@tcp 7s ago has timed out (7s prior to deadline). Jul 7 12:26:17 cmsls6 kernel: LustreError: 138-a: cmsprod1-OST002d: A client on nid 131.225.190.151@tcp was evicted due to a lock blocking callback to 131.225.190.151@tcp timed out: rc -107 Jul 7 12:26:17 cmsls6 kernel: LustreError: 15352:0:(ldlm_lockd.c:1167:ldlm_handle_enqueue()) ### lock on destroyed export 810c3926f400 ns: filter-cmsprod1-OST002d_UUID lock: 8109c7f21a00/0xf22d54118e04e04d lrc: 3/0,0 mode: --/PW res: 337742/0 rrc: 2 type: EXT [0-1048575] (req 0-1048575) flags: 0x0 remote: 0x6c03f21f59f6b4e6 expref: 19 pid: 15352 timeout 0 Jul 7 12:26:17 cmsls6 kernel: Lustre: 2740:0:(ost_handler.c:1219:ost_brw_write()) cmsprod1-OST002d: ignoring bulk IO comm error with f81d3629-7e6a-1b5d-810e-ad73d7f5c90d@NET_0x283e1be97_UUID id 12345-131.225.190.151@tcp - client will retry Jul 7 12:26:19 cmsls6 kernel: Lustre: 2742:0:(ost_handler.c:1219:ost_brw_write()) cmsprod1-OST002d: ignoring bulk IO comm error with f81d3629-7e6a-1b5d-810e-ad73d7f5c90d@NET_0x283e1be97_UUID id 12345-131.225.190.151@tcp - client will retry Some of these errors seem really bad - like the bulk IO comm error or the eviction due to a locking call back. What should I be looking for here? I have determined some of the messages that say a client has been evicted cause the OSS thinks its dead are not due the system being down. So what makes the OSS think the client is dead? Well, the clients become unresponsive for some reason, you really need to look at the client side logs for some clues on that. I have been doing this as I was waiting for a reply and going through the manual and lustre-discuss archives. Here is an example of one of the client's logs during the appropriate time frame: Jul 7 11:55:33 cmswn1526 kernel: LustreError: 11-0: an error occurred while communicating with 131.225.191.164@tcp. The obd_ping operation failed with -107 Jul 7 11:55:33 cmswn1526 kernel: Lustre: cmsprod1-OST0033-osc-810617966400: Connection to service cmsprod1-OST0033 via nid 131.225.191.164@tcp was lost; in progress operations using this service will wait for recovery to complete. Jul 7 11:55:33 cmswn1526 kernel: LustreError: 11-0: an error occurred while communicating with 131.225.191.164@tcp. The ost_write operation failed with -107 Jul 7 11:55:35 cmswn1526 kernel: LustreError: 167-0: This client was evicted by cmsprod1-OST0033; in progress operations using this service will fail. Jul 7 11:55:35 cmswn1526 kernel: LustreError: 3750:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@81031d414400 x1373265269802511/t0 o4-cmsprod1-OST0033_UUID@131.225.191.164@tcp:6/4 lens 448/608 e 0 to 1 dl 0 ref 2 fl Rpc:/0/0 rc 0/0 Jul 7 11:55:35 cmswn1526 kernel: Lustre: cmsprod1-OST0033-osc-810617966400: Connection restored to service cmsprod1-OST0033 using nid 131.225.191.164@tcp. Also is there any way to
Re: [Lustre-discuss] potential issue with data corruption
Hello! On Jul 14, 2011, at 3:55 PM, Lisa Giacchetti wrote: Jul 7 07:10:08 cmsls6 kernel: Lustre: 15431:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: c03badd9-c242-1507-6824-3a9648c8b21f reconnecting Some of these errors seem really bad - like the bulk IO comm error or the eviction due to a locking call back. What should I be looking for here? I have determined some of the messages that say a client has been evicted cause the OSS thinks its dead are not due the system being down. So what makes the OSS think the client is dead? Well, the clients become unresponsive for some reason, you really need to look at the client side logs for some clues on that. I have been doing this as I was waiting for a reply and going through the manual and lustre-discuss archives. Here is an example of one of the client's logs during the appropriate time frame: Jul 7 11:55:33 cmswn1526 kernel: LustreError: 11-0: an error occurred while communicating with 131.225.191.164@tcp. The obd_ping operation failed with -107 Jul 7 11:55:33 cmswn1526 kernel: Lustre: cmsprod1-OST0033-osc-810617966400: Connection to service cmsprod1-OST0033 via nid 131.225.191.164@tcp was lost; in progress operations using this service will wait for recovery to complete. This is way too late in the game, here the server already evicted the client. Was there anything before then? Also is there any way to determine what files are involved in these errors? Well, the lock blocking callbacks message will provide you with ost number and object index that you might be able to backreference to a file. I know there is a way to do this from the /proc file system (at least I think its /proc) but I can't find any reference to this in the book I got from class on this or in the manual. Can someone refresh my memory? Actually I think you can do it with combination of lfs find and lfs getattr. All that said, 1.8.3 is quite old and I think it would be a much better idea to try 1.8.6 and see if it improves things. downtimes are few and far between for us so this may take a while to get scheduled. If there is anything that can be done in the meantime I'd like to try it. I suspect the might have been several bugs since 1.8.3 that might have manifested in slowness to reply to lock callback requests and you'll end up having downtime to upgrade the clients one way or the other. Bye, Oleg -- Oleg Drokin Senior Software Engineer Whamcloud, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] potential issue with data corruption
On 7/14/11 3:05 PM, Oleg Drokin wrote: Hello! On Jul 14, 2011, at 3:55 PM, Lisa Giacchetti wrote: Jul 7 07:10:08 cmsls6 kernel: Lustre: 15431:0:(ldlm_lib.c:575:target_handle_reconnect()) cmsprod1-OST002d: c03badd9-c242-1507-6824-3a9648c8b21f reconnecting Some of these errors seem really bad - like the bulk IO comm error or the eviction due to a locking call back. What should I be looking for here? I have determined some of the messages that say a client has been evicted cause the OSS thinks its dead are not due the system being down. So what makes the OSS think the client is dead? Well, the clients become unresponsive for some reason, you really need to look at the client side logs for some clues on that. I have been doing this as I was waiting for a reply and going through the manual and lustre-discuss archives. Here is an example of one of the client's logs during the appropriate time frame: Jul 7 11:55:33 cmswn1526 kernel: LustreError: 11-0: an error occurred while communicating with 131.225.191.164@tcp. The obd_ping operation failed with -107 Jul 7 11:55:33 cmswn1526 kernel: Lustre: cmsprod1-OST0033-osc-810617966400: Connection to service cmsprod1-OST0033 via nid 131.225.191.164@tcp was lost; in progress operations using this service will wait for recovery to complete. This is way too late in the game, here the server already evicted the client. Was there anything before then? No there is nothing before then. Also is there any way to determine what files are involved in these errors? Well, the lock blocking callbacks message will provide you with ost number and object index that you might be able to backreference to a file. I know there is a way to do this from the /proc file system (at least I think its /proc) but I can't find any reference to this in the book I got from class on this or in the manual. Can someone refresh my memory? Actually I think you can do it with combination of lfs find and lfs getattr. Hmm. Ok let me try that All that said, 1.8.3 is quite old and I think it would be a much better idea to try 1.8.6 and see if it improves things. downtimes are few and far between for us so this may take a while to get scheduled. If there is anything that can be done in the meantime I'd like to try it. I suspect the might have been several bugs since 1.8.3 that might have manifested in slowness to reply to lock callback requests and you'll end up having downtime to upgrade the clients one way or the other. Bye, Oleg -- Oleg Drokin Senior Software Engineer Whamcloud, Inc. attachment: lisa.vcf___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how to add force_over_8tb to MDS
With one other note: you should have used --mkfsoptions='-t ext4' when doing mkfs.lustre, and NOT the force option. Given that it is already formatted and you don't want to use data, at least use the ext4 Lustre RPMs. Pretty sure you don't need a --writeconf -- you would either run as-is with ext4-based ldiskfs or reformat. The MDT device should be limited to 8TB; I don't think anyone has tested a larger MDT. Kevin Cliff White wrote: This error message you are seeing is what Andreas was talking about - you must use the ext4-based version, as you will not need any option with your size LUNS. The 'must use force_over_8tb' error is the key here, you most certainly want/need to *.ext4.rpm versions of stuff. cliffw On Thu, Jul 14, 2011 at 11:10 AM, Theodore Omtzigt t...@stillwater-sc.com mailto:t...@stillwater-sc.com wrote: Michael: The reason I had to do it on the OST's is because when issuing the mkfs.lustre command to build the OST it would error out with the message that I should use the force_over_8tb mount option. I was not able to create an OST on that device without the force_over_8tb option. Your insights on the writeconf are excellent: good to know that writeconf is solid. Thank you. Theo On 7/14/2011 1:29 PM, Michael Barnes wrote: On Jul 14, 2011, at 1:15 PM, Theodore Omtzigt wrote: Two part question: 1- do I need to set that parameter on the MGS/MDS server as well No, they are different filesystems. You shouldn't need to do this on the OSTs either. You must be using an older lustre release. 2- if yes, how do I properly add this parameter on this running Lustre file system (100TB on 9 storage servers) covered I can't resolve the ambiguity in the documentation as I can't find a good explanation of the configuration log mechanism that is being referenced in the man pages. The fact that the doc for --writeconf states This is very dangerous, I am hesitant to pull the trigger as there is 60TB of data on this file system that I rather not lose. I've had no issues with writeconf. Its nice because it shows you the old and new parameters. Make sure that the changes that you made were the what you want, and that the old parameters that you want to keep are still in tact. I don't remember the exact circumstances, but I've found settings were lost when doing a writeconf, and I had to explictly put these settings in tunefs.lustre command to preserve them. -mb -- +--- | Michael Barnes | | Thomas Jefferson National Accelerator Facility | Scientific Computing Group | 12000 Jefferson Ave. | Newport News, VA 23606 | (757) 269-7634 tel:%28757%29%20269-7634 +--- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org mailto:Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com http://www.whamcloud.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] New wc-discuss Lustre Mailing List
On Tue, Jul 12, 2011 at 02:12:38PM -0700, Peter Jones wrote: Isaac If you (or anyone else for that matter) is having trouble joining the group let me know privately at pjo...@whamcloud.com which email address that you would like to use and I will add you manually. Thanks Peter, I got an invitation from a subscriber and joined without any problem, no Google account required at all. - Isaac __ This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses. Xyratex Technology Limited (03134912), Registered in England Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan Limited registered in Japan. __ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] LNET o2ib networking and MTU
On Thu, Jul 14, 2011 at 12:43:32PM -0700, Adesanya, Adeyemi wrote: Just need some clarification on this: We use the o2ib driver for Lustre IB communication. We also use IPoIB to define IP addresses for the IB interfaces in the network. Does the MTU configuration parameter impact Lustre in any way? My understanding is that LNET is only using IPoIB for address resolution when using o2ib. IPoIB MTU settings don't affect Lustre/LNet in any way if the o2ib driver is used. With recent Lustre releases, you can find out path MTU of each IB connection used by LNet with: lctl conn_list --net o2ib0, for example. - Isaac __ This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses. Xyratex Technology Limited (03134912), Registered in England Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan Limited registered in Japan. __ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss