On 2011-02-15, at 12:20, Cliff White wrote: > Client situation depends on where you deactivated the OST - if you deactivate > on the MDS only, clients should be able to read. > > What is best to do when an OST fills up really depends on what else you are > doing at the time, and how much control you have over what the clients are > doing and other things. If you can solve the space issue with a quick rm > -rf, best to leave it online, likewise if all your clients are trying to bang > on it and failing, best to turn things off. YMMV
In theory, with 1.8 the full OST should be skipped for new object allocations, but this is not robust in the face of e.g. a single very large file being written to the OST that takes it from "average" usage to being full. > On Tue, Feb 15, 2011 at 10:57 AM, Jagga Soorma <jagg...@gmail.com> wrote: > Hi Guys, > > One of my clients got a hung lustre mount this morning and I saw the > following errors in my logs: > > -- > ..snip.. > Feb 15 09:38:07 reshpc116 kernel: LustreError: 11-0: an error occurred while > communicating with 10.0.250.47@o2ib3. The ost_write operation failed with -28 > Feb 15 09:38:07 reshpc116 kernel: LustreError: Skipped 4755836 previous > similar messages > Feb 15 09:48:07 reshpc116 kernel: LustreError: 11-0: an error occurred while > communicating with 10.0.250.47@o2ib3. The ost_write operation failed with -28 > Feb 15 09:48:07 reshpc116 kernel: LustreError: Skipped 4649141 previous > similar messages > Feb 15 10:16:54 reshpc116 kernel: Lustre: > 6254:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request > x1360125198261945 sent from reshpcfs-OST0005-osc-ffff8830175c8400 to NID > 10.0.250.47@o2ib3 1344s ago has timed out (1344s prior to deadline). > Feb 15 10:16:54 reshpc116 kernel: Lustre: > reshpcfs-OST0005-osc-ffff8830175c8400: Connection to service reshpcfs-OST0005 > via nid 10.0.250.47@o2ib3 was lost; in progress operations using this service > will wait for recovery to complete. > Feb 15 10:16:54 reshpc116 kernel: LustreError: 11-0: an error occurred while > communicating with 10.0.250.47@o2ib3. The ost_connect operation failed with > -16 > Feb 15 10:16:54 reshpc116 kernel: LustreError: Skipped 2888779 previous > similar messages > Feb 15 10:16:55 reshpc116 kernel: Lustre: > 6254:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request > x1360125198261947 sent from reshpcfs-OST0005-osc-ffff8830175c8400 to NID > 10.0.250.47@o2ib3 1344s ago has timed out (1344s prior to deadline). > Feb 15 10:18:11 reshpc116 kernel: LustreError: 11-0: an error occurred while > communicating with 10.0.250.47@o2ib3. The ost_connect operation failed with > -16 > Feb 15 10:18:11 reshpc116 kernel: LustreError: Skipped 10 previous similar > messages > Feb 15 10:20:45 reshpc116 kernel: LustreError: 11-0: an error occurred while > communicating with 10.0.250.47@o2ib3. The ost_connect operation failed with > -16 > Feb 15 10:20:45 reshpc116 kernel: LustreError: Skipped 21 previous similar > messages > Feb 15 10:25:46 reshpc116 kernel: LustreError: 11-0: an error occurred while > communicating with 10.0.250.47@o2ib3. The ost_connect operation failed with > -16 > Feb 15 10:25:46 reshpc116 kernel: LustreError: Skipped 42 previous similar > messages > Feb 15 10:31:43 reshpc116 kernel: Lustre: > reshpcfs-OST0005-osc-ffff8830175c8400: Connection restored to service > reshpcfs-OST0005 using nid 10.0.250.47@o2ib3. > -- > > Due to disk space issues on my lustre filesystem one of the OST's were full > and I deactivated that OST this morning. I thought that operation just puts > it in a read only state and that clients can still access the data from that > OST. After activating this OST again the client connected again and was okay > after this. How else would you deal with a OST that is close to 100% full? > Is it okay to leave the OST active and the clients will know not to write > data to that OST? > > Thanks, > -J > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss