Hi, Andreas Sorry, if I did not make my question clear at the first place.
I am testing overstriping feature and observed a decent performance improvement. Enabling overstriping using only a subset of OSTs is just my experiments. I am thinking that for median-size applications it may be better to use only a subset of OSTs than all of them. This is based on from the perspective of complexity of network communication between the computer nodes and OSS nodes. For example, on Perlmutter at NERSC, there are a total of 370 OSTs. If an applications runs on, say 100 compute nodes and 128 MPI processes per node, I guess using 100 OSTs is a good number and overstriping them with 3 striping count per OST performs better than 300 OSTs with no overstriping. Will this be the case? I will also run some experiments there to see. Wei-keng On Dec 22, 2025, at 1:29 PM, Andreas Dilger <[email protected]> wrote: Your first email was not clear that you are trying to overstripe the file on a subset of OSTs. When the MDS is selecting the OSTs for a file, it will always try to put each stripe on a different OST if possible (subject to limitations of the OST pool and free space on OSTs), before overstriping. There isn't any benefit to overstriping a file when there are unused OSTs available, except for synthetic test workloads. In your previous email thread you mentioned the filesystem has 160 OSTs, so an 8-stripe file will always prefer to use 8 different OSTs. Overstriping is not different than regular striping, in that you either need to use an OST pool, or specify the OST indexes to limit the allocation to a subset of OSTs. In your example, the "-C 8" is not more than the number of OSTs, so the overstriping flag is cleared from the layout, since each of the 8 stripes is on a different OST. This is true whether you use "lfs setstripe" or "llapi_layout_*()" calls. Using "-c 4 -C 8" is not different than just "-C 8", since the first stripe count is overwritten by the second stripe count. If this is just for testing bandwidth or similar, then it should be enough to specify "-o M-N,M-N[,...]" for your tests. If there is a good *production* reason to overstripe when there are more OSTs available, then I would be interested to hear what that is. Cheers, Andreas On Dec 22, 2025, at 10:59, Wei-Keng Liao <[email protected]> wrote: Hi, Andreas The lfs-setstripe man page for option '-C' indicates only negative values can be used, and the file will be striped over all available OSTs. However, my wish is to stripe a file over only a subset set of available OSTs. Is it possible to achieve that? I just now tried the two commands below without '-o' option. My intent is to create a file with stripe count of 8 over 4 OSTs. But they both ended up with the same result of no overstriping. % lfs setstripe -c 4 -C 8 $SCRATCH/dummy % lfs setstripe -C 8 $SCRATCH/dummy % lfs getstripe $SCRATCH/dummy /pscratch/sd/w/wkliao/dummy lmm_stripe_count: 8 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 168 lmm_pool: original obdidx objid objid group 168 19587711 0x12ae27f 0x368000041f 169 19224808 0x12558e8 0x36c0000428 170 19783691 0x12de00b 0x3700000413 171 20429006 0x137b8ce 0x3740000419 172 19633677 0x12b960d 0x3780000421 173 20027491 0x1319863 0x37c0000402 174 19912786 0x12fd852 0x3800000401 175 20862151 0x13e54c7 0x3840000418 As for using llapi_layout APIs, I am doing the followings. It seems like I miss some API call to set the number of overstipes or number of stripes per OST, as they would not achieve an overstriping setting. struct llapi_layout *layout = llapi_layout_alloc(); err = llapi_layout_pattern_set(layout, LLAPI_LAYOUT_OVERSTRIPING); err = llapi_layout_stripe_count_set(layout, 8); fd = llapi_layout_file_create(path, O_CREAT|O_RDWR, 0660, layout); I found the only way to achieve overstriping is to call err = llapi_layout_ost_index_set(layout, stripe_number, ost_index); However, I must pick the values for argument 'ost_index'. Wei-keng On Dec 22, 2025, at 4:32 AM, Andreas Dilger <[email protected]> wrote: You should be able to use "-C N" to overstripe a file without specifying the OST indexes with "-o ...". For handling this via llapi_layout commands, I believe it is necessary to set llapi_layout_pattern_set(LLAPI_LAYOUT_OVERSTRIPING) flag on the component, and then specify a stripe count > OSTCOUNT. I see this isn't documented in the llapi_layout_pattern_set(3) man page (along with LLAPI_LAYOUT_FOREIGN), so please file a Jira ticket for this (and ideally also submit a patch to the man page). The flag will be cleared if the stripe count <= OSTCOUNT, for improved compatibility with older clients that do not understand overstriping (though that is unlikely these days). The patch https://urldefense.com/v3/__https://review.whamcloud.com/54192__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5g5Wpb5IY$ ("LU-16938 utils: setstripe overstripe multiple OST count") along with a few follow-on fixes in Lustre 2.16+ also allows specifying: lfs setstripe -C -N ... FILE|DIR (or llapi equivalent) to create 'N' stripes per OST for the file, instead of having to know the exact OST count, if that is more convenient. Cheers, Andreas On Dec 20, 2025, at 18:52, Wei-Keng Liao via lustre-discuss <[email protected]> wrote: When setting the overstriping for a new file, is it possible to let the MDS to choose the OST indices? I was able to use lfs command to set an overstiping for a new file. For example, to overstripe a file over 4 OSTs with 2 stripe per OST, I am using this command: % lfs setstripe -c 4 -C 8 -o 10-13,10-13 $SCRATCH/dummy % lfs getstripe $SCRATCH/dummy | grep lmm lmm_stripe_count: 8 lmm_stripe_size: 1048576 lmm_pattern: raid0,overstriped lmm_layout_gen: 0 lmm_stripe_offset: 10 lmm_pool: original My understanding is when without overstriping, the default is that the OSTs are selected by Lustre MDS based on some policy (maybe OST usage). I wonder if this can also apply to overstriping, i.e. using lfs command options '-c' and '-C' without option '-o'. I am also wondering how this can be achieved using the Lustre user C APIs, when making calls to llapi_layout_ost_index_set(). Wei-keng _______________________________________________ lustre-discuss mailing list [email protected] https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5gQzWGvgU$ Andreas Dilger Principal Lustre Architect [email protected]
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
