On 12/11/2018 12.58, Timofey Titovets wrote:
> From: Timofey Titovets <nefelim...@gmail.com>
> 
> Currently btrfs raid1/10 balancer bаlance requests to mirrors,
> based on pid % num of mirrors.
[...]
>   v6 -> v7:
>     - Fixes based on Nikolay Borisov review:
>       * Assume num == 2
>       * Remove "for" loop based on that assumption, where possible
>       * No functional changes
> 
> Signed-off-by: Timofey Titovets <nefelim...@gmail.com>
> Tested-by: Dmitrii Tcvetkov <demfl...@demfloro.ru>
> Reviewed-by: Dmitrii Tcvetkov <demfl...@demfloro.ru>
> ---
[...]
> +/**
> + * guess_optimal - return guessed optimal mirror
> + *
> + * Optimal expected to be pid % num_stripes
> + *
> + * That's generaly ok for spread load
> + * Add some balancer based on queue length to device
> + *
> + * Basic ideas:
> + *  - Sequential read generate low amount of request
> + *    so if load of drives are equal, use pid % num_stripes balancing
> + *  - For mixed rotate/non-rotate mirrors, pick non-rotate as optimal
> + *    and repick if other dev have "significant" less queue length
> + *  - Repick optimal if queue length of other mirror are less
> + */
> +static int guess_optimal(struct map_lookup *map, int num, int optimal)
> +{
> +     int i;
> +     int round_down = 8;
> +     /* Init for missing bdevs */
> +     int qlen[2] = { INT_MAX, INT_MAX };
> +     bool is_nonrot[2] = { false, false };
> +     bool all_bdev_nonrot = true;
> +     bool all_bdev_rotate = true;
> +     struct block_device *bdev;
> +
> +     ASSERT(num == 2);
> +
> +     /* Check accessible bdevs */> + for (i = 0; i < 2; i++) {

>From your function comment, it is not clear why you are comparing "num" to 
>"2". Pay attention that there are somewhere some patches which implement raid 
>profile with higher redundancy (IIRC up to 4). I suggest to put your 
>assumption also in the comment ("...this function works up to 2 mirrors...") 
>and, better, add a define like 

#define BTRFS_MAX_RAID1_RAID10_MIRRORS 2

And replace "2" with BTRFS_MAX_RAID1_RAID10_MIRRORS



> +             bdev = map->stripes[i].dev->bdev;
> +             if (bdev) {
> +                     qlen[i] = 0;
> +                     is_nonrot[i] = blk_queue_nonrot(bdev_get_queue(bdev));
> +                     if (is_nonrot[i])
> +                             all_bdev_rotate = false;
> +                     else
> +                             all_bdev_nonrot = false;
> +             }
> +     }
> +
> +     /*
> +      * Don't bother with computation
> +      * if only one of two bdevs are accessible
> +      */
> +     if (qlen[0] == INT_MAX)
> +             return 1;
> +     if (qlen[1] == INT_MAX)
> +             return 0;
> +
> +     if (all_bdev_nonrot)
> +             round_down = 2;
> +
> +     for (i = 0; i < 2; i++) {
> +             bdev = map->stripes[i].dev->bdev;
> +             qlen[i] = bdev_get_queue_len(bdev, round_down);
> +     }
> +
> +     /* For mixed case, pick non rotational dev as optimal */
> +     if (all_bdev_rotate == all_bdev_nonrot) {
> +             if (is_nonrot[0])
> +                     optimal = 0;
> +             else
> +                     optimal = 1;
> +     }
> +
> +     if (qlen[optimal] > qlen[(optimal + 1) % 2])
> +             optimal = i;
> +
> +     return optimal;
> +}
> +
>  static int find_live_mirror(struct btrfs_fs_info *fs_info,
>                           struct map_lookup *map, int first,
>                           int dev_replace_is_ongoing)
> @@ -5177,7 +5274,8 @@ static int find_live_mirror(struct btrfs_fs_info 
> *fs_info,
>       else
>               num_stripes = map->num_stripes;
>  
> -     preferred_mirror = first + current->pid % num_stripes;
> +     preferred_mirror = first + guess_optimal(map, num_stripes,
> +                                              current->pid % num_stripes);
>  
>       if (dev_replace_is_ongoing &&
>           fs_info->dev_replace.cont_reading_from_srcdev_mode ==
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

Reply via email to