Re: [PATCH] raid6: fix the input of raid6 algorithm

2016-08-22 Thread H. Peter Anvin
On August 22, 2016 8:22:57 PM PDT, liuzhengy...@kylinos.cn wrote:
>
>To test and choose an best algorithm for raid6, disk number
>and disk data must be offered. Those input depend on page
>size and gfmul table at current time. It would lead the disk
>number less than 4 when the page size is more than 64KB.This
>patch would support arbitrarily page size by defining a macro
>for disk number and using random number to fill with disk data.
>
>Signed-off-by: ZhengYuan Liu 
>---
> lib/raid6/algos.c | 36 ++--
> 1 file changed, 22 insertions(+), 14 deletions(-)
>
>diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
>index 975c6e0..f15a4d2 100644
>--- a/lib/raid6/algos.c
>+++ b/lib/raid6/algos.c
>@@ -23,6 +23,7 @@
> #else
> #include 
> #include 
>+#include 
> #if !RAID6_USE_EMPTY_ZERO_PAGE
> /* In .bss so it's zeroed */
>const char raid6_empty_zero_page[PAGE_SIZE]
>__attribute__((aligned(256)));
>@@ -30,6 +31,8 @@ EXPORT_SYMBOL(raid6_empty_zero_page);
> #endif
> #endif
> 
>+#define RAID6_DISKS   8
>+
> struct raid6_calls raid6_call;
> EXPORT_SYMBOL_GPL(raid6_call);
> 
>@@ -129,7 +132,7 @@ static inline const struct raid6_recov_calls
>*raid6_choose_recov(void)
> }
> 
> static inline const struct raid6_calls *raid6_choose_gen(
>-  void *(*const dptrs)[(65536/PAGE_SIZE)+2], const int disks)
>+  void *(*const dptrs)[RAID6_DISKS], const int disks)
> {
>   unsigned long perf, bestgenperf, bestxorperf, j0, j1;
>   int start = (disks>>1)-1, stop = disks-3;   /* work on the second 
> half
>of the disks */
>@@ -206,27 +209,32 @@ static inline const struct raid6_calls
>*raid6_choose_gen(
> 
> int __init raid6_select_algo(void)
> {
>-  const int disks = (65536/PAGE_SIZE)+2;
>+  const int disks = RAID6_DISKS;
> 
>   const struct raid6_calls *gen_best;
>   const struct raid6_recov_calls *rec_best;
>-  char *syndromes;
>-  void *dptrs[(65536/PAGE_SIZE)+2];
>-  int i;
>-
>-  for (i = 0; i < disks-2; i++)
>-  dptrs[i] = ((char *)raid6_gfmul) + PAGE_SIZE*i;
>+  char *disk_ptr;
>+  void *dptrs[RAID6_DISKS];
>+  int i, j;
> 
>-  /* Normal code - use a 2-page allocation to avoid D$ conflict */
>-  syndromes = (void *) __get_free_pages(GFP_KERNEL, 1);
>+  /* use a 8-page allocation, The first 6 pages for disks
>+ and the last 2 pages for syndromes */
>+  disk_ptr = (void *) __get_free_pages(GFP_KERNEL, 3);
> 
>-  if (!syndromes) {
>+  if (!disk_ptr) {
>   pr_err("raid6: Yikes!  No memory available.\n");
>   return -ENOMEM;
>   }
> 
>-  dptrs[disks-2] = syndromes;
>-  dptrs[disks-1] = syndromes + PAGE_SIZE;
>+  /* Fix-me: may should use get_random_bytes_arch() instead of
>get_random_bytes() */
>+  for (i = 0; i < disks-2; i++) {
>+  dptrs[i] = disk_ptr + PAGE_SIZE*i;
>+  for (j = 0; j < PAGE_SIZE; j++)
>+  get_random_bytes(dptrs[i]+j, 1);
>+  }
>+
>+  dptrs[disks-2] = disk_ptr + PAGE_SIZE*(disks-2);
>+  dptrs[disks-1] = disk_ptr + PAGE_SIZE*(disks-1);
> 
>   /* select raid gen_syndrome function */
>   gen_best = raid6_choose_gen(&dptrs, disks);
>@@ -234,7 +242,7 @@ int __init raid6_select_algo(void)
>   /* select raid recover functions */
>   rec_best = raid6_choose_recov();
> 
>-  free_pages((unsigned long)syndromes, 1);
>+  free_pages((unsigned long)disk_ptr, 3);
> 
>   return gen_best && rec_best ? 0 : -EINVAL;
> }

Do you have any idea how long this takes to run?  People are already 
complaining about the boot time penalty.  get_random_*() is quite expensive and 
is overkill...
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.


Re: [PATCH] raid6: fix the input of raid6 algorithm

2016-08-24 Thread liuzhengyuan
Oh, get_random_*() is really expensive. Thanks for your tips. The boot log on 
my aarch64 showed bellow
told it taked about 0.6 second to fill with disk data. 
  
  [0.172831] DMA: preallocated 256 KiB pool for atomic allocations
  [0.788664] raid6: int64x1  gen()   121 MB/s
  [0.856613] raid6: int64x1  xor()74 MB/s
  [0.924665] raid6: int64x2  gen()   166 MB/s
  [0.992846] raid6: int64x2  xor()95 MB/s
  [1.060681] raid6: int64x4  gen()   290 MB/s
  [1.128774] raid6: int64x4  xor()   160 MB/s
  [1.196933] raid6: int64x8  gen()   238 MB/s
  [1.264937] raid6: int64x8  xor()   148 MB/s
  [1.332878] raid6: neonx1   gen()   256 MB/s
  [1.400975] raid6: neonx1   xor()   130 MB/s
  [1.468951] raid6: neonx2   gen()   333 MB/s
  [1.537085] raid6: neonx2   xor()   181 MB/s
  [1.605042] raid6: neonx4   gen()   451 MB/s
  [1.673121] raid6: neonx4   xor()   289 MB/s
  [1.741143] raid6: neonx8   gen()   452 MB/s
  [1.809151] raid6: neonx8   xor()   277 MB/s
  [1.809154] raid6: using algorithm neonx8 gen() 452 MB/s
  [1.809157] raid6:  xor() 277 MB/s, rmw enabled
  [1.809160] raid6: using intx1 recovery algorithm

 I replaced get_random_* with a local PRNG based on well-know 
"linear congruential bit". The patch was like this:

  +/* use the linear congruential bit. */
  +static int32_t get_random_number_by_lcb(void)
  +{
  +static int32_t seed = 1;
  +int32_t ret = 0;
  +ret = ((seed * 1103515245) + 12345) & 0x7fff;
  +seed = ret;
  +return ret;
  +}
 
   /* Try to pick the best algorithm */
   /* This code uses the gfmul table as convenient data set to abuse */
  @@ -229,8 +238,8 @@ int __init raid6_select_algo(void)
  for (i = 0; i < disks-2; i++) {
  dptrs[i] = disk_ptr + PAGE_SIZE*i;
  -   for (j = 0; j < PAGE_SIZE; j++)
  -   get_random_bytes(dptrs[i]+j, 1);
  +   for (j = 0; j < PAGE_SIZE; j = j + 4)
  +   *(int32_t *)(dptrs[i]+j) = get_random_number_by_lcb();
  }
   
  dptrs[disks-2] = disk_ptr + PAGE_SIZE*(disks-2);

The boot log with this patch was showd bellow, it taked about 0.08 second.

  [0.172858] DMA: preallocated 256 KiB pool for atomic allocations
  [0.256673] raid6: int64x1  gen()   121 MB/s
  [0.324484] raid6: int64x1  xor()73 MB/s
  [0.392606] raid6: int64x2  gen()   166 MB/s
  [0.460309] raid6: int64x2  xor()92 MB/s
  [0.528368] raid6: int64x4  gen()   290 MB/s
  [0.596401] raid6: int64x4  xor()   156 MB/s
  [0.664601] raid6: int64x8  gen()   238 MB/s
  [0.732609] raid6: int64x8  xor()   148 MB/s
  [0.800523] raid6: neonx1   gen()   256 MB/s
  [0.868730] raid6: neonx1   xor()   129 MB/s
  [0.936741] raid6: neonx2   gen()   334 MB/s
  [1.004717] raid6: neonx2   xor()   202 MB/s
  [1.072692] raid6: neonx4   gen()   451 MB/s
  [1.140763] raid6: neonx4   xor()   260 MB/s
  [1.208842] raid6: neonx8   gen()   452 MB/s
  [1.276887] raid6: neonx8   xor()   277 MB/s
  [1.276890] raid6: using algorithm neonx8 gen() 452 MB/s
  [1.276894] raid6:  xor() 277 MB/s, rmw enabled
  [1.276897] raid6: using intx1 recovery algorithm
  [1.276941] ACPI: Interpreter disabled.

I'm not familiar with  spurious D$ conflicts and CPU cache behavior. How do you 
think this PRNG or anything else I need to do?

-- Original --
From:  "H. Peter Anvin";
Date:  Tue, Aug 23, 2016 11:53 AM
To:  "liuzhengyuan";
Cc:  "shli"; "linux-raid"; 
"fenghua.yu"; 
"linux-kernel"; 
"liuzhengyuang521";
Subject:  Re: [PATCH] raid6: fix the input of raid6 algorithm
 
Do you have any idea how long this takes to run?  People are already 
complaining about the boot time penalty.  get_random_*() is quite expensive and 
is overkill...
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.