[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-10-04 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

Bill Fischofer  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Bill Fischofer  ---
Fix has been merged.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-09-12 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #14 from Maxim Uvarov  ---
https://github.com/Linaro/odp/commit/c46f54d8c708d6335b0288ff4a5aad3a3b93e41c
refs/heads/master
2018-09-12T17:36:54+03:00
Josep Puigdemont josep.puigdem...@linaro.org
linux-gen: ishm: implement huge page cache

With this patch, ODP will pre-allocate several huge pages at init
time. When memory is to be mapped into a huge page, one that was
pre-allocated will be used, if available, this way ODP won't have to
trap into the kernel to allocate huge pages.

The idea with this implementation is to trick ishm into thinking that
a file descriptor where to map the memory was provided, this way it
it won't try to allocate one itself. This file descriptor is one of
those previously allocated at init time. When the system is done with
this file descriptor, instead of closing it, it is put back into the
list of available huge pages, ready to be reused.

A collateral effect of this patch is that memory is not zeroed out
when it is reused.

WARNING: This patch will not work when using process mode threads.
For several reasons, this may not work when using ODP_ISHM_SINGLE_VA
either, so when this flag is set, the list of pre-allocated files is
not used.

By default ODP will not reserve any huge pages, to tell ODP to do that,
update the ODP configuration file with something like this:
shm: {
num_cached_hp = 32
}

Example usage:

$ echo odp.config
odp_implementation = "linux-generic"
config_file_version = "0.0.1"
shm: {
num_cached_hp = 32
}

$ ODP_CONFIG_FILE=odp.conf ./test/validation/api/shmem/shmem_main

This patch solves bug #3774:
https://bugs.linaro.org/show_bug.cgi?id=3774
Signed-off-by: Josep Puigdemont 
Reviewed-and-tested-by: Matias Elo 
Signed-off-by: Maxim Uvarov 

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-08-28 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #13 from Josep Puigdemont  ---
A patch that fixes/mitigates this issue can be found here:
https://github.com/joseppc/odp/tree/fix/cache_huge_pages

This patch will pre-allocate huge pages at init time and never release them to
the kernel until the application finishes, instead they will be kept in a list
ready to be reused, thus avoiding the time spent in the kernel zeroing out the
memory.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-08-21 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #12 from Matias Elo  ---
Hi,


I'm currently on vacation with limited access to email. I'll be returning to
office on September 3rd.


Best regards,

Matias Elo

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-08-21 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #11 from Josep Puigdemont  ---
The issue here seems to be the same as for bug 3867. When allocating a huge
page, the kernel zeroes it out before handing it over to user space, and this
is one of the causes of the delays. Another cause is that there is a single
lock taken when entering any of the shared memory module functions, and
released only on exit (roughly speaking), causing lock contention for other
threads and CPU usage.

Due to the nature of this test application, which has many threads allocating
and freeing shared memory in rather rapid succession, all of the above becomes
a problem that results in the delays observed. Maybe this is not an issue for
real world applications, which probably allocate memory once at start-up or, at
least, not so often.

One initial idea to mitigate this problem was to keep the huge pages that where
to be freed in a list of "available" pages rather than closing the file
descriptor. However, this approach would create problems with "process mode"
threads in ODP, as we would need to find a way to know when all threads have
freed a given huge page (it might be possible to implement this functionality
in fdserver, but it doesn't feel right).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-07-19 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #10 from Matias Elo  ---
I timed shmem test runs with both 2MB and 1GB pages (28 thread system):

2MB pages
-
Run Summary:Type   Total Ran  Passed Failed Inactive
  suites   1   1 n/a  00
   tests   5   5   5  00
 asserts 1812146 1812146 1812146  0  n/a

Elapsed time =  105.336 seconds
101.08user 4.85system 0:05.31elapsed 1995%CPU (0avgtext+0avgdata
15968maxresident)k
0inputs+0outputs (0major+17910minor)pagefaults 0swaps


1GB pages
-
Run Summary:Type   Total Ran  Passed Failed Inactive
  suites   1   1 n/a  00
   tests   5   5   5  00
 asserts 1807525 1807525 1807525  0  n/a

Elapsed time = 16599.636 second
15850.52user 751.34system 12:31.74elapsed 2208%CPU (0avgtext+0avgdata
17880maxresident)k
5480inputs+0outputs (20major+46992minor)pagefaults 0swaps


Perf cycles with 1GB pages:

 97.37%  shmem_main[.] odp_spinlock_lock
  2.44%  [kernel]  [k] clear_page_erms
  0.08%  [kernel]  [k] clear_huge_page

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-06-01 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #9 from Josep Puigdemont  ---
(In reply to Brian Brooks from comment #7)
> I cannot reproduce this issue. Instead I see it hanging on this test:
> 
>   make[3]: Entering directory '/home/brian/odp/example/l2fwd_simple'
>   make[4]: Entering directory '/home/brian/odp/example/l2fwd_simple'

For this issue I opened bug #3879.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-06-01 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #8 from Bill Fischofer  ---
Thanks, Brian. Were you testing on Arm? You may have noticed that Josep posted
https://github.com/Linaro/odp/pull/609 as a fix for this. Can you verify that
it works on Arm as well?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-06-01 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #7 from Brian Brooks  ---
I cannot reproduce this issue. Instead I see it hanging on this test:

  make[3]: Entering directory '/home/brian/odp/example/l2fwd_simple'
  make[4]: Entering directory '/home/brian/odp/example/l2fwd_simple'

The perf output Matias shared is clearly related to x86 Linux kernel.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-05-29 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

Josep Puigdemont  changed:

   What|Removed |Added

 CC||josep.puigdem...@linaro.org

--- Comment #6 from Josep Puigdemont  ---
In my laptop, with just 6 1G huge pages (what I had at hand), the issue is not
reproducible but, like Matias, I did see the timer test failing due to the
first timer being delayed. Maybe we should open another bug report for that:

Test: timer_test_plain_queue ...odp_ishmphy.c:151:_odp_ishmphy_map():mmap
failed:Cannot allocate memory
odp_ishmphy.c:151:_odp_ishmphy_map():mmap failed:Cannot allocate memory
timer.c:261:timer_test_queue_type():
Timer pool parameters:
timer.c:262:timer_test_queue_type():  res_ns  2000
timer.c:263:timer_test_queue_type():  min_tmo 1
timer.c:264:timer_test_queue_type():  max_tmo 1
timer.c:288:timer_test_queue_type():  period_ns 4
timer.c:289:timer_test_queue_type():  period_tick 20

timer.c:308:timer_test_queue_type():abs timer tick 20
timer.c:308:timer_test_queue_type():abs timer tick 40
timer.c:308:timer_test_queue_type():abs timer tick 60
timer.c:308:timer_test_queue_type():abs timer tick 80
timer.c:308:timer_test_queue_type():abs timer tick 100
timer.c:308:timer_test_queue_type():abs timer tick 120
timer.c:308:timer_test_queue_type():abs timer tick 140
timer.c:308:timer_test_queue_type():abs timer tick 160
timer.c:308:timer_test_queue_type():abs timer tick 180
timer.c:308:timer_test_queue_type():abs timer tick 200
odp_timer.c:883:timer_notify():
3 ticks overrun on timer pool "timer_pool", timer resolution too high
timer.c:342:timer_test_queue_type():timeout tick 20, timeout period 488145691
timer.c:342:timer_test_queue_type():timeout tick 40, timeout period 371834630
timer.c:342:timer_test_queue_type():timeout tick 60, timeout period 36114
timer.c:342:timer_test_queue_type():timeout tick 80, timeout period 38216
timer.c:342:timer_test_queue_type():timeout tick 100, timeout period 35336
timer.c:342:timer_test_queue_type():timeout tick 120, timeout period 36924
timer.c:342:timer_test_queue_type():timeout tick 140, timeout period 36870
timer.c:342:timer_test_queue_type():timeout tick 160, timeout period 36823
timer.c:342:timer_test_queue_type():timeout tick 180, timeout period 36815
timer.c:342:timer_test_queue_type():timeout tick 200, timeout period 36778
timer.c:352:timer_test_queue_type():test period 4059954202
FAILED
1. timer.c:338  - diff_period < (period_ns + (4 * res_ns))

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-05-29 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #5 from Matias Elo  ---
I ran the same test on odp-dpdk with 1GB pages without problems, so the issue
is  restricted to odp-linux.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-05-24 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

Bill Fischofer  changed:

   What|Removed |Added

   Assignee|christophe.mil...@linaro.or |brian.bro...@linaro.org
   |g   |
 CC||bill.fischo...@linaro.org

--- Comment #4 from Bill Fischofer  ---
Brian will investigate. Matias will double check on odp-dpdk to see if issue is
restricted to odp-linux or has wider scope.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-05-03 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #3 from Matias Elo  ---
shmem_test_stress actually passes, it just takes really long time. Perf shows
that almost all time is spent in kernel.

  94.17%  [kernel][k] clear_page_erms
   2.55%  [kernel][k] clear_huge_page
   2.15%  [kernel][k] _cond_resched
   0.88%  [kernel][k] rcu_all_qs
   0.01%  [kernel][k] _raw_spin_lock
   0.01%  [kernel][k] update_load_avg

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-05-03 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #2 from Matias Elo  ---
May be related. With 1GB huge pages the first timer is always delayed, which
causes timer validation test to fail:

  Test: timer_test_sched_queue ...timer.c:261:timer_test_queue_type():
Timer pool parameters:
timer.c:262:timer_test_queue_type():  res_ns  2000
timer.c:263:timer_test_queue_type():  min_tmo 1
timer.c:264:timer_test_queue_type():  max_tmo 1
timer.c:288:timer_test_queue_type():  period_ns 4
timer.c:289:timer_test_queue_type():  period_tick 20

timer.c:308:timer_test_queue_type():abs timer tick 20
timer.c:308:timer_test_queue_type():abs timer tick 40
timer.c:308:timer_test_queue_type():abs timer tick 60
timer.c:308:timer_test_queue_type():abs timer tick 80
timer.c:308:timer_test_queue_type():abs timer tick 100
timer.c:308:timer_test_queue_type():abs timer tick 120
timer.c:308:timer_test_queue_type():abs timer tick 140
timer.c:308:timer_test_queue_type():abs timer tick 160
timer.c:308:timer_test_queue_type():abs timer tick 180
timer.c:308:timer_test_queue_type():abs timer tick 200
odp_timer.c:880:timer_notify():
9 ticks overrun on timer pool "timer_pool", timer resolution too high
timer.c:342:timer_test_queue_type():timeout tick 20, timeout period 604149774
timer.c:342:timer_test_queue_type():timeout tick 40, timeout period 375843228
timer.c:342:timer_test_queue_type():timeout tick 60, timeout period 38013
timer.c:342:timer_test_queue_type():timeout tick 80, timeout period 39128
timer.c:342:timer_test_queue_type():timeout tick 100, timeout period 38088
timer.c:342:timer_test_queue_type():timeout tick 120, timeout period 40630
timer.c:342:timer_test_queue_type():timeout tick 140, timeout period 38801
timer.c:342:timer_test_queue_type():timeout tick 160, timeout period 40252
timer.c:342:timer_test_queue_type():timeout tick 180, timeout period 37224
timer.c:342:timer_test_queue_type():timeout tick 200, timeout period 39459
timer.c:352:timer_test_queue_type():test period 4179984604
FAILED
1. timer.c:338  - diff_period < (period_ns + (4 * res_ns))

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3774] Shmem validation test runs indefinitely with 1GB huge pages

2018-05-03 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3774

--- Comment #1 from Matias Elo  ---
Log from shmem validation test getting stuck (shmem_test_stress):
https://pastebin.com/Pq2ieuwW

-- 
You are receiving this mail because:
You are on the CC list for the bug.