Re: SegFault in Crawler Part

2021-06-01 Thread Qingchen Dang
Thank you very much! Yes your guess is correct, I forgot the possibility of 
evicting a crawler item :(

Furthermore, I have a similar problem as this 
post: https://github.com/memcached/memcached/issues/467
I gave a very limited memory usage to Memcached to test eviction and it 
does cause the similar error.
When I use Memtier_Benchmark, the error looks like:

*[RUN #1] Preparing benchmark client...*

*[RUN #1] Launching threads now...*

*error: response parsing failed.*

*error: response parsing failed.*

*server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
storing object*

*error: response parsing failed.*

*server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
storing object*

*error: response parsing failed.*

*[RUN #1 17%,   0 secs]  1 threads:   87137 ops,   87213 (avg:   87213) 
ops/sec, 65.66MB/sec (avg: 65.66MB/sec*

*[RUN #1 36%,   1 secs]  1 threads:  179012 ops,   91864 (avg:   89540) 
ops/sec, 69.87MB/sec (avg: 67.76MB/sec*

*[RUN #1 56%,   2 secs]  1 threads:  279971 ops,  100947 (avg:   93343) 
ops/sec, 76.76MB/sec (avg: 70.76MB/sec*

*[RUN #1 75%,   3 secs]  1 threads:  375715 ops,   95732 (avg:   93941) 
ops/sec, 72.87MB/sec (avg: 71.29MB/sec*

*[RUN #1 92%,   4 secs]  1 threads:  462054 ops,   93910 (avg:   93935) 
ops/sec, 71.41MB/sec (avg: 71.31MB/sec*

*[RUN #1 92%,   4 secs]  1 threads:  462054 ops,   0 (avg:   92431) 
ops/sec, 0.00KB/sec (avg: 70.17MB/sec)*

*[RUN #1 92%,   5 secs]  1 threads:  462054 ops,   0 (avg:   90975) 
ops/sec, 0.00KB/sec (avg: 69.06MB/sec)*

*[RUN #1 92%,   5 secs]  1 threads:  462054 ops,   0 (avg:   89564) 
ops/sec, 0.00KB/sec (avg: 67.99MB/sec)*
When I use Memaslap, it looks like 

*set proportion: set_prop=0.10*

*get proportion: get_prop=0.90*

*<12 SERVER_ERROR out of memory storing object*

*<10 SERVER_ERROR out of memory storing object*

*<12 SERVER_ERROR out of memory storing object*

*<7 SERVER_ERROR out of memory storing object*
The unmodified Memcached gives errors less frequently than Memcached with 
my eviction framework (especially using Memtier_Benchmark), so I wonder the 
reason. I read your post message in the above link, but I am still confused 
about why memory limitation affect Memcached's usage. Could you give a more 
detailed explanation? If I have to give limited memory, is there a way to 
avoid this issue?
Thank you very much for helping!

Best,
Qingchen
On Tuesday, June 1, 2021 at 2:36:09 AM UTC-4 Dormando wrote:

> try '-o no_lru_crawler' ? That definitely works.
>
> I don't know what you're doing since no code has been provided. The locks
> around managing LRU tails is pretty strict; so make sure you are actually
> using them correctly.
>
> The LRU crawler works by injecting a fake item into the LRU, then using
> that to keep its position and walk. If I had to guess I bet you've
> "evicted" the LRU crawler, which then immediately dies when it tries to
> continue crawling.
>
> On Mon, 31 May 2021, Qingchen Dang wrote:
>
> > Furthermore, I tried to disable the crawler with the '- no_lru_crawler' 
> command parameter, and it gives the same error. I wonder why it does not 
> disable
> > the crawler lru as it supposes to do.
> >
> > On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:
> > Hi,
> > I am implementing a framework based on Memcached. There's a problem that 
> confused me a lot. The framework basically change the eviction policy, so
> > when it calls to evict an item, it might not evict the tail item at COLD 
> LRU, instead it will look for a "more suitable" item to evict and it will
> > reinsert the tail items to the head of COLD queue.
> >
> > It mostly works fine, but sometimes it causes a SegFault when 
> reinsertion happens very frequently (like in almost each eviction). The 
> SegFault is
> > triggered in the crawler part. As attached, it seems when the crawler 
> loops through the item queue, it reaches an invalid memory address. The bug
> > happens after around 5000~1000 GET/SET (9:1) operations. I used 
> Memaslap for testing.
> >
> > Could anyone give me some suggestions of the reasons which cause such 
> error?
> >
> > Here is the gdb messages:
> >
> > Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.
> >
> > [Switching to Thread 0x74d6c700 (LWP 36414)]
> >
> > do_item_crawl_q (it=it@entry=0x5579e7e0 )
> >
> > at items.c:2015
> >
> > 2015 it->prev->next = it->next;
> >
> > (gdb) print it->prev
> >
> > $5 = (struct _stritem *) 0x4f4d6355616d5471
> >
> > (gdb) print it->prev->next
> >
> > Cannot access memory at address 0x4f4d6355616d54

Re: SegFault in Crawler Part

2021-05-31 Thread Qingchen Dang
Furthermore, I tried to disable the crawler with the '- no_lru_crawler' 
command parameter, and it gives the same error. I wonder why it does not 
disable the crawler lru as it supposes to do.

On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:

> Hi,
>
> I am implementing a framework based on Memcached. There's a problem that 
> confused me a lot. The framework basically change the eviction policy, so 
> when it calls to evict an item, it might not evict the tail item at COLD 
> LRU, instead it will look for a "more suitable" item to evict and it will 
> reinsert the tail items to the head of COLD queue.
>
> It mostly works fine, but sometimes it causes a SegFault when reinsertion 
> happens very frequently (like in almost each eviction). The SegFault is 
> triggered in the crawler part. As attached, it seems when the crawler loops 
> through the item queue, it reaches an invalid memory address. The bug 
> happens after around 5000~1000 GET/SET (9:1) operations. I used 
> Memaslap for testing.
>
> Could anyone give me some suggestions of the reasons which cause such 
> error?
>
> Here is the gdb messages:
>
> *Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.*
>
> *[Switching to Thread 0x74d6c700 (LWP 36414)]*
>
> *do_item_crawl_q (it=it@entry=0x5579e7e0 )*
>
> *at items.c:2015*
>
> *2015 it->prev->next = it->next;*
>
> *(gdb) print it->prev*
>
> *$5 = (struct _stritem *) 0x4f4d6355616d5471*
>
> *(gdb) print it->prev->next*
>
> *Cannot access memory at address 0x4f4d6355616d5479*
>
> *(gdb) print it->next*
>
> *$6 = (struct _stritem *) 0x7a59324376753351*
>
> *(gdb) print it->next->prev*
>
> *Cannot access memory at address 0x7a59324376753361*
>
> *(gdb) print it->nkey*
>
> *$7 = 0 '\000'*
>
> *(gdb) *
> Here is the part that triggers the error:
>
> *2012 assert(it->next != it);*
>
> *2013 if (it->next) {*
>
> *2014 assert(it->prev->next == it);*
>
> *2015 it->prev->next = it->next;*
>
> *2016 it->next->prev = it->prev;*
>
> *2017 } else {*
>
> *2018 /* Tail. Move this above? */*
>
> *2019 it->prev->next = 0;*
>
> *2020 }*
>
> (I'm also confused why the assert function in line 2014 does not give 
> error?)
>
> Thank you very much for helping!
>
> Best,
>
> Qingchen
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/1398d377-06b8-4a43-8811-f299d044d055n%40googlegroups.com.


SegFault in Crawler Part

2021-05-30 Thread Qingchen Dang
Hi,

I am implementing a framework based on Memcached. There's a problem that 
confused me a lot. The framework basically change the eviction policy, so 
when it calls to evict an item, it might not evict the tail item at COLD 
LRU, instead it will look for a "more suitable" item to evict and it will 
reinsert the tail items to the head of COLD queue.

It mostly works fine, but sometimes it causes a SegFault when reinsertion 
happens very frequently (like in almost each eviction). The SegFault is 
triggered in the crawler part. As attached, it seems when the crawler loops 
through the item queue, it reaches an invalid memory address. The bug 
happens after around 5000~1000 GET/SET (9:1) operations. I used 
Memaslap for testing.

Could anyone give me some suggestions of the reasons which cause such error?

Here is the gdb messages:

*Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.*

*[Switching to Thread 0x74d6c700 (LWP 36414)]*

*do_item_crawl_q (it=it@entry=0x5579e7e0 )*

*at items.c:2015*

*2015 it->prev->next = it->next;*

*(gdb) print it->prev*

*$5 = (struct _stritem *) 0x4f4d6355616d5471*

*(gdb) print it->prev->next*

*Cannot access memory at address 0x4f4d6355616d5479*

*(gdb) print it->next*

*$6 = (struct _stritem *) 0x7a59324376753351*

*(gdb) print it->next->prev*

*Cannot access memory at address 0x7a59324376753361*

*(gdb) print it->nkey*

*$7 = 0 '\000'*

*(gdb) *
Here is the part that triggers the error:

*2012 assert(it->next != it);*

*2013 if (it->next) {*

*2014 assert(it->prev->next == it);*

*2015 it->prev->next = it->next;*

*2016 it->next->prev = it->prev;*

*2017 } else {*

*2018 /* Tail. Move this above? */*

*2019 it->prev->next = 0;*

*2020 }*

(I'm also confused why the assert function in line 2014 does not give 
error?)

Thank you very much for helping!

Best,

Qingchen

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/ed80d89c-eb8f-4682-9938-a7cd024d4d10n%40googlegroups.com.


Re: Memcached Testing Evictions

2021-04-29 Thread Qingchen Dang
Oh I see. Yes that is the case. Thank you very much for helping me out!!

On Thu, Apr 29, 2021 at 2:07 AM dormando  wrote:

> Hey,
>
> I don't know what you're doing but I can say there's nothing buffering
> prints. Maybe you're starting memcached as a daemon? Prints won't work in
> that case.
>
> On Thu, 29 Apr 2021, Qingchen Dang wrote:
>
> > Thank you for helping! Yes I actually figured it out that I used wrong
> parameters to load data which causes the overwrite.
> >
> > But I do have one more question, how to add print statements to
> memcached source code? When I use ‘make test’, printf works as usual. But
> in the actual run, it seems the print is re-bufferred to somewhere else
> which makes it very hard to debug. Is there a way to print to stdout?
> >
> > Thank you very much!
> > QD
> >
> > Sent from my iPhone
> >
> > > On Apr 28, 2021, at 10:33 PM, dormando  wrote:
> > >
> > > How're you loading the data?
> > >
> > > From the stats it looks like you're probably overwriting the same
> values
> > > over and over (high total_items but low curr_items and no get_expired)
> > >
> > >> On Wed, 28 Apr 2021, Qingchen Dang wrote:
> > >>
> > >> Hi,
> > >> I am trying to test my optimization of Memcached eviction, but it
> seems even though I got a lot of misses, the evictions stats is always 0.
> > >> Here is the stats for the original Memcached without any change. I
> wonder what memcached config/benchmark config can make eviction happen?
> > >>
> > >> STAT pointer_size 64
> > >>
> > >> STAT rusage_user 110.668826
> > >>
> > >> STAT rusage_system 769.767632
> > >>
> > >> STAT max_connections 1024
> > >>
> > >> STAT curr_connections 2
> > >>
> > >> STAT total_connections 8004
> > >>
> > >> STAT rejected_connections 0
> > >>
> > >> STAT connection_structures 402
> > >>
> > >> STAT response_obj_oom 0
> > >>
> > >> STAT response_obj_count 1
> > >>
> > >> STAT response_obj_bytes 131072
> > >>
> > >> STAT read_buf_count 16
> > >>
> > >> STAT read_buf_bytes 262144
> > >>
> > >> STAT read_buf_bytes_free 114688
> > >>
> > >> STAT read_buf_oom 0
> > >>
> > >> STAT reserved_fds 40
> > >>
> > >> STAT cmd_get 7272
> > >>
> > >> STAT cmd_set 728
> > >>
> > >> STAT cmd_flush 0
> > >>
> > >> STAT get_hits 626512
> > >>
> > >> STAT get_misses 72093488
> > >>
> > >> STAT get_expired 0
> > >>
> > >> STAT get_flushed 0
> > >>
> > >> STAT delete_misses 0
> > >>
> > >> STAT delete_hits 0
> > >>
> > >> STAT incr_misses 0
> > >>
> > >> STAT incr_hits 0
> > >>
> > >> STAT decr_misses 0
> > >>
> > >> STAT decr_hits 0
> > >>
> > >> STAT bytes_read 1809568092
> > >>
> > >> STAT bytes_written 459401348
> > >>
> > >> STAT limit_maxbytes 2097152
> > >>
> > >> STAT accepting_conns 1
> > >>
> > >> STAT listen_disabled_num 0
> > >>
> > >> STAT time_in_listen_disabled_us 0
> > >>
> > >> STAT threads 8
> > >>
> > >> STAT conn_yields 0
> > >>
> > >> STAT hash_power_level 16
> > >>
> > >> STAT hash_bytes 524288
> > >>
> > >> STAT hash_is_expanding 0
> > >>
> > >> STAT slab_reassign_rescues 0
> > >>
> > >> STAT slab_reassign_chunk_rescues 0
> > >>
> > >> STAT slab_reassign_evictions_nomem 0
> > >>
> > >> STAT slab_reassign_inline_reclaim 0
> > >>
> > >> STAT slab_reassign_busy_items 0
> > >>
> > >> STAT slab_reassign_busy_deletes 0
> > >>
> > >> STAT slab_reassign_running 0
> > >>
> > >> STAT slabs_moved 0
> > >>
> > >> STAT lru_crawler_running 0
> > >>
> > >> STAT lru_crawler_starts 4
> > >>
> > >> STAT lru_maintainer_juggles 101778
> > >>
> > >> STAT malloc_fails 0
> >

Re: Memcached Testing Evictions

2021-04-28 Thread Qingchen Dang
Thank you for helping! Yes I actually figured it out that I used wrong 
parameters to load data which causes the overwrite. 

But I do have one more question, how to add print statements to memcached 
source code? When I use ‘make test’, printf works as usual. But in the actual 
run, it seems the print is re-bufferred to somewhere else which makes it very 
hard to debug. Is there a way to print to stdout?

Thank you very much!
QD

Sent from my iPhone

> On Apr 28, 2021, at 10:33 PM, dormando  wrote:
> 
> How're you loading the data?
> 
> From the stats it looks like you're probably overwriting the same values
> over and over (high total_items but low curr_items and no get_expired)
> 
>> On Wed, 28 Apr 2021, Qingchen Dang wrote:
>> 
>> Hi,
>> I am trying to test my optimization of Memcached eviction, but it seems even 
>> though I got a lot of misses, the evictions stats is always 0. 
>> Here is the stats for the original Memcached without any change. I wonder 
>> what memcached config/benchmark config can make eviction happen?
>> 
>> STAT pointer_size 64
>> 
>> STAT rusage_user 110.668826
>> 
>> STAT rusage_system 769.767632
>> 
>> STAT max_connections 1024
>> 
>> STAT curr_connections 2
>> 
>> STAT total_connections 8004
>> 
>> STAT rejected_connections 0
>> 
>> STAT connection_structures 402
>> 
>> STAT response_obj_oom 0
>> 
>> STAT response_obj_count 1
>> 
>> STAT response_obj_bytes 131072
>> 
>> STAT read_buf_count 16
>> 
>> STAT read_buf_bytes 262144
>> 
>> STAT read_buf_bytes_free 114688
>> 
>> STAT read_buf_oom 0
>> 
>> STAT reserved_fds 40
>> 
>> STAT cmd_get 7272
>> 
>> STAT cmd_set 728
>> 
>> STAT cmd_flush 0
>> 
>> STAT get_hits 626512
>> 
>> STAT get_misses 72093488
>> 
>> STAT get_expired 0
>> 
>> STAT get_flushed 0
>> 
>> STAT delete_misses 0
>> 
>> STAT delete_hits 0
>> 
>> STAT incr_misses 0
>> 
>> STAT incr_hits 0
>> 
>> STAT decr_misses 0
>> 
>> STAT decr_hits 0
>> 
>> STAT bytes_read 1809568092
>> 
>> STAT bytes_written 459401348
>> 
>> STAT limit_maxbytes 2097152
>> 
>> STAT accepting_conns 1
>> 
>> STAT listen_disabled_num 0
>> 
>> STAT time_in_listen_disabled_us 0
>> 
>> STAT threads 8
>> 
>> STAT conn_yields 0
>> 
>> STAT hash_power_level 16
>> 
>> STAT hash_bytes 524288
>> 
>> STAT hash_is_expanding 0
>> 
>> STAT slab_reassign_rescues 0
>> 
>> STAT slab_reassign_chunk_rescues 0
>> 
>> STAT slab_reassign_evictions_nomem 0
>> 
>> STAT slab_reassign_inline_reclaim 0
>> 
>> STAT slab_reassign_busy_items 0
>> 
>> STAT slab_reassign_busy_deletes 0
>> 
>> STAT slab_reassign_running 0
>> 
>> STAT slabs_moved 0
>> 
>> STAT lru_crawler_running 0
>> 
>> STAT lru_crawler_starts 4
>> 
>> STAT lru_maintainer_juggles 101778
>> 
>> STAT malloc_fails 0
>> 
>> STAT bytes 94437
>> 
>> STAT curr_items 909
>> 
>> STAT total_items 728
>> 
>> STAT slab_global_page_pool 0
>> 
>> STAT expired_unfetched 0
>> 
>> STAT evicted_unfetched 0
>> 
>> STAT evicted_active 0
>> 
>> STAT evictions 0
>> 
>> STAT reclaimed 0
>> 
>> STAT crawler_reclaimed 0
>> 
>> STAT crawler_items_checked 2732
>> 
>> STAT lrutail_reflocked 12
>> 
>> STAT moves_to_cold 116840
>> 
>> STAT moves_to_warm 3185
>> 
>> STAT moves_within_lru 538
>> 
>> STAT direct_reclaims 0
>> 
>> STAT lru_bumps_dropped 0
>> 
>> 
>> Thanks!
>> 
>> QD
>> 
>> --
>> 
>> ---
>> You received this message because you are subscribed to the Google Groups 
>> "memcached" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to memcached+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/memcached/95e1a261-7ad5-49c2-a917-0635b7e3cbbfn%40googlegroups.com.
>> 
>> 
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "memcached" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/cIh-fFlEEh8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> memcached+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/e274ce6a-63ab-2896-d7c0-befa57222a%40rydia.net.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/55A7989D-20BC-4830-8719-83D75CFA9194%40princeton.edu.


Memcached Testing Evictions

2021-04-28 Thread Qingchen Dang
Hi,

I am trying to test my optimization of Memcached eviction, but it seems 
even though I got a lot of misses, the evictions stats is always 0. 
Here is the stats for the original Memcached without any change. I wonder 
what memcached config/benchmark config can make eviction happen?

STAT pointer_size 64

STAT rusage_user 110.668826

STAT rusage_system 769.767632

STAT max_connections 1024

STAT curr_connections 2

STAT total_connections 8004

STAT rejected_connections 0

STAT connection_structures 402

STAT response_obj_oom 0

STAT response_obj_count 1

STAT response_obj_bytes 131072

STAT read_buf_count 16

STAT read_buf_bytes 262144

STAT read_buf_bytes_free 114688

STAT read_buf_oom 0

STAT reserved_fds 40

STAT cmd_get 7272

STAT cmd_set 728

STAT cmd_flush 0

STAT get_hits 626512

STAT get_misses 72093488

STAT get_expired 0

STAT get_flushed 0

STAT delete_misses 0

STAT delete_hits 0

STAT incr_misses 0

STAT incr_hits 0

STAT decr_misses 0

STAT decr_hits 0

STAT bytes_read 1809568092

STAT bytes_written 459401348

STAT limit_maxbytes 2097152

STAT accepting_conns 1

STAT listen_disabled_num 0

STAT time_in_listen_disabled_us 0

STAT threads 8

STAT conn_yields 0

STAT hash_power_level 16

STAT hash_bytes 524288

STAT hash_is_expanding 0

STAT slab_reassign_rescues 0

STAT slab_reassign_chunk_rescues 0

STAT slab_reassign_evictions_nomem 0

STAT slab_reassign_inline_reclaim 0

STAT slab_reassign_busy_items 0

STAT slab_reassign_busy_deletes 0

STAT slab_reassign_running 0

STAT slabs_moved 0

STAT lru_crawler_running 0

STAT lru_crawler_starts 4

STAT lru_maintainer_juggles 101778

STAT malloc_fails 0

STAT bytes 94437

STAT curr_items 909

STAT total_items 728

STAT slab_global_page_pool 0

STAT expired_unfetched 0

STAT evicted_unfetched 0

STAT evicted_active 0

STAT evictions 0

STAT reclaimed 0

STAT crawler_reclaimed 0

STAT crawler_items_checked 2732

STAT lrutail_reflocked 12

STAT moves_to_cold 116840

STAT moves_to_warm 3185

STAT moves_within_lru 538

STAT direct_reclaims 0

STAT lru_bumps_dropped 0


Thanks!

QD

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/95e1a261-7ad5-49c2-a917-0635b7e3cbbfn%40googlegroups.com.