Re: SegFault in Crawler Part

2021-06-01 Thread dormando
You can't evict memory that's being used to load data from the network.
So if you have a low amount of memory and run a benchmark doing a bunch of
parallel writes you're going to be sad.

On Tue, 1 Jun 2021, Qingchen Dang wrote:

> Thank you very much! Yes your guess is correct, I forgot the possibility of 
> evicting a crawler item :(
> Furthermore, I have a similar problem as this post: 
> https://github.com/memcached/memcached/issues/467
> I gave a very limited memory usage to Memcached to test eviction and it does 
> cause the similar error.
> When I use Memtier_Benchmark, the error looks like:
>
> [RUN #1] Preparing benchmark client...
>
> [RUN #1] Launching threads now...
>
> error: response parsing failed.
>
> error: response parsing failed.
>
> server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
> storing object
>
> error: response parsing failed.
>
> server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
> storing object
>
> error: response parsing failed.
>
> [RUN #1 17%,   0 secs]  1 threads:       87137 ops,   87213 (avg:   87213) 
> ops/sec, 65.66MB/sec (avg: 65.66MB/sec
>
> [RUN #1 36%,   1 secs]  1 threads:      179012 ops,   91864 (avg:   89540) 
> ops/sec, 69.87MB/sec (avg: 67.76MB/sec
>
> [RUN #1 56%,   2 secs]  1 threads:      279971 ops,  100947 (avg:   93343) 
> ops/sec, 76.76MB/sec (avg: 70.76MB/sec
>
> [RUN #1 75%,   3 secs]  1 threads:      375715 ops,   95732 (avg:   93941) 
> ops/sec, 72.87MB/sec (avg: 71.29MB/sec
>
> [RUN #1 92%,   4 secs]  1 threads:      462054 ops,   93910 (avg:   93935) 
> ops/sec, 71.41MB/sec (avg: 71.31MB/sec
>
> [RUN #1 92%,   4 secs]  1 threads:      462054 ops,       0 (avg:   92431) 
> ops/sec, 0.00KB/sec (avg: 70.17MB/sec)
>
> [RUN #1 92%,   5 secs]  1 threads:      462054 ops,       0 (avg:   90975) 
> ops/sec, 0.00KB/sec (avg: 69.06MB/sec)
>
> [RUN #1 92%,   5 secs]  1 threads:      462054 ops,       0 (avg:   89564) 
> ops/sec, 0.00KB/sec (avg: 67.99MB/sec)
>
> When I use Memaslap, it looks like 
>
> set proportion: set_prop=0.10
>
> get proportion: get_prop=0.90
>
> <12 SERVER_ERROR out of memory storing object
>
> <10 SERVER_ERROR out of memory storing object
>
> <12 SERVER_ERROR out of memory storing object
>
> <7 SERVER_ERROR out of memory storing object
>
> The unmodified Memcached gives errors less frequently than Memcached with my 
> eviction framework (especially using Memtier_Benchmark), so I wonder the
> reason. I read your post message in the above link, but I am still confused 
> about why memory limitation affect Memcached's usage. Could you give a more
> detailed explanation? If I have to give limited memory, is there a way to 
> avoid this issue?
> Thank you very much for helping!
>
> Best,
> Qingchen
> On Tuesday, June 1, 2021 at 2:36:09 AM UTC-4 Dormando wrote:
>   try '-o no_lru_crawler' ? That definitely works.
>
>   I don't know what you're doing since no code has been provided. The 
> locks
>   around managing LRU tails is pretty strict; so make sure you are 
> actually
>   using them correctly.
>
>   The LRU crawler works by injecting a fake item into the LRU, then using
>   that to keep its position and walk. If I had to guess I bet you've
>   "evicted" the LRU crawler, which then immediately dies when it tries to
>   continue crawling.
>
>   On Mon, 31 May 2021, Qingchen Dang wrote:
>
>   > Furthermore, I tried to disable the crawler with the '- 
> no_lru_crawler' command parameter, and it gives the same error. I wonder why 
> it
>   does not disable
>   > the crawler lru as it supposes to do.
>   >
>   > On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:
>   > Hi,
>   > I am implementing a framework based on Memcached. There's a problem 
> that confused me a lot. The framework basically change the eviction
>   policy, so
>   > when it calls to evict an item, it might not evict the tail item at 
> COLD LRU, instead it will look for a "more suitable" item to evict and
>   it will
>   > reinsert the tail items to the head of COLD queue.
>   >
>   > It mostly works fine, but sometimes it causes a SegFault when 
> reinsertion happens very frequently (like in almost each eviction). The
>   SegFault is
>   > triggered in the crawler part. As attached, it seems when the crawler 
> loops through the item queue, it reaches an invalid memory address.
>   The bug
>   > happens after around 5000~1000 GET/SET (9:1) operations. I 
> used Memaslap for testing.
>   >
>   > Could anyone give me some suggestions of the reasons which cause such 
> error?
>   >
>   > Here is the gdb messages:
>   >
>   > Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.
>   >
>   > [Switching to Thread 0x74d6c700 (LWP 36414)]
>   >
>   > do_item_crawl_q (it=it@entry=0x5579e7e0 )
>   >
>   >     at items.c:2015
>   >
>  

Re: SegFault in Crawler Part

2021-06-01 Thread Qingchen Dang
Thank you very much! Yes your guess is correct, I forgot the possibility of 
evicting a crawler item :(

Furthermore, I have a similar problem as this 
post: https://github.com/memcached/memcached/issues/467
I gave a very limited memory usage to Memcached to test eviction and it 
does cause the similar error.
When I use Memtier_Benchmark, the error looks like:

*[RUN #1] Preparing benchmark client...*

*[RUN #1] Launching threads now...*

*error: response parsing failed.*

*error: response parsing failed.*

*server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
storing object*

*error: response parsing failed.*

*server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
storing object*

*error: response parsing failed.*

*[RUN #1 17%,   0 secs]  1 threads:   87137 ops,   87213 (avg:   87213) 
ops/sec, 65.66MB/sec (avg: 65.66MB/sec*

*[RUN #1 36%,   1 secs]  1 threads:  179012 ops,   91864 (avg:   89540) 
ops/sec, 69.87MB/sec (avg: 67.76MB/sec*

*[RUN #1 56%,   2 secs]  1 threads:  279971 ops,  100947 (avg:   93343) 
ops/sec, 76.76MB/sec (avg: 70.76MB/sec*

*[RUN #1 75%,   3 secs]  1 threads:  375715 ops,   95732 (avg:   93941) 
ops/sec, 72.87MB/sec (avg: 71.29MB/sec*

*[RUN #1 92%,   4 secs]  1 threads:  462054 ops,   93910 (avg:   93935) 
ops/sec, 71.41MB/sec (avg: 71.31MB/sec*

*[RUN #1 92%,   4 secs]  1 threads:  462054 ops,   0 (avg:   92431) 
ops/sec, 0.00KB/sec (avg: 70.17MB/sec)*

*[RUN #1 92%,   5 secs]  1 threads:  462054 ops,   0 (avg:   90975) 
ops/sec, 0.00KB/sec (avg: 69.06MB/sec)*

*[RUN #1 92%,   5 secs]  1 threads:  462054 ops,   0 (avg:   89564) 
ops/sec, 0.00KB/sec (avg: 67.99MB/sec)*
When I use Memaslap, it looks like 

*set proportion: set_prop=0.10*

*get proportion: get_prop=0.90*

*<12 SERVER_ERROR out of memory storing object*

*<10 SERVER_ERROR out of memory storing object*

*<12 SERVER_ERROR out of memory storing object*

*<7 SERVER_ERROR out of memory storing object*
The unmodified Memcached gives errors less frequently than Memcached with 
my eviction framework (especially using Memtier_Benchmark), so I wonder the 
reason. I read your post message in the above link, but I am still confused 
about why memory limitation affect Memcached's usage. Could you give a more 
detailed explanation? If I have to give limited memory, is there a way to 
avoid this issue?
Thank you very much for helping!

Best,
Qingchen
On Tuesday, June 1, 2021 at 2:36:09 AM UTC-4 Dormando wrote:

> try '-o no_lru_crawler' ? That definitely works.
>
> I don't know what you're doing since no code has been provided. The locks
> around managing LRU tails is pretty strict; so make sure you are actually
> using them correctly.
>
> The LRU crawler works by injecting a fake item into the LRU, then using
> that to keep its position and walk. If I had to guess I bet you've
> "evicted" the LRU crawler, which then immediately dies when it tries to
> continue crawling.
>
> On Mon, 31 May 2021, Qingchen Dang wrote:
>
> > Furthermore, I tried to disable the crawler with the '- no_lru_crawler' 
> command parameter, and it gives the same error. I wonder why it does not 
> disable
> > the crawler lru as it supposes to do.
> >
> > On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:
> > Hi,
> > I am implementing a framework based on Memcached. There's a problem that 
> confused me a lot. The framework basically change the eviction policy, so
> > when it calls to evict an item, it might not evict the tail item at COLD 
> LRU, instead it will look for a "more suitable" item to evict and it will
> > reinsert the tail items to the head of COLD queue.
> >
> > It mostly works fine, but sometimes it causes a SegFault when 
> reinsertion happens very frequently (like in almost each eviction). The 
> SegFault is
> > triggered in the crawler part. As attached, it seems when the crawler 
> loops through the item queue, it reaches an invalid memory address. The bug
> > happens after around 5000~1000 GET/SET (9:1) operations. I used 
> Memaslap for testing.
> >
> > Could anyone give me some suggestions of the reasons which cause such 
> error?
> >
> > Here is the gdb messages:
> >
> > Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.
> >
> > [Switching to Thread 0x74d6c700 (LWP 36414)]
> >
> > do_item_crawl_q (it=it@entry=0x5579e7e0 )
> >
> > at items.c:2015
> >
> > 2015 it->prev->next = it->next;
> >
> > (gdb) print it->prev
> >
> > $5 = (struct _stritem *) 0x4f4d6355616d5471
> >
> > (gdb) print it->prev->next
> >
> > Cannot access memory at address 0x4f4d6355616d5479
> >
> > (gdb) print it->next
> >
> > $6 = (struct _stritem *) 0x7a59324376753351
> >
> > (gdb) print it->next->prev
> >
> > Cannot access memory at address 0x7a59324376753361
> >
> > (gdb) print it->nkey
> >
> > $7 = 0 '\000'
> >
> > (gdb) 
> >
> > Here is the part that triggers the error:
> >
> > 2012 assert(it->next != it);
> >

Re: SegFault in Crawler Part

2021-06-01 Thread dormando
try '-o no_lru_crawler' ? That definitely works.

I don't know what you're doing since no code has been provided. The locks
around managing LRU tails is pretty strict; so make sure you are actually
using them correctly.

The LRU crawler works by injecting a fake item into the LRU, then using
that to keep its position and walk. If I had to guess I bet you've
"evicted" the LRU crawler, which then immediately dies when it tries to
continue crawling.

On Mon, 31 May 2021, Qingchen Dang wrote:

> Furthermore, I tried to disable the crawler with the '- no_lru_crawler' 
> command parameter, and it gives the same error. I wonder why it does not 
> disable
> the crawler lru as it supposes to do.
>
> On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:
>   Hi,
> I am implementing a framework based on Memcached. There's a problem that 
> confused me a lot. The framework basically change the eviction policy, so
> when it calls to evict an item, it might not evict the tail item at COLD LRU, 
> instead it will look for a "more suitable" item to evict and it will
> reinsert the tail items to the head of COLD queue.
>
> It mostly works fine, but sometimes it causes a SegFault when reinsertion 
> happens very frequently (like in almost each eviction). The SegFault is
> triggered in the crawler part. As attached, it seems when the crawler loops 
> through the item queue, it reaches an invalid memory address. The bug
> happens after around 5000~1000 GET/SET (9:1) operations. I used 
> Memaslap for testing.
>
> Could anyone give me some suggestions of the reasons which cause such error?
>
> Here is the gdb messages:
>
> Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.
>
> [Switching to Thread 0x74d6c700 (LWP 36414)]
>
> do_item_crawl_q (it=it@entry=0x5579e7e0 )
>
>     at items.c:2015
>
> 2015             it->prev->next = it->next;
>
> (gdb) print it->prev
>
> $5 = (struct _stritem *) 0x4f4d6355616d5471
>
> (gdb) print it->prev->next
>
> Cannot access memory at address 0x4f4d6355616d5479
>
> (gdb) print it->next
>
> $6 = (struct _stritem *) 0x7a59324376753351
>
> (gdb) print it->next->prev
>
> Cannot access memory at address 0x7a59324376753361
>
> (gdb) print it->nkey
>
> $7 = 0 '\000'
>
> (gdb) 
>
> Here is the part that triggers the error:
>
> 2012         assert(it->next != it);
>
> 2013         if (it->next) {
>
> 2014             assert(it->prev->next == it);
>
> 2015             it->prev->next = it->next;
>
> 2016             it->next->prev = it->prev;
>
> 2017         } else {
>
> 2018             /* Tail. Move this above? */
>
> 2019             it->prev->next = 0;
>
> 2020         }
>
> (I'm also confused why the assert function in line 2014 does not give error?)
>
> Thank you very much for helping!
>
> Best,
>
> Qingchen
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/1398d377-06b8-4a43-8811-f299d044d055n%40googlegroups.com.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/1f184a63-c220-c949-91f9-9aeca3ff1d85%40rydia.net.