Re: evicted_time

2016-11-14 Thread dormando
On Mon, 14 Nov 2016, Bill Moseley wrote:

> Thanks for the response.
>
> Sorry for all the questions. I'm trying to fully understand before upgrading.
> On Sun, Nov 13, 2016 at 11:52 PM, dormando <dorma...@rydia.net> wrote:
>
>   >  *  If all items in the slab are reclaimed then "stats items" no 
> longer shows that slab. I wonder if that is correct because we can no longer 
> see stats on evicted
>   items for
>   >     that slab.
>
>   It's always been that way... just never fixed/changed it I guess?
>
>
> I guess in older versions I never saw it disappear since we always did a 
> "set" which would then evict/replace an item and the slab would never be 
> empty.   With rebalancing it's
> possible for all items to be removed from the slab.
> (But the counters like "evicted" are still maintained.)
>
>
> It doesn't appear that rebalancing works within a slab, correct?
>
> That is, if I run with "-m 5 -o modern" and fill a 100k-item slab it will say 
> "total_pages 5".  If all the items expire in 60 seconds then after 
> rebalancing it will then show
> "total_pages 2" and "slab_global_page_pool 3".
>
> If I do the same thing again, but this time I alternate the items' expires 
> with 60 and 0 and wait 60 seconds for the rebalance to remove items then it 
> sill shows "total_pages
> 5" and "slab_global_page_pool 0".
>
> The rebalancing won't shuffle items from one page to another page in a given 
> slab to fully free up a page, correct?
>
> In other words, If there's a single item with an expires of zero (never 
> expire) stored in a page then that page won't ever be freed back to the 
> slab_global_page_pool (unless
> that item is force evicted).
>
> Am I understanding that correctly?

No, it does make a best-effort attempt to shuffle memory so it can be
freed. You probably need to test with more than 5 slabs though. It will
take free chunks that exist outside of the slab page it currently wants to
free, and moves valid items into different areas of memory. It'll shuffle
for a bit and then free up that slab page if it can.

Just might require a bit more memory in active use (and then freed up via
expirey) before it's able to do that dance safely.

>
> And yes, we do sometimes store items > 1mb.  Currently we are not using -I 
> and do the chunking into parts (and maybe gzipping) in the client before 
> storing.  So, we store one
> or more 1mb items and then the remainder in another chunk based on its size.
>
> If we use new large item support all of the split-up parts of the item we are 
> storing end up in the same chunk.  Does that mean a single overflow 
> (remainder) byte would use up
> another full chunk-size?  (524288 bytes?)

Yup. It's on average more efficient than the old method, by quite a lot.
Once I get some time to refactor the first pass feature it should be able
to "cap" an item with a chunk from a smaller slab class. (so 512k + 100
bytes or something)

my goal then would be to lower the chunk max to somewhere in 16-64k range,
which should compact the smaller slab classes closer together, and
generally raise memory efficiency. No timeline for this, sadly.

>
> Thanks,
>
> --
> Bill Moseley
> mose...@hank.org
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: evicted_time

2016-11-13 Thread dormando
Hey,

Sorry; In the next release that should be in -h at least. The easiest way
is to just start it with/without the option and diff the output of `stats
settings` between the two.

On Sun, 13 Nov 2016, Bill Moseley wrote:

> ok, I found the "modern" option by looking at the code.
> In a few simple tests I can see slab memory is reclaimed.   From what I see:
>  *  Memcached will reclaim memory in the slab, but will leave two allocated 
> pages in the slab.
>
>  *  That freed memory can then be used for pages in other slabs.
>
>  *  If all items in the slab are reclaimed then "stats items" no longer shows 
> that slab. I wonder if that is correct because we can no longer see stats on 
> evicted items for
> that slab.

It's always been that way... just never fixed/changed it I guess?

>  *  If more memory than specified with the -m command line argument was 
> malloced, then memory seems be be given back to the system after a reclaim 
> (down to -m setting).

There's a "cache_memlimit" command that allows you to increase or decrease
the memory limit at runtime. that's how the command works.

>  *  Is there a good way to see how much memory has been reclaimed (and thus 
> available to other slabs)?

| slab_global_page_pool | 32u | Slab pages returned to global pool for
|
|   | | reassignment to other slab classes.
|

> Is the best way to subtract "bytes" from "total_malloced"?   i.e. to 
> check how much "free" memory there is available for the cache.
> Does the reclaim algorithm move items within the slab?  That is, will it 
> reshuffle items from one page to another page so that the page can be 
> reclaimed?  Or do all items in a
> given page first need to expire before it can be reclaimed?

See above.

To answer your earlier questions: -I 4m should work a lot better with the
new code. It doesn't screw up the smaller slab classes.

>
> On Fri, Nov 11, 2016 at 10:44 PM, Bill Moseley <mose...@hank.org> wrote:
>
>
>   On Thu, Oct 13, 2016 at 11:32 PM, dormando <dorma...@rydia.net> wrote:
> I think this is fixed in 1.4.32. Been broken for a very long time
> unfortunately.
>
>
> Finally getting back to this.  I'm testing with 1.4.33.  Looks good, thanks.
>
>
> I'm trying to understand all the new options.  What does -o modern include?
>
> - modern: Enables 'modern' defaults. See release notes (highly recommended!).
>
> Which release notes should I be looking at?
>
> Or rather, can you provide guidance on what options to use?   We are using 
> Memcached as a general cache so our value sizes and set frequency varies 
> quite a bit.  I
> suspect being able to reallocate slab space would be helpful in our case.
>
> We also have a need to store some larger (>1MB) items, too. (for example, -I 
> 4m)
>
> Thanks,
>
>  
>
>   On Sun, 25 Sep 2016, Bill Moseley wrote:
>
>   > If I understand the documentation correctly, evicted_time should show 
> the seconds since the last access for the more recently evicted item.
>   >
>   > I'm seeing evicted_time for our production servers, but I'm not able 
> to have a value show up when testing.
>   >
>   > I have a slab class that can hold 10 items of the value I'm writing.
>   >
>   > I assume I would see an evicted_time doing this:
>   >  1. write a key and read it back
>   >  2. sleep 5 seconds.
>   >  3. write and read 10 random keys into the slab (quicly)
>   >  4. Notice that there has now been one eviction
>   >  5. read from the original key and get NOT FOUND
>   > I then see:
>   >
>   >       STAT items:32:evicted 1
>   > STAT items:32:evicted_nonzero 0
>   > STAT items:32:evicted_time 0
>   >
>   >
>   > Shouldn't I see evicted_time of 5 seconds since the item evicted was 
> last accessed about 5 seconds before it got evicted?
>   >
>   > I get the same result if I use no expires times or If I set it to, 
> say, 30 seconds.
>   >
>   > VERSION 1.4.25
>   >
>   >
>   > --
>   > Bill Moseley
>   > mose...@hank.org
>   >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to memcached+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
> --
>
> ---
> You received this me

Re: evicted_time

2016-10-14 Thread dormando
I think this is fixed in 1.4.32. Been broken for a very long time
unfortunately.

On Sun, 25 Sep 2016, Bill Moseley wrote:

> If I understand the documentation correctly, evicted_time should show the 
> seconds since the last access for the more recently evicted item.
>
> I'm seeing evicted_time for our production servers, but I'm not able to have 
> a value show up when testing.
>
> I have a slab class that can hold 10 items of the value I'm writing.
>
> I assume I would see an evicted_time doing this:
>  1. write a key and read it back
>  2. sleep 5 seconds.
>  3. write and read 10 random keys into the slab (quicly)
>  4. Notice that there has now been one eviction
>  5. read from the original key and get NOT FOUND
> I then see:
>
>   STAT items:32:evicted 1
> STAT items:32:evicted_nonzero 0
> STAT items:32:evicted_time 0
>
>
> Shouldn't I see evicted_time of 5 seconds since the item evicted was last 
> accessed about 5 seconds before it got evicted?
>
> I get the same result if I use no expires times or If I set it to, say, 30 
> seconds.
>
> VERSION 1.4.25
>
>
> --
> Bill Moseley
> mose...@hank.org
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question on item_chunk

2016-10-14 Thread dormando
>
>
> On Monday, October 3, 2016 at 9:17:00 PM UTC-4, Dormando wrote:
>
>
>   On Mon, 3 Oct 2016, yuantao peng wrote:
>
>   > Hi,  --    I am reading memcached source code and got a question on 
> this function:  do_slabs_alloc_chunked,   it is called by do_slabs_alloc if 
> the request size
>   is larger than
>   > the slabclass size.  I am curious why we don't just move to a 
> slabclass with larger size instead?  Also, I am not sure I understand the way 
> it calculated the
>   number of chunks
>   > needed:
>
>   That's only used if it's already at the highest slab class. It chains
>   multiple chunks of the highest possible class together.
>
>   >     int csize = p->size - sizeof(item_chunk);
>   >     unsigned int chunks_req = size / csize;
>   >     if (size % csize != 0)
>   >         chunks_req++;
>   >
>   >
>   > later on, we store the first chunk in do_item_alloc as follows:
>   >
>   > if (it->it_flags & ITEM_CHUNKED) {
>   >         item_chunk *chunk = (item_chunk *) ITEM_data(it);
>   >
>   >         chunk->next = (item_chunk *) it->h_next;
>   >         chunk->prev = 0;
>   >         chunk->head = it;
>   >         /* Need to chain back into the head's chunk */
>   >         chunk->next->prev = chunk;
>   >         chunk->size = chunk->next->size - ((char *)chunk - (char 
> *)it);
>   >         chunk->used = 0;
>   >         assert(chunk->size > 0);
>   >     }
>   >
>   > That means the first item has an item_chunk header next to item 
> header,   then the csize calculation is wrong, isn't it?   Say if csize 
> equals 2,  we will
>   actually need 3
>   > chunks, because the first chunk will have more meta data 
> (item+item_chunk),   am I missing something here?
>
>   Wish I didn't have to do that, it is pretty confusing. I can double 
> check
>   but the size of the header is added and removed in the appropriate 
> places
>   and it's gotten quite a lot of testing.
>
>   That particular section you're looking at is settign the initial chunk
>   size to the potential size of the chunk minus the offset the current 
> chunk
>   is into the overall item the header is stored into. So that should take
>   into account the entire rendered metadata.
>
>
>  It is indeed quite confusing. :)   Say if the highest slab class have size 
> 2000 bytes,  and sizeof(item) and sizeof(item_chunk) are all 40 bytes,  If I 
> call slab_alloc with
> ntotal=3920 bytes,
>  since csize = p->size - sizeof(item_chunk) ,  csize would be 2000-40=1960,  
> and  chunks_req=3920/1960=2.  
>
>  the following code in do_item_alloc will set the first chunk size to 
> (2000-40)=1960,  
>    chunk->size = chunk->next->size - ((char *)chunk - (char *)it);
>
> but the first chunk is set to start at the location ITEM_data(it) as follows, 
> which is after the struct item meta data,  shouldn't the chunk size here is 
> actually smaller than
> 1960 because item_chunk meta data of the first chunk will also consume some 
> space?   
>
> item_chunk *chunk = (item_chunk *) ITEM_data(it);
>   

Did you have a chance to play with the tests yourself? I didn't have
enough time for the last bugfix release to look into it myself.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


1.4.32 bugfix release

2016-10-12 Thread dormando
https://github.com/memcached/memcached/wiki/ReleaseNotes1432

didn't quite get everything I wanted in there... but the LRU fix is pretty
significant.

-Dormando

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question on item_chunk

2016-10-05 Thread dormando
>
>
> On Tuesday, October 4, 2016 at 12:31:52 AM UTC-4, Dormando wrote:
>   > >
>   >
>   > I'll need to check more carefully. If that's true, the tests should 
> show
>   > data corruption (and they did a few times during development). Take a 
> look
>   > at the tests for the chunked item support?
>   >
>   > IE: If I allocate a new page to a slab class, the sequential bytes get
>   > chopped up into a linked list. The first item chunk of a fresh boot 
> will
>   > naturally get linked to the next contiguous chunk of memory.
>   >
>   > So if you boot up a new server, write a random pattern, and the first
>   > chunk is offset, it should overwrite the header of the next one if 
> what
>   > you said is true. That should leave to a crash, or incorrect results, 
> or
>   > etc. A few bytes of the chunk can be shifted due to alignment but 
> being
>   > off by an entire header is tougher.
>   >
>   > I also ran the code in a 12 hour torture test
>   > setting/unsetting/overwriting while moving slab classes at the same 
> time.
>   >
>   > but yes, it's written as a layer violation. my intent was to come 
> back the
>   > week after and refactor it more cleanly but I haven't done that yet. 
> I'll
>   > try to look at this soon but I have a few pressing bugs to cut a 
> release
>   > for.
>
>   That all said; are you looking into a particular bug or weirdness or
>   anything? What's gotten you into this?
>
>  Oh, I was just reading memcached for fun :)   I have been using memcached 
> for multiple projects,  never really got time to take a look at the 
> implementations.  

Ah cool. Sorry I didn't go back and clean that up yet :( Been trying to be
good about not leaving things that way.

Anyway; hopefully I can double check that soon, but feel free to play with
the tests. Or change the code and re-run the tests/benchmarks/etc.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question on item_chunk

2016-10-03 Thread dormando
> >
>
> I'll need to check more carefully. If that's true, the tests should show
> data corruption (and they did a few times during development). Take a look
> at the tests for the chunked item support?
>
> IE: If I allocate a new page to a slab class, the sequential bytes get
> chopped up into a linked list. The first item chunk of a fresh boot will
> naturally get linked to the next contiguous chunk of memory.
>
> So if you boot up a new server, write a random pattern, and the first
> chunk is offset, it should overwrite the header of the next one if what
> you said is true. That should leave to a crash, or incorrect results, or
> etc. A few bytes of the chunk can be shifted due to alignment but being
> off by an entire header is tougher.
>
> I also ran the code in a 12 hour torture test
> setting/unsetting/overwriting while moving slab classes at the same time.
>
> but yes, it's written as a layer violation. my intent was to come back the
> week after and refactor it more cleanly but I haven't done that yet. I'll
> try to look at this soon but I have a few pressing bugs to cut a release
> for.

That all said; are you looking into a particular bug or weirdness or
anything? What's gotten you into this?

-Dormando

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question on item_chunk

2016-10-03 Thread dormando


On Mon, 3 Oct 2016, yuantao peng wrote:

> Hi,  --    I am reading memcached source code and got a question on this 
> function:  do_slabs_alloc_chunked,   it is called by do_slabs_alloc if the 
> request size is larger than
> the slabclass size.  I am curious why we don't just move to a slabclass with 
> larger size instead?  Also, I am not sure I understand the way it calculated 
> the number of chunks
> needed:

That's only used if it's already at the highest slab class. It chains
multiple chunks of the highest possible class together.

>     int csize = p->size - sizeof(item_chunk);
>     unsigned int chunks_req = size / csize;
>     if (size % csize != 0)
>         chunks_req++;
>
>
> later on, we store the first chunk in do_item_alloc as follows:
>
> if (it->it_flags & ITEM_CHUNKED) {
>         item_chunk *chunk = (item_chunk *) ITEM_data(it);
>
>         chunk->next = (item_chunk *) it->h_next;
>         chunk->prev = 0;
>         chunk->head = it;
>         /* Need to chain back into the head's chunk */
>         chunk->next->prev = chunk;
>         chunk->size = chunk->next->size - ((char *)chunk - (char *)it);
>         chunk->used = 0;
>         assert(chunk->size > 0);
>     }
>
> That means the first item has an item_chunk header next to item header,   
> then the csize calculation is wrong, isn't it?   Say if csize equals 2,  we 
> will actually need 3
> chunks, because the first chunk will have more meta data (item+item_chunk),   
> am I missing something here?

Wish I didn't have to do that, it is pretty confusing. I can double check
but the size of the header is added and removed in the appropriate places
and it's gotten quite a lot of testing.

That particular section you're looking at is settign the initial chunk
size to the potential size of the chunk minus the offset the current chunk
is into the overall item the header is stored into. So that should take
into account the entire rendered metadata.

> thanks
> yuantao
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Memcached per Item Maximum size.

2016-09-22 Thread dormando
Hi,

Unfortunately there is no supported version of memcached that runs on
windows.

On Wed, 21 Sep 2016, Ajinkya Aher wrote:

> Hii,
>
> I am using Memcached for windows can anyone tell me what is the maximum size 
> of per Item that can be stored in Memcached as when I use " -I 129M " command 
> it throws error as
> maximum size that can be used per item is 128MB.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: libmemcached: Retrieving array of key-value pairs using wildcards

2016-08-31 Thread dormando
You cannot, no. you can only fetch exact keys.

On Tue, 30 Aug 2016, Sonia wrote:

> Hello,
>
> Is there any functionality to retrieve an array of key-value pairs in 
> libmemcached using wildcard characters. I have read about memcached_get() 
> function but there is no mention
> of whether we can use wildcards or not.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Clarification on Hardware Requirements and Performance

2016-08-27 Thread dormando
Thanks!

You can reallocate pages in newer versions. as I said before the `-o
modern` on the latest ones does that automatically so long as you have
some free space for it to work with (the lru_crawler and using sane TTL's
helps there). Otherwise you can use manual controls as listed in
doc/protocol.txt and your own external process.

On Sat, 27 Aug 2016, Joseph Grasser wrote:

> Hey Guys, thank you so much for everything
> This community is awesome; the product is awesome; everyone here is awesome; 
> and most of all everyone on this thread is awesome! 
>
> On Sat, Aug 27, 2016 at 3:05 PM, Joseph Grasser <jgrasser@gmail.com> 
> wrote:
>   So, when I compare the total pages with the unfetched evictions I do 
> notice skew. We should probably reallocate the pages to better fit our usage 
> pattern. 
>
> On Sat, Aug 27, 2016 at 2:46 PM, dormando <dorma...@rydia.net> wrote:
>   Probably.
>
>   Look through `stats slabs` and `stats items` to see if evictions skew
>   toward slabs without enough pages in it, that sort of thing. That's all
>   fixed (or improved) in more recent versions (with enough feature flags
>   enabled)
>
>   you can also telnet in and run `watch evictions` to get a stream of 
> what's
>   being evicted. Look for patterns or bug developers about it.
>
>   On Sat, 27 Aug 2016, Joseph Grasser wrote:
>
>   > echo "stats" shows the following : cmd_set 3,000,000,000
>   > evicted_unfetched 2,800,000,000
>   > evictions 2,900,000,000
>   >
>   > This looks super abusive to me. What, is that 6% utilization of data 
> in cache? 
>   >
>   > On Sat, Aug 27, 2016 at 1:35 PM, dormando <dorma...@rydia.net> wrote:
>   >       You could comb through stats looking for things like 
> evicted_unfetched,
>   >       unbalanced slab classes, etc.
>   >
>   >       1.4.31 with `-o modern` can either make a huge improvement in 
> memory
>   >       efficiency or a marginal one. I'm unaware of it being worse.
>   >
>   >       Just something to consider if cost is your concern.
>   >
>   >       On Sat, 27 Aug 2016, Joseph Grasser wrote:
>   >
>   >       > We are running 1.4.13 on wheezy. 
>   >       > In the environment I am looking at there is positive 
> correlation between gets and puts. The ration is something like 10 Gets : 15 
> Puts. The eviction
>   spikes are
>   >       also occurring
>   >       > at peak put times ( which kind of makes senses with the mem 
> pressure ). I think the application is some kind of report generation tool - 
> it's hard
>   to say, my
>   >       visibility into
>   >       > the team stuff is pretty low right now as I am a new hire. 
>   >       >
>   >       > On Sat, Aug 27, 2016 at 12:34 PM, dormando 
> <dorma...@rydia.net> wrote:
>   >       >       What version are you on and what're your startup 
> options, out of
>   >       >       curiosity?
>   >       >
>   >       >       A lot of the more recent features can help with memory 
> efficiency, for
>   >       >       what it's worth.
>   >       >
>   >       >       On Sat, 27 Aug 2016, Joseph Grasser wrote:
>   >       >
>   >       >       >
>   >       >       > No problem, I'm trying cut down on cost. We're 
> currently using a dedicated model which works for us on a technical level but 
> is expensive
>   (within budget
>   >       but still
>   >       >       expensive).
>   >       >       >
>   >       >       > We are experiencing weird spikes in evictions but I 
> think that is the result of developers abusing the service.
>   >       >       >
>   >       >       > Tbh I don't know what to make of the evictions yet. 
> I'm gong to dig into it on Monday though.
>   >       >       >
>   >       >       >
>   >       >       > On Aug 27, 2016 1:55 AM, "Ripduman Sohan" 
> <ripduman.so...@gmail.com> wrote:
>   >       >       >
>   >       >       >             On Aug 27, 2016 1:46 AM, "dormando" 
> <dorma...@rydia.net> wrote:
>   >       >       >                   >
>   >       >       >                   > Thank you for the tips guys!
>   >       >       >                   >
>   >       >       >                   >

Re: Clarification on Hardware Requirements and Performance

2016-08-27 Thread dormando
Probably.

Look through `stats slabs` and `stats items` to see if evictions skew
toward slabs without enough pages in it, that sort of thing. That's all
fixed (or improved) in more recent versions (with enough feature flags
enabled)

you can also telnet in and run `watch evictions` to get a stream of what's
being evicted. Look for patterns or bug developers about it.

On Sat, 27 Aug 2016, Joseph Grasser wrote:

> echo "stats" shows the following : cmd_set 3,000,000,000
> evicted_unfetched 2,800,000,000
> evictions 2,900,000,000
>
> This looks super abusive to me. What, is that 6% utilization of data in 
> cache? 
>
> On Sat, Aug 27, 2016 at 1:35 PM, dormando <dorma...@rydia.net> wrote:
>   You could comb through stats looking for things like evicted_unfetched,
>   unbalanced slab classes, etc.
>
>   1.4.31 with `-o modern` can either make a huge improvement in memory
>   efficiency or a marginal one. I'm unaware of it being worse.
>
>   Just something to consider if cost is your concern.
>
>   On Sat, 27 Aug 2016, Joseph Grasser wrote:
>
>   > We are running 1.4.13 on wheezy. 
>   > In the environment I am looking at there is positive correlation 
> between gets and puts. The ration is something like 10 Gets : 15 Puts. The 
> eviction spikes are
>   also occurring
>   > at peak put times ( which kind of makes senses with the mem pressure 
> ). I think the application is some kind of report generation tool - it's hard 
> to say, my
>   visibility into
>   > the team stuff is pretty low right now as I am a new hire. 
>   >
>   > On Sat, Aug 27, 2016 at 12:34 PM, dormando <dorma...@rydia.net> wrote:
>   >       What version are you on and what're your startup options, out of
>   >       curiosity?
>   >
>   >       A lot of the more recent features can help with memory 
> efficiency, for
>   >       what it's worth.
>   >
>   >       On Sat, 27 Aug 2016, Joseph Grasser wrote:
>   >
>   >       >
>   >       > No problem, I'm trying cut down on cost. We're currently 
> using a dedicated model which works for us on a technical level but is 
> expensive (within budget
>   but still
>   >       expensive).
>   >       >
>   >       > We are experiencing weird spikes in evictions but I think 
> that is the result of developers abusing the service.
>   >       >
>   >       > Tbh I don't know what to make of the evictions yet. I'm gong 
> to dig into it on Monday though.
>   >       >
>   >       >
>   >       > On Aug 27, 2016 1:55 AM, "Ripduman Sohan" 
> <ripduman.so...@gmail.com> wrote:
>   >       >
>   >       >             On Aug 27, 2016 1:46 AM, "dormando" 
> <dorma...@rydia.net> wrote:
>   >       >                   >
>   >       >                   > Thank you for the tips guys!
>   >       >                   >
>   >       >                   > The limiting factor for us is actually 
> memory utilization. We are using the default configuration on sizable ec2 
> nodes and pulling
>   only
>   >       >                   like 20k qps per node. Which is fine
>   >       >                   > because we need to shard the key set over 
> x servers to handle the mem req (30G) per server.
>   >       >                   >
>   >       >                   > I should have looked into that before 
> posting.
>   >       >                   >
>   >       >                   > I am really curious about network 
> saturation though. 200k gets at 1mb per get is a lot of traffic... how can 
> you hit that mark without
>   >       >                   saturation?
>   >       >
>   >       >                   Most people's keys are a lot smaller. In 
> multiget tests with 40 byte keys
>   >       >                   I can pull 20 million+ keys/sec out of the 
> server. probably less than
>   >       >                   10gbps at that rate too. Tends to cap 
> between 600k and 800k/s if you need
>   >       >                   to do a full roundtrip per key fetch. 
> limited by the NIC. Lots of tuning
>   >       >                   required to get around that.
>   >       >
>   >       >
>   >       > I think (but may be wrong) the 200K TPS result is ba

Re: Clarification on Hardware Requirements and Performance

2016-08-27 Thread dormando
You could comb through stats looking for things like evicted_unfetched,
unbalanced slab classes, etc.

1.4.31 with `-o modern` can either make a huge improvement in memory
efficiency or a marginal one. I'm unaware of it being worse.

Just something to consider if cost is your concern.

On Sat, 27 Aug 2016, Joseph Grasser wrote:

> We are running 1.4.13 on wheezy. 
> In the environment I am looking at there is positive correlation between gets 
> and puts. The ration is something like 10 Gets : 15 Puts. The eviction spikes 
> are also occurring
> at peak put times ( which kind of makes senses with the mem pressure ). I 
> think the application is some kind of report generation tool - it's hard to 
> say, my visibility into
> the team stuff is pretty low right now as I am a new hire. 
>
> On Sat, Aug 27, 2016 at 12:34 PM, dormando <dorma...@rydia.net> wrote:
>   What version are you on and what're your startup options, out of
>   curiosity?
>
>   A lot of the more recent features can help with memory efficiency, for
>   what it's worth.
>
>   On Sat, 27 Aug 2016, Joseph Grasser wrote:
>
>   >
>   > No problem, I'm trying cut down on cost. We're currently using a 
> dedicated model which works for us on a technical level but is expensive 
> (within budget but still
>   expensive).
>   >
>   > We are experiencing weird spikes in evictions but I think that is the 
> result of developers abusing the service.
>   >
>   > Tbh I don't know what to make of the evictions yet. I'm gong to dig 
> into it on Monday though.
>   >
>   >
>   > On Aug 27, 2016 1:55 AM, "Ripduman Sohan" <ripduman.so...@gmail.com> 
> wrote:
>   >
>   >             On Aug 27, 2016 1:46 AM, "dormando" <dorma...@rydia.net> 
> wrote:
>   >                   >
>   >                   > Thank you for the tips guys!
>   >                   >
>   >                   > The limiting factor for us is actually memory 
> utilization. We are using the default configuration on sizable ec2 nodes and 
> pulling only
>   >                   like 20k qps per node. Which is fine
>   >                   > because we need to shard the key set over x 
> servers to handle the mem req (30G) per server.
>   >                   >
>   >                   > I should have looked into that before posting.
>   >                   >
>   >                   > I am really curious about network saturation 
> though. 200k gets at 1mb per get is a lot of traffic... how can you hit that 
> mark without
>   >                   saturation?
>   >
>   >                   Most people's keys are a lot smaller. In multiget 
> tests with 40 byte keys
>   >                   I can pull 20 million+ keys/sec out of the server. 
> probably less than
>   >                   10gbps at that rate too. Tends to cap between 600k 
> and 800k/s if you need
>   >                   to do a full roundtrip per key fetch. limited by 
> the NIC. Lots of tuning
>   >                   required to get around that.
>   >
>   >
>   > I think (but may be wrong) the 200K TPS result is based on 1K values. 
>  Dormando should be able to correct me. 
>   >
>   > 20K TPS does seem a little low though.  If you're bound by memory set 
> size have you thought of the cost/tradeoff benefits of using dedicated 
> servers for your
>   memcache?  
>   > I'm quite interested to find out more about what you're trying to 
> optimise.  Is it minimising number of servers, maximising query rate, both, 
> none, etc? 
>   >
>   > Feel free to reach out directly if you can't share this publicly. 
>   >  
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to a topic in 
> the Google Groups "memcached" group.
>   > To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/la-0fH1UzyA/unsubscribe.
>   > To unsubscribe from this group and all its topics, send an email to 
> memcached+unsubscr...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   > --
>   >
>   > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to memcached+unsubscr...@

Re: Clarification on Hardware Requirements and Performance

2016-08-27 Thread dormando
What version are you on and what're your startup options, out of
curiosity?

A lot of the more recent features can help with memory efficiency, for
what it's worth.

On Sat, 27 Aug 2016, Joseph Grasser wrote:

>
> No problem, I'm trying cut down on cost. We're currently using a dedicated 
> model which works for us on a technical level but is expensive (within budget 
> but still expensive).
>
> We are experiencing weird spikes in evictions but I think that is the result 
> of developers abusing the service.
>
> Tbh I don't know what to make of the evictions yet. I'm gong to dig into it 
> on Monday though.
>
>
> On Aug 27, 2016 1:55 AM, "Ripduman Sohan" <ripduman.so...@gmail.com> wrote:
>
> On Aug 27, 2016 1:46 AM, "dormando" <dorma...@rydia.net> wrote:
>   >
>   > Thank you for the tips guys!
>   >
>   > The limiting factor for us is actually memory 
> utilization. We are using the default configuration on sizable ec2 nodes and 
> pulling only
>   like 20k qps per node. Which is fine
>   > because we need to shard the key set over x servers to 
> handle the mem req (30G) per server.
>   >
>   > I should have looked into that before posting.
>   >
>   > I am really curious about network saturation though. 200k 
> gets at 1mb per get is a lot of traffic... how can you hit that mark without
>   saturation?
>
>   Most people's keys are a lot smaller. In multiget tests 
> with 40 byte keys
>   I can pull 20 million+ keys/sec out of the server. probably 
> less than
>   10gbps at that rate too. Tends to cap between 600k and 
> 800k/s if you need
>   to do a full roundtrip per key fetch. limited by the NIC. 
> Lots of tuning
>   required to get around that.
>
>
> I think (but may be wrong) the 200K TPS result is based on 1K values.  
> Dormando should be able to correct me. 
>
> 20K TPS does seem a little low though.  If you're bound by memory set size 
> have you thought of the cost/tradeoff benefits of using dedicated servers for 
> your memcache?  
> I'm quite interested to find out more about what you're trying to optimise.  
> Is it minimising number of servers, maximising query rate, both, none, etc? 
>
> Feel free to reach out directly if you can't share this publicly. 
>  
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the Google 
> Groups "memcached" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/la-0fH1UzyA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Clarification on Hardware Requirements and Performance

2016-08-27 Thread dormando
>
> Thank you for the tips guys!
>
> The limiting factor for us is actually memory utilization. We are using the 
> default configuration on sizable ec2 nodes and pulling only like 20k qps per 
> node. Which is fine
> because we need to shard the key set over x servers to handle the mem req 
> (30G) per server.
>
> I should have looked into that before posting.
>
> I am really curious about network saturation though. 200k gets at 1mb per get 
> is a lot of traffic... how can you hit that mark without saturation?

Most people's keys are a lot smaller. In multiget tests with 40 byte keys
I can pull 20 million+ keys/sec out of the server. probably less than
10gbps at that rate too. Tends to cap between 600k and 800k/s if you need
to do a full roundtrip per key fetch. limited by the NIC. Lots of tuning
required to get around that.

In pure throughput over localhost I've gotten it past 78 gigabits.. using
16k keys.

it's pretty snappy.

>
> On Aug 26, 2016 3:41 PM, "Ripduman Sohan" <ripduman.so...@gmail.com> wrote:
>   I believe it's standard memcached compared on kernel and OpenOnload TCP 
> stacks.  I have had no involvement with this though so it's just conjecture 
> on my part.  I
>   guess sa...@solarflare.com knows more, I can find out if it helps. 
>
>
> On 26 August 2016 at 23:37, dormando <dorma...@rydia.net> wrote:
>   Is that still using a modified codebase?
>
>   On Fri, 26 Aug 2016, Ripduman Sohan wrote:
>
>   > Some more 
> numbers:https://www.solarflare.com/Media/Default/PDFs/Solutions/Solarflare-Accelerating-Memcached-Using-Flareon-Ultra-server-IO-adapter.pdf
>   >
>   > On 26 August 2016 at 07:08, Henrik Schröder <skro...@gmail.com> wrote:
>   >       Anecdotal datapoint: I have a machine with 2xE5520 (Xeon server 
> processor from 2009) which does ~300k requests/s, and handles ~400Mbps of 
> network
>   traffic, but only
>   >       using ~5% of the CPU.
>   >
>   > It's been my experience that you will saturate your network way 
> before you'll saturate your CPU on pretty much any current hardware.
>   >
>   >
>   >
>   > On Thu, Aug 25, 2016 at 10:12 PM, Joseph Grasser 
> <jgrasser@gmail.com> wrote:
>   >       It is written in the docs that "On a fast machine with very 
> high speed networking, memcached can easily handle 200,000+ requests per 
> second." How fast
>   does a
>   >       machine have to be in order to server that load easily? What 
> are the hardware requirements for such a server?
>   >
>   > https://github.com/memcached/memcached/wiki/Performance
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   > --
>   > --rip
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   >
>
>   --
>
>   ---
>   You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   To unsubscribe from this group and stop receiving emails from it, send 
> an email to memcached+unsubscr...@googlegroups.com.
>   For more options, visit https://groups.google.com/d/optout.
>
> --
> --rip
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the Google 
> Groups "memcached" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/la-0fH1UzyA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> memcached+unsubscr...@googlegroups.co

Re: Clarification on Hardware Requirements and Performance

2016-08-26 Thread dormando
Is that still using a modified codebase?

On Fri, 26 Aug 2016, Ripduman Sohan wrote:

> Some more 
> numbers:https://www.solarflare.com/Media/Default/PDFs/Solutions/Solarflare-Accelerating-Memcached-Using-Flareon-Ultra-server-IO-adapter.pdf
>
> On 26 August 2016 at 07:08, Henrik Schröder  wrote:
>   Anecdotal datapoint: I have a machine with 2xE5520 (Xeon server 
> processor from 2009) which does ~300k requests/s, and handles ~400Mbps of 
> network traffic, but only
>   using ~5% of the CPU.
>
> It's been my experience that you will saturate your network way before you'll 
> saturate your CPU on pretty much any current hardware.
>
>
>
> On Thu, Aug 25, 2016 at 10:12 PM, Joseph Grasser  
> wrote:
>   It is written in the docs that "On a fast machine with very high speed 
> networking, memcached can easily handle 200,000+ requests per second." How 
> fast does a
>   machine have to be in order to server that load easily? What are the 
> hardware requirements for such a server?
>
> https://github.com/memcached/memcached/wiki/Performance
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> --rip
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 1.4.29 (large item support)

2016-08-13 Thread dormando
It's looking like slab_chunk_max of 512k will sit better as a default
until stitching is done...

It doesn't create slab classes of say (770k) that still take 1MB of space
due to the slab mover needing consistent page sizes.

It doesn't have the low end efficiency hole between 16k and ~80k slab
sizes that the 16k default had.

With -I 2m and old code + page mover, anything above 1M uses 2M of space.
For -I 20m that ends up being 10->20m. with 512k, at least a 15m item with
an -I 20m would actually take 15m of space... and as mentioned above the
low end isn't damaged. This is with the factor default of 1.25.

Anyone have any thoughts? I'm going to hold this change for monday since
there's a chance I could roll it with the new crawler changes.

I've fixed the recommendation for slab_chunk_max in the releasenotes to be
512k instead of 1m. Hopefully that tides things over.

Sorry, again. Some simple math would've avoided this situation. This is a
complicated change to do on one's own.

On Sat, 13 Aug 2016, Dormando wrote:

> what about without the slab_chunk_max change? (just bare modern) is usage 
> better?
>
> could I get a stats snapshot from the one that filled?
>
> On Aug 13, 2016, at 9:35 AM, andr...@vimeo.com wrote:
>
>   The "STAT bytes" leveled out at 8.1GB for the 1.4.30 instance (with -C 
> -m 10240 -I 20m -c 4096 -o modern,slab_chunk_max=1048576 -f 1.25), vs. 9.4GB 
> for 1.4.25, and
>   STAT curr_items is 120k vs. 136k. So it still seems to be making worse 
> use of memory, but it's far better than any of the previous tries with 
> .29/.30.
>
>   On Friday, August 12, 2016 at 8:46:41 PM UTC-4, Dormando wrote:
> still running ok?
>
> > On Aug 12, 2016, at 1:10 PM, dormando <dorm...@rydia.net> wrote:
> >
> > Ok. So I think I can narrow the change to explicitly set -f 
> 1.08 if the
> > slab_chunk_max is actually 16k... instead of just if `-o 
> modern` is on...
> > I was careful about filling out a lot of the new values after 
> all of the
> > parsing is done but missed some spots.
> >
> > Thanks for trying it out. I'll wait a few hours in case you 
> find anything
> > else.. or I think of anything else.
> >
> > Much appreciated.
> >
> >> On Fri, 12 Aug 2016, and...@vimeo.com wrote:
> >>
> >> That one seems to work okay ― again, I've gotten past 2GB and 
> the hit-rate is within a few points of where it belongs. I don't have numbers 
> for the
>     same situation on .29 but
> >> IIRC it was very bad. So I guess .30 is an improvement there.
> >>
> >> On Friday, August 12, 2016 at 3:34:00 PM UTC-4, Dormando wrote:
> >>      Also, just for completeness:
> >>
> >>      Does:
> >>
> >>      `-C -m 10240 -I 20m -c 4096 -o modern`
> >>
> >>      also fail under .30? (without the slab_chunk_max change)
> >>
> >>>      On Fri, 12 Aug 2016, dormando wrote:
> >>>
> >>> FML.
> >>>
> >>> Please let me know how it goes. I'm going to take a hard look 
> at this and
> >>> see about another bugfix release... there're a couple things 
> I forgot from
> >>> .30 anyway.
> >>>
> >>> Your information will be very helpful though. Thanks again 
> for testing it.
> >>> All of my testing recently was with explicit configuration 
> options, so I
> >>> didn't notice the glitch with -o modern :(
> >>>
> >>>> On Fri, 12 Aug 2016, and...@vimeo.com wrote:
> >>>>
> >>>> It will take a while to fill up entirely, but I passed 2GB 
> with 0 evictions, so it looks like that probably does the job.
> >>>>
> >>>> On Friday, August 12, 2016 at 3:02:47 PM UTC-4, Dormando 
> wrote:
> >>>>       A crap, I think I see it.
> >>>>
> >>>>       Can you add: `-f 1.25` *after* the -o stuff?
> >>>>
> >>>>       like this:
> >>>>
> >>>>       `-C -m 10240 -I 20m -c 4096 -o 
> modern,slab_chunk_max=104

Re: 1.4.29 (large item support)

2016-08-13 Thread Dormando
what about without the slab_chunk_max change? (just bare modern) is usage 
better?

could I get a stats snapshot from the one that filled?

> On Aug 13, 2016, at 9:35 AM, andr...@vimeo.com wrote:
> 
> The "STAT bytes" leveled out at 8.1GB for the 1.4.30 instance (with -C -m 
> 10240 -I 20m -c 4096 -o modern,slab_chunk_max=1048576 -f 1.25), vs. 9.4GB for 
> 1.4.25, and STAT curr_items is 120k vs. 136k. So it still seems to be making 
> worse use of memory, but it's far better than any of the previous tries with 
> .29/.30.
> 
>> On Friday, August 12, 2016 at 8:46:41 PM UTC-4, Dormando wrote:
>> still running ok? 
>> 
>> > On Aug 12, 2016, at 1:10 PM, dormando <dorm...@rydia.net> wrote: 
>> > 
>> > Ok. So I think I can narrow the change to explicitly set -f 1.08 if the 
>> > slab_chunk_max is actually 16k... instead of just if `-o modern` is on... 
>> > I was careful about filling out a lot of the new values after all of the 
>> > parsing is done but missed some spots. 
>> > 
>> > Thanks for trying it out. I'll wait a few hours in case you find anything 
>> > else.. or I think of anything else. 
>> > 
>> > Much appreciated. 
>> > 
>> >> On Fri, 12 Aug 2016, and...@vimeo.com wrote: 
>> >> 
>> >> That one seems to work okay ― again, I've gotten past 2GB and the 
>> >> hit-rate is within a few points of where it belongs. I don't have numbers 
>> >> for the same situation on .29 but 
>> >> IIRC it was very bad. So I guess .30 is an improvement there. 
>> >> 
>> >> On Friday, August 12, 2016 at 3:34:00 PM UTC-4, Dormando wrote: 
>> >>  Also, just for completeness: 
>> >> 
>> >>  Does: 
>> >> 
>> >>  `-C -m 10240 -I 20m -c 4096 -o modern` 
>> >> 
>> >>  also fail under .30? (without the slab_chunk_max change) 
>> >> 
>> >>>  On Fri, 12 Aug 2016, dormando wrote: 
>> >>> 
>> >>> FML. 
>> >>> 
>> >>> Please let me know how it goes. I'm going to take a hard look at this 
>> >>> and 
>> >>> see about another bugfix release... there're a couple things I forgot 
>> >>> from 
>> >>> .30 anyway. 
>> >>> 
>> >>> Your information will be very helpful though. Thanks again for testing 
>> >>> it. 
>> >>> All of my testing recently was with explicit configuration options, so I 
>> >>> didn't notice the glitch with -o modern :( 
>> >>> 
>> >>>> On Fri, 12 Aug 2016, and...@vimeo.com wrote: 
>> >>>> 
>> >>>> It will take a while to fill up entirely, but I passed 2GB with 0 
>> >>>> evictions, so it looks like that probably does the job. 
>> >>>> 
>> >>>> On Friday, August 12, 2016 at 3:02:47 PM UTC-4, Dormando wrote: 
>> >>>>   A crap, I think I see it. 
>> >>>> 
>> >>>>   Can you add: `-f 1.25` *after* the -o stuff? 
>> >>>> 
>> >>>>   like this: 
>> >>>> 
>> >>>>   `-C -m 10240 -I 20m -c 4096 -o modern,slab_chunk_max=1048576 -f 
>> >>>> 1.25` 
>> >>>> 
>> >>>>   And test that out, please? I might have to back out some 
>> >>>> over-aggressive 
>> >>>>   switches... and I keep thinking of making this particular problem 
>> >>>> (which 
>> >>>>   I'll talk about if confirmed) a startup error :( 
>> >>>> 
>> >>>>   On Fri, 12 Aug 2016, and...@vimeo.com wrote: 
>> >>>> 
>> >>>>   > Here you go. 
>> >>>>   > Yes, 1.4.25 is running with `-C -m 10240 -I 20m -c 4096 -o 
>> >>  
>> >> maxconns_fast,hash_algorithm=murmur3,lru_maintainer,lru_crawler,slab_reassign,slab_automove`.
>> >>  
>> >>>>   > 1.4.30 is running with `-C -m 10240 -I 20m -c 4096 -o 
>> >>>> modern,slab_chunk_max=1048576`. 
>> >>>>   > 
>> >>>>   > 
>> >>>>   > On Friday, August 12, 2016 at 2:32:59 PM UTC-4, Dormando wrote: 
>> >>>>   >   Hey, 
>> >>>>   > 
>> >>>>   >   any chance I could see `stats slabs` output as well? a 
>> >>>> 

Re: 1.4.29 (large item support)

2016-08-12 Thread Dormando
still running ok?

> On Aug 12, 2016, at 1:10 PM, dormando <dorma...@rydia.net> wrote:
> 
> Ok. So I think I can narrow the change to explicitly set -f 1.08 if the
> slab_chunk_max is actually 16k... instead of just if `-o modern` is on...
> I was careful about filling out a lot of the new values after all of the
> parsing is done but missed some spots.
> 
> Thanks for trying it out. I'll wait a few hours in case you find anything
> else.. or I think of anything else.
> 
> Much appreciated.
> 
>> On Fri, 12 Aug 2016, andr...@vimeo.com wrote:
>> 
>> That one seems to work okay ― again, I've gotten past 2GB and the hit-rate 
>> is within a few points of where it belongs. I don't have numbers for the 
>> same situation on .29 but
>> IIRC it was very bad. So I guess .30 is an improvement there.
>> 
>> On Friday, August 12, 2016 at 3:34:00 PM UTC-4, Dormando wrote:
>>  Also, just for completeness:
>> 
>>  Does:
>> 
>>  `-C -m 10240 -I 20m -c 4096 -o modern`
>> 
>>  also fail under .30? (without the slab_chunk_max change)
>> 
>>>  On Fri, 12 Aug 2016, dormando wrote:
>>> 
>>> FML.
>>> 
>>> Please let me know how it goes. I'm going to take a hard look at this and
>>> see about another bugfix release... there're a couple things I forgot from
>>> .30 anyway.
>>> 
>>> Your information will be very helpful though. Thanks again for testing it.
>>> All of my testing recently was with explicit configuration options, so I
>>> didn't notice the glitch with -o modern :(
>>> 
>>>> On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>>>> 
>>>> It will take a while to fill up entirely, but I passed 2GB with 0 
>>>> evictions, so it looks like that probably does the job.
>>>> 
>>>> On Friday, August 12, 2016 at 3:02:47 PM UTC-4, Dormando wrote:
>>>>   A crap, I think I see it.
>>>> 
>>>>   Can you add: `-f 1.25` *after* the -o stuff?
>>>> 
>>>>   like this:
>>>> 
>>>>   `-C -m 10240 -I 20m -c 4096 -o modern,slab_chunk_max=1048576 -f 1.25`
>>>> 
>>>>   And test that out, please? I might have to back out some 
>>>> over-aggressive
>>>>   switches... and I keep thinking of making this particular problem 
>>>> (which
>>>>   I'll talk about if confirmed) a startup error :(
>>>> 
>>>>   On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>>>> 
>>>>   > Here you go.
>>>>   > Yes, 1.4.25 is running with `-C -m 10240 -I 20m -c 4096 -o
>>  
>> maxconns_fast,hash_algorithm=murmur3,lru_maintainer,lru_crawler,slab_reassign,slab_automove`.
>>>>   > 1.4.30 is running with `-C -m 10240 -I 20m -c 4096 -o 
>>>> modern,slab_chunk_max=1048576`.
>>>>   >
>>>>   >
>>>>   > On Friday, August 12, 2016 at 2:32:59 PM UTC-4, Dormando wrote:
>>>>   >   Hey,
>>>>   >
>>>>   >   any chance I could see `stats slabs` output as well? a lot 
>>>> of the data's
>>>>   >   in there. Need all three: stats, stats items, stats slabs
>>>>   >
>>>>   >   Also, did you try 1.4.30 with `-o slab_chunk_max=1048576` as 
>>>> well?
>>>>   >
>>>>   >   thanks
>>>>   >
>>>>   >   On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>>>>   >
>>>>   >   > Thanks! That's an improvement. It's still worse than older 
>>>> versions, but it's better than 1.4.29. This time it made it up to about 
>>>> 1.75GB/10GB
>>  used
>>>>   before it
>>>>   >   started evicting;
>>>>   >   > I left it running for another 8 hours and it got up to 
>>>> 2GB, but no higher.
>>>>   >   > Here's some stats output from the old and new versions, in 
>>>> case you can puzzle anything out of it.
>>>>   >   >
>>>>   >   > Thanks,
>>>>   >   >
>>>>   >   > Andrew
>>>>   >   >
>>>>   >   >
>>>>   >   > On Thursday, August 11, 2016 at 6:14:26 PM UTC-4, Dormando 
>>>> wrote:
>>>>   >   >  

Re: 1.4.29 (large item support)

2016-08-12 Thread dormando
Ok. So I think I can narrow the change to explicitly set -f 1.08 if the
slab_chunk_max is actually 16k... instead of just if `-o modern` is on...
I was careful about filling out a lot of the new values after all of the
parsing is done but missed some spots.

Thanks for trying it out. I'll wait a few hours in case you find anything
else.. or I think of anything else.

Much appreciated.

On Fri, 12 Aug 2016, andr...@vimeo.com wrote:

> That one seems to work okay — again, I've gotten past 2GB and the hit-rate is 
> within a few points of where it belongs. I don't have numbers for the same 
> situation on .29 but
> IIRC it was very bad. So I guess .30 is an improvement there.
>
> On Friday, August 12, 2016 at 3:34:00 PM UTC-4, Dormando wrote:
>   Also, just for completeness:
>
>   Does:
>
>   `-C -m 10240 -I 20m -c 4096 -o modern`
>
>   also fail under .30? (without the slab_chunk_max change)
>
>   On Fri, 12 Aug 2016, dormando wrote:
>
>   > FML.
>   >
>   > Please let me know how it goes. I'm going to take a hard look at this 
> and
>   > see about another bugfix release... there're a couple things I forgot 
> from
>   > .30 anyway.
>   >
>   > Your information will be very helpful though. Thanks again for 
> testing it.
>   > All of my testing recently was with explicit configuration options, 
> so I
>   > didn't notice the glitch with -o modern :(
>   >
>   > On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>   >
>   > > It will take a while to fill up entirely, but I passed 2GB with 0 
> evictions, so it looks like that probably does the job.
>   > >
>   > > On Friday, August 12, 2016 at 3:02:47 PM UTC-4, Dormando wrote:
>   > >       A crap, I think I see it.
>   > >
>   > >       Can you add: `-f 1.25` *after* the -o stuff?
>   > >
>   > >       like this:
>   > >
>   > >       `-C -m 10240 -I 20m -c 4096 -o modern,slab_chunk_max=1048576 
> -f 1.25`
>   > >
>   > >       And test that out, please? I might have to back out some 
> over-aggressive
>   > >       switches... and I keep thinking of making this particular 
> problem (which
>   > >       I'll talk about if confirmed) a startup error :(
>   > >
>   > >       On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>   > >
>   > >       > Here you go.
>   > >       > Yes, 1.4.25 is running with `-C -m 10240 -I 20m -c 4096 -o
>   
> maxconns_fast,hash_algorithm=murmur3,lru_maintainer,lru_crawler,slab_reassign,slab_automove`.
>   > >       > 1.4.30 is running with `-C -m 10240 -I 20m -c 4096 -o 
> modern,slab_chunk_max=1048576`.
>   > >       >
>   > >       >
>   > >       > On Friday, August 12, 2016 at 2:32:59 PM UTC-4, Dormando 
> wrote:
>   > >       >       Hey,
>   > >       >
>   > >       >       any chance I could see `stats slabs` output as well? 
> a lot of the data's
>   > >       >       in there. Need all three: stats, stats items, stats 
> slabs
>   > >       >
>   > >       >       Also, did you try 1.4.30 with `-o 
> slab_chunk_max=1048576` as well?
>   > >       >
>   > >       >       thanks
>   > >       >
>   > >       >       On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>   > >       >
>   > >       >       > Thanks! That's an improvement. It's still worse 
> than older versions, but it's better than 1.4.29. This time it made it up to 
> about 1.75GB/10GB
>   used
>   > >       before it
>   > >       >       started evicting;
>   > >       >       > I left it running for another 8 hours and it got up 
> to 2GB, but no higher.
>   > >       >       > Here's some stats output from the old and new 
> versions, in case you can puzzle anything out of it.
>   > >       >       >
>   > >       >       > Thanks,
>   > >       >       >
>   > >       >       > Andrew
>   > >       >       >
>   > >       >       >
>   > >       >       > On Thursday, August 11, 2016 at 6:14:26 PM UTC-4, 
> Dormando wrote:
>   > >       >       >       Hi,
>   > >       >       >
>   > >       >       >       
> ht

Re: 1.4.29 (large item support)

2016-08-12 Thread dormando
Also, just for completeness:

Does:

`-C -m 10240 -I 20m -c 4096 -o modern`

also fail under .30? (without the slab_chunk_max change)

On Fri, 12 Aug 2016, dormando wrote:

> FML.
>
> Please let me know how it goes. I'm going to take a hard look at this and
> see about another bugfix release... there're a couple things I forgot from
> .30 anyway.
>
> Your information will be very helpful though. Thanks again for testing it.
> All of my testing recently was with explicit configuration options, so I
> didn't notice the glitch with -o modern :(
>
> On Fri, 12 Aug 2016, andr...@vimeo.com wrote:
>
> > It will take a while to fill up entirely, but I passed 2GB with 0 
> > evictions, so it looks like that probably does the job.
> >
> > On Friday, August 12, 2016 at 3:02:47 PM UTC-4, Dormando wrote:
> >   A crap, I think I see it.
> >
> >   Can you add: `-f 1.25` *after* the -o stuff?
> >
> >   like this:
> >
> >   `-C -m 10240 -I 20m -c 4096 -o modern,slab_chunk_max=1048576 -f 1.25`
> >
> >   And test that out, please? I might have to back out some 
> > over-aggressive
> >   switches... and I keep thinking of making this particular problem 
> > (which
> >   I'll talk about if confirmed) a startup error :(
> >
> >   On Fri, 12 Aug 2016, and...@vimeo.com wrote:
> >
> >   > Here you go.
> >   > Yes, 1.4.25 is running with `-C -m 10240 -I 20m -c 4096 -o 
> > maxconns_fast,hash_algorithm=murmur3,lru_maintainer,lru_crawler,slab_reassign,slab_automove`.
> >   > 1.4.30 is running with `-C -m 10240 -I 20m -c 4096 -o 
> > modern,slab_chunk_max=1048576`.
> >   >
> >   >
> >   > On Friday, August 12, 2016 at 2:32:59 PM UTC-4, Dormando wrote:
> >   >       Hey,
> >   >
> >   >       any chance I could see `stats slabs` output as well? a lot of 
> > the data's
> >   >       in there. Need all three: stats, stats items, stats slabs
> >   >
> >   >       Also, did you try 1.4.30 with `-o slab_chunk_max=1048576` as 
> > well?
> >   >
> >   >       thanks
> >   >
> >   >       On Fri, 12 Aug 2016, and...@vimeo.com wrote:
> >   >
> >   >       > Thanks! That's an improvement. It's still worse than older 
> > versions, but it's better than 1.4.29. This time it made it up to about 
> > 1.75GB/10GB used
> >   before it
> >   >       started evicting;
> >   >       > I left it running for another 8 hours and it got up to 2GB, 
> > but no higher.
> >   >       > Here's some stats output from the old and new versions, in 
> > case you can puzzle anything out of it.
> >   >       >
> >   >       > Thanks,
> >   >       >
> >   >       > Andrew
> >   >       >
> >   >       >
> >   >       > On Thursday, August 11, 2016 at 6:14:26 PM UTC-4, Dormando 
> > wrote:
> >   >       >       Hi,
> >   >       >
> >   >       >       
> > https://github.com/memcached/memcached/wiki/ReleaseNotes1430
> >   >       >
> >   >       >       Can you please try this? And let me know how it goes 
> > either way :)
> >   >       >
> >   >       >       On Wed, 10 Aug 2016, dormando wrote:
> >   >       >
> >   >       >       > Hey,
> >   >       >       >
> >   >       >       > Thanks and sorry about that. I just found a bug 
> > this week where the new
> >   >       >       > code is over-allocating (though 30MB out of 10G 
> > limit seems odd?)
> >   >       >       >
> >   >       >       > ie: with -I 2m, it would allocate 2 megabytes of 
> > memory and then only use
> >   >       >       > up to 1mb of it. A one-line fix for a missed 
> > variable conversion.
> >   >       >       >
> >   >       >       > Will likely do a bugfix release later tonight with 
> > that and a few other
> >   >       >       > things.
> >   >       >       >
> >   >       >       > Will take a look at your data in hopes it's the 
> > same issue at least,
> >   >       >       > thanks!
> >   >       >       >
> >   >       >       > On Wed, 10 Aug 2016, and...@vimeo.com wrote:
> 

Re: 1.4.29 (large item support)

2016-08-12 Thread dormando
FML.

Please let me know how it goes. I'm going to take a hard look at this and
see about another bugfix release... there're a couple things I forgot from
.30 anyway.

Your information will be very helpful though. Thanks again for testing it.
All of my testing recently was with explicit configuration options, so I
didn't notice the glitch with -o modern :(

On Fri, 12 Aug 2016, andr...@vimeo.com wrote:

> It will take a while to fill up entirely, but I passed 2GB with 0 evictions, 
> so it looks like that probably does the job.
>
> On Friday, August 12, 2016 at 3:02:47 PM UTC-4, Dormando wrote:
>   A crap, I think I see it.
>
>   Can you add: `-f 1.25` *after* the -o stuff?
>
>   like this:
>
>   `-C -m 10240 -I 20m -c 4096 -o modern,slab_chunk_max=1048576 -f 1.25`
>
>   And test that out, please? I might have to back out some over-aggressive
>   switches... and I keep thinking of making this particular problem (which
>   I'll talk about if confirmed) a startup error :(
>
>   On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>
>   > Here you go.
>   > Yes, 1.4.25 is running with `-C -m 10240 -I 20m -c 4096 -o 
> maxconns_fast,hash_algorithm=murmur3,lru_maintainer,lru_crawler,slab_reassign,slab_automove`.
>   > 1.4.30 is running with `-C -m 10240 -I 20m -c 4096 -o 
> modern,slab_chunk_max=1048576`.
>   >
>   >
>   > On Friday, August 12, 2016 at 2:32:59 PM UTC-4, Dormando wrote:
>   >       Hey,
>   >
>   >       any chance I could see `stats slabs` output as well? a lot of 
> the data's
>   >       in there. Need all three: stats, stats items, stats slabs
>   >
>   >       Also, did you try 1.4.30 with `-o slab_chunk_max=1048576` as 
> well?
>   >
>   >       thanks
>   >
>   >       On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>   >
>   >       > Thanks! That's an improvement. It's still worse than older 
> versions, but it's better than 1.4.29. This time it made it up to about 
> 1.75GB/10GB used
>   before it
>   >       started evicting;
>   >       > I left it running for another 8 hours and it got up to 2GB, 
> but no higher.
>   >       > Here's some stats output from the old and new versions, in 
> case you can puzzle anything out of it.
>   >       >
>   >       > Thanks,
>   >       >
>   >       > Andrew
>   >       >
>   >       >
>   >       > On Thursday, August 11, 2016 at 6:14:26 PM UTC-4, Dormando 
> wrote:
>   >       >       Hi,
>   >       >
>   >       >       
> https://github.com/memcached/memcached/wiki/ReleaseNotes1430
>   >       >
>   >       >       Can you please try this? And let me know how it goes 
> either way :)
>   >       >
>   >       >       On Wed, 10 Aug 2016, dormando wrote:
>   >       >
>   >       >       > Hey,
>   >       >       >
>   >       >       > Thanks and sorry about that. I just found a bug this 
> week where the new
>   >       >       > code is over-allocating (though 30MB out of 10G limit 
> seems odd?)
>   >       >       >
>   >       >       > ie: with -I 2m, it would allocate 2 megabytes of 
> memory and then only use
>   >       >       > up to 1mb of it. A one-line fix for a missed variable 
> conversion.
>   >       >       >
>   >       >       > Will likely do a bugfix release later tonight with 
> that and a few other
>   >       >       > things.
>   >       >       >
>   >       >       > Will take a look at your data in hopes it's the same 
> issue at least,
>   >       >       > thanks!
>   >       >       >
>   >       >       > On Wed, 10 Aug 2016, and...@vimeo.com wrote:
>   >       >       >
>   >       >       > > I decided to give this a try on a production setup 
> that has a very bimodal size distribution (about a 50/50 split of 10k-100k 
> values and 1M-10M
>   values)
>   >       and
>   >       >       lots of writes,
>   >       >       > > where we've been running with "-I 10m -m 10240" for 
> a while. It didn't go so great. Almost immediately there were lots and lots 
> of evictions,
>   even
>   >       though the
>   >       >       used memory was
>   >       >       > > only about 

Re: 1.4.29 (large item support)

2016-08-12 Thread dormando
A crap, I think I see it.

Can you add: `-f 1.25` *after* the -o stuff?

like this:

`-C -m 10240 -I 20m -c 4096 -o modern,slab_chunk_max=1048576 -f 1.25`

And test that out, please? I might have to back out some over-aggressive
switches... and I keep thinking of making this particular problem (which
I'll talk about if confirmed) a startup error :(

On Fri, 12 Aug 2016, andr...@vimeo.com wrote:

> Here you go.
> Yes, 1.4.25 is running with `-C -m 10240 -I 20m -c 4096 -o 
> maxconns_fast,hash_algorithm=murmur3,lru_maintainer,lru_crawler,slab_reassign,slab_automove`.
> 1.4.30 is running with `-C -m 10240 -I 20m -c 4096 -o 
> modern,slab_chunk_max=1048576`.
>
>
> On Friday, August 12, 2016 at 2:32:59 PM UTC-4, Dormando wrote:
>   Hey,
>
>   any chance I could see `stats slabs` output as well? a lot of the data's
>   in there. Need all three: stats, stats items, stats slabs
>
>   Also, did you try 1.4.30 with `-o slab_chunk_max=1048576` as well?
>
>   thanks
>
>   On Fri, 12 Aug 2016, and...@vimeo.com wrote:
>
>   > Thanks! That's an improvement. It's still worse than older versions, 
> but it's better than 1.4.29. This time it made it up to about 1.75GB/10GB 
> used before it
>   started evicting;
>   > I left it running for another 8 hours and it got up to 2GB, but no 
> higher.
>   > Here's some stats output from the old and new versions, in case you 
> can puzzle anything out of it.
>   >
>   > Thanks,
>   >
>   > Andrew
>   >
>   >
>   > On Thursday, August 11, 2016 at 6:14:26 PM UTC-4, Dormando wrote:
>   >       Hi,
>   >
>   >       https://github.com/memcached/memcached/wiki/ReleaseNotes1430
>   >
>   >       Can you please try this? And let me know how it goes either way 
> :)
>   >
>   >       On Wed, 10 Aug 2016, dormando wrote:
>   >
>   >       > Hey,
>   >       >
>   >       > Thanks and sorry about that. I just found a bug this week 
> where the new
>   >       > code is over-allocating (though 30MB out of 10G limit seems 
> odd?)
>   >       >
>   >       > ie: with -I 2m, it would allocate 2 megabytes of memory and 
> then only use
>   >       > up to 1mb of it. A one-line fix for a missed variable 
> conversion.
>   >       >
>   >       > Will likely do a bugfix release later tonight with that and a 
> few other
>   >       > things.
>   >       >
>   >       > Will take a look at your data in hopes it's the same issue at 
> least,
>   >       > thanks!
>   >       >
>   >       > On Wed, 10 Aug 2016, and...@vimeo.com wrote:
>   >       >
>   >       > > I decided to give this a try on a production setup that has 
> a very bimodal size distribution (about a 50/50 split of 10k-100k values and 
> 1M-10M values)
>   and
>   >       lots of writes,
>   >       > > where we've been running with "-I 10m -m 10240" for a 
> while. It didn't go so great. Almost immediately there were lots and lots of 
> evictions, even
>   though the
>   >       used memory was
>   >       > > only about 30MB of the 10GB limit, and the number of active 
> keys grew very slowly. "-o slab_chunk_max=1048576" may have had some effect, 
> but it didn't
>   really
>   >       seem like it.
>   >       > > Setting "slabs automove 2" (usually 1) reduced evictions 
> about 50% but it still wasn't enough to get acceptable performance.
>   >       > > I've rolled back to 1.4.25 for the moment, but I'm 
> attaching a log with "stats" and "stats items" from yesterday. "stats sizes" 
> wasn't available due to
>   -C, and
>   >       the log isn't
>   >       > > from as long after startup as I would like, but it's what I 
> got, sorry.
>   >       > >
>   >       > > Let me know if there's anything else I can do to help.
>   >       > >
>   >       > > Thanks,
>   >       > >
>   >       > > Andrew
>   >       > >
>   >       > > On Wednesday, July 13, 2016 at 8:08:49 PM UTC-4, Dormando 
> wrote:
>   >       > >       
> https://github.com/memcached/memcached/wiki/ReleaseNotes1429
>   >       > >
>   >       > >       enjoy.
>   >       > >
>   >       > > --
>  

Re: 1.4.29 (large item support)

2016-08-12 Thread dormando
Hey,

any chance I could see `stats slabs` output as well? a lot of the data's
in there. Need all three: stats, stats items, stats slabs

Also, did you try 1.4.30 with `-o slab_chunk_max=1048576` as well?

thanks

On Fri, 12 Aug 2016, andr...@vimeo.com wrote:

> Thanks! That's an improvement. It's still worse than older versions, but it's 
> better than 1.4.29. This time it made it up to about 1.75GB/10GB used before 
> it started evicting;
> I left it running for another 8 hours and it got up to 2GB, but no higher.
> Here's some stats output from the old and new versions, in case you can 
> puzzle anything out of it.
>
> Thanks,
>
> Andrew
>
>
> On Thursday, August 11, 2016 at 6:14:26 PM UTC-4, Dormando wrote:
>   Hi,
>
>   https://github.com/memcached/memcached/wiki/ReleaseNotes1430
>
>   Can you please try this? And let me know how it goes either way :)
>
>   On Wed, 10 Aug 2016, dormando wrote:
>
>   > Hey,
>   >
>   > Thanks and sorry about that. I just found a bug this week where the 
> new
>   > code is over-allocating (though 30MB out of 10G limit seems odd?)
>   >
>   > ie: with -I 2m, it would allocate 2 megabytes of memory and then only 
> use
>   > up to 1mb of it. A one-line fix for a missed variable conversion.
>   >
>   > Will likely do a bugfix release later tonight with that and a few 
> other
>   > things.
>   >
>   > Will take a look at your data in hopes it's the same issue at least,
>   > thanks!
>   >
>   > On Wed, 10 Aug 2016, and...@vimeo.com wrote:
>   >
>   > > I decided to give this a try on a production setup that has a very 
> bimodal size distribution (about a 50/50 split of 10k-100k values and 1M-10M 
> values) and
>   lots of writes,
>   > > where we've been running with "-I 10m -m 10240" for a while. It 
> didn't go so great. Almost immediately there were lots and lots of evictions, 
> even though the
>   used memory was
>   > > only about 30MB of the 10GB limit, and the number of active keys 
> grew very slowly. "-o slab_chunk_max=1048576" may have had some effect, but 
> it didn't really
>   seem like it.
>   > > Setting "slabs automove 2" (usually 1) reduced evictions about 50% 
> but it still wasn't enough to get acceptable performance.
>   > > I've rolled back to 1.4.25 for the moment, but I'm attaching a log 
> with "stats" and "stats items" from yesterday. "stats sizes" wasn't available 
> due to -C, and
>   the log isn't
>   > > from as long after startup as I would like, but it's what I got, 
> sorry.
>   > >
>   > > Let me know if there's anything else I can do to help.
>   > >
>   > > Thanks,
>   > >
>   > > Andrew
>   > >
>   > > On Wednesday, July 13, 2016 at 8:08:49 PM UTC-4, Dormando wrote:
>   > >       https://github.com/memcached/memcached/wiki/ReleaseNotes1429
>   > >
>   > >       enjoy.
>   > >
>   > > --
>   > >
>   > > ---
>   > > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+...@googlegroups.com.
>   > > For more options, visit https://groups.google.com/d/optout.
>   > >
>   > >
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 1.4.29 (large item support)

2016-08-11 Thread dormando
Hi,

https://github.com/memcached/memcached/wiki/ReleaseNotes1430

Can you please try this? And let me know how it goes either way :)

On Wed, 10 Aug 2016, dormando wrote:

> Hey,
>
> Thanks and sorry about that. I just found a bug this week where the new
> code is over-allocating (though 30MB out of 10G limit seems odd?)
>
> ie: with -I 2m, it would allocate 2 megabytes of memory and then only use
> up to 1mb of it. A one-line fix for a missed variable conversion.
>
> Will likely do a bugfix release later tonight with that and a few other
> things.
>
> Will take a look at your data in hopes it's the same issue at least,
> thanks!
>
> On Wed, 10 Aug 2016, andr...@vimeo.com wrote:
>
> > I decided to give this a try on a production setup that has a very bimodal 
> > size distribution (about a 50/50 split of 10k-100k values and 1M-10M 
> > values) and lots of writes,
> > where we've been running with "-I 10m -m 10240" for a while. It didn't go 
> > so great. Almost immediately there were lots and lots of evictions, even 
> > though the used memory was
> > only about 30MB of the 10GB limit, and the number of active keys grew very 
> > slowly. "-o slab_chunk_max=1048576" may have had some effect, but it didn't 
> > really seem like it.
> > Setting "slabs automove 2" (usually 1) reduced evictions about 50% but it 
> > still wasn't enough to get acceptable performance.
> > I've rolled back to 1.4.25 for the moment, but I'm attaching a log with 
> > "stats" and "stats items" from yesterday. "stats sizes" wasn't available 
> > due to -C, and the log isn't
> > from as long after startup as I would like, but it's what I got, sorry.
> >
> > Let me know if there's anything else I can do to help.
> >
> > Thanks,
> >
> > Andrew
> >
> > On Wednesday, July 13, 2016 at 8:08:49 PM UTC-4, Dormando wrote:
> >   https://github.com/memcached/memcached/wiki/ReleaseNotes1429
> >
> >   enjoy.
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google Groups 
> > "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to memcached+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 1.4.29 (large item support)

2016-08-10 Thread dormando
Hey,

Thanks and sorry about that. I just found a bug this week where the new
code is over-allocating (though 30MB out of 10G limit seems odd?)

ie: with -I 2m, it would allocate 2 megabytes of memory and then only use
up to 1mb of it. A one-line fix for a missed variable conversion.

Will likely do a bugfix release later tonight with that and a few other
things.

Will take a look at your data in hopes it's the same issue at least,
thanks!

On Wed, 10 Aug 2016, andr...@vimeo.com wrote:

> I decided to give this a try on a production setup that has a very bimodal 
> size distribution (about a 50/50 split of 10k-100k values and 1M-10M values) 
> and lots of writes,
> where we've been running with "-I 10m -m 10240" for a while. It didn't go so 
> great. Almost immediately there were lots and lots of evictions, even though 
> the used memory was
> only about 30MB of the 10GB limit, and the number of active keys grew very 
> slowly. "-o slab_chunk_max=1048576" may have had some effect, but it didn't 
> really seem like it.
> Setting "slabs automove 2" (usually 1) reduced evictions about 50% but it 
> still wasn't enough to get acceptable performance.
> I've rolled back to 1.4.25 for the moment, but I'm attaching a log with 
> "stats" and "stats items" from yesterday. "stats sizes" wasn't available due 
> to -C, and the log isn't
> from as long after startup as I would like, but it's what I got, sorry.
>
> Let me know if there's anything else I can do to help.
>
> Thanks,
>
> Andrew
>
> On Wednesday, July 13, 2016 at 8:08:49 PM UTC-4, Dormando wrote:
>   https://github.com/memcached/memcached/wiki/ReleaseNotes1429
>
>   enjoy.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: After deleting the key successfully again key existing after some time.

2016-07-25 Thread dormando
Your app is putting the key back. make it stop doing that?

On Mon, 25 Jul 2016, Babu G wrote:

> Hi,
>
>    After deleting the key successfully again key existing after some time. is 
> there any way to delete key permanently.
>
>  syntax : delete 
> builder_cache_store:views//aa/dd/kk-dd/a=p_7182_451006ec2f0328a2287aa18878086dff41a40fa0
> response: DELETED
> But after some time when I try to get the key I am able to get the value
> Syntax: get 
> builder_cache_store:views//aa/dd/kk-dd/a=p_7182_451006ec2f0328a2287aa18878086dff41a40fa0
>
> NOTE: I used telnet  
>
> Thanks in advance.
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inserting large values into memcached

2016-07-18 Thread dormando
The keys being set look like: sprintf(key, "bob:%d", keyMin);

ie: "bob:1"

the keys being fetched look like:

sprintf(key, "%d", keyMin);

ie: "1". not sure how this is supposed to work?

On Mon, 18 Jul 2016, Sonia wrote:

>
> To execute the program, you will need a txt file containing a list of all the 
> servers. You can make changes to this file accordingly.
> If you execute the code with a -h option you will get details of the command 
> line args.
>
> On Thursday, July 14, 2016 at 2:29:06 PM UTC-5, Dormando wrote:
>   Hey,
>
>   In order to understand I'd have to see the details firsthand at this
>   point: Output of `stats` and `stats settings` commands, as well as any
>   example code you can share.
>
>   On Thu, 14 Jul 2016, Sonia wrote:
>
>   > Everything seems in order, but cant seem to find a reason for this 
> odd behaviour.
>   >
>   > On Wednesday, July 13, 2016 at 11:22:07 PM UTC-5, Dormando wrote:
>   >       I'm not sure why. you can validate the settings via the `stats 
> settings`
>   >       command.
>   >
>   >       On Wed, 13 Jul 2016, Sonia wrote:
>   >
>   >       > I tried inserting 10 values of size 100 bytes. I am able 
> to insert all 10 values but I guess only the last 3 are present in cache 
> since I am getting
>   cache
>   >       misses for the
>   >       > first 7 key-value pairs.Is there a flag that we have to set 
> in the memcached configuration file (I currently have the '-m' option set to 
> 2048)
>   >       >
>   >       > On Wednesday, July 13, 2016 at 7:07:44 PM UTC-5, Dormando 
> wrote:
>   >       >       Hi,
>   >       >
>   >       >       You're trying to store exactly 1024*1024 bytes, but an 
> value in memcached
>   >       >       encompasses the key and the datastructure behind it. 
> Try (1024*1024 -
>   >       >       4096) and see if that stores.
>   >       >
>   >       >       On Wed, 13 Jul 2016, Sonia wrote:
>   >       >
>   >       >       > For 100,000 1MB values I guess the memory I have 
> allocated is insufficient. However, I tried inserting 10 1MB values into 
> memcached but this too
>   fails
>   >       and
>   >       >       memcached_strerror()
>   >       >       > returns "ITEM TOO BIG" (The value I have is a random 
> alpha-numeric char array of size 1048756 bytes).I am currently using a 
> libmemcached client.
>   Also
>   >       please find
>   >       >       the output of
>   >       >       > the stats command in the attached file.
>   >       >       > I really appreciate the help. Thank you.
>   >       >       >
>   >       >       > On Wednesday, July 13, 2016 at 2:39:43 PM UTC-5, 
> Dormando wrote:
>   >       >       >       Can you give more detail as to what exactly is 
> failing? what error message
>   >       >       >       are you getting, what client are you using, 
> what is the `stats` output
>   >       >       >       from some of your memcached instances, etc?
>   >       >       >
>   >       >       >       100,000 1 meg values are going to take at least 
> 100 gigabytes of RAM. if
>   >       >       >       you have 16 2G servers, you only have 32G of 
> RAM available. I can't really
>   >       >       >       help until knowing what your real error message 
> is but that math seems a
>   >       >       >       little odd.
>   >       >       >
>   >       >       >       On Wed, 13 Jul 2016, Sonia wrote:
>   >       >       >
>   >       >       >       > I have just started working with memcached 
> and I am working on a test program where I want to insert 100,000 values of 
> size 1 MB into
>   memcached.I
>   >       >       currently have
>   >       >       >       16 servers
>   >       >       >       > setup and I have setup the memory limit in 
> the memcached configuration file as 2 GB but for some reason my code is still 
> failing.
>   >       >       >       > Has anybody faced a similar situation?
>   >       >       >       >
>   >       >       >       > --
>   >       >       >       >
>   >       >       >       > ---
>   >       > 

Re: slab_chunk_max get value ?

2016-07-15 Thread dormando
Well, knew I forgot something :) Just pushed a fix for that for the next
release.

You'll want to avoid setting slab_chunk_max on your own. Only adjust with
-I, and you can see the state of -I via the "item_size_max" stat in there.

Also, for what it's worth: Your `-m 8` may use more than 8 megabytes of
RAM if you use more than one slab class. there is a minimum memory
assignable of 1mb per slab class.

-Dormando

On Fri, 15 Jul 2016, Centmin Mod George Liu wrote:

> with /usr/local/bin/memcached -d -m 8 -l 127.0.0.1 -p 11211 -c 2048 -b 2048 
> -R 200 -t 4 -n 72 -f 1.25 -u nobody -o 
> slab_reassign,slab_automove,slab_chunk_max=16384 -P
> /var/run/memcached/memached1.pid
> stats output as
>
> echo stats settings | nc 127.0.0.1 11211      
> STAT maxbytes 8388608
> STAT maxconns 2048
> STAT tcpport 11211
> STAT udpport 11211
> STAT inter 127.0.0.1
> STAT verbosity 0
> STAT oldest 0
> STAT evictions on
> STAT domain_socket NULL
> STAT umask 700
> STAT growth_factor 1.25
> STAT chunk_size 72
> STAT num_threads 4
> STAT num_threads_per_udp 4
> STAT stat_key_prefix :
> STAT detail_enabled no
> STAT reqs_per_event 200
> STAT cas_enabled yes
> STAT tcp_backlog 2048
> STAT binding_protocol auto-negotiate
> STAT auth_enabled_sasl no
> STAT item_size_max 1048576
> STAT maxconns_fast no
> STAT hashpower_init 0
> STAT slab_reassign yes
> STAT slab_automove 1
> STAT lru_crawler no
> STAT lru_crawler_sleep 100
> STAT lru_crawler_tocrawl 0
> STAT tail_repair_time 0
> STAT flush_enabled yes
> STAT hash_algorithm jenkins
> STAT lru_maintainer_thread no
> STAT hot_lru_pct 32
> STAT warm_lru_pct 32
> STAT expirezero_does_not_evict no
> STAT idle_timeout 0
> STAT watcher_logbuf_size 262144
> STAT worker_logbuf_size 65536
> STAT track_sizes no
> END
>
> On Friday, July 15, 2016 at 11:56:03 PM UTC+10, Centmin Mod George Liu wrote:
>   any tools or command line available to get the slab_chunk_max size in 
> 1.4.29 ?
> memcached-tools doesn't see any in settings probably needs an update ?
>
> memcached-tool 127.0.0.1:11211 stats
> #127.0.0.1:11211   Field       Value
>          accepting_conns           1
>                auth_cmds           0
>              auth_errors           0
>                    bytes     1380656
>               bytes_read     1904536
>            bytes_written     4211631
>               cas_badval           0
>                 cas_hits           0
>               cas_misses           0
>                cmd_flush           0
>                  cmd_get        1649
>                  cmd_set        1306
>                cmd_touch           0
>              conn_yields           0
>    connection_structures          13
>    crawler_items_checked        1536
>        crawler_reclaimed           0
>         curr_connections          12
>               curr_items         532
>                decr_hits           0
>              decr_misses           0
>              delete_hits           0
>            delete_misses           0
>          direct_reclaims           0
>        evicted_unfetched           0
>                evictions           0
>        expired_unfetched           0
>              get_expired           0
>              get_flushed           0
>                 get_hits        1108
>               get_misses         541
>               hash_bytes      524288
>        hash_is_expanding           0
>         hash_power_level          16
>                incr_hits           0
>              incr_misses           0
>                 libevent 2.0.22-stable
>           limit_maxbytes   268435456
>      listen_disabled_num           0
>         log_watcher_sent           0
>      log_watcher_skipped           0
>       log_worker_dropped           0
>       log_worker_written           0
>      lru_crawler_running           0
>       lru_crawler_starts         366
>   lru_maintainer_juggles        9230
>        lrutail_reflocked           0
>             malloc_fails           0
>            moves_to_cold         478
>            moves_to_warm          15
>         moves_within_lru         281
>                      pid       21995
>             pointer_size          64
>                reclaimed           0
>     rejected_connections           0
>             reserved_fds          20
>            rusage_system    0.45
>              rusage_user    1.09
>    slab_global_page_pool           0
> slab_reassign_busy_items           0
> slab_reassign_chunk_rescues           0
> slab_reassign_evictions_nomem           0
> slab_reassign_inline_reclaim           0
>    slab_reassign_rescues           0
>    slab_reassign

Re: 1.4.29 (large item support)

2016-07-15 Thread dormando
Hi,

I updated the release notes to be a little more clear. You use the -I
option, don't touch slab_chunk_max at all unless you really know what
you're doing.

All you have to do is:

-I 2m

ie:

-I 2m -o modern

... and you have a modern startup option with a 2m item limit.

On Fri, 15 Jul 2016, Centmin Mod George Liu wrote:

> ah units in KB so
> -o slab_chunk_max=2048
>
> ?
>
> how is it passed on command line with modern flag too ?
>
> -o modern,slab_chunk_max=2048
>
> ??
>
> On Friday, July 15, 2016 at 11:06:35 PM UTC+10, Centmin Mod George Liu wrote:
>   so to clarify if i want to raise max item size to 2MB i'd set -o 
> slab_chunk_max=2097152 ?
>   On Thursday, July 14, 2016 at 10:08:49 AM UTC+10, Dormando wrote:
> https://github.com/memcached/memcached/wiki/ReleaseNotes1429
>
> enjoy.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Memcache `delete_misses` are very high compared to `delete_hits` while `evictions` are 0

2016-07-14 Thread dormando
Hey,

the delete_misses and delete_hits counters only tick when a delete command
is run. you'll need to temporarily enable either logging via your app or
via memcached to see where those delete commands are coming from.

On Thu, 14 Jul 2016, Utkarsh Awasthi wrote:

> To be more specific Memcached is being used for caching html pages, If page 
> is not found in memcached, it's been generated and saved against a key. Also 
> no explicit `delete`
> command is issued.
>
> On Wednesday, July 13, 2016 at 5:59:41 PM UTC+5:30, Utkarsh Awasthi wrote:
>   Following are the stats of Memcached:
>   STAT pid 18323
> STAT uptime 384753
> STAT time 1468390067
> STAT version 1.4.27
> STAT libevent 1.4.13-stable
> STAT pointer_size 64
> STAT rusage_user 75.178571
> STAT rusage_system 31.052279
> STAT curr_connections 10
> STAT total_connections 9517
> STAT connection_structures 25
> STAT reserved_fds 20
> STAT cmd_get 9410
> STAT cmd_set 991
> STAT cmd_flush 0
> STAT cmd_touch 0
> STAT get_hits 7788
> STAT get_misses 1622
> STAT get_expired 265
> STAT delete_misses 18439
> STAT delete_hits 117
> STAT incr_misses 0
> STAT incr_hits 0
> STAT decr_misses 0
> STAT decr_hits 0
> STAT cas_misses 0
> STAT cas_hits 0
> STAT cas_badval 0
> STAT touch_hits 0
> STAT touch_misses 0
> STAT auth_cmds 0
> STAT auth_errors 0
> STAT bytes_read 45007488
> STAT bytes_written 321441436
> STAT limit_maxbytes 1073741824
> STAT accepting_conns 1
> STAT listen_disabled_num 0
> STAT time_in_listen_disabled_us 0
> STAT threads 4
> STAT conn_yields 0
> STAT hash_power_level 16
> STAT hash_bytes 524288
> STAT hash_is_expanding 0
> STAT malloc_fails 0
> STAT log_worker_dropped 0
> STAT log_worker_written 0
> STAT log_watcher_skipped 0
> STAT log_watcher_sent 0
> STAT bytes 12134672
> STAT curr_items 266
> STAT total_items 991
> STAT expired_unfetched 188
> STAT evicted_unfetched 0
> STAT evictions 0
> STAT reclaimed 340
> STAT crawler_reclaimed 0
> STAT crawler_items_checked 0
> STAT lrutail_reflocked 0
> END
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inserting large values into memcached

2016-07-14 Thread dormando
Hey,

In order to understand I'd have to see the details firsthand at this
point: Output of `stats` and `stats settings` commands, as well as any
example code you can share.

On Thu, 14 Jul 2016, Sonia wrote:

> Everything seems in order, but cant seem to find a reason for this odd 
> behaviour.
>
> On Wednesday, July 13, 2016 at 11:22:07 PM UTC-5, Dormando wrote:
>   I'm not sure why. you can validate the settings via the `stats settings`
>   command.
>
>   On Wed, 13 Jul 2016, Sonia wrote:
>
>   > I tried inserting 10 values of size 100 bytes. I am able to 
> insert all 10 values but I guess only the last 3 are present in cache since I 
> am getting cache
>   misses for the
>   > first 7 key-value pairs.Is there a flag that we have to set in the 
> memcached configuration file (I currently have the '-m' option set to 2048)
>   >
>   > On Wednesday, July 13, 2016 at 7:07:44 PM UTC-5, Dormando wrote:
>   >       Hi,
>   >
>   >       You're trying to store exactly 1024*1024 bytes, but an value in 
> memcached
>   >       encompasses the key and the datastructure behind it. Try 
> (1024*1024 -
>   >       4096) and see if that stores.
>   >
>   >       On Wed, 13 Jul 2016, Sonia wrote:
>   >
>   >       > For 100,000 1MB values I guess the memory I have allocated is 
> insufficient. However, I tried inserting 10 1MB values into memcached but 
> this too fails
>   and
>   >       memcached_strerror()
>   >       > returns "ITEM TOO BIG" (The value I have is a random 
> alpha-numeric char array of size 1048756 bytes).I am currently using a 
> libmemcached client. Also
>   please find
>   >       the output of
>   >       > the stats command in the attached file.
>   >       > I really appreciate the help. Thank you.
>   >       >
>   >       > On Wednesday, July 13, 2016 at 2:39:43 PM UTC-5, Dormando 
> wrote:
>   >       >       Can you give more detail as to what exactly is failing? 
> what error message
>   >       >       are you getting, what client are you using, what is the 
> `stats` output
>   >       >       from some of your memcached instances, etc?
>   >       >
>   >       >       100,000 1 meg values are going to take at least 100 
> gigabytes of RAM. if
>   >       >       you have 16 2G servers, you only have 32G of RAM 
> available. I can't really
>   >       >       help until knowing what your real error message is but 
> that math seems a
>   >       >       little odd.
>   >       >
>   >       >       On Wed, 13 Jul 2016, Sonia wrote:
>   >       >
>   >       >       > I have just started working with memcached and I am 
> working on a test program where I want to insert 100,000 values of size 1 MB 
> into memcached.I
>   >       currently have
>   >       >       16 servers
>   >       >       > setup and I have setup the memory limit in the 
> memcached configuration file as 2 GB but for some reason my code is still 
> failing.
>   >       >       > Has anybody faced a similar situation?
>   >       >       >
>   >       >       > --
>   >       >       >
>   >       >       > ---
>   >       >       > You received this message because you are subscribed 
> to the Google Groups "memcached" group.
>   >       >       > To unsubscribe from this group and stop receiving 
> emails from it, send an email to memcached+...@googlegroups.com.
>   >       >       > For more options, visit 
> https://groups.google.com/d/optout.
>   >       >       >
>   >       >       >
>   >       >
>   >       > --
>   >       >
>   >       > ---
>   >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group.
>   >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to memcached+...@googlegroups.com.
>   >       > For more options, visit https://groups.google.com/d/optout.
>   >       >
>   >       >
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails f

Re: Inserting large values into memcached

2016-07-13 Thread dormando
I'm not sure why. you can validate the settings via the `stats settings`
command.

On Wed, 13 Jul 2016, Sonia wrote:

> I tried inserting 10 values of size 100 bytes. I am able to insert all 10 
> values but I guess only the last 3 are present in cache since I am getting 
> cache misses for the
> first 7 key-value pairs.Is there a flag that we have to set in the memcached 
> configuration file (I currently have the '-m' option set to 2048)
>
> On Wednesday, July 13, 2016 at 7:07:44 PM UTC-5, Dormando wrote:
>   Hi,
>
>   You're trying to store exactly 1024*1024 bytes, but an value in 
> memcached
>   encompasses the key and the datastructure behind it. Try (1024*1024 -
>   4096) and see if that stores.
>
>   On Wed, 13 Jul 2016, Sonia wrote:
>
>   > For 100,000 1MB values I guess the memory I have allocated is 
> insufficient. However, I tried inserting 10 1MB values into memcached but 
> this too fails and
>   memcached_strerror()
>   > returns "ITEM TOO BIG" (The value I have is a random alpha-numeric 
> char array of size 1048756 bytes).I am currently using a libmemcached client. 
> Also please find
>   the output of
>   > the stats command in the attached file.
>   > I really appreciate the help. Thank you.
>   >
>   > On Wednesday, July 13, 2016 at 2:39:43 PM UTC-5, Dormando wrote:
>   >       Can you give more detail as to what exactly is failing? what 
> error message
>   >       are you getting, what client are you using, what is the `stats` 
> output
>   >       from some of your memcached instances, etc?
>   >
>   >       100,000 1 meg values are going to take at least 100 gigabytes 
> of RAM. if
>   >       you have 16 2G servers, you only have 32G of RAM available. I 
> can't really
>   >       help until knowing what your real error message is but that 
> math seems a
>   >       little odd.
>   >
>   >       On Wed, 13 Jul 2016, Sonia wrote:
>   >
>   >       > I have just started working with memcached and I am working 
> on a test program where I want to insert 100,000 values of size 1 MB into 
> memcached.I
>   currently have
>   >       16 servers
>   >       > setup and I have setup the memory limit in the memcached 
> configuration file as 2 GB but for some reason my code is still failing.
>   >       > Has anybody faced a similar situation?
>   >       >
>   >       > --
>   >       >
>   >       > ---
>   >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group.
>   >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to memcached+...@googlegroups.com.
>   >       > For more options, visit https://groups.google.com/d/optout.
>   >       >
>   >       >
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Memcache `delete_misses` are very high compared to `delete_hits` while `evictions` are 0

2016-07-13 Thread dormando
Do you have a specific question?

Given the subject, possibly your app is issuing delete's for stuff that
isn't in the cache for some reason?

On Wed, 13 Jul 2016, Utkarsh Awasthi wrote:

> Following are the stats of Memcached:
> STAT pid 18323
> STAT uptime 384753
> STAT time 1468390067
> STAT version 1.4.27
> STAT libevent 1.4.13-stable
> STAT pointer_size 64
> STAT rusage_user 75.178571
> STAT rusage_system 31.052279
> STAT curr_connections 10
> STAT total_connections 9517
> STAT connection_structures 25
> STAT reserved_fds 20
> STAT cmd_get 9410
> STAT cmd_set 991
> STAT cmd_flush 0
> STAT cmd_touch 0
> STAT get_hits 7788
> STAT get_misses 1622
> STAT get_expired 265
> STAT delete_misses 18439
> STAT delete_hits 117
> STAT incr_misses 0
> STAT incr_hits 0
> STAT decr_misses 0
> STAT decr_hits 0
> STAT cas_misses 0
> STAT cas_hits 0
> STAT cas_badval 0
> STAT touch_hits 0
> STAT touch_misses 0
> STAT auth_cmds 0
> STAT auth_errors 0
> STAT bytes_read 45007488
> STAT bytes_written 321441436
> STAT limit_maxbytes 1073741824
> STAT accepting_conns 1
> STAT listen_disabled_num 0
> STAT time_in_listen_disabled_us 0
> STAT threads 4
> STAT conn_yields 0
> STAT hash_power_level 16
> STAT hash_bytes 524288
> STAT hash_is_expanding 0
> STAT malloc_fails 0
> STAT log_worker_dropped 0
> STAT log_worker_written 0
> STAT log_watcher_skipped 0
> STAT log_watcher_sent 0
> STAT bytes 12134672
> STAT curr_items 266
> STAT total_items 991
> STAT expired_unfetched 188
> STAT evicted_unfetched 0
> STAT evictions 0
> STAT reclaimed 340
> STAT crawler_reclaimed 0
> STAT crawler_items_checked 0
> STAT lrutail_reflocked 0
> END
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


1.4.29 (large item support)

2016-07-13 Thread dormando
https://github.com/memcached/memcached/wiki/ReleaseNotes1429

enjoy.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inserting large values into memcached

2016-07-13 Thread dormando
Hi,

You're trying to store exactly 1024*1024 bytes, but an value in memcached
encompasses the key and the datastructure behind it. Try (1024*1024 -
4096) and see if that stores.

On Wed, 13 Jul 2016, Sonia wrote:

> For 100,000 1MB values I guess the memory I have allocated is insufficient. 
> However, I tried inserting 10 1MB values into memcached but this too fails 
> and memcached_strerror()
> returns "ITEM TOO BIG" (The value I have is a random alpha-numeric char array 
> of size 1048756 bytes).I am currently using a libmemcached client. Also 
> please find the output of
> the stats command in the attached file.
> I really appreciate the help. Thank you.
>
> On Wednesday, July 13, 2016 at 2:39:43 PM UTC-5, Dormando wrote:
>   Can you give more detail as to what exactly is failing? what error 
> message
>   are you getting, what client are you using, what is the `stats` output
>   from some of your memcached instances, etc?
>
>   100,000 1 meg values are going to take at least 100 gigabytes of RAM. if
>   you have 16 2G servers, you only have 32G of RAM available. I can't 
> really
>   help until knowing what your real error message is but that math seems a
>   little odd.
>
>   On Wed, 13 Jul 2016, Sonia wrote:
>
>   > I have just started working with memcached and I am working on a test 
> program where I want to insert 100,000 values of size 1 MB into memcached.I 
> currently have
>   16 servers
>   > setup and I have setup the memory limit in the memcached 
> configuration file as 2 GB but for some reason my code is still failing.
>   > Has anybody faced a similar situation?
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Inserting large values into memcached

2016-07-13 Thread dormando
Can you give more detail as to what exactly is failing? what error message
are you getting, what client are you using, what is the `stats` output
from some of your memcached instances, etc?

100,000 1 meg values are going to take at least 100 gigabytes of RAM. if
you have 16 2G servers, you only have 32G of RAM available. I can't really
help until knowing what your real error message is but that math seems a
little odd.

On Wed, 13 Jul 2016, Sonia wrote:

> I have just started working with memcached and I am working on a test program 
> where I want to insert 100,000 values of size 1 MB into memcached.I currently 
> have 16 servers
> setup and I have setup the memory limit in the memcached configuration file 
> as 2 GB but for some reason my code is still failing.
> Has anybody faced a similar situation?
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


large item support

2016-07-07 Thread dormando
https://github.com/memcached/memcached/pull/181

proper, this time. hoping to be done by friday.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


1.4.28

2016-07-01 Thread dormando
https://github.com/memcached/memcached/wiki/ReleaseNotes1428

bugfixes. latest feature I'm working on is proper "large item support".
Has a triple-whammy of allowing more slab classes at the lower end, having
better memory efficiency for largeish objects, and allowing increasing the
limit above 1mb without spreading out the slab classes and killing
efficiency.

It's coming along but no PR to share just yet. It'll appear on github in a
day or two.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: replacing/migrating memcache servers

2016-06-27 Thread dormando
+1.

On Mon, 27 Jun 2016, 'Jay Grizzard' via memcached wrote:

> This is really heavily dependent on the exact hashing algorithm your library. 
> Some are (relatively) good about losing a server, others are complete trash 
> about it. I don’t
> know what the perl lib does off the top of my head.
> I’m assuming new-memcache01 is the mcrouter box? If so, you might consider 
> listing it multiple times (assuming the lib doesn’t try to prevent that).
>
> e.g. at first, do:
>
> memcache_servers => ['new-memcache01','memcache02', 'memcache03', 
> 'memcache04’]
>
> …and then…
>
> memcache_servers => ['new-memcache01’,’new-memcache01', 'memcache03', 
> 'memcache04’]
>
> …and then once all four (or howevermany) servers all all new-memcache01, you 
> can just collapse it to [ ‘new-memcache01’ ] and call it done. The keys are 
> all the same
> regardless of where they hash to, so this will let you get away from the old 
> servers without worrying about your keys getting rehashed strangely and 
> completely destroying your
> hit rate. 
>
> -j
>
> On Mon, Jun 27, 2016 at 2:53 PM, Geoff Galitz  
> wrote:
>   Hi...
> We're working on a project to migrate from one set of memcache servers to 
> newer largers ones which are behind a mcrouter.
>
> One option on the table is to take the current memcache_servers array in our 
> perl app and replace a single instance with the larger new memcache 
> server/cluster.  Once
> the new larger server(s) is warmed up we'd start popping servers off the end 
> of the array to shrink it.  Would this a problem?  My concern is that the 
> index would become
> invalid and we'd invalidate the entire memcache pool.
>
> As an example:
>
> Starting config: 
> memcache_servers => ['memcache01', 'memcache02', 'memcache03', 'memcache04']
>
> Next:
> memcache_servers => ['new-memcache01','memcache02', 'memcache03', 
> 'memcache04']
>
> Next: 
> memcache_servers => ['new-memcache01','memcache02', 'memcache03']
>
> and so on.   
>
> Any thoughts?
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: replacing/migrating memcache servers

2016-06-27 Thread dormando
On Mon, 27 Jun 2016, Geoff Galitz wrote:

> Hi...
> We're working on a project to migrate from one set of memcache servers to 
> newer largers ones which are behind a mcrouter.
>
> One option on the table is to take the current memcache_servers array in our 
> perl app and replace a single instance with the larger new memcache 
> server/cluster.  Once the new
> larger server(s) is warmed up we'd start popping servers off the end of the 
> array to shrink it.  Would this a problem?  My concern is that the index 
> would become invalid and
> we'd invalidate the entire memcache pool.
>
> As an example:
>
> Starting config: 
> memcache_servers => ['memcache01', 'memcache02', 'memcache03', 'memcache04']
>
> Next:
> memcache_servers => ['new-memcache01','memcache02', 'memcache03', 
> 'memcache04']
>
> Next: 
> memcache_servers => ['new-memcache01','memcache02', 'memcache03']
>
> and so on.   
>
> Any thoughts?

If you're using consistent hashing, the churn will be relative to the
percentage of change, instead of invaliding everything. The other way
folks tend to do this (and I don't know if mcrouter has this feature) is
to overlay two clusters.

IE: You have your new-memcache array, and then the original array as
separate client instances. First fetch against new-memcache, if miss fetch
from old memcache. When writing new entries, write to the new one (and
optionally update or delete against the old one).

You end up with a big initial "hit" with a roundtrip, but much less
impactful than going to your backing store.

but that's complicated. if you can take a temporary 30% hit each time you
rotate in a new server you should stick with that. Run it during offpeak
or something.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Integrate Memcached with MarkLogic

2016-06-22 Thread dormando
Hey,

http://memcached.org/tutorial - memcached isn't something you integrate
with a database, so much as your application uses along with your
database.

On Tue, 21 Jun 2016, Kishan Ashra wrote:

> I want to use distributed caching with my MarkLogic Database. So, I want to 
> know how can I integrate memcached with MarkLogic.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


online stats sizes, manual slab class specification

2016-06-21 Thread dormando
https://github.com/memcached/memcached/pull/169

in case anyone wants to review/follow along.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: get operation with expiration time

2016-06-20 Thread dormando
Hi,

Since you're talking about contributing or forking, would you mind talking
through your use case a bit more? There may be some solutions that fit
better, and if not, a decent path inward.

First, you seem to be using the binary protocol? Can you share what client
and what features you make use of? It's somewhat rare and would be nice to
know.

Next: you seem to rely on the TTL for when your data actually needs to be
updated? Do you ever evict, are you doing a periodic rebuild of all items,
etc? (sorry, I can't tell what co you're from in case I'm supposed to know
this already :P). It'd be useful to know if you can tolerate cache misses
or if this is some kind of in-memory database situation.

What happens on a cache miss, exactly?

If you'd have the patience, it'd be nice to walk through a few scenarios
and why they do or don't work for you:

1) The fastest way to fill a new cluster of cache nodes is typically to
first-fetch against the new group, and on miss fetch-then-fill from the
old group. Most places I've seen get up to a normal hit ratio within 10
minutes. An hour at most, doing just that. You end up losing the long tail
in a cutover, and it doesn't give you the remaining TTL; but I'd like to
know if this pattern works or if you still have to iterate everything.

2) I wrote the LRU crawler a few years back. It's wired close to being
able to execute nearly arbitrary code while it rolls through each slab
class. I don't want to let too many cats out: but with the new logger code
being able to migrate sockets between threads (one way, have some
experimental code for two way), it may be possible to replace cachedump
with an LRU crawler exension. This would allow you to more trivially dump
valid items and their full headers without impacting the server.

Would that help or alleviate the need of the extra command? It sounds like
you need to iterate all items, fetch the header info, then fetch the item
again and dump it into the new server... If using a crawler you could
stream the headers for valid items in one go, then iterate them to fetch
the data back.

3) A really old common safety net is to embed the TTL again inside the
item you're storing. This allows people to "serve while stale" if a
preferred TTL is surpassed but the underlying item lives on. It also
allows you to schedule a background job or otherwise elect a client to
refresh an item once it nears the TTL, while fast serving the current item
to other clients. You can't move the need with GAT anymore, you'd have to
swap the item via CAS or similar. This is generally a best of all worlds
and has the added safety net effect. If doing this, combined with 1) and
tolerance for long-tail cache misses you can bring a new cluster up to
speed in a few minutes without modifying the daemon at all.

4) I have been thinking about new commands which return extended details
about the item... unfortunately the protocol code is all dumped into the
same file and is in need of a cleanup. I'm also trying to rethink the main
protocols a little more to make something better in the future. This means
if you fork now it could be pretty hard to maintain until after the
refactoring at least. Also have to keep the binary protocol and text
protocol in parity where possible.

Sorry for the wall of text. You have a few options before having to modify
the thing. Lets be absolutely sure it's what you need and that you can
operate on the most minimal effect.

thanks,
-Dormando

On Mon, 20 Jun 2016, 'Vu Tuan Nguyen' via memcached wrote:

> We'd like to get the expiration time with the value on a get operation.   
> We'd use this new operation mainly for an administrative task--cache warming 
> a new group of servers.
> At times, we want to deploy a new server group to replace the previous one 
> seemlessly--doing so in a way that the client apps don't suffer a significant 
> drop in hit rate.  We
> currently do that by deploying the new server group where the remote client 
> processes are dynamically notified that the new group is in write-only mode.  
> We wait for the
> duration of the normal app TTL when the new server group is sufficiently 
> full, then make the new server group readable--removing the previous group 
> shortly afterward.
> Since we have a lot of caches that have different TTL's and we manage the 
> caches separately from the client apps that read/write to them, we'd like to 
> make this cache warm-up
> process quicker (and easier operationally).  We want to dynamically warm-up 
> the new servers so that we don't need to wait for the full TTL before 
> enabling the new servers.  We
> already can get the keys from elsewhere.  We do have the TTL at the time of 
> the write operation too.  However, using the TTL from this is a bit more 
> complex than we'd like,
> and we also don't get the latest expiration if a get-and-touch operation is 
> used.
>
> Can a new opera

Re: LRU-chain update question

2016-06-20 Thread dormando
Hey,

Take a look at the doc/new_lru.txt file included in the source tarball.

(https://github.com/memcached/memcached/blob/master/doc/new_lru.txt for
the lazy)

The LRU-ish algorithm was updated a few versions ago. This documents its
behavior thoroughly.

On Sun, 19 Jun 2016, Hong Yeol Lim wrote:

> Hello
>
> I am testing for the LRU-chain update in Memcached 1.4.26 version at Ubuntu 
> 12.04.
> I have identified LRU-chain update after SET command for a new cache item.
> Memcached updates the LRU-chain after GET command as well?
>
> Please let me know, if I missed something.
>
> Thank you for help.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


1.4.26

2016-06-17 Thread dormando
https://github.com/memcached/memcached/wiki/ReleaseNotes1426

Seems like some folks are waking up (and I should've left the logging
stuff in 'next' for a little bit). Please give it a shot, test some more,
and if there're some minor compilation/cleanup issues I'll cut another
release soon.

I've done a lot of testing so far and things are going well. It's a really
handy feature and I hope people get good use out of it.

-Dormando

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ready for release

2016-06-17 Thread dormando
On Fri, 17 Jun 2016, 'Jay Grizzard' via memcached wrote:

> Also, OSX consistently gives:
>   t/issue_70.t . 1/4
> #   Failed test at t/issue_70.t line 22.
> #          got: 'SERVER_ERROR object too large for cache
> # '
> #     expected: 'CLIENT_ERROR bad command line format
> # '
> # Looks like you failed 1 test of 4.
> t/issue_70.t . Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/4 subtests
>
>
> (I haven’t really gotten a chance to look deeper)

Any chance you could bisect that one? would much appreciate, I don't have
access to OS X.

> I’ll do some testing on RHEL 5 6 & 7 a little later, just to sanity check 
> there.

Thanks!

> No idle kicker patch this release? :(

Release got too large. I'm going to do the next round of PR's and issues
next week. Should be a short release cycle for a little while at least :)

> -j
>
> On Fri, Jun 17, 2016 at 12:50 PM, dormando <dorma...@rydia.net> wrote:
>
>
>   On Fri, 17 Jun 2016, Dagobert Michelsen wrote:
>
>   > Hi Dormando,
>   >
>   > Am 17.06.2016 um 10:39 schrieb dormando <dorma...@rydia.net>:
>   > > https://github.com/memcached/memcached/pull/127 is now "done", as 
> much as
>   > > it'll be done for this release. More work in future releases.
>   > >
>   > > I've already pulled the next branch into master. I'll be cutting 
> this in
>   > > the morning unless someone has a major problem with it between now 
> and
>   > > then :)
>   >
>   > Not a major problem, but the testsuite on Solaris x86 on 32 and 64 
> bit has issues:
>   >   https://buildfarm.opencsw.org/buildbot/waterfall?category=memcached
>   >
>   > > #   Failed test 'canary ==
>   > > BBB...
>   > > #   at 
> /export/home/buildbot/slave/memcached-solaris10-i386/build/t/lib/MemcachedTest.pm
>  line 59.
>   > > #          got: 'END
>   > > # '
>   > > #     expected: 'VALUE canary 0 66560
>   > > # BBB...
>   > > # END
>   > > # '
>   > >
>   > > # Looks like you failed 1 test of 224.
>
>   Thanks! I saw that bouncing around while I was filling up 'next'. 
> there's
>   a flaky test I've punted to the next release to work on. I was able to 
> get
>   it to fail rarely locally. it does seem to fail more often on your
>   buildbot.
>
>   --
>
>   ---
>   You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   To unsubscribe from this group and stop receiving emails from it, send 
> an email to memcached+unsubscr...@googlegroups.com.
>   For more options, visit https://groups.google.com/d/optout.
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ready for release

2016-06-17 Thread dormando


On Fri, 17 Jun 2016, Dagobert Michelsen wrote:

> Hi Dormando,
>
> Am 17.06.2016 um 10:39 schrieb dormando <dorma...@rydia.net>:
> > https://github.com/memcached/memcached/pull/127 is now "done", as much as
> > it'll be done for this release. More work in future releases.
> >
> > I've already pulled the next branch into master. I'll be cutting this in
> > the morning unless someone has a major problem with it between now and
> > then :)
>
> Not a major problem, but the testsuite on Solaris x86 on 32 and 64 bit has 
> issues:
>   https://buildfarm.opencsw.org/buildbot/waterfall?category=memcached
>
> > #   Failed test 'canary ==
> > BBB...
> > #   at 
> > /export/home/buildbot/slave/memcached-solaris10-i386/build/t/lib/MemcachedTest.pm
> >  line 59.
> > #  got: 'END
> > # '
> > # expected: 'VALUE canary 0 66560
> > # BBB...
> > # END
> > # '
> >
> > # Looks like you failed 1 test of 224.

Thanks! I saw that bouncing around while I was filling up 'next'. there's
a flaky test I've punted to the next release to work on. I was able to get
it to fail rarely locally. it does seem to fail more often on your
buildbot.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ready for release

2016-06-17 Thread dormando
Hey,

https://github.com/memcached/memcached/pull/127 is now "done", as much as
it'll be done for this release. More work in future releases.

I've already pulled the next branch into master. I'll be cutting this in
the morning unless someone has a major problem with it between now and
then :)

It's been tough getting feedback on these things. Hard to tell if I'm
doing it well enough or nobody has the time to really look. Hopefully the
former and people get some good use out of this new thing.

thanks,
-Dormando

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Logging branch (for 1.4.26)

2016-06-12 Thread dormando
Yo,

I started this thing back in november:
https://github.com/memcached/memcached/pull/127

Picked it up this weekend and got through most of the cleanup.

With some code cleanup and additional endpoints left, thought I'd start
poking folks for opinions/review. This is a much needed feature which can
provide people with a huge of amount of insight into what their server is
doing. It does this without having to manage STDOUT/STDERR piped into
anything.

Thanks,
-Dormando

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reviewing a script to demo memcache get-set race conditions problems if it is used for distributed locking.

2016-06-08 Thread dormando
t "Being processed by %s" % user
>         else:
>            # But lost the race to find that.
>            print "Processed by another user."
>
>
> def process(source):
>     shutil.move(source, destination)
>     # filename => source
>     memcache.delete(filename)
>
>
>
>
> On Sunday, June 5, 2016 at 1:30:37 AM UTC+5:30, Dormando wrote:
>   The pattern is identical between the one in the wiki and yours. Simply
>   move the delete of the key until you're done using the lock, which would
>   be in a separate request.
>
>   In your case, you would probably set the contents of the key to be the
>   name of the user who has it locked.
>
>   In the original pseudocode:
>   key  = "expensive_frontpage_item"
>   item = memcli:get(key)
>   if (! defined item) {
>       # Oh crap, we have to recache it!
>       # Give us 60 seconds to recache the item.
>       if (memcli:add(key . "_lock", 60)) {
>           item = fetch_expensive_thing_from_database
>           memcli:add(key, item, 86400)
>           memcli:delete(key . "_lock")
>       } else {
>           # Lost the race. We can do any number of things:
>           # - short sleep, then re-fetch.
>           # - try the above a few times, then slow-fetch and return the 
> item
>           # - show the user a page without this expensive content
>           # - show some less expensive content
>           # - throw an error
>       }
>   }
>   return item
>
>   In yours:
>   key = filename
>   item = memcli:get(key)
>   if (! defined item) {
>     if (memcli:add(key . "_lock", lock_timeout_time, my_admin_username)) {
>        [etc]
>     } else {
>        # lost the race, handle how you want
>     }
>   } else if (item.value == my_admin_username) {
>     # good to go for that future request
>   }
>
>   Then when you're done holding the lock, delete the key.
>
>   On Sat, 4 Jun 2016, Nishant Varma wrote:
>
>   > I am reading 
> https://github.com/memcached/memcached/wiki/ProgrammingTricks#ghetto-central-locking,
>  it seems to deal with a slightly different lock scenario of
>   getting some
>   > expensive item from Database to avoid "Stampeding"
>   > In my case its slightly different lock that I need. I show regular 
> files from a folder in a web application to many users. So, to "lock" a file 
> using
>   memcache isn't this
>   > simple API sufficient or I still need that pattern :-)?
>   > def access(filename):
>   >      if memcache.add(filename, timestamp):
>   >         return "Access Granted. Lock Obtained" # Normally this 
> results in checking HTML checkbox against the filename so User can do actions 
> with that/
>   >      else:
>   >         return "Access Denied" # Normally this leads to an alert 
> saying that someone else is working on this.
>   >
>   > Isn't this simple API using add good enough in my case? I am sorry if 
> I am repeating this, but I could not really relate the "fetching expensive 
> item from
>   Database" to my
>   > scenario which is why I even wrote a simple script to test the 
> validity of the claim etc.
>   >
>   > Can you please let me know?
>   >
>   >
>   > On Saturday, June 4, 2016 at 6:42:35 PM UTC+5:30, Nishant Varma wrote:
>   >       Excellent I rely on you. I guess this is the reason you say I 
> am over-engineering this problem. Makes sense :-) I will again check the link 
> you gave me. I
>   will go
>   >       through the documentation this weekend.
>   >
>   >       On Saturday, June 4, 2016 at 1:33:04 PM UTC+5:30, Dormando 
> wrote:
>   >             Hey,
>   >
>   >             You really don't need to test this: I'm telling you 
> flatly, as an author
>   >             of this software and all of the documentation for it, 
> that you should
>   >             absolutely not rely on that pattern. I'm trying to save 
> you some time.
>   >
>   >             The pattern that is slightly better is written explicitly 
> in pseudocode in
>   >             the link I gave you several times in the issue. Please 
> use it.
>   >
>   >             Thanks,
>   >             -Dormando
>   >
>   >             On Fri, 3 Jun 2016, Nishant Varma wrote:
>  

Re: Monitoring specific key patterns in Memcache server

2016-06-04 Thread dormando
Hey,

Memcached can't do that easily right now. You can use the STDOUT logging
but that requires reading everything the server is doing directly.

I started a branch for a better logging situation a few months ago, and am
picking it up to finish over the next few weeks
(https://github.com/memcached/memcached/pull/127). That won't do you any
good in the short term though.

Printing via your overrides is probably the best way of getting the
localized data you need, however I insist *again* that you're
overengineering this.

On Sat, 4 Jun 2016, Nishant Varma wrote:

> Real time of offline solutions would be helpful. If I can profile in 
> background and query it later that is one option. However the only concern 
> with profiling is that I don't
> need to profile everything. Or does memcache do this by default? Can anyone 
> guide me?
>
> On Saturday, June 4, 2016 at 12:13:32 PM UTC+5:30, Nishant Varma wrote:
>   I am trying to troubleshoot an issue which could happen because of 
> get-set race condition. I can monitor the entire memcache operations but I 
> guess it is going to
>   be huge because its a small percentage of the DB itself, so I need to 
> filter only the keys I am interested in. We have a namespacing convention to 
> distinguish our
>   memcache entries so I would like to monitor the get and set that 
> happens to a specific namespace to track the get-set race condition.
>
>
>   I have a client side solution which is to over-ride (decorator in 
> Python) memcache.get and memcache.set to print the arguments if the key 
> matches our desired
>   pattern.
>
>
>   However can this be done in memcache server? We have so many clients 
> and we would have collect this information from all nodes and morover this 
> feels like suited
>   for server. Is there something that we could in memcached like using 
> debug module that would help us?
>
>
> This e-mail message (including any attachments) may contain information that 
> is confidential, protected by the attorney-client or other applicable 
> privileges, or otherwise
> comprising non-public information. This message is intended to be conveyed 
> only to the designated recipient(s). If you have any reason to believe you 
> are not an intended
> recipient of this message, please notify the sender by replying to this 
> message and then deleting it from your system. Any use, dissemination, 
> distribution, or reproduction of
> this message by unintended recipients is not authorized and may be unlawful.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Reviewing a script to demo memcache get-set race conditions problems if it is used for distributed locking.

2016-06-04 Thread dormando
Hey,

You really don't need to test this: I'm telling you flatly, as an author
of this software and all of the documentation for it, that you should
absolutely not rely on that pattern. I'm trying to save you some time.

The pattern that is slightly better is written explicitly in pseudocode in
the link I gave you several times in the issue. Please use it.

Thanks,
-Dormando

On Fri, 3 Jun 2016, Nishant Varma wrote:

> Can anyone help me peer review this script 
> https://gist.github.com/varmanishant/0129286d41038cc21471652a6460a5ff that 
> demonstrate potential problems with get set if it is used
> to implement distributed locking. I was suggested to modify from get set to 
> add in this thread https://github.com/memcached/memcached/issues/163. However 
> I wanted a small
> simulation to demonstrate this.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cannot see Evictions in my Memcache stats?

2016-04-13 Thread dormando
Hey,

Try telnetting to the port and running 'stats'. I don't know what program
you're using, but it looks like it's not printing all of the counters.

you're also on a very old version, which won't have nearly as many
statistics as a newer one.

On Wed, 13 Apr 2016, Ross Peetoom wrote:

> When I print my memcached stats I never see a value for evictions in the 
> results.  That may mean that I have enough memory that nothing ever gets 
> evicted but even then I might
> expect to see evictions with a value of 0, right?  This makes me think there 
> may be somethign wrong with my setup?
>     [pid] => 18830
>     [uptime] => 289546
>     [time] => 1460564747
>     [version] => 1.4.5
>     [pointer_size] => 64
>     [rusage_user] => 2915.408790
>     [rusage_system] => 9298.708381
>     [curr_connections] => 11
>     [total_connections] => 681390
>     [connection_structures] => 215
>     [cmd_get] => 127900677
>     [cmd_set] => 73289571
>     [cmd_flush] => 0
>     [get_hits] => 54628125
>     [get_misses] => 73272552
>     [delete_misses] => 13527498
>     [delete_hits] => 21493686
>     [incr_misses] => 0
>     [incr_hits] => 0
>     [decr_misses] => 0
>     [decr_hits] => 0
>     [cas_misses] => 0
>     [cas_hits] => 0
>     [cas_badval] => 0
>     [auth_cmds] => 0
>     [auth_errors] => 0
>     [bytes_read] => 73601288719
>     [bytes_written] => 135869180709
>     [limit_maxbytes] => 4294967296
>     [accepting_conns] => 1
>     [listen_disabled_num] => 0
>     [threads] => 4
>
> Any ideas, or is the evictions value not appearing a common thing?
>
> Thanks, Ross
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: In AWS GetMisses and Evictions are high

2016-02-25 Thread dormando
Hey,

What version are you running? Is this the amazon hosted memcached cloud
thing or are you actually installing and managing memcached yourself?

On Wed, 24 Feb 2016, raysmithvic1...@gmail.com wrote:

> HiUsing memcached in AWS using to store our application web sessions. Several 
> days noticed that the GetMisses and Evictions are high. Thought to add new 
> nodes or the resize the existing nodes. But not sure whether I want to tweak 
> the max_item_size parameter?
>
> Any idea ?
>
> Thanks
> Ray
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Broken link on memcached home page

2016-02-03 Thread dormando
Thanks. I'm trying to find some time to deal with the wiki having gone
away.

On Wed, 3 Feb 2016, Dan Madere wrote:

> Just a heads up that on http://memcached.org/ there's a broken link when you 
> click "API" in the sentence "Its API is available for most popular languages."
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: memcached-1.4.25.tar.gz sha1

2016-01-04 Thread dormando
$ cat memcached-1.4.25.tar.gz.sha1
7fd0ba9283c61204f196638ecf2e9295688b2314  ./memcached-1.4.25.tar.gz
$ sha1sum ./memcached-1.4.25.tar.gz
7fd0ba9283c61204f196638ecf2e9295688b2314  ./memcached-1.4.25.tar.gz

that's from the original copy on my local disk. Did your browser
decompress the .gz file while downloading? I noticed chrome started doing
that (sometimes?) recently but haven't looked into why.

On Mon, 4 Jan 2016, Wilson MacGyver wrote:

> Hi,
> I just downloaded the file from 
> http://www.memcached.org/files/memcached-1.4.25.tar.gz
>
>
> and then I ran sha1 on it. instead of the 
>
>
> 7fd0ba9283c61204f196638ecf2e9295688b2314
>
> I got 
>
>
>
> bf9e1fcf839dd6a15d6b68223308886bc8abae60
>
>
>
> Is the sha1 posted on the site still correct or did I somehow get a bad file?
>
>
> Thanks,
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: Memcache 1.4.24 + stunnel crashes without any notifications

2015-12-31 Thread dormando
That's unknown territory unfortunately. You'll have to debug sasl a bit on
your own :( getting a backtrace in the way I said could be helpful still.

On Wed, 30 Dec 2015, Prasad Prabhu wrote:

> I looked in /var/log/messages and i see that there are segfaults at the exact 
> time we saw the crashes:
> Dec 30 14:41:08 ip-10-82-116-125 kernel: memcached[10838]: segfault at 10 ip 
> 7f23e26f689e sp 7f23defb5b60 error 4 in 
> libsasl2.so.2.0.23.#prelink#.LawU2D (deleted)[7f23e26e7000+19000]
>
>
> Dec 30 16:01:02 ip-10-82-116-125 kernel: memcached[14329]: segfault at 10 ip 
> 0037b4a0f89e sp 7fa7e6fb8b60 error 4 in 
> libsasl2.so.2.0.23[37b4a0+19000]
> Looks like its memcache + sasl issue.
>
> Regards,
> Prasad
>
> On Wed, Dec 30, 2015 at 4:43 PM, dormando <dorma...@rydia.net> wrote:
>   Hi,
>
>   Do you have a segfault listed in 'dmesg'?
>
>   You can also try: 1.4.25, which had some bugfixes.
>   Or: the memcached-debug binary from a .24 or .25 compile, which has
>   assert()'s in to give better crashes
>   Or: Attach your running process to gdb, wait for it to crash, then get 
> the
>   backtrace.
>
>   If you're doing the latter two options I'd still recommend using .25 if 
> at
>   all possible.
>
>   On Wed, 30 Dec 2015, Prasad Prabhu wrote:
>
>   > We're running memcache v 1.4.24 with stunnel and we're seeing the 
> process crash without any logs or notifications. We've performance tested it 
> with different profiles and are not able to get a consistent cause of 
> failure.It sometimes crashes during a perf test and at other times, dies on 
> the second or third test after passing the first test
>   with no
>   > problems. One odd thing we saw from the memcache stats is that we saw 
> current connections count staying at a high number (about 6000) even after we 
> shut down the services using memcache. 
>   > We dont see any specific pattern in the memory utilization, number of 
> objects or any other details in the stats. Is there any known issue or a 
> problem we are?
>   > We've run memcache with the -v option and didnt see any additional 
> data.
>   >
>   > Memcache configuration:
>   > PORT="1234"
>   > USER="memcached"
>   > MAXCONN="2"
>   > CACHESIZE="1"
>   > OPTIONS="-S -v >> /var/log/memcached 2>&1"
>   >
>   > Anyone have any idea why this might be happening?
>   >
>   > Regards,
>   > Prasad
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   >
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: Evictions before expires.

2015-12-11 Thread dormando
Hey,

The protocol.txt is canonical for memcached proper. the mysql
documentation are based on a fork.

The stats are pretty confusing, I apologize. Your problem is likely slab
calcification. You need to look at 'stats slabs' and not strictly 'stats
items'.

There're a few stats counters under 'stats slabs' that will tell you how
much memory has actually been allocated:
STAT active_slabs 0
STAT total_malloced 0

The 'bytes' value is the total actual bytes stored, which doesn't account
for overhead in the slab allocator (typically 10-20% loss).

This was improved significantly by the 1.4.25 release, which the release
notes (and associated PR) discuss:
https://github.com/memcached/memcached/wiki/ReleaseNotes1425#new-features

You'll see the evidence of this in long running instances if the size of
items changes over time. IE: lots of evictions against slab class 3, which
has 5 pages assigned. Meanwhile slab class 2 has 5000 pages and never
evicts.

1.4.4 isn't supported, so if you continue to have trouble I highly
recommend trying the new code (and start options) first.

-Dormando

On Wed, 9 Dec 2015, Bill Moseley wrote:

> Thank you Denis for the explanation.
>
> On Wed, Dec 9, 2015 at 9:15 PM, Denis Samoylov <samoi...@gmail.com> wrote:
>   if item expired it will not be counted as evicted.
>
>
> I'm not clear on how items with expires time are removed.  After their 
> expires time has past are they just considered as available space in that 
> slab?  
>
>
> I'm confused about "evicted_nonzero" as I see two descriptions:
>
> "Number of times an item which had an explicit expire time set had to be 
> evicted from the LRU before it expired."
> https://github.com/memcached/memcached/blob/master/doc/protocol.txt#L704
>
> Or
>
> "The time of the last evicted non-zero entry"
> https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-memcached-stats-items.html
>
>
>
>  
>     So you need to look into "stats items" 
> (https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-memcached-stats-items.html)
>  and look for evictions. If you have evictions this means that you do not 
> have enough memory (or memory was allocated in other slabs: memcached does 
> not free slabs by default, so if you for example allocated all memory for 4K
>   slab and do not use it anymore, e.g. everything is expired in 4K slab - 
> you wont be able to allocate memory for 1MB slab)
>
>
> Well, I have lots of evictions (see below), but the server has been up for a 
> long time.   
>
>
> But, doesn't this mean I have almost a GB of unallocated memory that can be 
> used to allocate new pages for slabs?
>
>   STAT limit_maxbytes 8,589,934,592
> STAT bytes 7,438,136,137
>
>
>
> If that's so, why would anything be evicted?    Wouldn't Memcached allocate 
> that memory as pages to full slab classes first?
>
> And once a page is allocated to a slab it is never released from that slab.
>
> Here's the non-zero values from "stats items".   The "evicted_nonzero" shows 
> I'm evicting a lot of keys before they expired -- although I don't know the 
> rate.   Seems like a cache flush would be good.
>
> STAT items:1:number 3778
> STAT items:1:age 11829556
> STAT items:2:number 47446517
> STAT items:2:age 11734712
> STAT items:2:evicted 3650
> STAT items:2:evicted_time 300048
> STAT items:3:number 2136100
> STAT items:3:age 11318274
> STAT items:3:evicted 
> STAT items:3:evicted_time 723804
> STAT items:4:number 633476
> STAT items:4:age 12003825
> STAT items:4:evicted 54471719
> STAT items:4:evicted_nonzero 54471719
> STAT items:4:evicted_time 28454
> STAT items:5:number 1267004
> STAT items:5:age 12001656
> STAT items:5:evicted 644320
> STAT items:5:evicted_nonzero 644319
> STAT items:5:evicted_time 30609
> STAT items:6:number 24140
> STAT items:6:age 10669748
> STAT items:6:evicted 2962192
> STAT items:6:evicted_nonzero 2949912
> STAT items:6:evicted_time 1362778
> STAT items:7:number 5459
> STAT items:7:age 12029692
> STAT items:7:evicted 6057558
> STAT items:7:evicted_nonzero 6057558
> STAT items:7:evicted_time 2589
> STAT items:8:number 342866
> STAT items:8:age 11184370
> STAT items:8:evicted 45767
> STAT items:8:evicted_time 847838
> STAT items:9:number 1746
> STAT items:9:age 12029668
> STAT items:9:evicted 2026223
> STAT items:9:evicted_nonzero 2026223
> STAT items:9:evicted_time 2612
> STAT items:10:number 5576
> STAT items:10:age 12029153
> STAT items:10:evicted 4268899
> STAT items:10:evicted_nonzero 4268899
> STAT items:10:evicted_time 3126
> STAT items:11:number 3328
> STAT items:11:age 12030195
> STAT items:11:evicted 4107140
> STAT items:11:evicted_non

Re: new version memcached-1.4.24 and php-pecl-memcached question

2015-11-26 Thread dormando
Hey,

What are your startup arguments for memcached?

In .24 and up the maximum slab count is now 63 instead of 255. if the
default number of slab classes is 42, so you'd have to set your scaling
factor pretty low to get it to go that high.

Would that cause problems for you?

On Thu, 26 Nov 2015, Cheng Ali wrote:

>
> I have installed memcached ,php-pecl-memcached and libmemcached.
>
>
> 1.memcached (1.4.24)
> 2.php5.5.30 , php-pecl-memcached (2.2.0)
> 3.libmemcached(1.0.18)
>
> I try with php script test.php
>  $mc = new Memcached();
> $mc->addServer('127.0.0.1',11211);
> $stats = $mc->getAllKeys();
> echo "Return code:", $mc->getResultCode()."\n";
> echo "Retucn Message:", $mc->getResultMessage () ."\n";
> print_r ($stats);
> ?>
> Return code:9
> Retucn Message:CLIENT ERROR
>
>
>
> I look the log on memcached server log
>
>
> <36 stats cachedump 57 0
> <36 stats cachedump 58 0
> <36 stats cachedump 59 0
> <36 stats cachedump 60 0
> <36 stats cachedump 61 0
> <36 stats cachedump 62 0
> <36 stats cachedump 63 0
>
> >36 CLIENT_ERROR Illegal slab id
> <36 quit
> <36 connection closed.
>
>
> I change to memcached version 1.4.15 ,and try php script test.php
>
>
> Return code:0
> Retucn Message:SUCCESS
> Array
> (
> [0] => qinyao
> [1] => andy
> )
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>


Prepping for 1.4.25

2015-11-19 Thread dormando
Hey,

https://github.com/memcached/memcached - I've dumped what I hope are all
of the commits for 1.4.25 into master. If you're listening, would you mind
attempting to build and run the tests on your platform of choice? Hoping
to get a few different kinds since I still don't have a reliable buildbot
network.

Build process from the source tree is just:
"./autogen.sh && ./configure && make && make test"

Tons of minor things, and then my big slab rebalancer branch went in.
Closed out almost 20 pull requests and a number of old bug reports. Will
continue to clear out the remainder in future releases.

thanks!


memcached 1.4.25

2015-11-19 Thread dormando
https://github.com/memcached/memcached/wiki/ReleaseNotes1425

Much, much thanks to the netflix crew for their feedback in development
the slab automove improvements. I apologize for the bizarre month delay
between its near completion and release to the public. Please enjoy.


Need help clearing out pull requests

2015-11-19 Thread dormando
Yo,

I'd be super thankful if anyone out there in the ether could help review
and test a few things:

1) https://github.com/memcached/memcached/pull/84
https://github.com/dormando/memcached/tree/fix_flags - is my branch doing
similar.

If anyone could give me feedback on what clients do when they run into
"large flag" situations with the existing codebase? I'm especially
concerned about popular clients (the C libmemcached client, the PHP libs,
etc). Will they work with this branch, or will I need to feature-flag it
in some way?

2) https://github.com/memcached/memcached/pull/95

Security stuff is fun but I'd like some wider testing on this branch. Do
tests pass? does it work for you on your X/Y/Z distros?

3) https://github.com/memcached/memcached/issues/120

Anyone use these and willing to give it a test and sign off? (save me a
few minutes of effort I could be spending porting wiki pages over :P)

4) https://github.com/memcached/memcached/issues/124

anyone with windows clients able to validate this?

Thanks!

I started porting over wiki pages from google code to github and
updating them for freshness and accuracy... but it's a little dull so I'll
do it slowly. The main website has had most of its links updated from
google code to github. Hopefully that makes the project look less dead.

Thanks,
-Dormando


Re: testapp: testapp.c:725: safe_recv: Assertion `nr != 0' failed.

2015-11-18 Thread dormando


On Tue, 10 Nov 2015, Terry Hu wrote:

> when iinstall memcached-1.4.23 
> make  make test it tell me
>
> testapp: testapp.c:725: safe_recv: Assertion `nr != 0' failed.   

You'll need to include more of the test output. it's unclear what part of
the tests that failed in.

>
>  anybody can help me 
>
>
> Another thing
>
> tar zxvf memcached-1.4.24.tar.gz 
>
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
>
> if or not This package is a problem

Try opening the .gz file in your editor. You probably downloaded a webpage
by accident? Go get the tarball from http://memcached.org/ - from the
front page.

> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: can't we delete a item at the link head?

2015-11-18 Thread dormando
That assert() is checking that the item you're about to link isn't already
in there as the head. It's a basic double-linking prevention check.

that code is adding the link, not removing it.

On Sat, 7 Nov 2015, Song Zhigang wrote:

> i was reading memcached source code these days. but i got confused at 
> function "do_item_link_q", there is an assert
>
> assert(it != *head);
>
> in this function, is this means that we can't remove a item from the head of 
> the link?
>
> Yours
>
> sidgwick
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>


Re: HELP ! How Memcached objects are stored in the good SLAB

2015-11-16 Thread dormando
key + 1b
val + 2b
item struct 48b
[optional CAS] 8b
then the suffix trailer printf: please read the manpage for snprintf:

snprintf(suffix, 40, " %d %d\r\n", flags, nbytes - 2);

there are four characters in there. two spaces, and then \r and \n. the
two %d's change into: "0" for flags, and "90" for bytes (ignore the -2 in
there). that's 7b total.

In your no-cas test: 5 + 92 + 48 + 7 == 152. that's the exact size of slab
3. (152b).

On Mon, 16 Nov 2015, Nicolas Martinez wrote:

> The test was :
> •Key : test
> •Data : 90 bytes
> •Flags : 0
>
> :(
>
> Le lundi 16 novembre 2015 19:01:50 UTC+1, Nicolas Martinez a écrit :
>   Ok... i think i have understood what you say:
>
>   Key (Characters Number) + 1 + Data (Characters Number) + Header + Chunk 
> Size + CAS Size
>
>   Header = Flags (Characters Number) + Key (Characters Numbers)
>
>       + 2 bytes ( \r\n ) 
>
>       + 4  bytes (2 spaces and 1 \r)
>
>   Chunk Size = 48 bytes (default)
>
>   CAS Size = 8 bytes (64 bits platform)
>
>
>   Seems to be good:
>
>   4 + 1 + 90 + (1+4+2+4) + 48 + 8 = 162 Bytes => Slab 4 (192 Bytes)
>
>
>   But, if i start Memcached with -C, it's wrong.
>
>   4 + 1 + 90 + (1+4+2+4) + 48 = 154 Bytes => Slab 3 (152 Bytes)
>
>
>   It must be in Slab4 (192 Bytes) No?
>
>
>   Le lundi 16 novembre 2015 15:37:59 UTC+1, Nicolas Martinez a écrit :
> Thank you very much for yours answers.Ok for CAS... i don't use 
> -C so i have to add 8 bytes
>
> i still don"t understand these lines :
> >> 2b appended to data_bytes for an extra "\r\n" + 
> So, \r\n == 2b ?
>
> >> + 4 bytes for spaces and \r\n
> which spaces? 
> What is this \r\n ? Isn't already counted before?
> There is 1 "\r\n" and 1 "\"
>
>   echo -e 'set 30 0 3600 30\r\n'$data'\r'| nc ${memcached_server} 11211
>
>
> With your example:
>  * Key : 2c 
>  * Val : 28 bytes 
>  * Flg : 0 (1bytes) 
>
> turns into: 
>    * Key : 3b 
>
> => key number characters + 1
>    * Val : 30b
>
>  => 28 bytes + 2 bytes  ("\r\n")
>    * Hdr : 4b + 3b == 7b 
>
> => What are 4b? 
> => 3b are: flags (1b) + ??
>    * Itm : 56b 
>
> => 48b + 8b (CAS)
>    => 96b. which is the cap for slab 1 in a default setup. 
>
>
>
> Thank you again.
>
> Le lundi 16 novembre 2015 00:31:07 UTC+1, Dormando a écrit :
>   Read carefully:
>
>   item_size_ok(const size_t nkey, const int flags, const int nbytes) {
>
>   passes:
>
>       size_t ntotal = item_make_header(nkey + 1, flags, nbytes,
>                                        prefix, );
>
>   Then conditionally:
>
>       if (settings.use_cas) {
>           ntotal += sizeof(uint64_t);
>       }
>
>   item_make_header is doing:
>   nsuffix = (uint8_t) snprintf(suffix, 40, " %d %d\r\n", flags, nbytes - 
> 2);
>
>   Then:
>
>   return sizeof(item) + nkey + *nsuffix + nbytes;
>
>   It's convoluted but shirt.
>
>   the lengths are:
>   key +
>   1 +
>   data_bytes +
>   2b appended to data_bytes for an extra "\r\n" +
>   stringified rep of the flags + data length
>   + 4 bytes for spaces and \r\n (these are carriage returns, one byte 
> each)
>   + 8b for CAS if enabled
>
>   CAS can be turned off via the -C starttime arg. it takes up 8 bytes.
>
>   Example:
>    * Key : 2c
>    * Val : 28b
>    * Flg : 0 (1b)
>
>   turns into:
>    * Key : 3b
>    * Val : 30b
>    * Hdr : 4b + 3b == 7b
>    * Itm : 56b
>    => 96b. which is the cap for slab 1 in a default setup.
>
>   It's tough to get it exact for small chunks due to the way the header is
>   added. You should ballpark or tune the -f value to align with your
>   observed data.
>
>   On Sun, 15 Nov 2015, Nicolas Martinez wrote:
>
>   > Hi,
>   > Is CAS always used?
>   > If yes, we have to always add 56 bytes to the KEY and VALUE ?
>   > you don't count FLAGS characters??
>   >
>   > I've found that  Flags's size (number of characters) impact the 
> storage.
>   >
>   > Example:
>   >  *  Key : 2 characters = 2 bytes
>   >  *  Value : 28 characters  = 28 bytes
>   >  *  FLAGS : 1 characters = 1 bytes
>   > => 31 bytes
>   >
>   > seems to take the same storage as
>   >  *  Key : 1 characters = 1 bytes
>   &

Re: HELP ! How Memcached objects are stored in the good SLAB

2015-11-15 Thread dormando
Read carefully:

item_size_ok(const size_t nkey, const int flags, const int nbytes) {

passes:

size_t ntotal = item_make_header(nkey + 1, flags, nbytes,
 prefix, );

Then conditionally:

if (settings.use_cas) {
ntotal += sizeof(uint64_t);
}

item_make_header is doing:
nsuffix = (uint8_t) snprintf(suffix, 40, " %d %d\r\n", flags, nbytes - 2);

Then:

return sizeof(item) + nkey + *nsuffix + nbytes;

It's convoluted but shirt.

the lengths are:
key +
1 +
data_bytes +
2b appended to data_bytes for an extra "\r\n" +
stringified rep of the flags + data length
+ 4 bytes for spaces and \r\n (these are carriage returns, one byte each)
+ 8b for CAS if enabled

CAS can be turned off via the -C starttime arg. it takes up 8 bytes.

Example:
 * Key : 2c
 * Val : 28b
 * Flg : 0 (1b)

turns into:
 * Key : 3b
 * Val : 30b
 * Hdr : 4b + 3b == 7b
 * Itm : 56b
 => 96b. which is the cap for slab 1 in a default setup.

It's tough to get it exact for small chunks due to the way the header is
added. You should ballpark or tune the -f value to align with your
observed data.

On Sun, 15 Nov 2015, Nicolas Martinez wrote:

> Hi,
> Is CAS always used?
> If yes, we have to always add 56 bytes to the KEY and VALUE ?
> you don't count FLAGS characters??
>
> I've found that  Flags's size (number of characters) impact the storage.
>
> Example:
>  *  Key : 2 characters = 2 bytes
>  *  Value : 28 characters  = 28 bytes
>  *  FLAGS : 1 characters = 1 bytes
> => 31 bytes
>
> seems to take the same storage as
>  *  Key : 1 characters = 1 bytes
>  *  Value : 28 characters  = 28 bytes
>  *  FLAGS : 2 characters = 2 bytes
> => 31 bytes ... wich is the limit to be stored in Slab1
>
> ok for the /r/n ... should take 4 bytes no?
>
> So, if we count 56 bytes for CAS : 56(cas)+31(key+value+flags)+4(/r/n)= 91
>
> Not good... :(
>
> where I'm wrong ??
>
> Le samedi 14 novembre 2015 23:55:12 UTC+1, Dormando a écrit :
>   The mysql docs don't speak for the main tree... that's their own thing.
>
>   the "sizes" binary that comes with the source tree tells you how many
>   bytes an item will use (though I intend to add this output to the 
> 'stats'
>   output somewhere). With CAS this is 56 bytes.
>
>   56 + 2 + 30 == 88. Class 1 by default (in 1.4.24) is 96 bytes, but the
>   item still ends up in class 2.
>
>   Why is this? (unfortunately?) because memcached pre-renders part of the
>   text protocol into the item header:
>
>   *nsuffix = (uint8_t) snprintf(suffix, 40, " %d %d\r\n", flags, nbytes -
>   2);
>   return sizeof(item) + nkey + *nsuffix + nbytes;
>
>   so the flags + length are getting flattened + \r\n added to the end.
>   Together that's just enough to push it over the edge. It'd also be nice 
> to
>   add a highly optimized numerics printf so I could twiddle options to 
> save
>   a few bytes of memory in objects, but don't get your hopes up for that
>   happening soon :)
>
>   On Sat, 14 Nov 2015, Nicolas Martinez wrote:
>
>   > Add: Memcached version : 1.4.4 (RedHat)
>   >
>   > Le samedi 14 novembre 2015 14:49:37 UTC+1, Nicolas Martinez a écrit :
>   >       Hi, few days i'm reading Memcached documentation and blogs... 
> and i don't understand how objects are stored.
>   >
>   > My test
>   >
>   >       3 slabs : 
>   >
>   >  *  96.0 Bytes
>   >  *  120.0 Bytes
>   >  *  240.0 Bytes
>   > Everywhere, it's told :
>   >  *  if data is < 96 Bytes, it will be stored in Slabs1 (96B)
>   >  *  if data > 96B and < 120B, it will be stored in Slabs2 (120B)
>   >  *  if data > 120B, it will be stored in Slabs3 (240B)
>   >  *  etc.
>   > BUT, for example, when i'm creating an 30B object, it's stored in 
> Slab2 (120B), and NOT in Slab1 (96B).
>   >
>   > External sources:
>   >       For example, the default size for the smallest block is 88 
> bytes (40 bytes of value, and the default 48 bytes for the key and flag 
> data). If the size of the first item you store into the cache is less than 40 
> bytes, then a slab with a block size of 88 bytes is created and the value 
> stored.
>   >       => 
> https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-memcached-using-memory.html
>   >
>   >
>   > WRONG
>   >
>   >       A slab class is a collection of pages divided into same sized 
> chunks. Each slab class is referenced to by its chunk size. So we’ll have 
> Slab class 80kb, Slab class 100kb and so on. Whe

Re: HELP ! How Memcached objects are stored in the good SLAB

2015-11-14 Thread dormando
The mysql docs don't speak for the main tree... that's their own thing.

the "sizes" binary that comes with the source tree tells you how many
bytes an item will use (though I intend to add this output to the 'stats'
output somewhere). With CAS this is 56 bytes.

56 + 2 + 30 == 88. Class 1 by default (in 1.4.24) is 96 bytes, but the
item still ends up in class 2.

Why is this? (unfortunately?) because memcached pre-renders part of the
text protocol into the item header:

*nsuffix = (uint8_t) snprintf(suffix, 40, " %d %d\r\n", flags, nbytes -
2);
return sizeof(item) + nkey + *nsuffix + nbytes;

so the flags + length are getting flattened + \r\n added to the end.
Together that's just enough to push it over the edge. It'd also be nice to
add a highly optimized numerics printf so I could twiddle options to save
a few bytes of memory in objects, but don't get your hopes up for that
happening soon :)

On Sat, 14 Nov 2015, Nicolas Martinez wrote:

> Add: Memcached version : 1.4.4 (RedHat)
>
> Le samedi 14 novembre 2015 14:49:37 UTC+1, Nicolas Martinez a écrit :
>   Hi, few days i'm reading Memcached documentation and blogs... and i 
> don't understand how objects are stored.
>
> My test
>
>   3 slabs : 
>
>  *  96.0 Bytes
>  *  120.0 Bytes
>  *  240.0 Bytes
> Everywhere, it's told :
>  *  if data is < 96 Bytes, it will be stored in Slabs1 (96B)
>  *  if data > 96B and < 120B, it will be stored in Slabs2 (120B)
>  *  if data > 120B, it will be stored in Slabs3 (240B)
>  *  etc.
> BUT, for example, when i'm creating an 30B object, it's stored in Slab2 
> (120B), and NOT in Slab1 (96B).
>
> External sources:
>   For example, the default size for the smallest block is 88 bytes (40 
> bytes of value, and the default 48 bytes for the key and flag data). If the 
> size of the first item you store into the cache is less than 40 bytes, then a 
> slab with a block size of 88 bytes is created and the value stored.
>   => 
> https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-memcached-using-memory.html
>
>
> WRONG
>
>   A slab class is a collection of pages divided into same sized chunks. 
> Each slab class is referenced to by its chunk size. So we’ll have Slab class 
> 80kb, Slab class 100kb and so on. When an object needs to be stored, its size 
> determines where it gets stored. So if the object is larger than 80kb but 
> less than 100kb, it gets stored into Slab
>   class 100kb. 
>   => 
> http://returnfoo.com/2012/02/memcached-memory-allocation-and-optimization-2/
>
>
> WRONG
>
> How i create an object:
>
>   data=$(pwgen 30 -c 1)
>   echo -e 'set 30 0 3600 30\r\n'$data'\r'| nc ${memcached_server} 11211
>
>
> So, when 30B object is creating : 
>  *  key name : 30 = 2 bytes
>  *  value: 30 characters = 30 bytes
>  *  tags : 0 = 1 bytes
> => All = 33 bytes
> If i add the default 48b as explained on Mysql website : 33 + 48 = 81B ... so 
> < Slab1 (91B)... but always stored in Slab2 (120B)
>
> So, the size used to store object in the good Slab is not:
>  *  object value size
>  *  sum of KEY, VALUE and TAGS in bytes
> KEY size : 1 character = 1 B
> VALUE size : 1 character = 1 B
> TAGS size : 1 character = 1 B
>
> ... as read everywhere
>
> So, It seems that: (SUM of KEY+VALUE+TAGS )
>  *  For slab1 96.0 Bytes, data stored if <= 31 B (SUM of 2+28+1 )
>  *  For slab2 120.0 Bytes, data stored if <= 55 B (SUM of 2+52+1 )
>  *  For slab3 152.0 Bytes, data stored if <= 87 B (SUM of 2+84+1 )
>  *  For slab4 192.0 Bytes, data stored if <= 126 B (SUM of 3+122+1 )
>  *  For slab5 240.0 Bytes, data stored if <= 174 B (SUM of 3+170+1 )
>  *  etc.
>
> My configuration :
>  *  Chunk Size : 48
>  *  Chunk Growth Factore: 1,25
>  *  Max Bytes: 64.0 MBytes 
>
> So, someone could explain me how the data is stored in the right Slabs???
> How calculate it??? 
>
> Thank you
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: Where are the current issue tracker and docs?

2015-11-11 Thread dormando
memcached.org is still right. the "source and development repos" link on
the main page links to a wiki page which links to github, though the text
is out of date.

I'll be migrating things when I have time. apologies for the scary
"archived" stuff everywhere.

On Wed, 11 Nov 2015, Teo Tei wrote:

> Hi,
> Is memcached still maintained? Is memcached.org the current official homepage 
> of the project?
> If so, where have the code, issue tracker and documentation wiki been moved?
> Because all the links at memcached.org still point to the project at 
> code.google.com, which is archived since code.google.com is dead.
>
> Thanks
> teo
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>


Re: Fedora 23 Server and memcached role

2015-10-28 Thread dormando
not super sure what the quote would be for? you're looking for someone who
uses this roles thing?

On Tue, 27 Oct 2015, Matthew Miller wrote:

> Hey all. The Fedora Server operating system has a feature called "roles", 
> which allow push-button deployment of selected configs. This can be 
> controlled through an API, and in turn through tools like the Cockpit web 
> GUI. For the Fedora 23 release (one week from today), we're adding a 
> Memcached role. It'd be awesome to have a quote from the project for our
> press release. Would anyone be interested in providing one?
>
> Thanks!
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>


Re: Fedora 23 Server and memcached role

2015-10-28 Thread dormando
I mean maybe? do you have any links or am I supposed to google to figure
out what this is and how it works and how it helps people? :P what
versions are included, how often does it update, etc?

On Wed, 28 Oct 2015, Matthew Miller wrote:

>
> No (although that would be separately useful). More like something from the 
> memcached project saying that it's cool/useful/whatever that Fedora is 
> providing memcached in this way.
>
> On Oct 28, 2015 3:31 AM, "dormando" <dorma...@rydia.net> wrote:
>   not super sure what the quote would be for? you're looking for someone 
> who
>   uses this roles thing?
>
>   On Tue, 27 Oct 2015, Matthew Miller wrote:
>
>   > Hey all. The Fedora Server operating system has a feature called 
> "roles", which allow push-button deployment of selected configs. This can be 
> controlled through an API, and in turn through tools like the Cockpit web 
> GUI. For the Fedora 23 release (one week from today), we're adding a 
> Memcached role. It'd be awesome to have a quote from the
>   project for our
>   > press release. Would anyone be interested in providing one?
>   >
>   > Thanks!
>   >
>   > --
>   >
>   > ---
>   > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>   > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>   > For more options, visit https://groups.google.com/d/optout.
>   >
>   >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>


Re: Check for orphaned items in lru crawler thread

2015-10-07 Thread Dormando
any luck?

> On Oct 6, 2015, at 12:23 AM, Dormando <dorma...@rydia.net> wrote:
> 
> ah. I pushed two more changes earlier. should fix mem_requested. just 
> cosmetic stuff though
> 
>> On Oct 6, 2015, at 12:13 AM, Scott Mansfield <smansfi...@netflix.com> wrote:
>> 
>> Oops, looks like the latest code didn't get into production today. I'm 
>> building it again, same plan as before.
>> 
>>> On Monday, October 5, 2015 at 4:38:00 PM UTC-7, Dormando wrote:
>>> Looking forward to the results. Thanks for getting on this so quickly. 
>>> 
>>> I think there's still a bug in tracking requested memory, and I want to 
>>> move the stats counters to a rollup at the end of a page move. 
>>> Otherwise I think this branch is complete pending any further stability 
>>> issues or feedback. 
>>> 
>>> On Mon, 5 Oct 2015, Scott Mansfield wrote: 
>>> 
>>> > I just put the newest code into production. I'm going to monitor it for a 
>>> > bit to see how it behaves. As long as there's no obvious issues I'll 
>>> > enable reads in a few hours, which are an order of magnitude more 
>>> > traffic. I'll let you know what I find. 
>>> > 
>>> > On Monday, October 5, 2015 at 1:29:03 AM UTC-7, Dormando wrote: 
>>> >   It took a day of running torture tests which took 30-90 minutes to 
>>> > fail, 
>>> >   but along with a bunch of house chores I believe I've found the 
>>> > problem: 
>>> > 
>>> >   https://github.com/dormando/memcached/tree/slab_rebal_next - has a 
>>> > new 
>>> >   commit, specifically this: 
>>> >   
>>> > https://github.com/dormando/memcached/commit/1c32e5eeff5bd2a8cc9b652a2ed808157e4929bb
>>> >  
>>> > 
>>> >   It's somewhat relieving that when I brained this super hard back in 
>>> >   january I may have actually gotten the complex set of interactions 
>>> >   correct, I simply failed to keep typing when converting the 
>>> > comments to 
>>> >   code. 
>>> > 
>>> >   So this has been broken since 1.4.24, but hardly anyone uses the 
>>> > page 
>>> >   mover apparently. It's survived a 5 hour torture test (that I wrote 
>>> > in 
>>> >   2011!) once fixed (previously dying after 30-90 minutes). So please 
>>> > give 
>>> >   this one a try and let me know how it goes. 
>>> > 
>>> >   If it goes well I can merge up some other fixes from PR list and 
>>> > cut a 
>>> >   release, unless someone has feedback for something to change. 
>>> > 
>>> >   thanks! 
>>> > 
>>> >   On Thu, 1 Oct 2015, dormando wrote: 
>>> > 
>>> >   > I've seen items.c:1183 reported elsewhere in 1.4.24... so 
>>> > probably the bug 
>>> >   > was introduced when I rewrote the page mover for that. 
>>> >   > 
>>> >   > I didn't mean to send me a core file: I mean if you dump the core 
>>> > you can 
>>> >   > load it in gdb and get the backtrace (bt + thread apply all bt) 
>>> >   > 
>>> >   > Don't have a handler for convenient attaching :( 
>>> >   > 
>>> >   > didn't get a chance to poke at this today... I'll need another 
>>> > day to try 
>>> >   > it out. 
>>> >   > 
>>> >   > On Thu, 1 Oct 2015, Scott Mansfield wrote: 
>>> >   > 
>>> >   > > Sorry for the data dumps here, but I want to give you 
>>> > everything I have. I found 3 more addresses that showed up in the dmesg 
>>> > logs: 
>>> >   > > 
>>> >   > > $ for addr in 40e013 40eff4 40f7c4; do addr2line -e memcached 
>>> > $addr; done 
>>> >   > > 
>>> >   > > .../build/memcached-1.4.24-slab-rebal-next/slabs.c:265 
>>> > (discriminator 1) 
>>> >   > > 
>>> >   > > .../build/memcached-1.4.24-slab-rebal-next/items.c:312 
>>> > (discriminator 1) 
>>> >   > > 
>>> >   > > .../build/memcached-1.4.24-slab-rebal-next/items.c:1183 
>>> >   > > 
>>> >   > > 
>>> >   > > I still haven't tried to attach a debu

Re: Memcached appears to fail when LAN disconnects from the Internet

2015-10-06 Thread dormando
are you using dns to resolve the IP's of the clients?

On Tue, 6 Oct 2015, Jim Horning wrote:

> I have two conditions that seem to make memcached fail.  I have a memcached 
> server and a few clients all on the same subnet: 192.168.1.X, where these 
> devices use NAT to get to the Internet.
> 1.  If I start up the sever/clients AND the Internet connection is not 
> available (i.e. I disconnect the LAN to my router) then memcached does not 
> appear to work.  My server starts, initializing some of my memcached 
> variables and its VERY slow between each set() variable call.  And, the 
> clients appear to not be able to read the server.
>
> 2.  Or, similarly, if I startup WITH an Internet connection (and all is 
> working well) and then I disconnect from the Internet, then memcached fails.
>
> Note: I have a "default" gateway with genmask of 0.0.0.0 assigned to eth0, 
> and 192.168.1.0 with genmask of 255.255.255.0 assigned to eth0.
>
> If I delete the "default" route when I've lost the Internet connection then 
> all of a sudden memcached works just fine.
>
> Any ideas?
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: Check for orphaned items in lru crawler thread

2015-10-06 Thread Dormando
ah. I pushed two more changes earlier. should fix mem_requested. just cosmetic 
stuff though

> On Oct 6, 2015, at 12:13 AM, Scott Mansfield <smansfi...@netflix.com> wrote:
> 
> Oops, looks like the latest code didn't get into production today. I'm 
> building it again, same plan as before.
> 
>> On Monday, October 5, 2015 at 4:38:00 PM UTC-7, Dormando wrote:
>> Looking forward to the results. Thanks for getting on this so quickly. 
>> 
>> I think there's still a bug in tracking requested memory, and I want to 
>> move the stats counters to a rollup at the end of a page move. 
>> Otherwise I think this branch is complete pending any further stability 
>> issues or feedback. 
>> 
>> On Mon, 5 Oct 2015, Scott Mansfield wrote: 
>> 
>> > I just put the newest code into production. I'm going to monitor it for a 
>> > bit to see how it behaves. As long as there's no obvious issues I'll 
>> > enable reads in a few hours, which are an order of magnitude more traffic. 
>> > I'll let you know what I find. 
>> > 
>> > On Monday, October 5, 2015 at 1:29:03 AM UTC-7, Dormando wrote: 
>> >   It took a day of running torture tests which took 30-90 minutes to 
>> > fail, 
>> >   but along with a bunch of house chores I believe I've found the 
>> > problem: 
>> > 
>> >   https://github.com/dormando/memcached/tree/slab_rebal_next - has a 
>> > new 
>> >   commit, specifically this: 
>> >   
>> > https://github.com/dormando/memcached/commit/1c32e5eeff5bd2a8cc9b652a2ed808157e4929bb
>> >  
>> > 
>> >   It's somewhat relieving that when I brained this super hard back in 
>> >   january I may have actually gotten the complex set of interactions 
>> >   correct, I simply failed to keep typing when converting the comments 
>> > to 
>> >   code. 
>> > 
>> >   So this has been broken since 1.4.24, but hardly anyone uses the 
>> > page 
>> >   mover apparently. It's survived a 5 hour torture test (that I wrote 
>> > in 
>> >   2011!) once fixed (previously dying after 30-90 minutes). So please 
>> > give 
>> >   this one a try and let me know how it goes. 
>> > 
>> >   If it goes well I can merge up some other fixes from PR list and cut 
>> > a 
>> >   release, unless someone has feedback for something to change. 
>> > 
>> >   thanks! 
>> > 
>> >   On Thu, 1 Oct 2015, dormando wrote: 
>> > 
>> >   > I've seen items.c:1183 reported elsewhere in 1.4.24... so probably 
>> > the bug 
>> >   > was introduced when I rewrote the page mover for that. 
>> >   > 
>> >   > I didn't mean to send me a core file: I mean if you dump the core 
>> > you can 
>> >   > load it in gdb and get the backtrace (bt + thread apply all bt) 
>> >   > 
>> >   > Don't have a handler for convenient attaching :( 
>> >   > 
>> >   > didn't get a chance to poke at this today... I'll need another day 
>> > to try 
>> >   > it out. 
>> >   > 
>> >   > On Thu, 1 Oct 2015, Scott Mansfield wrote: 
>> >   > 
>> >   > > Sorry for the data dumps here, but I want to give you everything 
>> > I have. I found 3 more addresses that showed up in the dmesg logs: 
>> >   > > 
>> >   > > $ for addr in 40e013 40eff4 40f7c4; do addr2line -e memcached 
>> > $addr; done 
>> >   > > 
>> >   > > .../build/memcached-1.4.24-slab-rebal-next/slabs.c:265 
>> > (discriminator 1) 
>> >   > > 
>> >   > > .../build/memcached-1.4.24-slab-rebal-next/items.c:312 
>> > (discriminator 1) 
>> >   > > 
>> >   > > .../build/memcached-1.4.24-slab-rebal-next/items.c:1183 
>> >   > > 
>> >   > > 
>> >   > > I still haven't tried to attach a debugger, since the frequency 
>> > of the error would make it hard to catch it. Is there a handler that I 
>> > could add in to dump the stack trace when it segfaults? I'd get a core 
>> > dump, but they would be HUGE and contain confidential information. 
>> >   > > 
>> >   > > 
>> >   > > Below are the full dmesg logs. Out of 205 servers, 35 had dmesg 
>> > logs after a memcached crash, a

Re: Check for orphaned items in lru crawler thread

2015-10-05 Thread dormando
It took a day of running torture tests which took 30-90 minutes to fail,
but along with a bunch of house chores I believe I've found the problem:

https://github.com/dormando/memcached/tree/slab_rebal_next - has a new
commit, specifically this:
https://github.com/dormando/memcached/commit/1c32e5eeff5bd2a8cc9b652a2ed808157e4929bb

It's somewhat relieving that when I brained this super hard back in
january I may have actually gotten the complex set of interactions
correct, I simply failed to keep typing when converting the comments to
code.

So this has been broken since 1.4.24, but hardly anyone uses the page
mover apparently. It's survived a 5 hour torture test (that I wrote in
2011!) once fixed (previously dying after 30-90 minutes). So please give
this one a try and let me know how it goes.

If it goes well I can merge up some other fixes from PR list and cut a
release, unless someone has feedback for something to change.

thanks!

On Thu, 1 Oct 2015, dormando wrote:

> I've seen items.c:1183 reported elsewhere in 1.4.24... so probably the bug
> was introduced when I rewrote the page mover for that.
>
> I didn't mean to send me a core file: I mean if you dump the core you can
> load it in gdb and get the backtrace (bt + thread apply all bt)
>
> Don't have a handler for convenient attaching :(
>
> didn't get a chance to poke at this today... I'll need another day to try
> it out.
>
> On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
> > Sorry for the data dumps here, but I want to give you everything I have. I 
> > found 3 more addresses that showed up in the dmesg logs:
> >
> > $ for addr in 40e013 40eff4 40f7c4; do addr2line -e memcached $addr; done
> >
> > .../build/memcached-1.4.24-slab-rebal-next/slabs.c:265 (discriminator 1)
> >
> > .../build/memcached-1.4.24-slab-rebal-next/items.c:312 (discriminator 1)
> >
> > .../build/memcached-1.4.24-slab-rebal-next/items.c:1183
> >
> >
> > I still haven't tried to attach a debugger, since the frequency of the 
> > error would make it hard to catch it. Is there a handler that I could add 
> > in to dump the stack trace when it segfaults? I'd get a core dump, but they 
> > would be HUGE and contain confidential information.
> >
> >
> > Below are the full dmesg logs. Out of 205 servers, 35 had dmesg logs after 
> > a memcached crash, and only one crashed twice, both times on the original 
> > segfault. Below is the full unified set of dmesg logs, from which you can 
> > get a sense of frequency.
> >
> >
> > [47992.109269] memcached[2798]: segfault at 0 ip 0040e007 sp 
> > 7f4d20d25eb0 error 4 in memcached[40+1d000]
> >
> > [48960.851278] memcached[2805]: segfault at 0 ip 0040e007 sp 
> > 7f3c30d15eb0 error 4 in memcached[40+1d000]
> >
> > [46421.604609] memcached[2784]: segfault at 0 ip 0040e007 sp 
> > 7fdb94612eb0 error 4 in memcached[40+1d000]
> >
> > [48429.671534] traps: memcached[2768] general protection ip:40e013 
> > sp:7f1c32676be0 error:0 in memcached[40+1d000]
> >
> > [71838.979269] memcached[2792]: segfault at 0 ip 0040e007 sp 
> > 7f0162feeeb0 error 4 in memcached[40+1d000]
> >
> > [66763.091475] memcached[2804]: segfault at 0 ip 0040e007 sp 
> > 7f8240170eb0 error 4 in memcached[40+1d000]
> >
> > [102544.376092] traps: memcached[2792] general protection ip:40eff4 
> > sp:7fa58095be18 error:0 in memcached[40+1d000]
> >
> > [49932.757825] memcached[2777]: segfault at 0 ip 0040e007 sp 
> > 7f1ff2131eb0 error 4 in memcached[40+1d000]
> >
> > [50400.415878] memcached[2794]: segfault at 0 ip 0040e007 sp 
> > 7f11a26daeb0 error 4 in memcached[40+1d000]
> >
> > [48986.340345] memcached[2786]: segfault at 0 ip 0040e007 sp 
> > 7f9235279eb0 error 4 in memcached[40+1d000]
> >
> > [44742.175894] memcached[2796]: segfault at 0 ip 0040e007 sp 
> > 7eff3a0cceb0 error 4 in memcached[40+1d000]
> >
> > [49030.431879] memcached[2776]: segfault at 0 ip 0040e007 sp 
> > 7fdef27cfbe0 error 4 in memcached[40+1d000]
> >
> > [50211.611439] traps: memcached[2782] general protection ip:40e013 
> > sp:7f9ee1723be0 error:0 in memcached[40+1d000]
> >
> > [62534.892817] memcached[2783]: segfault at 0 ip 0040e007 sp 
> > 7f37f2d4beb0 error 4 in memcached[40+1d000]
> >
> > [78697.201195] memcached[2801]: segfault at 0 ip 0040e007 sp 
> > 7f696ef1feb0 error 4 in memcached[40+1d000]
> >
> > [48922.246712] memcached[2804]: segfault 

Re: Check for orphaned items in lru crawler thread

2015-10-05 Thread dormando
Looking forward to the results. Thanks for getting on this so quickly.

I think there's still a bug in tracking requested memory, and I want to
move the stats counters to a rollup at the end of a page move.
Otherwise I think this branch is complete pending any further stability
issues or feedback.

On Mon, 5 Oct 2015, Scott Mansfield wrote:

> I just put the newest code into production. I'm going to monitor it for a bit 
> to see how it behaves. As long as there's no obvious issues I'll enable reads 
> in a few hours, which are an order of magnitude more traffic. I'll let you 
> know what I find.
>
> On Monday, October 5, 2015 at 1:29:03 AM UTC-7, Dormando wrote:
>   It took a day of running torture tests which took 30-90 minutes to fail,
>   but along with a bunch of house chores I believe I've found the problem:
>
>   https://github.com/dormando/memcached/tree/slab_rebal_next - has a new
>   commit, specifically this:
>   
> https://github.com/dormando/memcached/commit/1c32e5eeff5bd2a8cc9b652a2ed808157e4929bb
>
>   It's somewhat relieving that when I brained this super hard back in
>   january I may have actually gotten the complex set of interactions
>   correct, I simply failed to keep typing when converting the comments to
>   code.
>
>   So this has been broken since 1.4.24, but hardly anyone uses the page
>   mover apparently. It's survived a 5 hour torture test (that I wrote in
>   2011!) once fixed (previously dying after 30-90 minutes). So please give
>   this one a try and let me know how it goes.
>
>   If it goes well I can merge up some other fixes from PR list and cut a
>   release, unless someone has feedback for something to change.
>
>   thanks!
>
>   On Thu, 1 Oct 2015, dormando wrote:
>
>   > I've seen items.c:1183 reported elsewhere in 1.4.24... so probably 
> the bug
>   > was introduced when I rewrote the page mover for that.
>   >
>   > I didn't mean to send me a core file: I mean if you dump the core you 
> can
>   > load it in gdb and get the backtrace (bt + thread apply all bt)
>   >
>   > Don't have a handler for convenient attaching :(
>   >
>   > didn't get a chance to poke at this today... I'll need another day to 
> try
>   > it out.
>   >
>   > On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >
>   > > Sorry for the data dumps here, but I want to give you everything I 
> have. I found 3 more addresses that showed up in the dmesg logs:
>   > >
>   > > $ for addr in 40e013 40eff4 40f7c4; do addr2line -e memcached 
> $addr; done
>   > >
>   > > .../build/memcached-1.4.24-slab-rebal-next/slabs.c:265 
> (discriminator 1)
>   > >
>   > > .../build/memcached-1.4.24-slab-rebal-next/items.c:312 
> (discriminator 1)
>   > >
>   > > .../build/memcached-1.4.24-slab-rebal-next/items.c:1183
>   > >
>   > >
>   > > I still haven't tried to attach a debugger, since the frequency of 
> the error would make it hard to catch it. Is there a handler that I could add 
> in to dump the stack trace when it segfaults? I'd get a core dump, but they 
> would be HUGE and contain confidential information.
>   > >
>   > >
>   > > Below are the full dmesg logs. Out of 205 servers, 35 had dmesg 
> logs after a memcached crash, and only one crashed twice, both times on the 
> original segfault. Below is the full unified set of dmesg logs, from which 
> you can get a sense of frequency.
>   > >
>   > >
>   > > [47992.109269] memcached[2798]: segfault at 0 ip 0040e007 
> sp 7f4d20d25eb0 error 4 in memcached[40+1d000]
>   > >
>   > > [48960.851278] memcached[2805]: segfault at 0 ip 0040e007 
> sp 7f3c30d15eb0 error 4 in memcached[40+1d000]
>   > >
>   > > [46421.604609] memcached[2784]: segfault at 0 ip 0040e007 
> sp 7fdb94612eb0 error 4 in memcached[40+1d000]
>   > >
>   > > [48429.671534] traps: memcached[2768] general protection ip:40e013 
> sp:7f1c32676be0 error:0 in memcached[40+1d000]
>   > >
>   > > [71838.979269] memcached[2792]: segfault at 0 ip 0040e007 
> sp 7f0162feeeb0 error 4 in memcached[40+1d000]
>   > >
>   > > [66763.091475] memcached[2804]: segfault at 0 ip 0040e007 
> sp 7f8240170eb0 error 4 in memcached[40+1d000]
>   > >
>   > > [102544.376092] traps: memcached[2792] general protec

Re: Check for orphaned items in lru crawler thread

2015-10-02 Thread dormando
07fd74a1d0eb0 error 4 in memcached[40+1d000]
>
> [73640.102621] memcached[2802]: segfault at 0 ip 0040e007 sp 
> 7f7bb30bfeb0 error 4 in memcached[40+1d000]
>
> [67690.640196] memcached[2787]: segfault at 0 ip 0040e007 sp 
> 7f299580feb0 error 4 in memcached[40+1d000]
>
> [57729.895442] memcached[2786]: segfault at 0 ip 0040e007 sp 
> 7f204073deb0 error 4 in memcached[40+1d000]
>
> [48009.284226] memcached[2801]: segfault at 0 ip 0040e007 sp 
> 7f7b30876eb0 error 4 in memcached[40+1d000]
>
> [48198.211826] memcached[2811]: segfault at 0 ip 0040e007 sp 
> 7fd496d79eb0 error 4 in memcached[40+1d000]
>
> [84057.439927] traps: memcached[2804] general protection ip:40f7c4 
> sp:7fbe75fffeb0 error:0 in memcached[40+1d000]
>
> [50215.489124] memcached[2784]: segfault at 0 ip 0040e007 sp 
> 7f3234b73eb0 error 4 in memcached[40+1d000]
>
> [46545.316351] memcached[2789]: segfault at 0 ip 0040e007 sp 
> 00007f362ceedeb0 error 4 in memcached[40+1d000]
>
> [102076.523474] memcached[29833]: segfault at 0 ip 0040e007 sp 
> 7f3c89b9ebe0 error 4 in memcached[40+1d000]
>
> [55537.568254] memcached[2780]: segfault at 0 ip 0040e007 sp 
> 7fc1f6005eb0 error 4 in memcached[40+1d000]
>
>
>
>
> On Thursday, October 1, 2015 at 5:40:35 PM UTC-7, Dormando wrote:
>   got it. that might be a decent hint actually... I had addded a bugfix to
>   the branch to not miscount the mem_requested counter, but it's not 
> working
>   or I missed a spot.
>
>   On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
>   > The number now, after maybe 90 minutes of writes, is 1,446. I think 
> after disabling a lot of the data TTL'd out. I have to disable it for now, 
> again (for unrelated reasons, again). The page that I screenshotted gives 
> real time data, so the numbers were from right then. Last night, it should 
> have shown better numbers in terms of "total_pages",
>   but I didn't
>   > get a screenshot. That number is directly from the stats slabs output.
>   >
>   >
>   >
>   > On Thursday, October 1, 2015 at 4:21:42 PM UTC-7, Dormando wrote:
>   >       ok... slab class 12 claims to have 2 in "total_pages", yet 14g 
> in
>   >       mem_requested. is this stat wrong?
>   >
>   >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >
>   >       > The ones that crashed (new code cluster) were set to only be 
> written to from the client applications. The data is an index key and a 
> series of data keys that are all written one after another. Each key might be 
> hashed to a different server, though, so not all of them are written to the 
> same server. I can give you a snapshot of one of
>   the
>   >       clusters that
>   >       > didn't crash (attached file). I can give more detail offline 
> if you need it.
>   >       >
>   >       >
>   >       > On Thursday, October 1, 2015 at 2:32:53 PM UTC-7, Dormando 
> wrote:
>   >       >       Any chance you could describe (perhaps privately?) in 
> very broad strokes
>   >       >       what the write load looks like? (they're getting only 
> writes, too?).
>   >       >       otherwise I'll have to devise arbitrary torture tests. 
> I'm sure the bug's
>   >       >       in there but it's not obvious yet
>   >       >
>   >       >       On Thu, 1 Oct 2015, dormando wrote:
>   >       >
>   >       >       > perfect, thanks! I have $dayjob as well but will look 
> into this as soon as
>   >       >       > I can. my torture test machines are in a box but I'll 
> try to borrow one
>   >       >       >
>   >       >       > On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >       >       >
>   >       >       > > Yes. Exact args:
>   >       >       > > -p 11211 -u  -l 0.0.0.0 -c 10 -o 
> slab_reassign -o lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 
> 56253
>   >       >       > >
>   >       >       > > On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, 
> Dormando wrote:
>   >       >       > >       Were lru_maintainer/lru_crawler/etc enabled 
> though? even if slab mover is
>   >       >       > >       off, those two were the big changes in .24
>   >       >       > >
>   >       >       > >       On 

Re: Check for orphaned items in lru crawler thread

2015-10-01 Thread dormando
How many servers were you running it on? I hope it wasn't more than a
handful. I'd recommend starting with one :P

can you do an addr2line? what were your startup args, and what was the
commit sha1 for the branch you pulled?

sorry about that :/

On Thu, 1 Oct 2015, Scott Mansfield wrote:

> A few different servers (5 / 205) experienced a segfault all within an hour 
> or so. Unfortunately at this point I'm a bit out of my depth. I have the 
> dmesg output, which is identical for all 5 boxes:
>
> [46545.316351] memcached[2789]: segfault at 0 ip 0040e007 sp 
> 7f362ceedeb0 error 4 in memcached[40+1d000]
>
>
> I can possibly supply the binary file if needed, though we didn't do anything 
> besides the standard setup and compile.
>
>
>
> On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, Dormando wrote:
>   If you look at the new branch there's a commit explaining the new stats.
>
>   You can watch slab_reassing_evictions vs slab_reassign_saves. you can 
> also
>   test automove=1 vs automove=2 (please also turn on the lru_maintainer 
> and
>   lru_crawler).
>
>   The initial branch you were running didn't add any new stats. It just
>   restored an old feature.
>
>   On Tue, 29 Sep 2015, Scott Mansfield wrote:
>
>   > An unrelated prod problem meant I had to stop after about an hour. 
> I'm turning it on again tomorrow morning.
>   > Are there any new metrics I should be looking at? Anything new in the 
> stats output? I'm about to take a look at the diffs as well.
>   >
>   > On Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, Dormando wrote:
>   >       excellent. if automove=2 is too aggressive you'll see that come 
> in in a
>   >       hit ratio reduction.
>   >
>   >       the new branch works with automove=2 as well, but it will 
> attempt to
>   >       rescue valid items in the old slab if possible. I'll still be 
> working on
>   >       it for another few hours today though. I'll mail again when I'm 
> done.
>   >
>   >       On Tue, 29 Sep 2015, Scott Mansfield wrote:
>   >
>   >       > I have the first commit (slab_automove=2) running in prod 
> right now. Later today will be a full load production test of the latest 
> code. I'll just let it run for a few days unless I spot any problems. We have 
> good metrics for latency et. al. from the client side, though network 
> normally dwarfs memcached time.
>   >       >
>   >       > On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, Dormando 
> wrote:
>   >       >       That's unfortunate.
>   >       >
>   >       >       I've done some more work on the branch:
>   >       >       https://github.com/memcached/memcached/pull/112
>   >       >
>   >       >       It's not completely likely you would see enough of an 
> improvement from the
>   >       >       new default mode. However if your item sizes change 
> gradually, items are
>   >       >       reclaimed during expiration, or get overwritten (and 
> thus freed in the old
>   >       >       class), it should work just fine. I have another patch 
> coming which should
>   >       >       help though.
>   >       >
>   >       >       Open to feedback from any interested party.
>   >       >
>   >       >       On Fri, 25 Sep 2015, Scott Mansfield wrote:
>   >       >
>   >       >       > I have it running internally, and it runs fine under 
> normal load. It's difficult to put it into the line of fire for a production 
> workload because of social reasons... As well it's a degenerate case that we 
> normally don't run in to (and actively try to avoid). I'm going to run some 
> heavier load tests on it today. 
>   >       >       >
>   >       >       > On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, 
> Scott Mansfield wrote:
>   >       >       >       I'm working on getting a test going internally. 
> I'll let you know how it goes. 
>   >       >       >
>   >       >       >
>   >       >       > Scott Mansfield
>   >       >       > On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote:
>   >       >       >       Yo,
>   >       >       >
>   >       >       >       
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you
>   >       >       >       mind playing around with the branch here? You 
> can see the sta

Re: Check for orphaned items in lru crawler thread

2015-10-01 Thread dormando
Just before I sit in and try to narrow this down: have you run any host on
1.4.24 mainline with those same start options? just in case the crash is
older

On Thu, 1 Oct 2015, Scott Mansfield wrote:

> Another message for you:
> [78098.528606] traps: memcached[2757] general protection ip:412b9d 
> sp:7fc0700dbdd0 error:0 in memcached[40+1d000]
>
>
> addr2line shows:
>
> $ addr2line -e memcached 412b9d
>
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
>
>
>
> On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando wrote:
>   Ok, thanks!
>
>   I'll noodle this a bit... unfortunately a backtrace might be more 
> helpful.
>   will ask you to attempt to get one if I don't figure anything out in 
> time.
>
>   (allow it to core dump or attach a GDB session and set an ignore handler
>   for sigpipe/int/etc and run "continue")
>
>   what were your full startup args, though?
>
>   On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
>   > The commit was the latest in slab_rebal_next at the time:
>   > 
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>   >
>   > addr2line gave me this output:
>   >
>   > $ addr2line -e memcached 0x40e007
>   >
>   > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>   >
>   >
>   > As well, this was running with production writes, but not reads. Even 
> if we had reads on with the few servers crashing, we're ok architecturally. 
> That's why I can get it out there without worrying too much. For now, I'm 
> going to turn it off. I had a metrics issue anyway that needs to get fixed. 
> Tomorrow I'm planning to test again with more
>   metrics, but I
>   > can get any new code in pretty quick.
>   >
>   >
>   > On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, Dormando wrote:
>   >       How many servers were you running it on? I hope it wasn't more 
> than a
>   >       handful. I'd recommend starting with one :P
>   >
>   >       can you do an addr2line? what were your startup args, and what 
> was the
>   >       commit sha1 for the branch you pulled?
>   >
>   >       sorry about that :/
>   >
>   >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >
>   >       > A few different servers (5 / 205) experienced a segfault all 
> within an hour or so. Unfortunately at this point I'm a bit out of my depth. 
> I have the dmesg output, which is identical for all 5 boxes:
>   >       >
>   >       > [46545.316351] memcached[2789]: segfault at 0 ip 
> 0040e007 sp 7f362ceedeb0 error 4 in memcached[40+1d000]
>   >       >
>   >       >
>   >       > I can possibly supply the binary file if needed, though we 
> didn't do anything besides the standard setup and compile.
>   >       >
>   >       >
>   >       >
>   >       > On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, Dormando 
> wrote:
>   >       >       If you look at the new branch there's a commit 
> explaining the new stats.
>   >       >
>   >       >       You can watch slab_reassing_evictions vs 
> slab_reassign_saves. you can also
>   >       >       test automove=1 vs automove=2 (please also turn on the 
> lru_maintainer and
>   >       >       lru_crawler).
>   >       >
>   >       >       The initial branch you were running didn't add any new 
> stats. It just
>   >       >       restored an old feature.
>   >       >
>   >       >       On Tue, 29 Sep 2015, Scott Mansfield wrote:
>   >       >
>   >       >       > An unrelated prod problem meant I had to stop after 
> about an hour. I'm turning it on again tomorrow morning.
>   >       >       > Are there any new metrics I should be looking at? 
> Anything new in the stats output? I'm about to take a look at the diffs as 
> well.
>   >       >       >
>   >       >       > On Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, 
> Dormando wrote:
>   >       >       >       excellent. if automove=2 is too aggressive 
> you'll see that come in in a
>   >       >       >       hit ratio reduction.
>   >       >       >
>   >       >       >       the n

Re: Check for orphaned items in lru crawler thread

2015-10-01 Thread dormando
perfect, thanks! I have $dayjob as well but will look into this as soon as
I can. my torture test machines are in a box but I'll try to borrow one

On Thu, 1 Oct 2015, Scott Mansfield wrote:

> Yes. Exact args:
> -p 11211 -u  -l 0.0.0.0 -c 10 -o slab_reassign -o 
> lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 56253
>
> On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, Dormando wrote:
>   Were lru_maintainer/lru_crawler/etc enabled though? even if slab mover 
> is
>   off, those two were the big changes in .24
>
>   On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
>   > The same cluster has > 400 servers happily running 1.4.24. It's been 
> our standard deployment for a while now, and we haven't seen any crashes. The 
> servers in the same cluster running 1.4.24 (with the same write load the new 
> build was taking) have been up for 29 days. The start options do not contain 
> the slab_automove option because it wasn't
>   effective for
>   > us before. The memory given is possibly slightly different per 
> server, as we calculate on startup how much we give. It's in the same 
> ballpark, though (~56 gigs).
>   >
>   > On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando wrote:
>   >       Just before I sit in and try to narrow this down: have you run 
> any host on
>   >       1.4.24 mainline with those same start options? just in case the 
> crash is
>   >       older
>   >
>   >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >
>   >       > Another message for you:
>   >       > [78098.528606] traps: memcached[2757] general protection 
> ip:412b9d sp:7fc0700dbdd0 error:0 in memcached[40+1d000]
>   >       >
>   >       >
>   >       > addr2line shows:
>   >       >
>   >       > $ addr2line -e memcached 412b9d
>   >       >
>   >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
>   >       >
>   >       >
>   >       >
>   >       > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando 
> wrote:
>   >       >       Ok, thanks!
>   >       >
>   >       >       I'll noodle this a bit... unfortunately a backtrace 
> might be more helpful.
>   >       >       will ask you to attempt to get one if I don't figure 
> anything out in time.
>   >       >
>   >       >       (allow it to core dump or attach a GDB session and set 
> an ignore handler
>   >       >       for sigpipe/int/etc and run "continue")
>   >       >
>   >       >       what were your full startup args, though?
>   >       >
>   >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >       >
>   >       >       > The commit was the latest in slab_rebal_next at the 
> time:
>   >       >       > 
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>   >       >       >
>   >       >       > addr2line gave me this output:
>   >       >       >
>   >       >       > $ addr2line -e memcached 0x40e007
>   >       >       >
>   >       >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>   >       >       >
>   >       >       >
>   >       >       > As well, this was running with production writes, but 
> not reads. Even if we had reads on with the few servers crashing, we're ok 
> architecturally. That's why I can get it out there without worrying too much. 
> For now, I'm going to turn it off. I had a metrics issue anyway that needs to 
> get fixed. Tomorrow I'm planning to test
>   again with
>   >       more
>   >       >       metrics, but I
>   >       >       > can get any new code in pretty quick.
>   >       >       >
>   >       >       >
>   >       >       > On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, 
> Dormando wrote:
>   >       >       >       How many servers were you running it on? I hope 
> it wasn't more than a
>   >       >       >       handful. I'd recommend starting with one :P
>   >       >       >
>   >       >       >       can you do an addr2line? what were your startup 
> args, and what was the
&g

Re: Check for orphaned items in lru crawler thread

2015-10-01 Thread dormando
Were lru_maintainer/lru_crawler/etc enabled though? even if slab mover is
off, those two were the big changes in .24

On Thu, 1 Oct 2015, Scott Mansfield wrote:

> The same cluster has > 400 servers happily running 1.4.24. It's been our 
> standard deployment for a while now, and we haven't seen any crashes. The 
> servers in the same cluster running 1.4.24 (with the same write load the new 
> build was taking) have been up for 29 days. The start options do not contain 
> the slab_automove option because it wasn't effective for
> us before. The memory given is possibly slightly different per server, as we 
> calculate on startup how much we give. It's in the same ballpark, though (~56 
> gigs).
>
> On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando wrote:
>   Just before I sit in and try to narrow this down: have you run any host 
> on
>   1.4.24 mainline with those same start options? just in case the crash is
>   older
>
>   On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
>   > Another message for you:
>   > [78098.528606] traps: memcached[2757] general protection ip:412b9d 
> sp:7fc0700dbdd0 error:0 in memcached[40+1d000]
>   >
>   >
>   > addr2line shows:
>   >
>   > $ addr2line -e memcached 412b9d
>   >
>   > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
>   >
>   >
>   >
>   > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando wrote:
>   >       Ok, thanks!
>   >
>   >       I'll noodle this a bit... unfortunately a backtrace might be 
> more helpful.
>   >       will ask you to attempt to get one if I don't figure anything 
> out in time.
>   >
>   >       (allow it to core dump or attach a GDB session and set an 
> ignore handler
>   >       for sigpipe/int/etc and run "continue")
>   >
>   >       what were your full startup args, though?
>   >
>   >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >
>   >       > The commit was the latest in slab_rebal_next at the time:
>   >       > 
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>   >       >
>   >       > addr2line gave me this output:
>   >       >
>   >       > $ addr2line -e memcached 0x40e007
>   >       >
>   >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>   >       >
>   >       >
>   >       > As well, this was running with production writes, but not 
> reads. Even if we had reads on with the few servers crashing, we're ok 
> architecturally. That's why I can get it out there without worrying too much. 
> For now, I'm going to turn it off. I had a metrics issue anyway that needs to 
> get fixed. Tomorrow I'm planning to test again with
>   more
>   >       metrics, but I
>   >       > can get any new code in pretty quick.
>   >       >
>   >       >
>   >       > On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, Dormando 
> wrote:
>   >       >       How many servers were you running it on? I hope it 
> wasn't more than a
>   >       >       handful. I'd recommend starting with one :P
>   >       >
>   >       >       can you do an addr2line? what were your startup args, 
> and what was the
>   >       >       commit sha1 for the branch you pulled?
>   >       >
>   >       >       sorry about that :/
>   >       >
>   >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >       >
>   >       >       > A few different servers (5 / 205) experienced a 
> segfault all within an hour or so. Unfortunately at this point I'm a bit out 
> of my depth. I have the dmesg output, which is identical for all 5 boxes:
>   >       >       >
>   >       >       > [46545.316351] memcached[2789]: segfault at 0 ip 
> 0040e007 sp 7f362ceedeb0 error 4 in memcached[40+1d000]
>   >       >       >
>   >       >       >
>   >       >       > I can possibly supply the binary file if needed, 
> though we didn't do anything besides the standard setup and compile.
>   >       >       >
>   >       >       >
>   >       >       >
>   >  

Re: Check for orphaned items in lru crawler thread

2015-10-01 Thread dormando
Any chance you could describe (perhaps privately?) in very broad strokes
what the write load looks like? (they're getting only writes, too?).
otherwise I'll have to devise arbitrary torture tests. I'm sure the bug's
in there but it's not obvious yet

On Thu, 1 Oct 2015, dormando wrote:

> perfect, thanks! I have $dayjob as well but will look into this as soon as
> I can. my torture test machines are in a box but I'll try to borrow one
>
> On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
> > Yes. Exact args:
> > -p 11211 -u  -l 0.0.0.0 -c 10 -o slab_reassign -o 
> > lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 56253
> >
> > On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, Dormando wrote:
> >   Were lru_maintainer/lru_crawler/etc enabled though? even if slab 
> > mover is
> >   off, those two were the big changes in .24
> >
> >   On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >
> >   > The same cluster has > 400 servers happily running 1.4.24. It's 
> > been our standard deployment for a while now, and we haven't seen any 
> > crashes. The servers in the same cluster running 1.4.24 (with the same 
> > write load the new build was taking) have been up for 29 days. The start 
> > options do not contain the slab_automove option because it wasn't
> >   effective for
> >   > us before. The memory given is possibly slightly different per 
> > server, as we calculate on startup how much we give. It's in the same 
> > ballpark, though (~56 gigs).
> >   >
> >   > On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando wrote:
> >   >       Just before I sit in and try to narrow this down: have you 
> > run any host on
> >   >       1.4.24 mainline with those same start options? just in case 
> > the crash is
> >   >       older
> >   >
> >   >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >   >
> >   >       > Another message for you:
> >   >       > [78098.528606] traps: memcached[2757] general protection 
> > ip:412b9d sp:7fc0700dbdd0 error:0 in memcached[40+1d000]
> >   >       >
> >   >       >
> >   >       > addr2line shows:
> >   >       >
> >   >       > $ addr2line -e memcached 412b9d
> >   >       >
> >   >       > 
> > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
> >   >       >
> >   >       >
> >   >       >
> >   >       > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, Dormando 
> > wrote:
> >   >       >       Ok, thanks!
> >   >       >
> >   >       >       I'll noodle this a bit... unfortunately a backtrace 
> > might be more helpful.
> >   >       >       will ask you to attempt to get one if I don't figure 
> > anything out in time.
> >   >       >
> >   >       >       (allow it to core dump or attach a GDB session and 
> > set an ignore handler
> >   >       >       for sigpipe/int/etc and run "continue")
> >   >       >
> >   >       >       what were your full startup args, though?
> >   >       >
> >   >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
> >   >       >
> >   >       >       > The commit was the latest in slab_rebal_next at the 
> > time:
> >   >       >       > 
> > https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
> >   >       >       >
> >   >       >       > addr2line gave me this output:
> >   >       >       >
> >   >       >       > $ addr2line -e memcached 0x40e007
> >   >       >       >
> >   >       >       > 
> > /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
> >   >       >       >
> >   >       >       >
> >   >       >       > As well, this was running with production writes, 
> > but not reads. Even if we had reads on with the few servers crashing, we're 
> > ok architecturally. That's why I can get it out there without worrying too 
> > much. For now, I'm going to turn it off. I had a metrics issue anyway that 
> > needs to get fixed. Tomorrow I'm planning to t

Re: Check for orphaned items in lru crawler thread

2015-10-01 Thread dormando
ok... slab class 12 claims to have 2 in "total_pages", yet 14g in
mem_requested. is this stat wrong?

On Thu, 1 Oct 2015, Scott Mansfield wrote:

> The ones that crashed (new code cluster) were set to only be written to from 
> the client applications. The data is an index key and a series of data keys 
> that are all written one after another. Each key might be hashed to a 
> different server, though, so not all of them are written to the same server. 
> I can give you a snapshot of one of the clusters that
> didn't crash (attached file). I can give more detail offline if you need it.
>
>
> On Thursday, October 1, 2015 at 2:32:53 PM UTC-7, Dormando wrote:
>   Any chance you could describe (perhaps privately?) in very broad strokes
>   what the write load looks like? (they're getting only writes, too?).
>   otherwise I'll have to devise arbitrary torture tests. I'm sure the 
> bug's
>   in there but it's not obvious yet
>
>   On Thu, 1 Oct 2015, dormando wrote:
>
>   > perfect, thanks! I have $dayjob as well but will look into this as 
> soon as
>   > I can. my torture test machines are in a box but I'll try to borrow 
> one
>   >
>   > On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >
>   > > Yes. Exact args:
>   > > -p 11211 -u  -l 0.0.0.0 -c 10 -o slab_reassign -o 
> lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 56253
>   > >
>   > > On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, Dormando wrote:
>   > >       Were lru_maintainer/lru_crawler/etc enabled though? even if 
> slab mover is
>   > >       off, those two were the big changes in .24
>   > >
>   > >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   > >
>   > >       > The same cluster has > 400 servers happily running 1.4.24. 
> It's been our standard deployment for a while now, and we haven't seen any 
> crashes. The servers in the same cluster running 1.4.24 (with the same write 
> load the new build was taking) have been up for 29 days. The start options do 
> not contain the slab_automove option because
>   it wasn't
>   > >       effective for
>   > >       > us before. The memory given is possibly slightly different 
> per server, as we calculate on startup how much we give. It's in the same 
> ballpark, though (~56 gigs).
>   > >       >
>   > >       > On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, Dormando 
> wrote:
>   > >       >       Just before I sit in and try to narrow this down: 
> have you run any host on
>   > >       >       1.4.24 mainline with those same start options? just 
> in case the crash is
>   > >       >       older
>   > >       >
>   > >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   > >       >
>   > >       >       > Another message for you:
>   > >       >       > [78098.528606] traps: memcached[2757] general 
> protection ip:412b9d sp:7fc0700dbdd0 error:0 in memcached[40+1d000]
>   > >       >       >
>   > >       >       >
>   > >       >       > addr2line shows:
>   > >       >       >
>   > >       >       > $ addr2line -e memcached 412b9d
>   > >       >       >
>   > >       >       > 
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/assoc.c:119
>   > >       >       >
>   > >       >       >
>   > >       >       >
>   > >       >       > On Thursday, October 1, 2015 at 1:41:44 AM UTC-7, 
> Dormando wrote:
>   > >       >       >       Ok, thanks!
>   > >       >       >
>   > >       >       >       I'll noodle this a bit... unfortunately a 
> backtrace might be more helpful.
>   > >       >       >       will ask you to attempt to get one if I don't 
> figure anything out in time.
>   > >       >       >
>   > >       >       >       (allow it to core dump or attach a GDB 
> session and set an ignore handler
>   > >       >       >       for sigpipe/int/etc and run "continue")
>   > >       >       >
>   > >       >       >       what were your full startup args, though?
>   > >       >       >
>

Re: Check for orphaned items in lru crawler thread

2015-10-01 Thread dormando
got it. that might be a decent hint actually... I had addded a bugfix to
the branch to not miscount the mem_requested counter, but it's not working
or I missed a spot.

On Thu, 1 Oct 2015, Scott Mansfield wrote:

> The number now, after maybe 90 minutes of writes, is 1,446. I think after 
> disabling a lot of the data TTL'd out. I have to disable it for now, again 
> (for unrelated reasons, again). The page that I screenshotted gives real time 
> data, so the numbers were from right then. Last night, it should have shown 
> better numbers in terms of "total_pages", but I didn't
> get a screenshot. That number is directly from the stats slabs output.
>
>
>
> On Thursday, October 1, 2015 at 4:21:42 PM UTC-7, Dormando wrote:
>   ok... slab class 12 claims to have 2 in "total_pages", yet 14g in
>   mem_requested. is this stat wrong?
>
>   On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
>   > The ones that crashed (new code cluster) were set to only be written 
> to from the client applications. The data is an index key and a series of 
> data keys that are all written one after another. Each key might be hashed to 
> a different server, though, so not all of them are written to the same 
> server. I can give you a snapshot of one of the
>   clusters that
>   > didn't crash (attached file). I can give more detail offline if you 
> need it.
>   >
>   >
>   > On Thursday, October 1, 2015 at 2:32:53 PM UTC-7, Dormando wrote:
>   >       Any chance you could describe (perhaps privately?) in very 
> broad strokes
>   >       what the write load looks like? (they're getting only writes, 
> too?).
>   >       otherwise I'll have to devise arbitrary torture tests. I'm sure 
> the bug's
>   >       in there but it's not obvious yet
>   >
>   >       On Thu, 1 Oct 2015, dormando wrote:
>   >
>   >       > perfect, thanks! I have $dayjob as well but will look into 
> this as soon as
>   >       > I can. my torture test machines are in a box but I'll try to 
> borrow one
>   >       >
>   >       > On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >       >
>   >       > > Yes. Exact args:
>   >       > > -p 11211 -u  -l 0.0.0.0 -c 10 -o slab_reassign 
> -o lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 56253
>   >       > >
>   >       > > On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, Dormando 
> wrote:
>   >       > >       Were lru_maintainer/lru_crawler/etc enabled though? 
> even if slab mover is
>   >       > >       off, those two were the big changes in .24
>   >       > >
>   >       > >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >       > >
>   >       > >       > The same cluster has > 400 servers happily running 
> 1.4.24. It's been our standard deployment for a while now, and we haven't 
> seen any crashes. The servers in the same cluster running 1.4.24 (with the 
> same write load the new build was taking) have been up for 29 days. The start 
> options do not contain the slab_automove option
>   because
>   >       it wasn't
>   >       > >       effective for
>   >       > >       > us before. The memory given is possibly slightly 
> different per server, as we calculate on startup how much we give. It's in 
> the same ballpark, though (~56 gigs).
>   >       > >       >
>   >       > >       > On Thursday, October 1, 2015 at 12:11:35 PM UTC-7, 
> Dormando wrote:
>   >       > >       >       Just before I sit in and try to narrow this 
> down: have you run any host on
>   >       > >       >       1.4.24 mainline with those same start 
> options? just in case the crash is
>   >       > >       >       older
>   >       > >       >
>   >       > >       >       On Thu, 1 Oct 2015, Scott Mansfield wrote:
>   >       > >       >
>   >       > >       >       > Another message for you:
>   >       > >       >       > [78098.528606] traps: memcached[2757] 
> general protection ip:412b9d sp:7fc0700dbdd0 error:0 in 
> memcached[40+1d000]
>   >       > >       >       >
>   >       > >       >       >
>   >       > >       >       > addr2line shows:
>   >       > >       >       >
>   >       > &g

Re: Check for orphaned items in lru crawler thread

2015-10-01 Thread dormando
Ok, thanks!

I'll noodle this a bit... unfortunately a backtrace might be more helpful.
will ask you to attempt to get one if I don't figure anything out in time.

(allow it to core dump or attach a GDB session and set an ignore handler
for sigpipe/int/etc and run "continue")

what were your full startup args, though?

On Thu, 1 Oct 2015, Scott Mansfield wrote:

> The commit was the latest in slab_rebal_next at the time:
> https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
>
> addr2line gave me this output:
>
> $ addr2line -e memcached 0x40e007
>
> /mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/slabs.c:264
>
>
> As well, this was running with production writes, but not reads. Even if we 
> had reads on with the few servers crashing, we're ok architecturally. That's 
> why I can get it out there without worrying too much. For now, I'm going to 
> turn it off. I had a metrics issue anyway that needs to get fixed. Tomorrow 
> I'm planning to test again with more metrics, but I
> can get any new code in pretty quick.
>
>
> On Thursday, October 1, 2015 at 1:01:36 AM UTC-7, Dormando wrote:
>   How many servers were you running it on? I hope it wasn't more than a
>   handful. I'd recommend starting with one :P
>
>   can you do an addr2line? what were your startup args, and what was the
>   commit sha1 for the branch you pulled?
>
>   sorry about that :/
>
>   On Thu, 1 Oct 2015, Scott Mansfield wrote:
>
>   > A few different servers (5 / 205) experienced a segfault all within 
> an hour or so. Unfortunately at this point I'm a bit out of my depth. I have 
> the dmesg output, which is identical for all 5 boxes:
>   >
>   > [46545.316351] memcached[2789]: segfault at 0 ip 0040e007 sp 
> 7f362ceedeb0 error 4 in memcached[40+1d000]
>   >
>   >
>   > I can possibly supply the binary file if needed, though we didn't do 
> anything besides the standard setup and compile.
>   >
>   >
>   >
>   > On Tuesday, September 29, 2015 at 10:27:59 PM UTC-7, Dormando wrote:
>   >       If you look at the new branch there's a commit explaining the 
> new stats.
>   >
>   >       You can watch slab_reassing_evictions vs slab_reassign_saves. 
> you can also
>   >       test automove=1 vs automove=2 (please also turn on the 
> lru_maintainer and
>   >       lru_crawler).
>   >
>   >       The initial branch you were running didn't add any new stats. 
> It just
>   >       restored an old feature.
>   >
>   >       On Tue, 29 Sep 2015, Scott Mansfield wrote:
>   >
>   >       > An unrelated prod problem meant I had to stop after about an 
> hour. I'm turning it on again tomorrow morning.
>   >       > Are there any new metrics I should be looking at? Anything 
> new in the stats output? I'm about to take a look at the diffs as well.
>   >       >
>   >       > On Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, Dormando 
> wrote:
>   >       >       excellent. if automove=2 is too aggressive you'll see 
> that come in in a
>   >       >       hit ratio reduction.
>   >       >
>   >       >       the new branch works with automove=2 as well, but it 
> will attempt to
>   >       >       rescue valid items in the old slab if possible. I'll 
> still be working on
>   >       >       it for another few hours today though. I'll mail again 
> when I'm done.
>   >       >
>   >       >       On Tue, 29 Sep 2015, Scott Mansfield wrote:
>   >       >
>   >       >       > I have the first commit (slab_automove=2) running in 
> prod right now. Later today will be a full load production test of the latest 
> code. I'll just let it run for a few days unless I spot any problems. We have 
> good metrics for latency et. al. from the client side, though network 
> normally dwarfs memcached time.
>   >       >       >
>   >       >       > On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, 
> Dormando wrote:
>   >       >       >       That's unfortunate.
>   >       >       >
>   >       >       >       I've done some more work on the branch:
>   >       >       >       https://github.com/memcached/memcached/pull/112
>   >       >       >
>   >       >       >       It's not completely likely you would see en

Re: Check for orphaned items in lru crawler thread

2015-09-29 Thread dormando
That's unfortunate.

I've done some more work on the branch:
https://github.com/memcached/memcached/pull/112

It's not completely likely you would see enough of an improvement from the
new default mode. However if your item sizes change gradually, items are
reclaimed during expiration, or get overwritten (and thus freed in the old
class), it should work just fine. I have another patch coming which should
help though.

Open to feedback from any interested party.

On Fri, 25 Sep 2015, Scott Mansfield wrote:

> I have it running internally, and it runs fine under normal load. It's 
> difficult to put it into the line of fire for a production workload because 
> of social reasons... As well it's a degenerate case that we normally don't 
> run in to (and actively try to avoid). I'm going to run some heavier load 
> tests on it today. 
>
> On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield wrote:
>   I'm working on getting a test going internally. I'll let you know how 
> it goes. 
>
>
> Scott Mansfield
> On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote:
>   Yo,
>
>   https://github.com/dormando/memcached/commits/slab_rebal_next - would 
> you
>   mind playing around with the branch here? You can see the start options 
> in
>   the test.
>
>   This is a dead simple modification (a restoration of a feature that was
>   arleady there...). The test very aggressively writes and is able to 
> shunt
>   memory around appropriately.
>
>   The work I'm exploring right now will allow savings of items being
>   rebalanced from, and increasing the aggression of page moving without
>   being so brain damaged about it.
>
>   But while I'm poking around with that, I'd be interested in knowing if
>   this simple branch is an improvement, and if so how much.
>
>   I'll push more code to the branch, but the changes should be gated 
> behind
>   a feature flag.
>
>   On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote:
>
>   >
>   > No worries man, you're doing us a favor. Let me know if there's 
> anything you need from us, and I promise I'll be quicker this time :)
>   >
>   > On Aug 18, 2015 12:01 AM, "dormando" <dorma...@rydia.net> wrote:
>   >       Hey,
>   >
>   >       I'm still really interested in working on this. I'll be taking 
> a careful
>   >       look soon I hope.
>   >
>   >       On Mon, 3 Aug 2015, Scott Mansfield wrote:
>   >
>   >       > I've tweaked the program slightly, so I'm adding a new 
> version. It prints more stats as it goes and runs a bit faster.
>   >       >
>   >       > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott 
> Mansfield wrote:
>   >       >       Total brain fart on my part. Apparently I had memcached 
> 1.4.13 on my path (who knows how...) Using the actual one that I've built 
> works. Sorry for the confusion... can't believe I didn't realize that before. 
> I'm testing against the compiled one now to see how it behaves.
>   >       >       On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando 
> wrote:
>   >       >             You sure that's 1.4.24? None of those fail for me 
> :(
>   >       >
>   >       >             On Mon, 3 Aug 2015, Scott Mansfield wrote:
>   >       >
>   >       >             > The command line I've used that will start is:
>   >       >             >
>   >       >             > memcached -m 64 -o slab_reassign,slab_automove
>   >       >             >
>   >       >             >
>   >       >             > the ones that fail are:
>   >       >             >
>   >       >             >
>   >       >             > memcached -m 64 -o 
> slab_reassign,slab_automove,lru_crawler,lru_maintainer
>   >       >             >
>   >       >             > memcached -o lru_crawler
>   >       >             >
>   >       >             >
>   >       >             > I'm sure I've missed something during compile, 
> though I just used ./configure and make.
>   >       >             >
>   >       >             >
>   >       >             > On Monday, August 3, 2015 at 12:22:33 AM UTC-7, 
> Scott Mansfield wrote:
>   >       >             >       I've attached a pretty simple program to 
> connect, fill a s

Re: Check for orphaned items in lru crawler thread

2015-09-29 Thread dormando
If you look at the new branch there's a commit explaining the new stats.

You can watch slab_reassing_evictions vs slab_reassign_saves. you can also
test automove=1 vs automove=2 (please also turn on the lru_maintainer and
lru_crawler).

The initial branch you were running didn't add any new stats. It just
restored an old feature.

On Tue, 29 Sep 2015, Scott Mansfield wrote:

> An unrelated prod problem meant I had to stop after about an hour. I'm 
> turning it on again tomorrow morning.
> Are there any new metrics I should be looking at? Anything new in the stats 
> output? I'm about to take a look at the diffs as well.
>
> On Tuesday, September 29, 2015 at 12:37:45 PM UTC-7, Dormando wrote:
>   excellent. if automove=2 is too aggressive you'll see that come in in a
>   hit ratio reduction.
>
>   the new branch works with automove=2 as well, but it will attempt to
>   rescue valid items in the old slab if possible. I'll still be working on
>   it for another few hours today though. I'll mail again when I'm done.
>
>   On Tue, 29 Sep 2015, Scott Mansfield wrote:
>
>   > I have the first commit (slab_automove=2) running in prod right now. 
> Later today will be a full load production test of the latest code. I'll just 
> let it run for a few days unless I spot any problems. We have good metrics 
> for latency et. al. from the client side, though network normally dwarfs 
> memcached time.
>   >
>   > On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, Dormando wrote:
>   >       That's unfortunate.
>   >
>   >       I've done some more work on the branch:
>   >       https://github.com/memcached/memcached/pull/112
>   >
>   >       It's not completely likely you would see enough of an 
> improvement from the
>   >       new default mode. However if your item sizes change gradually, 
> items are
>   >       reclaimed during expiration, or get overwritten (and thus freed 
> in the old
>   >       class), it should work just fine. I have another patch coming 
> which should
>   >       help though.
>   >
>   >       Open to feedback from any interested party.
>   >
>   >       On Fri, 25 Sep 2015, Scott Mansfield wrote:
>   >
>   >       > I have it running internally, and it runs fine under normal 
> load. It's difficult to put it into the line of fire for a production 
> workload because of social reasons... As well it's a degenerate case that we 
> normally don't run in to (and actively try to avoid). I'm going to run some 
> heavier load tests on it today. 
>   >       >
>   >       > On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott 
> Mansfield wrote:
>   >       >       I'm working on getting a test going internally. I'll 
> let you know how it goes. 
>   >       >
>   >       >
>   >       > Scott Mansfield
>   >       > On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote:
>   >       >       Yo,
>   >       >
>   >       >       
> https://github.com/dormando/memcached/commits/slab_rebal_next - would you
>   >       >       mind playing around with the branch here? You can see 
> the start options in
>   >       >       the test.
>   >       >
>   >       >       This is a dead simple modification (a restoration of a 
> feature that was
>   >       >       arleady there...). The test very aggressively writes 
> and is able to shunt
>   >       >       memory around appropriately.
>   >       >
>   >       >       The work I'm exploring right now will allow savings of 
> items being
>   >       >       rebalanced from, and increasing the aggression of page 
> moving without
>   >       >       being so brain damaged about it.
>   >       >
>   >       >       But while I'm poking around with that, I'd be 
> interested in knowing if
>   >       >       this simple branch is an improvement, and if so how 
> much.
>   >       >
>   >       >       I'll push more code to the branch, but the changes 
> should be gated behind
>   >       >       a feature flag.
>   >       >
>   >       >       On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached 
> wrote:
>   >       >
>   >       >       >
>   >       >       > No worries man, you're doing us a favor. Let me know 
> if there's anything you need from us, and I promise I'll be quicker this time 
> :)
>   >  

Re: Check for orphaned items in lru crawler thread

2015-09-29 Thread dormando
excellent. if automove=2 is too aggressive you'll see that come in in a
hit ratio reduction.

the new branch works with automove=2 as well, but it will attempt to
rescue valid items in the old slab if possible. I'll still be working on
it for another few hours today though. I'll mail again when I'm done.

On Tue, 29 Sep 2015, Scott Mansfield wrote:

> I have the first commit (slab_automove=2) running in prod right now. Later 
> today will be a full load production test of the latest code. I'll just let 
> it run for a few days unless I spot any problems. We have good metrics for 
> latency et. al. from the client side, though network normally dwarfs 
> memcached time.
>
> On Tuesday, September 29, 2015 at 3:10:03 AM UTC-7, Dormando wrote:
>   That's unfortunate.
>
>   I've done some more work on the branch:
>   https://github.com/memcached/memcached/pull/112
>
>   It's not completely likely you would see enough of an improvement from 
> the
>   new default mode. However if your item sizes change gradually, items are
>   reclaimed during expiration, or get overwritten (and thus freed in the 
> old
>   class), it should work just fine. I have another patch coming which 
> should
>   help though.
>
>   Open to feedback from any interested party.
>
>   On Fri, 25 Sep 2015, Scott Mansfield wrote:
>
>   > I have it running internally, and it runs fine under normal load. 
> It's difficult to put it into the line of fire for a production workload 
> because of social reasons... As well it's a degenerate case that we normally 
> don't run in to (and actively try to avoid). I'm going to run some heavier 
> load tests on it today. 
>   >
>   > On Wednesday, September 9, 2015 at 10:23:32 AM UTC-7, Scott Mansfield 
> wrote:
>   >       I'm working on getting a test going internally. I'll let you 
> know how it goes. 
>   >
>   >
>   > Scott Mansfield
>   > On Mon, Sep 7, 2015 at 2:33 PM, dormando wrote:
>   >       Yo,
>   >
>   >       https://github.com/dormando/memcached/commits/slab_rebal_next - 
> would you
>   >       mind playing around with the branch here? You can see the start 
> options in
>   >       the test.
>   >
>   >       This is a dead simple modification (a restoration of a feature 
> that was
>   >       arleady there...). The test very aggressively writes and is 
> able to shunt
>   >       memory around appropriately.
>   >
>   >       The work I'm exploring right now will allow savings of items 
> being
>   >       rebalanced from, and increasing the aggression of page moving 
> without
>   >       being so brain damaged about it.
>   >
>   >       But while I'm poking around with that, I'd be interested in 
> knowing if
>   >       this simple branch is an improvement, and if so how much.
>   >
>   >       I'll push more code to the branch, but the changes should be 
> gated behind
>   >       a feature flag.
>   >
>   >       On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote:
>   >
>   >       >
>   >       > No worries man, you're doing us a favor. Let me know if 
> there's anything you need from us, and I promise I'll be quicker this time :)
>   >       >
>   >       > On Aug 18, 2015 12:01 AM, "dormando" <dorm...@rydia.net> 
> wrote:
>   >       >       Hey,
>   >       >
>   >       >       I'm still really interested in working on this. I'll be 
> taking a careful
>   >       >       look soon I hope.
>   >       >
>   >       >       On Mon, 3 Aug 2015, Scott Mansfield wrote:
>   >       >
>   >       >       > I've tweaked the program slightly, so I'm adding a 
> new version. It prints more stats as it goes and runs a bit faster.
>   >       >       >
>   >       >       > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott 
> Mansfield wrote:
>   >       >       >       Total brain fart on my part. Apparently I had 
> memcached 1.4.13 on my path (who knows how...) Using the actual one that I've 
> built works. Sorry for the confusion... can't believe I didn't realize that 
> before. I'm testing against the compiled one now to see how it behaves.
>   >       >       >       On Monday, August 3, 2015 at 1:15:06 AM UTC-7, 
> Dormando wrote:
>   >       >       >             You sure that's 1.4.24? None of thos

Re: Check for orphaned items in lru crawler thread

2015-09-07 Thread dormando
Yo,

https://github.com/dormando/memcached/commits/slab_rebal_next - would you
mind playing around with the branch here? You can see the start options in
the test.

This is a dead simple modification (a restoration of a feature that was
arleady there...). The test very aggressively writes and is able to shunt
memory around appropriately.

The work I'm exploring right now will allow savings of items being
rebalanced from, and increasing the aggression of page moving without
being so brain damaged about it.

But while I'm poking around with that, I'd be interested in knowing if
this simple branch is an improvement, and if so how much.

I'll push more code to the branch, but the changes should be gated behind
a feature flag.

On Tue, 18 Aug 2015, 'Scott Mansfield' via memcached wrote:

>
> No worries man, you're doing us a favor. Let me know if there's anything you 
> need from us, and I promise I'll be quicker this time :)
>
> On Aug 18, 2015 12:01 AM, "dormando" <dorma...@rydia.net> wrote:
>   Hey,
>
>   I'm still really interested in working on this. I'll be taking a careful
>   look soon I hope.
>
>   On Mon, 3 Aug 2015, Scott Mansfield wrote:
>
>   > I've tweaked the program slightly, so I'm adding a new version. It 
> prints more stats as it goes and runs a bit faster.
>   >
>   > On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote:
>   >       Total brain fart on my part. Apparently I had memcached 1.4.13 
> on my path (who knows how...) Using the actual one that I've built works. 
> Sorry for the confusion... can't believe I didn't realize that before. I'm 
> testing against the compiled one now to see how it behaves.
>   >       On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote:
>   >             You sure that's 1.4.24? None of those fail for me :(
>   >
>   >             On Mon, 3 Aug 2015, Scott Mansfield wrote:
>   >
>   >             > The command line I've used that will start is:
>   >             >
>   >             > memcached -m 64 -o slab_reassign,slab_automove
>   >             >
>   >             >
>   >             > the ones that fail are:
>   >             >
>   >             >
>   >             > memcached -m 64 -o 
> slab_reassign,slab_automove,lru_crawler,lru_maintainer
>   >             >
>   >             > memcached -o lru_crawler
>   >             >
>   >             >
>   >             > I'm sure I've missed something during compile, though I 
> just used ./configure and make.
>   >             >
>   >             >
>   >             > On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott 
> Mansfield wrote:
>   >             >       I've attached a pretty simple program to connect, 
> fill a slab with data, and then fill another slab slowly with data of a 
> different size. I've been trying to get memcached to run with the lru_crawler 
> and lru_maintainer flags, but I get '
>   >             >
>   >             >       Illegal suboption "(null)"' every time I try to 
> start with either in any configuration.
>   >             >
>   >             >
>   >             >       I haven't seen it start to move slabs 
> automatically with a freshly installed 1.2.24.
>   >             >
>   >             >
>   >             >       On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, 
> Scott Mansfield wrote:
>   >             >             I realize I've not given you the tests to 
> reproduce the behavior. I should be able to soon. Sorry about the delay here.
>   >             > In the mean time, I wanted to bring up a possible 
> secondary use of the same logic to move items on slab rebalancing. I think 
> the system might benefit from using the same logic to crawl the pages in a 
> slab and compact the data in the background. In the case where we have memory 
> that is assigned to the slab but not being used
>   because
>   >             of replaced
>   >             > or TTL'd out data, returning the memory to a pool of 
> free memory will allow a slab to grow with that memory first instead of 
> waiting for an event where memory is needed at that instant.
>   >             >
>   >             > It's a change in approach, from reactive to proactive. 
> What do you think?
>   >             >
>   >             > On Monday, July 13, 2015 at 5:54:

Re: Check for orphaned items in lru crawler thread

2015-08-18 Thread dormando
Hey,

I'm still really interested in working on this. I'll be taking a careful
look soon I hope.

On Mon, 3 Aug 2015, Scott Mansfield wrote:

 I've tweaked the program slightly, so I'm adding a new version. It prints 
 more stats as it goes and runs a bit faster.

 On Monday, August 3, 2015 at 1:20:37 AM UTC-7, Scott Mansfield wrote:
   Total brain fart on my part. Apparently I had memcached 1.4.13 on my 
 path (who knows how...) Using the actual one that I've built works. Sorry for 
 the confusion... can't believe I didn't realize that before. I'm testing 
 against the compiled one now to see how it behaves.
   On Monday, August 3, 2015 at 1:15:06 AM UTC-7, Dormando wrote:
 You sure that's 1.4.24? None of those fail for me :(

 On Mon, 3 Aug 2015, Scott Mansfield wrote:

  The command line I've used that will start is:
 
  memcached -m 64 -o slab_reassign,slab_automove
 
 
  the ones that fail are:
 
 
  memcached -m 64 -o 
 slab_reassign,slab_automove,lru_crawler,lru_maintainer
 
  memcached -o lru_crawler
 
 
  I'm sure I've missed something during compile, though I just 
 used ./configure and make.
 
 
  On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield 
 wrote:
        I've attached a pretty simple program to connect, fill a 
 slab with data, and then fill another slab slowly with data of a different 
 size. I've been trying to get memcached to run with the lru_crawler and 
 lru_maintainer flags, but I get '
 
        Illegal suboption (null)' every time I try to start 
 with either in any configuration.
 
 
        I haven't seen it start to move slabs automatically with 
 a freshly installed 1.2.24.
 
 
        On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott 
 Mansfield wrote:
              I realize I've not given you the tests to reproduce 
 the behavior. I should be able to soon. Sorry about the delay here.
  In the mean time, I wanted to bring up a possible secondary use 
 of the same logic to move items on slab rebalancing. I think the system might 
 benefit from using the same logic to crawl the pages in a slab and compact 
 the data in the background. In the case where we have memory that is assigned 
 to the slab but not being used because
 of replaced
  or TTL'd out data, returning the memory to a pool of free 
 memory will allow a slab to grow with that memory first instead of waiting 
 for an event where memory is needed at that instant.
 
  It's a change in approach, from reactive to proactive. What do 
 you think?
 
  On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote:
         First, more detail for you:
        
         We are running 1.4.24 in production and haven't noticed 
 any bugs as of yet. The new LRUs seem to be working well, though we nearly 
 always run memcached scaled to hold all data without evictions. Those with 
 evictions are behaving well. Those without evictions haven't seen crashing or 
 any other noticeable bad behavior.
 
        Neat.
 
        
         OK, I think I see an area where I was speculating on 
 functionality. If you have a key in slab 21 and then the same key is written 
 again at a larger size in slab 23 I assumed that the space in 21 was not 
 freed on the second write. With that assumption, the LRU crawler would not 
 free up that space. Also just by observation in
 the
        macro, the space is not freed
         fast enough to be effective, in our use case, to accept 
 the writes that are happening. Think in the hundreds of millions of 
 overwrites in a 6 - 10 hour period across a cluster.
 
        Internally, items (a key/value pair) are generally 
 immutable. The only
        time when it's not is for INCR/DECR, and it still becomes 
 immutable if two
        INCR/DECR's collide.
 
        What this means, is that the new item is staged in a 
 piece of free memory
        while the upload stage of the SET happens. When 
 memcached has all of the
        data in memory to replace the item, it does an internal 
 swap under a lock.
        The old item is removed from the hash table and LRU, and 
 the new item gets
        put in its place (at the head of the LRU).
 
        Since items are refcounted, this means that if other 
 users are downloading
        an item which just got replaced

Re: Check for orphaned items in lru crawler thread

2015-08-03 Thread dormando
What are your startup args?

On Mon, 3 Aug 2015, Scott Mansfield wrote:

 I've attached a pretty simple program to connect, fill a slab with data, and 
 then fill another slab slowly with data of a different size. I've been trying 
 to get memcached to run with the lru_crawler and lru_maintainer flags, but I 
 get '

 Illegal suboption (null)' every time I try to start with either in any 
 configuration.


 I haven't seen it start to move slabs automatically with a freshly installed 
 1.2.24.


 On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote:
   I realize I've not given you the tests to reproduce the behavior. I 
 should be able to soon. Sorry about the delay here.
 In the mean time, I wanted to bring up a possible secondary use of the same 
 logic to move items on slab rebalancing. I think the system might benefit 
 from using the same logic to crawl the pages in a slab and compact the data 
 in the background. In the case where we have memory that is assigned to the 
 slab but not being used because of replaced or
 TTL'd out data, returning the memory to a pool of free memory will allow a 
 slab to grow with that memory first instead of waiting for an event where 
 memory is needed at that instant.

 It's a change in approach, from reactive to proactive. What do you think?

 On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote:
First, more detail for you:
   
We are running 1.4.24 in production and haven't noticed any bugs as 
 of yet. The new LRUs seem to be working well, though we nearly always run 
 memcached scaled to hold all data without evictions. Those with evictions are 
 behaving well. Those without evictions haven't seen crashing or any other 
 noticeable bad behavior.

   Neat.

   
OK, I think I see an area where I was speculating on functionality. 
 If you have a key in slab 21 and then the same key is written again at a 
 larger size in slab 23 I assumed that the space in 21 was not freed on the 
 second write. With that assumption, the LRU crawler would not free up that 
 space. Also just by observation in the macro,
   the space is not freed
fast enough to be effective, in our use case, to accept the writes 
 that are happening. Think in the hundreds of millions of overwrites in a 6 
 - 10 hour period across a cluster.

   Internally, items (a key/value pair) are generally immutable. The only
   time when it's not is for INCR/DECR, and it still becomes immutable if 
 two
   INCR/DECR's collide.

   What this means, is that the new item is staged in a piece of free 
 memory
   while the upload stage of the SET happens. When memcached has all of 
 the
   data in memory to replace the item, it does an internal swap under a 
 lock.
   The old item is removed from the hash table and LRU, and the new item 
 gets
   put in its place (at the head of the LRU).

   Since items are refcounted, this means that if other users are 
 downloading
   an item which just got replaced, their memory doesn't get corrupted by 
 the
   item changing out from underneath them. They can continue to read the 
 old
   item until they're done. When the refcount reaches zero the old memory 
 is
   reclaimed.

   Most of the time, the item replacement happens then the old memory is
   immediately removed.

   However, this does mean that you need *one* piece of free memory to
   replace the old one. Then the old memory gets freed after that set.

   So if you take a memcached instance with 0 free chunks, and do a rolling
   replacement of all items (within the same slab class as before), the 
 first
   one would cause an eviction from the tail of the LRU to get a free 
 chunk.
   Every SET after that would use the chunk freed from the replacement of 
 the
   previous memory.

After that last sentence I realized I also may not have explained 
 well enough the access pattern. The keys are all overwritten every day, but 
 it takes some time to write them all (obviously). We see a huge increase in 
 the bytes metric as if the new data for the old keys was being written for 
 the first time. Since the old slab for the
   same key doesn't
proactively release memory, it starts to fill up the cache and then 
 start evicting data in the new slab. Once that happens, we see evictions in 
 the old slab because of the algorithm you mentioned (random picking / freeing 
 of memory). Typically we don't see any use for upgrading an item as the new 
 data would be entirely new and should
   wholesale replace the
old data for that key. More specifically, the operation is always 
 set, with different data each day.

   Right. Most of your problems will come from two areas. One being that
   writing data aggressively into the new slab class (unless you set the
   rebalancer to always-replace mode), the mover will make memory

Re: Check for orphaned items in lru crawler thread

2015-08-03 Thread dormando
You sure that's 1.4.24? None of those fail for me :(

On Mon, 3 Aug 2015, Scott Mansfield wrote:

 The command line I've used that will start is:

 memcached -m 64 -o slab_reassign,slab_automove


 the ones that fail are:


 memcached -m 64 -o slab_reassign,slab_automove,lru_crawler,lru_maintainer

 memcached -o lru_crawler


 I'm sure I've missed something during compile, though I just used ./configure 
 and make.


 On Monday, August 3, 2015 at 12:22:33 AM UTC-7, Scott Mansfield wrote:
   I've attached a pretty simple program to connect, fill a slab with 
 data, and then fill another slab slowly with data of a different size. I've 
 been trying to get memcached to run with the lru_crawler and lru_maintainer 
 flags, but I get '

   Illegal suboption (null)' every time I try to start with either in 
 any configuration.


   I haven't seen it start to move slabs automatically with a freshly 
 installed 1.2.24.


   On Tuesday, July 21, 2015 at 4:55:17 PM UTC-7, Scott Mansfield wrote:
 I realize I've not given you the tests to reproduce the behavior. 
 I should be able to soon. Sorry about the delay here.
 In the mean time, I wanted to bring up a possible secondary use of the same 
 logic to move items on slab rebalancing. I think the system might benefit 
 from using the same logic to crawl the pages in a slab and compact the data 
 in the background. In the case where we have memory that is assigned to the 
 slab but not being used because of replaced
 or TTL'd out data, returning the memory to a pool of free memory will allow a 
 slab to grow with that memory first instead of waiting for an event where 
 memory is needed at that instant.

 It's a change in approach, from reactive to proactive. What do you think?

 On Monday, July 13, 2015 at 5:54:11 PM UTC-7, Dormando wrote:
First, more detail for you:
   
We are running 1.4.24 in production and haven't noticed any bugs as 
 of yet. The new LRUs seem to be working well, though we nearly always run 
 memcached scaled to hold all data without evictions. Those with evictions are 
 behaving well. Those without evictions haven't seen crashing or any other 
 noticeable bad behavior.

   Neat.

   
OK, I think I see an area where I was speculating on functionality. 
 If you have a key in slab 21 and then the same key is written again at a 
 larger size in slab 23 I assumed that the space in 21 was not freed on the 
 second write. With that assumption, the LRU crawler would not free up that 
 space. Also just by observation in the
   macro, the space is not freed
fast enough to be effective, in our use case, to accept the writes 
 that are happening. Think in the hundreds of millions of overwrites in a 6 
 - 10 hour period across a cluster.

   Internally, items (a key/value pair) are generally immutable. The only
   time when it's not is for INCR/DECR, and it still becomes immutable if 
 two
   INCR/DECR's collide.

   What this means, is that the new item is staged in a piece of free 
 memory
   while the upload stage of the SET happens. When memcached has all of 
 the
   data in memory to replace the item, it does an internal swap under a 
 lock.
   The old item is removed from the hash table and LRU, and the new item 
 gets
   put in its place (at the head of the LRU).

   Since items are refcounted, this means that if other users are 
 downloading
   an item which just got replaced, their memory doesn't get corrupted by 
 the
   item changing out from underneath them. They can continue to read the 
 old
   item until they're done. When the refcount reaches zero the old memory 
 is
   reclaimed.

   Most of the time, the item replacement happens then the old memory is
   immediately removed.

   However, this does mean that you need *one* piece of free memory to
   replace the old one. Then the old memory gets freed after that set.

   So if you take a memcached instance with 0 free chunks, and do a rolling
   replacement of all items (within the same slab class as before), the 
 first
   one would cause an eviction from the tail of the LRU to get a free 
 chunk.
   Every SET after that would use the chunk freed from the replacement of 
 the
   previous memory.

After that last sentence I realized I also may not have explained 
 well enough the access pattern. The keys are all overwritten every day, but 
 it takes some time to write them all (obviously). We see a huge increase in 
 the bytes metric as if the new data for the old keys was being written for 
 the first time. Since the old slab for
   the same key doesn't
proactively release memory, it starts to fill up the cache and then 
 start evicting data in the new slab. Once that happens, we see evictions in 
 the old slab because of the algorithm you mentioned (random picking / freeing 
 of memory). Typically we don't see

Re: Does memcached releases free pages?

2015-07-17 Thread Dormando
correct, it does not free the pages

 On Jul 17, 2015, at 1:48 AM, Denis Tataurov sineed...@gmail.com wrote:
 
 Hi! I use memcached in my project and I want to know will memcached free 
 their pages or not.
 A page has 1 Mb size. Here is the statistics of the biggest slabs in my 
 memcached:
 
 STAT 5:chunk_size 240
 STAT 5:chunks_per_page 4369
 STAT 5:total_pages 730
 STAT 5:total_chunks 3189370
 STAT 5:used_chunks 1992035
 STAT 5:free_chunks 1197335
 STAT 5:free_chunks_end 0
 STAT 5:mem_requested 456248432
 STAT 5:get_hits 21520936
 STAT 5:cmd_set 4623788
 STAT 5:delete_hits 0
 STAT 5:incr_hits 0
 STAT 5:decr_hits 0
 STAT 5:cas_hits 0
 STAT 5:cas_badval 0
 STAT 5:touch_hits 0
 
 In this slab I see that there is about 35% of free chunks. I can predict that 
 there are some pages which don't have used chunks so they are totally free.
 The question is: will memcached release these pages and other slabs can use 
 them when lacking of free chunks?
 I suppose that it doesn't release pages but I want to make sure that it's 
 true.
 -- 
 
 --- 
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
memcached group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Check for orphaned items in lru crawler thread

2015-07-13 Thread dormando
, Dormando wrote:
   Hey,

   On Fri, 10 Jul 2015, Scott Mansfield wrote:

We've seen issues recently where we run a cluster that typically has 
 the majority of items overwritten in the same slab every day and a sudden 
 change in data size evicts a ton of data, affecting downstream systems. To be 
 clear that is our problem, but I think there's a tweak in memcached that 
 might be useful and another possible feature that
   would be even
better.
The data that is written to this cache is overwritten every day, 
 though the TTL is 7 days. One slab takes up the majority of the space in the 
 cache. The application wrote e.g. 10KB (slab 21) every day for each key 
 consistently. One day, a change occurred where it started writing 15KB (slab 
 23), causing a migration of data from one slab to
   another. We had -o
slab_reassign,slab_automove=1 set on the server, causing large 
 numbers of evictions on the initial slab. Let's say the cache could hold the 
 data at 15KB per key, but the old data was not technically TTL'd out in it's 
 old slab. This means that memory was not being freed by the lru crawler 
 thread (I think) because its expiry had not come
   around. 
   
lines 1199 and 1200 in items.c:
if ((search-exptime != 0  search-exptime  current_time) || 
 is_flushed(search)) {
   
If there was a check to see if this data was orphaned, i.e. that 
 the key, if accessed, would map to a different slab than the current one, 
 then these orphans could be reclaimed as free memory. I am working on a patch 
 to do this, though I have reservations about performing a hash on the key on 
 the lru crawler thread (if the hash is not
   already available).
I have very little experience in the memcached codebase so I don't 
 know the most efficient way to do this. Any help would be appreciated.

   There seems to be a misconception about how the slab classes work. A 
 key,
   if already existing in a slab, will always map to the slab class it
   currently fits into. The slab classes always exist, but the amount of
   memory reserved for each of them will shift with the slab_reassign. ie: 
 10
   pages in slab class 21, then memory pressure on 23 causes it to move 
 over.

   So if you examine a key that still exists in slab class 21, it has no
   reason to move up or down the slab classes.

Alternatively, and possibly more beneficial is compaction of data in 
 a slab using the same set of criteria as lru crawling. Understandably, 
 compaction is a very difficult problem to solve since moving the data would 
 be a pain in the ass. I saw a couple of discussions about this in the mailing 
 list, though I didn't see any firm thoughts about
   it. I think it
can probably be done in O(1) like the lru crawler by limiting the 
 number of items it touches each time. Writing and reading are doable in O(1) 
 so moving should be as well. Has anyone given more thought on compaction?

   I'd be interested in hacking this up for you folks if you can provide me
   testing and some data to work with. With all of the LRU work I did in
   1.4.24, the next things I wanted to do is a big improvement on the slab
   reassignment code.

   Currently it picks essentially a random slab page, empties it, and moves
   the slab page into the class under pressure.

   One thing we can do is first examine for free memory in the existing 
 slab,
   IE:

   - Take a page from slab 21
   - Scan the page for valid items which need to be moved
   - Pull free memory from slab 21, migrate the item (moderately 
 complicated)
   - When the page is empty, move it (or give up if you run out of free
   chunks).

   The next step is to pull from the LRU on slab 21:

   - Take page from slab 21
   - Scan page for valid items
   - Pull free memory from slab 21, migrate the item
     - If no memory free, evict tail of slab 21. use that chunk.
   - When the page is empty, move it.

   Then, when you hit this condition your least-recently-used data gets
   culled as new data migrates your page class. This should match a natural
   occurrance if you would already be evicting valid (but old) items to 
 make
   room for new items.

   A bonus to using the free memory trick, is that I can use the amount of
   free space in a slab class as a heuristic to more quickly move slab 
 pages
   around.

   If it's still necessary from there, we can explore upgrading items to 
 a
   new slab class, but that is much much more complicated since the item 
 has
   to shift LRU's. Do you put it at the head, the tail, the middle, etc? It
   might be impossible to make a good generic decision there.

   What version are you currently on? If 1.4.24, have you seen any
   instability? I'm currently torn between fighting a few

Re: Check for orphaned items in lru crawler thread

2015-07-11 Thread dormando
Hey,

On Fri, 10 Jul 2015, Scott Mansfield wrote:

 We've seen issues recently where we run a cluster that typically has the 
 majority of items overwritten in the same slab every day and a sudden change 
 in data size evicts a ton of data, affecting downstream systems. To be clear 
 that is our problem, but I think there's a tweak in memcached that might be 
 useful and another possible feature that would be even
 better.
 The data that is written to this cache is overwritten every day, though the 
 TTL is 7 days. One slab takes up the majority of the space in the cache. The 
 application wrote e.g. 10KB (slab 21) every day for each key consistently. 
 One day, a change occurred where it started writing 15KB (slab 23), causing a 
 migration of data from one slab to another. We had -o
 slab_reassign,slab_automove=1 set on the server, causing large numbers of 
 evictions on the initial slab. Let's say the cache could hold the data at 
 15KB per key, but the old data was not technically TTL'd out in it's old 
 slab. This means that memory was not being freed by the lru crawler thread (I 
 think) because its expiry had not come around. 

 lines 1199 and 1200 in items.c:
 if ((search-exptime != 0  search-exptime  current_time) || 
 is_flushed(search)) {

 If there was a check to see if this data was orphaned, i.e. that the key, 
 if accessed, would map to a different slab than the current one, then these 
 orphans could be reclaimed as free memory. I am working on a patch to do 
 this, though I have reservations about performing a hash on the key on the 
 lru crawler thread (if the hash is not already available).
 I have very little experience in the memcached codebase so I don't know the 
 most efficient way to do this. Any help would be appreciated.

There seems to be a misconception about how the slab classes work. A key,
if already existing in a slab, will always map to the slab class it
currently fits into. The slab classes always exist, but the amount of
memory reserved for each of them will shift with the slab_reassign. ie: 10
pages in slab class 21, then memory pressure on 23 causes it to move over.

So if you examine a key that still exists in slab class 21, it has no
reason to move up or down the slab classes.

 Alternatively, and possibly more beneficial is compaction of data in a slab 
 using the same set of criteria as lru crawling. Understandably, compaction is 
 a very difficult problem to solve since moving the data would be a pain in 
 the ass. I saw a couple of discussions about this in the mailing list, though 
 I didn't see any firm thoughts about it. I think it
 can probably be done in O(1) like the lru crawler by limiting the number of 
 items it touches each time. Writing and reading are doable in O(1) so moving 
 should be as well. Has anyone given more thought on compaction?

I'd be interested in hacking this up for you folks if you can provide me
testing and some data to work with. With all of the LRU work I did in
1.4.24, the next things I wanted to do is a big improvement on the slab
reassignment code.

Currently it picks essentially a random slab page, empties it, and moves
the slab page into the class under pressure.

One thing we can do is first examine for free memory in the existing slab,
IE:

- Take a page from slab 21
- Scan the page for valid items which need to be moved
- Pull free memory from slab 21, migrate the item (moderately complicated)
- When the page is empty, move it (or give up if you run out of free
chunks).

The next step is to pull from the LRU on slab 21:

- Take page from slab 21
- Scan page for valid items
- Pull free memory from slab 21, migrate the item
  - If no memory free, evict tail of slab 21. use that chunk.
- When the page is empty, move it.

Then, when you hit this condition your least-recently-used data gets
culled as new data migrates your page class. This should match a natural
occurrance if you would already be evicting valid (but old) items to make
room for new items.

A bonus to using the free memory trick, is that I can use the amount of
free space in a slab class as a heuristic to more quickly move slab pages
around.

If it's still necessary from there, we can explore upgrading items to a
new slab class, but that is much much more complicated since the item has
to shift LRU's. Do you put it at the head, the tail, the middle, etc? It
might be impossible to make a good generic decision there.

What version are you currently on? If 1.4.24, have you seen any
instability? I'm currently torn between fighting a few bugs and start on
improving the slab rebalancer.

-Dormando

Re: what could change between .13 and .21 that caused perf degradation?

2015-07-07 Thread dormando
What happens with more than one connection? A lot of things changed to
increase the scalability of it but single thread might be meh.

What're the memcached start args? What exactly is that test doing?
gets/sets? Is .24 any better?

On Tue, 7 Jul 2015, Denis Samoylov wrote:

 hi,
 Does anybody know what can change between .13 ( STAT version 1.4.13, STAT 
 libevent 1.4.13-stable) and  .21 (STAT version 1.4.21 STAT libevent 
 2.0.21-stable)?

 memcached became about 15-20% slower.  We discovered this in production but 
 there are too many moving parts. So I used twemperf 
 (https://github.com/twitter/twemperf) to measure on dev server


 result for .13 (repeated many times)

 [dsamoylov.dev twemperf (master)]$ src/mcperf --linger=0 --call-rate=0 
 --num-calls=10 --conn-rate=0 --num-conns=1 --sizes=d1

 Total: connections 1 requests 10 responses 10 test-duration 9.731 s 
 (10.160 s, 9.981 s)

 Connection rate: 0.1 conn/s (9731.1 ms/conn = 1 concurrent connections)
 Connection time [ms]: avg 9731.1 min 9731.1 max 9731.1 stddev 0.00
 Connect time [ms]: avg 0.2 min 0.2 max 0.2 stddev 0.00

 Request rate: 10276.3 req/s (0.1 ms/req)
 Request size [B]: avg 28.0 min 28.0 max 28.0 stddev 0.00

 Response rate: 10276.3 rsp/s (0.1 ms/rsp)
 Response size [B]: avg 8.0 min 8.0 max 8.0 stddev 0.00
 Response time [ms]: avg 0.1 min 0.0 max 18.0 stddev 0.00
 Response time [ms]: p25 1.0 p50 1.0 p75 1.0
 Response time [ms]: p95 1.0 p99 1.0 p999 4.0
 Response type: stored 10 not_stored 0 exists 0 not_found 0
 Response type: num 0 deleted 0 end 0 value 0
 Response type: error 0 client_error 0 server_error 0

 Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
 Errors: fd-unavail 0 ftab-full 0 addrunavail 0 other 0

 CPU time [s]: user 1.18 system 3.50 (user 12.2% system 36.0% total 48.2%)
 Net I/O: bytes 3.4 MB rate 361.3 KB/s (3.0*10^6 bps)


 result for .21 (repeated many times)
 [dsamoylov.dev twemperf (master)]$ src/mcperf --linger=0 --call-rate=0 
 --num-calls=10 --conn-rate=0 --num-conns=1 --sizes=d1

 Total: connections 1 requests 10 responses 10 test-duration 12.328 s 
 (11.230 s, 12.713 s)

 Connection rate: 0.1 conn/s (12328.4 ms/conn = 1 concurrent connections)
 Connection time [ms]: avg 12328.4 min 12328.4 max 12328.4 stddev 0.00
 Connect time [ms]: avg 0.3 min 0.3 max 0.3 stddev 0.00

 Request rate: 8111.4 req/s (0.1 ms/req)
 Request size [B]: avg 28.0 min 28.0 max 28.0 stddev 0.00

 Response rate: 8111.4 rsp/s (0.1 ms/rsp)
 Response size [B]: avg 8.0 min 8.0 max 8.0 stddev 0.00
 Response time [ms]: avg 0.1 min 0.0 max 28.3 stddev 0.00
 Response time [ms]: p25 1.0 p50 1.0 p75 1.0
 Response time [ms]: p95 1.0 p99 1.0 p999 5.0
 Response type: stored 10 not_stored 0 exists 0 not_found 0
 Response type: num 0 deleted 0 end 0 value 0
 Response type: error 0 client_error 0 server_error 0

 Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
 Errors: fd-unavail 0 ftab-full 0 addrunavail 0 other 0

 CPU time [s]: user 3.38 system 7.44 (user 27.4% system 60.3% total 87.7%)
 Net I/O: bytes 3.4 MB rate 285.2 KB/s (2.3*10^6 bps)

 same server is used, clean just restarted memcached daemon.

 --

 ---
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.



Re: memcached flush_all failure

2015-05-13 Thread dormando
What version of memcached are you using are you using?

On Wed, 13 May 2015, Miaolong Zhang wrote:

 There are something wrong when I used memcached command  flush_all.
 I used memcached to store the CSV files.  every time I changed the CSV files 
 and then  flush the memcached used flush_all, but it was failed. 
 although memcahed replied OK , new content could not be found in memcached. 
 (of course, after flush memcached, the new file has been loaded again.)
 when I deleted the key and loaded the CSV file again, bingo... the new 
 content could be found. 

 I don't know why.or is this a bug?

 thanks



 --

 ---
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.



release 1.4.23

2015-04-20 Thread dormando
https://code.google.com/p/memcached/wiki/ReleaseNotes1423

After a long delay I fixed the last known bugs in the lru_rework branch
and decided to release it. If you run a large cluster, this could have a
huge positive impact on your hit ratio. Hit ratio is life. All for the hit
ratio!

-Dormando


Re: Memcached implementation analysis and comparison with Redis

2015-02-16 Thread dormando

 Hello Dormando, thanks for detailed and constructive reply.

 On Tuesday, 17 February 2015 01:39:02 UTC+7, Dormando wrote:
   Yo,

   On Mon, 16 Feb 2015, Roman Leventov wrote:

Hello Memcached users and developers,
I've analyzed Memcached implementation and suggested some improvements
here: 
 http://key-value-stories.blogspot.com/2015/02/memcached-internals-design.html
   
If I'm wrong in some details or conclusions, please correct me 
 instead of blaming, my only concern is making article as relevant as
   possible.

   The basic overview of internals seems largely correct, but as soon as 
 you
   start recommending things it goes weird. I'll answer points below, but
   I'll counter some unsolicited advice with some of my own: you should
   really verify ideas with simple tests rather than simply write them out
   without backing them up.

 In this case preparation of a single post would take month instead of a week. 
 I would throw away some hypoteses, which haven't been proven in my
 syntetic tests, however, this could be because I build wrong workload model, 
 of applied slightly wrong approach. Just pronauncing many ideas (even
 controversial and imprecise) seems to be better: it generates discussions, 
 from  which some really valuable ideas or conclusions could emerge.

This isn't generally true. In my long years of maintaining memcached,
every post like yours builds up technical debt that I have to prove
against when I make my own statements. People will blindly accept what
you've written as truth. It's not really your fault... it's just the way
it is. People still cite blog posts and papers from 6+ years ago when
comparing memcached to things.

Discussions on mailing lists/etc tend to go just fine though.

   There is a surprising text area  flags length\r\n (the value length 
 is
   meant) between the stored key and value. Apparently, it is added for
   debugging,

   It's not for debugging. This is part of the ASCII protocol:

 Ok, updated the post. I have no one to ask about Memcached (and any other 
 database), so use another approach:
 make assumption and wait until someone prove/refute it. 

Why couldn't you have asked here first? You also pinged me on twitter so
you could've just asked :P

   Even through the whole memory dedicated to for entry allocation could 
 be
   malloc()-ed in the start of memcached process (it is done so if 
 memcached
   is configured to use large pages), entries of the size class are
   established by chunks of maximum entry size (i. e. 1 MB). This chunk
   itself also called slab that is confusing a bit.

   It originally did that, but so many people couldn't make large 
 allocations
   at start time it was changed to as-needed.

 Hm. https://github.com/memcached/memcached/blob/master/memcached.c#L5245

I'm not talking about largepages (which aren't even implemented for linux
right now). It's assumed that if you intend to use large/hugepages you're
equipped enough to make sure there's enough free contiguous memory when
you start the daemon.

   Mutex locks are slow, because they cause two context switches. Context
   switch cost estimates vary in a wide range, depending on if this is 
 just a
   in-process thread switch, or an OS-level process switch (requiring much
   more kernel operations and TLB cache flush), CPU model, workload, if the
   server is dedicated for Memcached or have other intensively-working
   processes, and so on.

   This is all baseless speculation. It was spinlocking on a few mutexes
   (without a time limit, I guess). I was never able to get it to be an
   improvement in performance, and as the number of worker threads grows 
 the
   performance would worsen. Removing all of them produced significantly
   higher performance and better latency averages. The locks are held for
   generally consistent amounts of times (no syscalls/etc) so spinlocking
   waiting for a lock just burns cpu for no reason.

   Spinlocks don't seem to be much help outside of the kernel. I don't 
 really
   understand why people are so obsessed with them. If two worker threads
   block on the same mutex, the OS can tell which one to wake up sooner. In
   the spinlock case you could spend an entire timeslice (or two) on the
   wrong thread. Unless there's a POSIX primitive for spinlocking which
   introduces dependencies...

   Even within the linux kernel they're pretty obnoxious. I speak from 
 having
   to peel them out constantly at $dayjob.

   Also, don't forget futexes. For the per-thread statistics lock futexes
   make it an agreeable situation. Assuming it's terrible is how we got
   twemcache's release announcement to claim they've significantly 
 improved
   performance despite simple benchmarking proving otherwise.

   In the future I

Re: Memcached implementation analysis and comparison with Redis

2015-02-16 Thread dormando
 On Tuesday, 17 February 2015 03:38:55 UTC+7, Dormando wrote:

   Again, in actual benchmarks I've not been able to prove them to be a
   problem. In one of the gist links I provided before I show an all miss
   case, which acquires/releases a bucket lock but does not have the 
 overhead
   of processing the value. In those cases it was able to process over 50
   million keys per second. The internals don't tend to be the slow part
   anymore.

 Seems that that was with 32 threads. Hence throughput per thread is 1.5 mln 
 keys/sec, very roughtly.
 It means 600-700 ns / op latency. (Or about 300 ns, if that was with 16 
 threads).
 Maybe it's not the major part of total thousands of ns average  Memcached op
 takes now, but this is considerable amount to optimize.
 Unconctended spin lock should take only dozens of ns.
 Also, if it just queries an empty table, cache is uncontended, memory of 
 mutex structures is not evicted
 as quickly as under normal conditions. So such test tends to show faster 
 mutex ops than they are actually.

I don't care at all about optimizing up from 1.5m keys/sec per thread.
When fetching real values it got up to 20m keys/sec for 32 threads. That's
more than you can possibly need for a few more years until hardware gets
cheaper.

There's honestly nobody even asking for better performance... it's just an
area of academic study for a lot of people (see one of the dozens of
papers on modifying memcached).


 What about optimizing snprintf()?
 - For slab class, we know the highest digit of the value length
 - hand-written itoa()s instead snprintf()?
 - Optimize / precompute most probable combinations of flags in decimal repr
 - If values are constantly sized (or it is hardly the case for memcached?) or 
 one of several size classes,
   ultra thin hash table with precomputed value lengths should help

Optimzing the snprintf could help write speed a chunk. There're a few good
approaches to it. It's low priority for me (connection work and better
slab rebalancing would be more useful to an end user), but good patches
are welcome.

The best thing anyone can do is help test the branches I'm actually
working on. If I can't release these then we don't move forward and
nothing happens at all :/

We're not sponsored like redis is. I've only ever lost money on this
venture.

<    1   2   3   4   5   6   7   8   9   10   >