Re: Some changes in memcached to efficiently process multi-get queries

Sunil Patil Thu, 25 Jul 2013 04:19:58 -0700

Hi,

My experience with memcached is only of 6 months, so I am really not aware 
of production side issues of memcached. When we talked to our production 
team dealing with social application, they said 80% of their queries are 
multi-get.


Initially we had some apprehensions on doing processing on server as that 
would go against the philosophy of memcached of serving data quickly as you 
correctly said. But if system has 80% multi-get traffic, then for this 
pattern doing processing on server really helped. We were able to get 
performance that is normally obtained with 4 clients by using just 1 client 
(client consolidation).

My answer to all your questions would be YES. I feel it depends on usecase 
which store you want to use for your social application. Our systems do use 
memcached as a distributed in-memory cache. And there is a disk backed KV 
database (distributed, reliable, scalable persistent data store) that is 
accessed if there is a cache miss.

But I would be really interested in knowing the use case which made you to 
give up on memcached and come up with a diff architecture.

Thanks,
Sunil

On Tuesday, 23 July 2013 10:57:19 UTC+5:30, nEosAg wrote:
>
> Hi Sunil,
>
> I have gone through few posts, and i know you have done significant amount 
> of work. But, i would like to share few opinions from my experience with no 
> offence.
>
> 1. do you think memcached is perfect store for a "social" application?
> 2. do you think the patches that you have made are "scalable"?
> 3. *If we provide a mechanism for updating data in-place/on-server then 
> this operation would become fast and there wont be any network 
> traffic/load).* Can you make this atomic? even if yes, is it ok with you, 
> to whatsoever you have to sacrify?
>
> etc.
>
> Memcached is KV store built with philosophy to get very quick results 
> based on "known key". KV stores are not built for "social" applications, 
> but they can be used to fetch known data sets, i meant, so have to used 
> with some "conjunction".
>
> I have done same *mistake* by choosing wrong store, still i am running 
> same system from last 4 years with millions of users having thousands of 
> friends and bla bla. But i know its a wrong store, we had faced many 
> issues, many code patching have been done, but now we have proposed diff 
> architecture.
>
> Please think about it..
>
> Regards,
>
>
> On Monday, 22 July 2013 18:02:39 UTC+5:30, Sunil Patil wrote:
>>
>> Interesting.
>>
>> Basically all our internal systems (in production) are tightly integrated 
>> with memcached (hard to change, I guess many other products also rely 
>> heavily on memcached) , so we decided to provide this functionality in 
>> memcached.
>>
>> Just to mention, we found that "one instance per core" method (like in 
>> Redis) performs slower than "single multi-threaded instance" as far as data 
>> filtering of multi-get queries is concerned (say on a single server) as 
>> mentioned in README. For a single multi-get query "single multi-threaded 
>> instance" would return only one set of filtered data, whereas in "one 
>> instance per core" each core would do "filtering separately" (assuming data 
>> is distributed randomly among cores) and return multiple sets of filtered 
>> data for the same query, so more data/packets per query flow over network.
>>
>> Thanks,
>> Sunil
>>  
>>
>>
>> On Mon, Jul 22, 2013 at 3:23 PM, Rohit Karlupia <iamr...@gmail.com>wrote:
>>
>>> Take a look at cacheismo. It supports memcached protocol and provides 
>>> fully scriptable server side runtime. 
>>>
>>> thanks, 
>>> rohitk
>>> On Jul 22, 2013 3:19 PM, "Sunil Patil" <sun...@gmail.com> wrote:
>>>
>>>>  Hi,
>>>>
>>>> All changes "memcached code with support for doing data filtering on 
>>>> server for multi-get queries" (somewhat similar to executing lua script on 
>>>> redis server but much more efficient) is now available at 
>>>> https://github.com/sunillp/**sheep-memcached<https://github.com/sunillp/sheep-memcached>
>>>>
>>>> In addition, we have provided a sample filter library whose filtering 
>>>> functions are called in order to process/filter multi-get queries on 
>>>> server.
>>>> Have provided a "memcached client" which measures performance 
>>>> (throughput and latency) for multi-get queries. This client can be used to 
>>>> see the enhancements that can be achieved by doing data filtering on 
>>>> server. Details of usage/experiments given in README file under section 
>>>> "BUILDING/TESTING" available at https://github.com/sunillp/**
>>>> sheep-memcached <https://github.com/sunillp/sheep-memcached>
>>>>  
>>>> We plan to support many more features using this framework of filter 
>>>> library, basically operations that can be performed on server itself 
>>>> without the need for reading data upto client and processing data on 
>>>> client, ex: pre-processing data before writing into memcached server on 
>>>> SET 
>>>> (this is like a read-modify-update operation. Here data is read from 
>>>> server 
>>>> to client, updated/modified on client and then return back and stored on 
>>>> server. If we provide a mechanism for updating data in-place/on-server 
>>>> then 
>>>> this operation would become fast and there wont be any network 
>>>> traffic/load).
>>>>
>>>> Let us know your feedback.
>>>>
>>>> Thanks,
>>>> Sunil
>>>>
>>>> On Saturday, 18 May 2013 15:24:36 UTC+5:30, Sunil Patil wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We have made some changes in memcached for doing "Data filtering at 
>>>>> server". We would like to open source this and contribute to memcached. 
>>>>> We 
>>>>> can provide you the patch for review. We have developed some tests (which 
>>>>> people could try out) that show benefits of this i.e. "data filtering at 
>>>>> server".
>>>>>
>>>>> Please let me know your thoughts.
>>>>>
>>>>> Thanks,
>>>>> Sunil
>>>>>
>>>>> About Changes:
>>>>> - With these changes we can do "data filtering at server". This is 
>>>>> good for multi-get queries ex: queries issued in social networking 
>>>>> applications where "data related to all friends of a user is read, 
>>>>> processed, and filtered data is returned to user. Filtered data is often 
>>>>> a 
>>>>> very small subset of actual data that was read".
>>>>> - On a side note not related to memcached server, we also plan to 
>>>>> implement data colocation on memcached client (all friends data will be 
>>>>> stored on single (or very few) server), so that very few servers are 
>>>>> contacted during query processing. This would further compliment data 
>>>>> filtering.
>>>>>
>>>>> Changes:
>>>>> 1. Added two new options to memcached server (-x and –y):
>>>>> # ./memcached –h
>>>>> …
>>>>> -x <num> -y <filter library path>
>>>>>               Enable data filtering at server - helps in multi-get 
>>>>> operations
>>>>>               <num> = 1 - Data filtering at server enable (no 
>>>>> deserialized data)
>>>>>                           Data deserialized at the time of query 
>>>>> processing
>>>>>               <num> = 2 - Data filtering at server enable (with 
>>>>> deserialized data)
>>>>>                           Uses more memory but gives better 
>>>>> performance. Avoids data
>>>>>                           deserialization at the time of query 
>>>>> processing and
>>>>>                           saves CPU cycles
>>>>>               <filter library path> - path of filter library 
>>>>> 'libfilter.so'
>>>>>                           This library implements filtering functions 
>>>>> and data
>>>>>                           serialization/deserialization functions
>>>>>
>>>>> 2. On enabling filtering, on "get" query we read data of all keys and 
>>>>> pass this data to a filtering function implemented in user provided 
>>>>> library 
>>>>> "libfilter.so". "dlopen", "dlsym" framework is used for opening user 
>>>>> provided library and calling user provided functions. User has to define 
>>>>> only three functions, "deserialize()", "free_msg()" and "readfilter()". 
>>>>> We 
>>>>> plan to introduce a new command "fget" (filter get) for this 
>>>>> functionality 
>>>>> wherein client could additionally pass arguments to filter function and 
>>>>> could have multiple filtering functions (i.e. can have (work with) 
>>>>> multiple 
>>>>> filter libraries).
>>>>>
>>>>> Currently changes are implemented for linux platform (tested on linux 
>>>>> version RHEL 5.6). Changes made on memcached version "memcached-1.4.13". 
>>>>> Changes made for ascii protocol (not for binary protocol), no impact on 
>>>>> "gets" (get with CAS) functionality.
>>>>>
>>>>> Performance enhancement:
>>>>> Some of the advantages of this are (for multi-get queries with 
>>>>> characteristics mentioned above):
>>>>> - Better throughput and latency under normal query-load conditions => 
>>>>> can result in client consolidation
>>>>> - Since most data is filtered at server, very less data traffic flows 
>>>>> over network (from server to client). This avoids network congestion (and 
>>>>> hence latencies/delays caused by this) which might happen under high 
>>>>> query-load with normal memcached.
>>>>>
>>>>> - Performance with these changes (for multi-get queries with 
>>>>> characteristics mentioned above) is 3x to 7x times better than normal 
>>>>> memcached as shown below.
>>>>>
>>>>> Tests performed:
>>>>> - Setup details:
>>>>> 1 memcached server, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb 
>>>>> ethernet card
>>>>> 1 memcached client, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb 
>>>>> ethernet card
>>>>> - Test details:
>>>>> There are one million users (each user represented by a unique key). 
>>>>> Each user has 100 friends. Each user has 30 records of type (userId, 
>>>>> articleId, timestamp) stored as value. On READ query for a user, all 
>>>>> records associated with all friends of that user are READ, sorted in 
>>>>> increasing order of timestamp, and top/latest 10 records across all 
>>>>> friends 
>>>>> are returned as output. So basically on READ query 100 keys (100*30=3000 
>>>>> records) are read, 3000 records are sorted and top 10 records are 
>>>>> returned 
>>>>> as output.
>>>>>
>>>>> - For normal memcached all these operations of READING 100 keys, 
>>>>> sorting 3000 records, and finding top 10 records are done on client.
>>>>> - With our changes (where filtering (sorting) happens on server), on 
>>>>> server 100 keys are read, 3000 records are sorted locally by filtering 
>>>>> function (implemented in user provided library – similar processing is 
>>>>> done 
>>>>> on server as it is done on client), and only 10 records are sent to the 
>>>>> client.
>>>>>
>>>>> Created a multithreaded CLIENT application which issues READ queries 
>>>>> asynchronously (multiple threads are used for issuing and processing READ 
>>>>> queries). READ queries are issued for varying number of users starting 
>>>>> from 
>>>>> 1 user to 30000 users. Time taken to complete these queries is used to 
>>>>> compute throughput and latency. See the attachments for perf. results.
>>>>>
>>>>  -- 
>>>>  
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "memcached" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to memcached+...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>  
>>>>  
>>>>
>>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Some changes in memcached to efficiently process multi-get queries

Reply via email to