These changes will be available at

A README file and detailed design document is currently available.

On Saturday, 18 May 2013 15:24:36 UTC+5:30, Sunil Patil wrote:
> Hi,
> We have made some changes in memcached for doing "Data filtering at 
> server". We would like to open source this and contribute to memcached. We 
> can provide you the patch for review. We have developed some tests (which 
> people could try out) that show benefits of this i.e. "data filtering at 
> server".
> Please let me know your thoughts.
> Thanks,
> Sunil
> About Changes:
> - With these changes we can do "data filtering at server". This is good 
> for multi-get queries ex: queries issued in social networking applications 
> where "data related to all friends of a user is read, processed, and 
> filtered data is returned to user. Filtered data is often a very small 
> subset of actual data that was read".
> - On a side note not related to memcached server, we also plan to 
> implement data colocation on memcached client (all friends data will be 
> stored on single (or very few) server), so that very few servers are 
> contacted during query processing. This would further compliment data 
> filtering.
> Changes:
> 1. Added two new options to memcached server (-x and –y):
> # ./memcached –h
> …
> -x <num> -y <filter library path>
>               Enable data filtering at server - helps in multi-get 
> operations
>               <num> = 1 - Data filtering at server enable (no deserialized 
> data)
>                           Data deserialized at the time of query processing
>               <num> = 2 - Data filtering at server enable (with 
> deserialized data)
>                           Uses more memory but gives better performance. 
> Avoids data
>                           deserialization at the time of query processing 
> and
>                           saves CPU cycles
>               <filter library path> - path of filter library ''
>                           This library implements filtering functions and 
> data
>                           serialization/deserialization functions
> 2. On enabling filtering, on "get" query we read data of all keys and pass 
> this data to a filtering function implemented in user provided library 
> "". "dlopen", "dlsym" framework is used for opening user 
> provided library and calling user provided functions. User has to define 
> only three functions, "deserialize()", "free_msg()" and "readfilter()". We 
> plan to introduce a new command "fget" (filter get) for this functionality 
> wherein client could additionally pass arguments to filter function and 
> could have multiple filtering functions (i.e. can have (work with) multiple 
> filter libraries).
> Currently changes are implemented for linux platform (tested on linux 
> version RHEL 5.6). Changes made on memcached version "memcached-1.4.13". 
> Changes made for ascii protocol (not for binary protocol), no impact on 
> "gets" (get with CAS) functionality.
> Performance enhancement:
> Some of the advantages of this are (for multi-get queries with 
> characteristics mentioned above):
> - Better throughput and latency under normal query-load conditions => can 
> result in client consolidation
> - Since most data is filtered at server, very less data traffic flows over 
> network (from server to client). This avoids network congestion (and hence 
> latencies/delays caused by this) which might happen under high query-load 
> with normal memcached.
> - Performance with these changes (for multi-get queries with 
> characteristics mentioned above) is 3x to 7x times better than normal 
> memcached as shown below.
> Tests performed:
> - Setup details:
> 1 memcached server, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb ethernet 
> card
> 1 memcached client, RHEL 6.1, 64 bit, 16 core, 24 GB RAM, 1 Gb ethernet 
> card
> - Test details:
> There are one million users (each user represented by a unique key). Each 
> user has 100 friends. Each user has 30 records of type (userId, articleId, 
> timestamp) stored as value. On READ query for a user, all records 
> associated with all friends of that user are READ, sorted in increasing 
> order of timestamp, and top/latest 10 records across all friends are 
> returned as output. So basically on READ query 100 keys (100*30=3000 
> records) are read, 3000 records are sorted and top 10 records are returned 
> as output.
> - For normal memcached all these operations of READING 100 keys, sorting 
> 3000 records, and finding top 10 records are done on client.
> - With our changes (where filtering (sorting) happens on server), on 
> server 100 keys are read, 3000 records are sorted locally by filtering 
> function (implemented in user provided library – similar processing is done 
> on server as it is done on client), and only 10 records are sent to the 
> client.
> Created a multithreaded CLIENT application which issues READ queries 
> asynchronously (multiple threads are used for issuing and processing READ 
> queries). READ queries are issued for varying number of users starting from 
> 1 user to 30000 users. Time taken to complete these queries is used to 
> compute throughput and latency. See the attachments for perf. results.


You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
For more options, visit

Reply via email to