For the source:
Do you plan to support enumerating all the keys via cachedump / lru_crawler
metadump / ...?
If there is an option which doesn't require enumerating the keys, how will
splitting be done (no splitting / splitting on slab ids / ...)?
Can the cache be read while its still being modified (will effectively a
snapshot be made using a watcher or is it expected that the cache will be
read only or inconsistent when reading)?

Also, as a usability point, all PTransforms are meant to be applied to
PCollections and not vice versa.
e.g.
PCollection<byte[]> keys = ...;
keys.apply(MemCacheIO.withConfig());

This makes it so that people can write:
PCollection<...> output =
input.apply(ptransform1).apply(ptransform2).apply(...);
It also makes it so that a PTransform can be applied to multiple
PCollections.

If you haven't already, I would also suggest that you take a look at the
Pipeline I/O guide: https://beam.apache.org/documentation/io/io-toc/
Talks about various usability points and how to write a good I/O connector.


On Sat, Jul 8, 2017 at 9:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi,
>
> Great job !
>
> I'm looking forward for the PRs review.
>
> Regards
> JB
>
>
> On 07/08/2017 09:50 AM, Madhusudan Borkar wrote:
>
>> Hi,
>> We are proposing to build connectors for memcache first and then use it
>> for
>> Couchbase. The connector for memcache will be build as a IOTransform and
>> then it can be used for other memcache implementations including
>> Couchbase.
>>
>> 1. As Source
>>
>>     input will be a key(String / byte[]), output will be a KV<key, value>
>>
>>     where key - String / byte[]
>>
>>     value - String / byte[]
>>
>>     Spymemcached supports a multi-get operation where it takes a bunch of
>> keys and retrieves the associated values, the input PCollection<key> can
>> be
>> bundled into multiple batches and each batch can be submitted via the
>> multi-get operation.
>>
>> PCollection<KV<byte[], byte[]>> values =
>>
>>     MemCacheIO
>>
>>     .withConfig()
>>
>>     .read()
>>
>>     .withKey(PCollection<byte[]>);
>>
>>
>> 2. As Sink
>>
>>     input will be a KV<key, value>, output will be none or probably a
>> boolean indicating the outcome of the operation
>>
>>
>>
>>
>>
>> //write
>>
>>     MemCacheIO
>>
>>     .withConfig()
>>
>>     .write()
>>
>>     .withEntries(PCollection<KV<byte[],byte[]>>);
>>
>>
>> Implementation plan
>>
>> 1. Develop Memcache connector with 'set' and 'add' operation
>>
>> 2. Then develop other operations
>>
>> 3. Use Memcache connector for Couchbase
>>
>>
>> Thanks @Ismael for help
>>
>> Please, let us know your views.
>>
>> Madhu Borkar
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to