Hello,

Thanks Lukasz for bring some of this subjects. I have briefly
discussed with the guys working on this they are the same team who did
HCatalogIO (Hive).

We just analyzed the different libraries that allowed to develop this
integration from Java and decided that the most complete
implementation was spymemcached. One thing I really didn’t like of
their API is that there is not an abstraction for Mutation (like in
Bigtable/Hbase) but a corresponding method for each operation so to
make things easier we discussed to focus first on read/write.

@Lukasz for the enumeration part, I am not sure I follow, we had just
discussed a naive approach for splitting by server given that
Memcached is not a cluster but a server farm ‘which means every server
is its own’ we thought this will be the easiest way to partition, is
there any technical issue that impeaches this (creating a
BoundedSource and just read per each server)? Or partitioning by slabs
will bring us a better optimization? (Notice I am far from an expert
on Memcached).

For the consistency part I assumed it will be inconsistent when
reading, because I didn’t know how to do the snapshot but if you can
give us more details on how to do this, and why it is worth the effort
(vs the cost of the snapshot), this will be something interesting to
integrate.

Thanks,
Ismaël


On Sun, Jul 9, 2017 at 7:39 PM, Lukasz Cwik <lc...@google.com.invalid> wrote:
> For the source:
> Do you plan to support enumerating all the keys via cachedump / lru_crawler
> metadump / ...?
> If there is an option which doesn't require enumerating the keys, how will
> splitting be done (no splitting / splitting on slab ids / ...)?
> Can the cache be read while its still being modified (will effectively a
> snapshot be made using a watcher or is it expected that the cache will be
> read only or inconsistent when reading)?
>
> Also, as a usability point, all PTransforms are meant to be applied to
> PCollections and not vice versa.
> e.g.
> PCollection<byte[]> keys = ...;
> keys.apply(MemCacheIO.withConfig());
>
> This makes it so that people can write:
> PCollection<...> output =
> input.apply(ptransform1).apply(ptransform2).apply(...);
> It also makes it so that a PTransform can be applied to multiple
> PCollections.
>
> If you haven't already, I would also suggest that you take a look at the
> Pipeline I/O guide: https://beam.apache.org/documentation/io/io-toc/
> Talks about various usability points and how to write a good I/O connector.
>
>
> On Sat, Jul 8, 2017 at 9:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Hi,
>>
>> Great job !
>>
>> I'm looking forward for the PRs review.
>>
>> Regards
>> JB
>>
>>
>> On 07/08/2017 09:50 AM, Madhusudan Borkar wrote:
>>
>>> Hi,
>>> We are proposing to build connectors for memcache first and then use it
>>> for
>>> Couchbase. The connector for memcache will be build as a IOTransform and
>>> then it can be used for other memcache implementations including
>>> Couchbase.
>>>
>>> 1. As Source
>>>
>>>     input will be a key(String / byte[]), output will be a KV<key, value>
>>>
>>>     where key - String / byte[]
>>>
>>>     value - String / byte[]
>>>
>>>     Spymemcached supports a multi-get operation where it takes a bunch of
>>> keys and retrieves the associated values, the input PCollection<key> can
>>> be
>>> bundled into multiple batches and each batch can be submitted via the
>>> multi-get operation.
>>>
>>> PCollection<KV<byte[], byte[]>> values =
>>>
>>>     MemCacheIO
>>>
>>>     .withConfig()
>>>
>>>     .read()
>>>
>>>     .withKey(PCollection<byte[]>);
>>>
>>>
>>> 2. As Sink
>>>
>>>     input will be a KV<key, value>, output will be none or probably a
>>> boolean indicating the outcome of the operation
>>>
>>>
>>>
>>>
>>>
>>> //write
>>>
>>>     MemCacheIO
>>>
>>>     .withConfig()
>>>
>>>     .write()
>>>
>>>     .withEntries(PCollection<KV<byte[],byte[]>>);
>>>
>>>
>>> Implementation plan
>>>
>>> 1. Develop Memcache connector with 'set' and 'add' operation
>>>
>>> 2. Then develop other operations
>>>
>>> 3. Use Memcache connector for Couchbase
>>>
>>>
>>> Thanks @Ismael for help
>>>
>>> Please, let us know your views.
>>>
>>> Madhu Borkar
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>

Reply via email to