For the source: Do you plan to support enumerating all the keys via cachedump / lru_crawler metadump / ...? If there is an option which doesn't require enumerating the keys, how will splitting be done (no splitting / splitting on slab ids / ...)? Can the cache be read while its still being modified (will effectively a snapshot be made using a watcher or is it expected that the cache will be read only or inconsistent when reading)?
Also, as a usability point, all PTransforms are meant to be applied to PCollections and not vice versa. e.g. PCollection<byte[]> keys = ...; keys.apply(MemCacheIO.withConfig()); This makes it so that people can write: PCollection<...> output = input.apply(ptransform1).apply(ptransform2).apply(...); It also makes it so that a PTransform can be applied to multiple PCollections. If you haven't already, I would also suggest that you take a look at the Pipeline I/O guide: https://beam.apache.org/documentation/io/io-toc/ Talks about various usability points and how to write a good I/O connector. On Sat, Jul 8, 2017 at 9:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi, > > Great job ! > > I'm looking forward for the PRs review. > > Regards > JB > > > On 07/08/2017 09:50 AM, Madhusudan Borkar wrote: > >> Hi, >> We are proposing to build connectors for memcache first and then use it >> for >> Couchbase. The connector for memcache will be build as a IOTransform and >> then it can be used for other memcache implementations including >> Couchbase. >> >> 1. As Source >> >> input will be a key(String / byte[]), output will be a KV<key, value> >> >> where key - String / byte[] >> >> value - String / byte[] >> >> Spymemcached supports a multi-get operation where it takes a bunch of >> keys and retrieves the associated values, the input PCollection<key> can >> be >> bundled into multiple batches and each batch can be submitted via the >> multi-get operation. >> >> PCollection<KV<byte[], byte[]>> values = >> >> MemCacheIO >> >> .withConfig() >> >> .read() >> >> .withKey(PCollection<byte[]>); >> >> >> 2. As Sink >> >> input will be a KV<key, value>, output will be none or probably a >> boolean indicating the outcome of the operation >> >> >> >> >> >> //write >> >> MemCacheIO >> >> .withConfig() >> >> .write() >> >> .withEntries(PCollection<KV<byte[],byte[]>>); >> >> >> Implementation plan >> >> 1. Develop Memcache connector with 'set' and 'add' operation >> >> 2. Then develop other operations >> >> 3. Use Memcache connector for Couchbase >> >> >> Thanks @Ismael for help >> >> Please, let us know your views. >> >> Madhu Borkar >> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >