Hello, Thanks Lukasz for bring some of this subjects. I have briefly discussed with the guys working on this they are the same team who did HCatalogIO (Hive).
We just analyzed the different libraries that allowed to develop this integration from Java and decided that the most complete implementation was spymemcached. One thing I really didn’t like of their API is that there is not an abstraction for Mutation (like in Bigtable/Hbase) but a corresponding method for each operation so to make things easier we discussed to focus first on read/write. @Lukasz for the enumeration part, I am not sure I follow, we had just discussed a naive approach for splitting by server given that Memcached is not a cluster but a server farm ‘which means every server is its own’ we thought this will be the easiest way to partition, is there any technical issue that impeaches this (creating a BoundedSource and just read per each server)? Or partitioning by slabs will bring us a better optimization? (Notice I am far from an expert on Memcached). For the consistency part I assumed it will be inconsistent when reading, because I didn’t know how to do the snapshot but if you can give us more details on how to do this, and why it is worth the effort (vs the cost of the snapshot), this will be something interesting to integrate. Thanks, Ismaël On Sun, Jul 9, 2017 at 7:39 PM, Lukasz Cwik <lc...@google.com.invalid> wrote: > For the source: > Do you plan to support enumerating all the keys via cachedump / lru_crawler > metadump / ...? > If there is an option which doesn't require enumerating the keys, how will > splitting be done (no splitting / splitting on slab ids / ...)? > Can the cache be read while its still being modified (will effectively a > snapshot be made using a watcher or is it expected that the cache will be > read only or inconsistent when reading)? > > Also, as a usability point, all PTransforms are meant to be applied to > PCollections and not vice versa. > e.g. > PCollection<byte[]> keys = ...; > keys.apply(MemCacheIO.withConfig()); > > This makes it so that people can write: > PCollection<...> output = > input.apply(ptransform1).apply(ptransform2).apply(...); > It also makes it so that a PTransform can be applied to multiple > PCollections. > > If you haven't already, I would also suggest that you take a look at the > Pipeline I/O guide: https://beam.apache.org/documentation/io/io-toc/ > Talks about various usability points and how to write a good I/O connector. > > > On Sat, Jul 8, 2017 at 9:31 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Hi, >> >> Great job ! >> >> I'm looking forward for the PRs review. >> >> Regards >> JB >> >> >> On 07/08/2017 09:50 AM, Madhusudan Borkar wrote: >> >>> Hi, >>> We are proposing to build connectors for memcache first and then use it >>> for >>> Couchbase. The connector for memcache will be build as a IOTransform and >>> then it can be used for other memcache implementations including >>> Couchbase. >>> >>> 1. As Source >>> >>> input will be a key(String / byte[]), output will be a KV<key, value> >>> >>> where key - String / byte[] >>> >>> value - String / byte[] >>> >>> Spymemcached supports a multi-get operation where it takes a bunch of >>> keys and retrieves the associated values, the input PCollection<key> can >>> be >>> bundled into multiple batches and each batch can be submitted via the >>> multi-get operation. >>> >>> PCollection<KV<byte[], byte[]>> values = >>> >>> MemCacheIO >>> >>> .withConfig() >>> >>> .read() >>> >>> .withKey(PCollection<byte[]>); >>> >>> >>> 2. As Sink >>> >>> input will be a KV<key, value>, output will be none or probably a >>> boolean indicating the outcome of the operation >>> >>> >>> >>> >>> >>> //write >>> >>> MemCacheIO >>> >>> .withConfig() >>> >>> .write() >>> >>> .withEntries(PCollection<KV<byte[],byte[]>>); >>> >>> >>> Implementation plan >>> >>> 1. Develop Memcache connector with 'set' and 'add' operation >>> >>> 2. Then develop other operations >>> >>> 3. Use Memcache connector for Couchbase >>> >>> >>> Thanks @Ismael for help >>> >>> Please, let us know your views. >>> >>> Madhu Borkar >>> >>> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >>