On Fri, Jun 23, 2017 at 12:06 PM, Stack <[email protected]> wrote: > On Wed, Jun 21, 2017 at 9:31 AM, Sean Busbey <[email protected]> wrote: > > > 2) What Spark version(s) do we care about? >> ... >> >> What version(s) do we want to handle and thus encourage our downstream >> folks to use? >> > .. > >> Personally, I think I favor option b for simplicity, though I don't >> care for more possible delay in getting stuff out in branch-1. >> Probably option a would be best for our downstreamers. >> >> > Lets do option b.) well. If demand and contribs, lets consider adding 1.6 > support. >
This is the chorus I'm hearing. :) > >> 6) What about the SHC project? >> >> In case you didn’t see the excellent talk at HBaseCon from Weiqing >> Yang, she’s been maintaining a high quality integration library >> between HBase and Spark. >> >> HBaseCon West 2017 slides: https://s.apache.org/IQMA >> Blog: https://s.apache.org/m1bc >> Repo: https://github.com/hortonworks-spark/shc >> >> I’d love to see us encourage the SHC devs to fold their work into >> participation in our wider community. Before approaching them about >> that, I think we need to make sure we share goals and can give them >> reasonable expectations about release cadence (which probably means >> making it into branch-1). >> > > I pinged Weiqing; my guess is she has an opinion on your swath here. > Been going for a few days here and no obvious points of contention; would love to get her take on things. > >> >> Right now, I’d only consider the things that have made it to our docs >> to be “done”. Here’s the relevant section of the ref guide: >> >> http://hbase.apache.org/book.html#spark >> >> Comparing our current offering and the above, I’d say the big gaps >> between our offering and the SHC project are: >> >> * Avro serialization (we have this implemented but documentation is >> limited to an example in the section on SparkSQL support) >> * Composite keys (as mentioned above, we have a start to this) >> * More robust handling of delegation tokens, i.e. in presence of >> multiple secure clusters >> * Handling of Phoenix encoded data >> >> Are these all things we’d want available to our downstream folks? >> >> > I don't know enough about the integration but is the 'handling of Phoenix > encoded data' about mapping spark types to a serialization in hbase? If > not, where is the need for seamless transforms between spark types and a > natural hbase serialization listed. We need this IIRC. > It's a subtask, really. We already have a pluggable system for mapping between spark types and a couple of serialization options (the docs need improvement?). SHC has its own pluggable system and has the addition of a phoenix encoding. The set seems like the most likely out-of-the-box formats folks might have something in. (I thinkMaybe Kite? I think it's different than the rest.) Or are you saying we can just map all of it the the hbase-common "types" and then do the pluggable part under it?
