[jira] [Commented] (HBASE-18405) Track scope for HBase-Spark module

stack (JIRA) Sun, 06 Aug 2017 16:53:54 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115969#comment-16115969
 ]


stack commented on HBASE-18405:
-------------------------------

Read update1. Excellent writeup [~busbey]

Non-important thoughts:

+ Composite row key is spark being able to grok a preexisting hbase row key; 
i.e. teaching spark how to extract subparts. How would this be done in the 
future do you think?
+ No typing then in the first pass. Ain't that going to be ugly perf-wise sir? 
Downsides? How you see the project that does the smooth mapping of spark to 
hbase-type (whether phoenix or hbase data type).
+ What is this one about... "... multiple secure HBase deployments"? A spark 
job spanning hbase clusters?

Avro? Because? You can do schema apart from data?
Java Native is Bytes.toLong, etc.?

You looked at providing a 'catalog'? Would it be tough? Could we ship a 
hard-coded table with type info in it with a facade that implements catalog to 
satisfy spark? Or would folks be needing to extend with their own compound 
types, etc.

bq. Where practical, we will avoid duplication of implementation source code..

Is this like our current hadoop-compatible for metrics, etc., with hadoop 2 and 
hadoop1 implementations?

On unit tests, will we have to spin up clusters? Can we get away with things 
like the RegionAsTable dohickey (puts a Table Interface on a Region...). You 
know what I'm worried about ... You start the build when you leave on your 
two-week vacation and you hope it is done when you get back.

Spark in a new project like hbase-thirdparty?

Great writeup.




> Track scope for HBase-Spark module
> ----------------------------------
>
>                 Key: HBASE-18405
>                 URL: https://issues.apache.org/jira/browse/HBASE-18405
>             Project: HBase
>          Issue Type: Task
>          Components: spark
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>             Fix For: 1.4.0, 2.0.0-beta-1
>
>         Attachments: Apache HBase - Apache Spark Integration Scope.pdf, 
> Apache HBase - Apache Spark Integration Scope - update 1.pdf
>
>
> Start with [\[DISCUSS\]  status of and plans for our hbase-spark integration 
> |https://lists.apache.org/thread.html/fd74ef9b9da77abf794664f06ea19c839fb3d543647fb29115081683@%3Cdev.hbase.apache.org%3E]
>  and formalize into a scope document for bringing this feature into a release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18405) Track scope for HBase-Spark module

Reply via email to