[ 
https://issues.apache.org/jira/browse/SPARK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605663#comment-14605663
 ] 

Ted Malaska commented on SPARK-2447:
------------------------------------

Yeah, I have talked a lot with TD (Spark), Job H(HBase), Stacks(HBase) about 
this.  Nether thing HBase or Spark is the right project to put it in.

Right now the code is in Cloudera Labs and a github and works for CDH 5.3 and 
5.4 we have a number of clients on it.

There is talk to make it an apache project.  It is apache listened but it would 
be nice to put it under apache totally.  The problem is it is soooooo simple 
some times it feels to small to be it's own project.  

The design is just to have a HBase connection in a static location in the 
executor.

I know other NoSql brag about local gets, but HBase already had that even 
without SparkOnHBase.  The Table input format already gives you local gets.

All Spark on HBase gives you is an active connection that can be accessed in 
the distributed function of Spark.  Which is very important to some use cases.  
Like Spark Streaming and complex graph local.

Let me know.  We are open to ideas.

> Add common solution for sending upsert actions to HBase (put, deletes, and 
> increment)
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-2447
>                 URL: https://issues.apache.org/jira/browse/SPARK-2447
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, Streaming
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>
> Going to review the design with Tdas today.  
> But first thoughts is to have an extension of VoidFunction that handles the 
> connection to HBase and allows for options such as turning auto flush off for 
> higher through put.
> Need to answer the following questions first.
> - Can it be written in Java or should it be written in Scala?
> - What is the best way to add the HBase dependency? (will review how Flume 
> does this as the first option)
> - What is the best way to do testing? (will review how Flume does this as the 
> first option)
> - How to support python? (python may be a different Jira it is unknown at 
> this time)
> Goals:
> - Simple to use
> - Stable
> - Supports high load
> - Documented (May be in a separate Jira need to ask Tdas)
> - Supports Java, Scala, and hopefully Python
> - Supports Streaming and normal Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to