[ 
https://issues.apache.org/jira/browse/SPARK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080166#comment-14080166
 ] 

Ted Malaska commented on SPARK-2447:
------------------------------------

Hey Matei,

Lets do a webex or something in the near future.  I would love to get more of 
your input.  

Here are my answers to you questions above:
1. Yes I can do Python
2. Yes I can do that.  So to be clear the bulkGet and scan will return a fixed 
(Array[Byte], Array[(Array[Byte], Array[Byte], Array[Byte], Long)]) for 
(rowKey, Array[columnFamily, column, value, timestamp)])
2.1 As for the bulkPut/Increment/Delete/CheckPut I think we need to give the 
user freedom to interact with the raw API.  I have no problem building a 
simpler interface for the 80% use case but I don't want to fail the 20%.
3. The lowest version is 0.96 The release is there was a major API change from 
0.94 to 0.96+.  So if we need to support 0.94 and below we need to make a 
different code base.

Let me know if this answers you questions and let me know if there is anything 
else I can do.  I have learned so much from TD and I have grown so much from 
this process.

Ted Malaska

> Add common solution for sending upsert actions to HBase (put, deletes, and 
> increment)
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-2447
>                 URL: https://issues.apache.org/jira/browse/SPARK-2447
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, Streaming
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>
> Going to review the design with Tdas today.  
> But first thoughts is to have an extension of VoidFunction that handles the 
> connection to HBase and allows for options such as turning auto flush off for 
> higher through put.
> Need to answer the following questions first.
> - Can it be written in Java or should it be written in Scala?
> - What is the best way to add the HBase dependency? (will review how Flume 
> does this as the first option)
> - What is the best way to do testing? (will review how Flume does this as the 
> first option)
> - How to support python? (python may be a different Jira it is unknown at 
> this time)
> Goals:
> - Simple to use
> - Stable
> - Supports high load
> - Documented (May be in a separate Jira need to ask Tdas)
> - Supports Java, Scala, and hopefully Python
> - Supports Streaming and normal Spark



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to