[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

pwendell Sun, 23 Mar 2014 11:47:24 -0700

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/194#issuecomment-38392478
  
    I looked at this a bit more closely. I'm definitely +1 on having some 
utility functions like this for writing to HBase.
    
    It seems a bit brittle to me that we expect people to go through a textual 
representation of the records in order to save to HBase. I think a nicer way to 
do this would be to go through a SchemaRDD (which is a new feature recently 
merged into Spark) or even a Scala case class or Scala tuples. And then have an 
automatic conversion into HBase types based on the runtime type of the RDD. And 
you'd just need to give a mapping of the attribute names to the hbase column 
names (/cc @marmbrust).
    
    This approach here seems a little bit more ad-hoc and like something we may 
not want to support for a long time going forward. So it might make sense to 
slot this for Spark 1.1 and re-work it to have more integrated support for 
schemas.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

Reply via email to