Re: Spark and HBase RDD join/get

Ted Yu Thu, 14 Jan 2016 07:50:18 -0800

For #1, yes it is possible.

You can find some example in hbase-spark module of hbase where hbase as
DataSource is provided.
e.g.


https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala

Cheers

On Thu, Jan 14, 2016 at 5:04 AM, Kristoffer Sjögren <sto...@gmail.com>
wrote:

> Hi
>
> We have a RDD<UserId> that needs to be mapped with information from
> HBase, where the exact key is the user id.
>
> What's the different alternatives for doing this?
>
> - Is it possible to do HBase.get() requests from a map function in Spark?
> - Or should we join RDDs with all full HBase table scan?
>
> I ask because full table scans feels inefficient, especially if the
> input RDD<UserId> is really small compared to the full table. But I
> realize that a full table scan may not be what happens in reality?
>
> Cheers,
> -Kristoffer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark and HBase RDD join/get

Reply via email to