you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase, and do join in Spark SQL.
> On 21 Jun 2017, at 10:17 AM, [email protected] wrote: > > Hello, > My scenary is like this: > 1.val df=hivecontext/carboncontex.sql("sql....") > 2.iterating rows,extrating two columns,id and mvcc, and use id as key > to scan hbase to get corresponding value > if mvcc==value, this row pass,else drop > Is there a better way except dataframe.mapPartitions because it cause an > extra stage and spend more time. > I put two DAGs in appendix,please check! > > Thanks!! > [email protected] <mailto:[email protected]><appendix.zip> > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected] > <mailto:[email protected]>
