t it using id column.
>
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
&g
So I would bucket it using id column.
>
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
file/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 June 2016 at 05:01, Mohanraj Ragupathiraj <mohanaug...@gmail.com>
> w
I am trying to join a Dataframe(say 100 records) with an ORC file with 500
million records through Spark(can increase to 4-5 billion, 25 bytes each
record).
I used Spark hiveContext API.
*ORC File Creation Code*
//fsdtRdd is JavaRDD, fsdtSchema is StructType schema
DataFrame fsdtDf =
I have 100 million records to be inserted to a HBase table (PHOENIX) as a
result of a Spark Job. I would like to know if i convert it to a Dataframe
and save it, will it do Bulk load (or) it is not the efficient way to write
data to a HBase table
--
Thanks and Regards
Mohan
I have created a DataFrame from a HBase Table (PHOENIX) which has 500
million rows. From the DataFrame I created an RDD of JavaBean and use it
for joining with data from a file.
Map phoenixInfoMap = new HashMap();
phoenixInfoMap.put("table", tableName);
I have created a DataFrame from a HBase Table (PHOENIX) which has 500
million rows. From the DataFrame I created an RDD of JavaBean and use it
for joining with data from a file.
Map phoenixInfoMap = new HashMap();
phoenixInfoMap.put("table", tableName);