Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-20 Thread Mohanraj Ragupathiraj
t it using id column. > > > HTH > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > &g

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-20 Thread Mohanraj Ragupathiraj
So I would bucket it using id column. > > > HTH > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-20 Thread Mohanraj Ragupathiraj
file/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 20 June 2016 at 05:01, Mohanraj Ragupathiraj <mohanaug...@gmail.com> > w

Spark - “min key = null, max key = null” while reading ORC file

2016-06-19 Thread Mohanraj Ragupathiraj
I am trying to join a Dataframe(say 100 records) with an ORC file with 500 million records through Spark(can increase to 4-5 billion, 25 bytes each record). I used Spark hiveContext API. *ORC File Creation Code* //fsdtRdd is JavaRDD, fsdtSchema is StructType schema DataFrame fsdtDf =

SPARK - DataFrame for BulkLoad

2016-05-17 Thread Mohanraj Ragupathiraj
I have 100 million records to be inserted to a HBase table (PHOENIX) as a result of a Spark Job. I would like to know if i convert it to a Dataframe and save it, will it do Bulk load (or) it is not the efficient way to write data to a HBase table -- Thanks and Regards Mohan

Load Table as DataFrame

2016-05-17 Thread Mohanraj Ragupathiraj
I have created a DataFrame from a HBase Table (PHOENIX) which has 500 million rows. From the DataFrame I created an RDD of JavaBean and use it for joining with data from a file. Map phoenixInfoMap = new HashMap(); phoenixInfoMap.put("table", tableName);

Load Table as DataFrame

2016-05-17 Thread Mohanraj Ragupathiraj
I have created a DataFrame from a HBase Table (PHOENIX) which has 500 million rows. From the DataFrame I created an RDD of JavaBean and use it for joining with data from a file. Map phoenixInfoMap = new HashMap(); phoenixInfoMap.put("table", tableName);