Do you have the full source code? Why do you convert a data frame to rdd - this 
does not make sense to me?

> On 18 May 2016, at 06:13, Mohanraj Ragupathiraj <mohanaug...@gmail.com> wrote:
> 
> I have created a DataFrame from a HBase Table (PHOENIX) which has 500 million 
> rows. From the DataFrame I created an RDD of JavaBean and use it for joining 
> with data from a file.
> 
>       Map<String, String> phoenixInfoMap = new HashMap<String, String>();
>       phoenixInfoMap.put("table", tableName);
>       phoenixInfoMap.put("zkUrl", zkURL);
>       DataFrame df = 
> sqlContext.read().format("org.apache.phoenix.spark").options(phoenixInfoMap).load();
>       JavaRDD<Row> tableRows = df.toJavaRDD();
>       JavaPairRDD<String, AccountModel> dbData = tableRows.mapToPair(
>       new PairFunction<Row, String, String>()
>       {
>               @Override
>               public Tuple2<String, String> call(Row row) throws Exception
>               {
>                       return new Tuple2<String, String>(row.getAs("ID"), 
> row.getAs("NAME"));
>               }
>       });
>  
> Now my question - Lets say the file has 2 unique million entries matching 
> with the table. Is the entire table loaded into memory as RDD or only the 
> matching 2 million records from the table will be loaded into memory as RDD ?
> 
> 
> http://stackoverflow.com/questions/37289849/phoenix-spark-load-table-as-dataframe
> 
> -- 
> Thanks and Regards
> Mohan

Reply via email to