Do you have the full source code? Why do you convert a data frame to rdd - this does not make sense to me?
> On 18 May 2016, at 06:13, Mohanraj Ragupathiraj <mohanaug...@gmail.com> wrote: > > I have created a DataFrame from a HBase Table (PHOENIX) which has 500 million > rows. From the DataFrame I created an RDD of JavaBean and use it for joining > with data from a file. > > Map<String, String> phoenixInfoMap = new HashMap<String, String>(); > phoenixInfoMap.put("table", tableName); > phoenixInfoMap.put("zkUrl", zkURL); > DataFrame df = > sqlContext.read().format("org.apache.phoenix.spark").options(phoenixInfoMap).load(); > JavaRDD<Row> tableRows = df.toJavaRDD(); > JavaPairRDD<String, AccountModel> dbData = tableRows.mapToPair( > new PairFunction<Row, String, String>() > { > @Override > public Tuple2<String, String> call(Row row) throws Exception > { > return new Tuple2<String, String>(row.getAs("ID"), > row.getAs("NAME")); > } > }); > > Now my question - Lets say the file has 2 unique million entries matching > with the table. Is the entire table loaded into memory as RDD or only the > matching 2 million records from the table will be loaded into memory as RDD ? > > > http://stackoverflow.com/questions/37289849/phoenix-spark-load-table-as-dataframe > > -- > Thanks and Regards > Mohan