If all you’re doing is just dumping tables from SQLServer to HDFS, have you 
looked at Sqoop?

Otherwise, if you need to run this in Spark could you just use the existing 
JdbcRDD?


From: Shushant Arora
Date: Wednesday, July 1, 2015 at 10:19 AM
To: user
Subject: custom RDD in java

Hi

Is it possible to write custom RDD in java?

Requirement is - I am having a list of Sqlserver tables  need to be dumped in 
HDFS.

So I have a
List<String> tables = {dbname.tablename,dbname.tablename2......};

then
JavaRDD<String> rdd = javasparkcontext.parllelise(tables);

JavaRDDString> tablecontent = rdd.map(new 
Function<String,Iterable<String>>){fetch table and return populate iterable}

tablecontent.storeAsTextFile("hffs path");


In rdd.map(new Function<String,>). I cannot keep complete table content in 
memory , so I want to creat my own RDD to handle it.

Thanks
Shushant






Reply via email to