Re: custom RDD in java
If all you’re doing is just dumping tables from SQLServer to HDFS, have you looked at Sqoop? Otherwise, if you need to run this in Spark could you just use the existing JdbcRDD? From: Shushant Arora Date: Wednesday, July 1, 2015 at 10:19 AM To: user Subject: custom RDD in java Hi Is it possible to write custom RDD in java? Requirement is - I am having a list of Sqlserver tables need to be dumped in HDFS. So I have a ListString tables = {dbname.tablename,dbname.tablename2..}; then JavaRDDString rdd = javasparkcontext.parllelise(tables); JavaRDDString tablecontent = rdd.map(new FunctionString,IterableString){fetch table and return populate iterable} tablecontent.storeAsTextFile(hffs path); In rdd.map(new FunctionString,). I cannot keep complete table content in memory , so I want to creat my own RDD to handle it. Thanks Shushant
Re: custom RDD in java
List of tables is not large , RDD is created on table list to parllelise the work of fetching tables in multiple mappers at same time.Since time taken to fetch a table is significant , so can't run that sequentially. Content of table fetched by a map job is large, so one option is to dump content to hdfs using filesystem api from inside map function for every few rows of table fetched. I cannot keep complete table in memory and then dump in hdfs using below map function- JavaRDDString tablecontent = tablelistrdd.map(new FunctionString,IterableString) {public IterableString call(String tablename){ ..make jdbc connection get table data and populate in list and return that.. } tablecontent .saveAsTextFile(hdfspath); Here I wanted to create customRDD- whose partitions would be in memory on multiple executors and contains parts of table data. And i would have called saveAsTextFile on customRDD directly to save in hdfs. On Thu, Jul 2, 2015 at 12:59 AM, Feynman Liang fli...@databricks.com wrote: On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora shushantaror...@gmail.com wrote: JavaRDDString rdd = javasparkcontext.parllelise(tables); You are already creating an RDD in Java here ;) However, it's not clear to me why you'd want to make this an RDD. Is the list of tables so large that it doesn't fit on a single machine? If not, you may be better off spinning up one spark job for dumping each table in tables using a JDBC datasource https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases . On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala absolutely. From: Shushant Arora Date: Wednesday, July 1, 2015 at 1:44 PM To: Silvio Fiorito Cc: user Subject: Re: custom RDD in java ok..will evaluate these options but is it possible to create RDD in java? On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: If all you’re doing is just dumping tables from SQLServer to HDFS, have you looked at Sqoop? Otherwise, if you need to run this in Spark could you just use the existing JdbcRDD? From: Shushant Arora Date: Wednesday, July 1, 2015 at 10:19 AM To: user Subject: custom RDD in java Hi Is it possible to write custom RDD in java? Requirement is - I am having a list of Sqlserver tables need to be dumped in HDFS. So I have a ListString tables = {dbname.tablename,dbname.tablename2..}; then JavaRDDString rdd = javasparkcontext.parllelise(tables); JavaRDDString tablecontent = rdd.map(new FunctionString,IterableString){fetch table and return populate iterable} tablecontent.storeAsTextFile(hffs path); In rdd.map(new FunctionString,). I cannot keep complete table content in memory , so I want to creat my own RDD to handle it. Thanks Shushant
Re: custom RDD in java
Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala absolutely. From: Shushant Arora Date: Wednesday, July 1, 2015 at 1:44 PM To: Silvio Fiorito Cc: user Subject: Re: custom RDD in java ok..will evaluate these options but is it possible to create RDD in java? On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito silvio.fior...@granturing.commailto:silvio.fior...@granturing.com wrote: If all you’re doing is just dumping tables from SQLServer to HDFS, have you looked at Sqoop? Otherwise, if you need to run this in Spark could you just use the existing JdbcRDD? From: Shushant Arora Date: Wednesday, July 1, 2015 at 10:19 AM To: user Subject: custom RDD in java Hi Is it possible to write custom RDD in java? Requirement is - I am having a list of Sqlserver tables need to be dumped in HDFS. So I have a ListString tables = {dbname.tablename,dbname.tablename2..}; then JavaRDDString rdd = javasparkcontext.parllelise(tables); JavaRDDString tablecontent = rdd.map(new FunctionString,IterableString){fetch table and return populate iterable} tablecontent.storeAsTextFile(hffs path); In rdd.map(new FunctionString,). I cannot keep complete table content in memory , so I want to creat my own RDD to handle it. Thanks Shushant
Re: custom RDD in java
AFAIK RDDs can only be created on the driver, not the executors. Also, `saveAsTextFile(...)` is an action and hence can also only be executed on the driver. As Silvio already mentioned, Sqoop may be a good option. On Wed, Jul 1, 2015 at 12:46 PM, Shushant Arora shushantaror...@gmail.com wrote: List of tables is not large , RDD is created on table list to parllelise the work of fetching tables in multiple mappers at same time.Since time taken to fetch a table is significant , so can't run that sequentially. Content of table fetched by a map job is large, so one option is to dump content to hdfs using filesystem api from inside map function for every few rows of table fetched. I cannot keep complete table in memory and then dump in hdfs using below map function- JavaRDDString tablecontent = tablelistrdd.map(new FunctionString,IterableString) {public IterableString call(String tablename){ ..make jdbc connection get table data and populate in list and return that.. } tablecontent .saveAsTextFile(hdfspath); Here I wanted to create customRDD- whose partitions would be in memory on multiple executors and contains parts of table data. And i would have called saveAsTextFile on customRDD directly to save in hdfs. On Thu, Jul 2, 2015 at 12:59 AM, Feynman Liang fli...@databricks.com wrote: On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora shushantaror...@gmail.com wrote: JavaRDDString rdd = javasparkcontext.parllelise(tables); You are already creating an RDD in Java here ;) However, it's not clear to me why you'd want to make this an RDD. Is the list of tables so large that it doesn't fit on a single machine? If not, you may be better off spinning up one spark job for dumping each table in tables using a JDBC datasource https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases . On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala absolutely. From: Shushant Arora Date: Wednesday, July 1, 2015 at 1:44 PM To: Silvio Fiorito Cc: user Subject: Re: custom RDD in java ok..will evaluate these options but is it possible to create RDD in java? On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: If all you’re doing is just dumping tables from SQLServer to HDFS, have you looked at Sqoop? Otherwise, if you need to run this in Spark could you just use the existing JdbcRDD? From: Shushant Arora Date: Wednesday, July 1, 2015 at 10:19 AM To: user Subject: custom RDD in java Hi Is it possible to write custom RDD in java? Requirement is - I am having a list of Sqlserver tables need to be dumped in HDFS. So I have a ListString tables = {dbname.tablename,dbname.tablename2..}; then JavaRDDString rdd = javasparkcontext.parllelise(tables); JavaRDDString tablecontent = rdd.map(new FunctionString,IterableString){fetch table and return populate iterable} tablecontent.storeAsTextFile(hffs path); In rdd.map(new FunctionString,). I cannot keep complete table content in memory , so I want to creat my own RDD to handle it. Thanks Shushant
Re: custom RDD in java
ok..will evaluate these options but is it possible to create RDD in java? On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: If all you’re doing is just dumping tables from SQLServer to HDFS, have you looked at Sqoop? Otherwise, if you need to run this in Spark could you just use the existing JdbcRDD? From: Shushant Arora Date: Wednesday, July 1, 2015 at 10:19 AM To: user Subject: custom RDD in java Hi Is it possible to write custom RDD in java? Requirement is - I am having a list of Sqlserver tables need to be dumped in HDFS. So I have a ListString tables = {dbname.tablename,dbname.tablename2..}; then JavaRDDString rdd = javasparkcontext.parllelise(tables); JavaRDDString tablecontent = rdd.map(new FunctionString,IterableString){fetch table and return populate iterable} tablecontent.storeAsTextFile(hffs path); In rdd.map(new FunctionString,). I cannot keep complete table content in memory , so I want to creat my own RDD to handle it. Thanks Shushant