Re: custom RDD in java

2015-07-01 Thread Silvio Fiorito
If all you’re doing is just dumping tables from SQLServer to HDFS, have you 
looked at Sqoop?

Otherwise, if you need to run this in Spark could you just use the existing 
JdbcRDD?


From: Shushant Arora
Date: Wednesday, July 1, 2015 at 10:19 AM
To: user
Subject: custom RDD in java

Hi

Is it possible to write custom RDD in java?

Requirement is - I am having a list of Sqlserver tables  need to be dumped in 
HDFS.

So I have a
ListString tables = {dbname.tablename,dbname.tablename2..};

then
JavaRDDString rdd = javasparkcontext.parllelise(tables);

JavaRDDString tablecontent = rdd.map(new 
FunctionString,IterableString){fetch table and return populate iterable}

tablecontent.storeAsTextFile(hffs path);


In rdd.map(new FunctionString,). I cannot keep complete table content in 
memory , so I want to creat my own RDD to handle it.

Thanks
Shushant








Re: custom RDD in java

2015-07-01 Thread Shushant Arora
List of tables is not large , RDD is created on table list to parllelise
the work of fetching tables in multiple mappers at same time.Since time
taken to fetch a table is significant , so can't run that sequentially.


Content of table fetched by a map job is large, so one option is to dump
content to hdfs using filesystem api from inside map function for every few
rows of table fetched.

I cannot keep complete table in memory and then dump in hdfs using below
map function-

JavaRDDString tablecontent = tablelistrdd.map(new
FunctionString,IterableString)
{public IterableString call(String tablename){
..make jdbc connection get table data and populate in list and return that..
 }
 tablecontent .saveAsTextFile(hdfspath);

Here I wanted to create customRDD- whose partitions would be in memory on
multiple executors and contains parts of table data. And i would have
called saveAsTextFile on customRDD directly to save in hdfs.



On Thu, Jul 2, 2015 at 12:59 AM, Feynman Liang fli...@databricks.com
wrote:


 On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora shushantaror...@gmail.com
  wrote:

 JavaRDDString rdd = javasparkcontext.parllelise(tables);


 You are already creating an RDD in Java here ;)

 However, it's not clear to me why you'd want to make this an RDD. Is the
 list of tables so large that it doesn't fit on a single machine? If not,
 you may be better off spinning up one spark job for dumping each table in
 tables using a JDBC datasource
 https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
 .

 On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito 
 silvio.fior...@granturing.com wrote:

   Sure, you can create custom RDDs. Haven’t done so in Java, but in
 Scala absolutely.

   From: Shushant Arora
 Date: Wednesday, July 1, 2015 at 1:44 PM
 To: Silvio Fiorito
 Cc: user
 Subject: Re: custom RDD in java

   ok..will evaluate these options but is it possible to create RDD in
 java?


 On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito 
 silvio.fior...@granturing.com wrote:

  If all you’re doing is just dumping tables from SQLServer to HDFS,
 have you looked at Sqoop?

  Otherwise, if you need to run this in Spark could you just use the
 existing JdbcRDD?


   From: Shushant Arora
 Date: Wednesday, July 1, 2015 at 10:19 AM
 To: user
 Subject: custom RDD in java

   Hi

  Is it possible to write custom RDD in java?

  Requirement is - I am having a list of Sqlserver tables  need to be
 dumped in HDFS.

  So I have a
 ListString tables = {dbname.tablename,dbname.tablename2..};

  then
 JavaRDDString rdd = javasparkcontext.parllelise(tables);

  JavaRDDString tablecontent = rdd.map(new
 FunctionString,IterableString){fetch table and return populate iterable}

  tablecontent.storeAsTextFile(hffs path);


  In rdd.map(new FunctionString,). I cannot keep complete table
 content in memory , so I want to creat my own RDD to handle it.

  Thanks
 Shushant











Re: custom RDD in java

2015-07-01 Thread Silvio Fiorito
Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala 
absolutely.

From: Shushant Arora
Date: Wednesday, July 1, 2015 at 1:44 PM
To: Silvio Fiorito
Cc: user
Subject: Re: custom RDD in java

ok..will evaluate these options but is it possible to create RDD in java?


On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito 
silvio.fior...@granturing.commailto:silvio.fior...@granturing.com wrote:
If all you’re doing is just dumping tables from SQLServer to HDFS, have you 
looked at Sqoop?

Otherwise, if you need to run this in Spark could you just use the existing 
JdbcRDD?


From: Shushant Arora
Date: Wednesday, July 1, 2015 at 10:19 AM
To: user
Subject: custom RDD in java

Hi

Is it possible to write custom RDD in java?

Requirement is - I am having a list of Sqlserver tables  need to be dumped in 
HDFS.

So I have a
ListString tables = {dbname.tablename,dbname.tablename2..};

then
JavaRDDString rdd = javasparkcontext.parllelise(tables);

JavaRDDString tablecontent = rdd.map(new 
FunctionString,IterableString){fetch table and return populate iterable}

tablecontent.storeAsTextFile(hffs path);


In rdd.map(new FunctionString,). I cannot keep complete table content in 
memory , so I want to creat my own RDD to handle it.

Thanks
Shushant









Re: custom RDD in java

2015-07-01 Thread Feynman Liang
AFAIK RDDs can only be created on the driver, not the executors. Also,
`saveAsTextFile(...)` is an action and hence can also only be executed on
the driver.

As Silvio already mentioned, Sqoop may be a good option.

On Wed, Jul 1, 2015 at 12:46 PM, Shushant Arora shushantaror...@gmail.com
wrote:

 List of tables is not large , RDD is created on table list to parllelise
 the work of fetching tables in multiple mappers at same time.Since time
 taken to fetch a table is significant , so can't run that sequentially.


 Content of table fetched by a map job is large, so one option is to dump
 content to hdfs using filesystem api from inside map function for every few
 rows of table fetched.

 I cannot keep complete table in memory and then dump in hdfs using below
 map function-

 JavaRDDString tablecontent = tablelistrdd.map(new
 FunctionString,IterableString)
 {public IterableString call(String tablename){
 ..make jdbc connection get table data and populate in list and return
 that..
  }
  tablecontent .saveAsTextFile(hdfspath);

 Here I wanted to create customRDD- whose partitions would be in memory on
 multiple executors and contains parts of table data. And i would have
 called saveAsTextFile on customRDD directly to save in hdfs.



 On Thu, Jul 2, 2015 at 12:59 AM, Feynman Liang fli...@databricks.com
 wrote:


 On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora shushantaror...@gmail.com
  wrote:

 JavaRDDString rdd = javasparkcontext.parllelise(tables);


 You are already creating an RDD in Java here ;)

 However, it's not clear to me why you'd want to make this an RDD. Is the
 list of tables so large that it doesn't fit on a single machine? If not,
 you may be better off spinning up one spark job for dumping each table in
 tables using a JDBC datasource
 https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
 .

 On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito 
 silvio.fior...@granturing.com wrote:

   Sure, you can create custom RDDs. Haven’t done so in Java, but in
 Scala absolutely.

   From: Shushant Arora
 Date: Wednesday, July 1, 2015 at 1:44 PM
 To: Silvio Fiorito
 Cc: user
 Subject: Re: custom RDD in java

   ok..will evaluate these options but is it possible to create RDD in
 java?


 On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito 
 silvio.fior...@granturing.com wrote:

  If all you’re doing is just dumping tables from SQLServer to HDFS,
 have you looked at Sqoop?

  Otherwise, if you need to run this in Spark could you just use the
 existing JdbcRDD?


   From: Shushant Arora
 Date: Wednesday, July 1, 2015 at 10:19 AM
 To: user
 Subject: custom RDD in java

   Hi

  Is it possible to write custom RDD in java?

  Requirement is - I am having a list of Sqlserver tables  need to be
 dumped in HDFS.

  So I have a
 ListString tables = {dbname.tablename,dbname.tablename2..};

  then
 JavaRDDString rdd = javasparkcontext.parllelise(tables);

  JavaRDDString tablecontent = rdd.map(new
 FunctionString,IterableString){fetch table and return populate 
 iterable}

  tablecontent.storeAsTextFile(hffs path);


  In rdd.map(new FunctionString,). I cannot keep complete table
 content in memory , so I want to creat my own RDD to handle it.

  Thanks
 Shushant












Re: custom RDD in java

2015-07-01 Thread Shushant Arora
ok..will evaluate these options but is it possible to create RDD in java?


On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito 
silvio.fior...@granturing.com wrote:

  If all you’re doing is just dumping tables from SQLServer to HDFS, have
 you looked at Sqoop?

  Otherwise, if you need to run this in Spark could you just use the
 existing JdbcRDD?


   From: Shushant Arora
 Date: Wednesday, July 1, 2015 at 10:19 AM
 To: user
 Subject: custom RDD in java

   Hi

  Is it possible to write custom RDD in java?

  Requirement is - I am having a list of Sqlserver tables  need to be
 dumped in HDFS.

  So I have a
 ListString tables = {dbname.tablename,dbname.tablename2..};

  then
 JavaRDDString rdd = javasparkcontext.parllelise(tables);

  JavaRDDString tablecontent = rdd.map(new
 FunctionString,IterableString){fetch table and return populate iterable}

  tablecontent.storeAsTextFile(hffs path);


  In rdd.map(new FunctionString,). I cannot keep complete table content
 in memory , so I want to creat my own RDD to handle it.

  Thanks
 Shushant