Re: DataFrame insertIntoJDBC parallelism while writing data into a DB table

Mohammad Tariq Tue, 16 Jun 2015 18:00:35 -0700

I would really appreciate if someone could help me with this.

On Monday, June 15, 2015, Mohammad Tariq <donta...@gmail.com> wrote:


> Hello list,
>
> The method *insertIntoJDBC(url: String, table: String, overwrite:
> Boolean)* provided by Spark DataFrame allows us to copy a DataFrame into
> a JDBC DB table. Similar functionality is provided by the 
> *createJDBCTable(url:
> String, table: String, allowExisting: Boolean) *method. But if you look
> at the docs it says that *createJDBCTable *runs a *bunch of Insert
> statements* in order to copy the data. While the docs of *insertIntoJDBC 
> *doesn't
> have any such statement.
>
> Could someone please shed some light on this? How exactly data gets
> inserted using *insertIntoJDBC *method?
>
> And if it is same as *createJDBCTable *method, then what exactly does *bunch
> of Insert statements* mean? What's the criteria to decide the number
> *inserts/bunch*? How are these bunches generated?
>
> *An example* could be creating a DataFrame by reading all the files
> stored in a given directory. If I just do *DataFrame.save()*, it'll
> create the same number of output files as the input files. What'll happen
> in case of *DataFrame.df.insertIntoJDBC()*?
>
> I'm really sorry to be pest of questions, but I could net get much help by
> Googling about this.
>
> Thank you so much for your valuable time. really appreciate it.
>
> [image: http://]
> Tariq, Mohammad
> about.me/mti
> [image: http://]
> <http://about.me/mti>
>
>


-- 

[image: http://]
Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>

Re: DataFrame insertIntoJDBC parallelism while writing data into a DB table

Reply via email to