Re: Hive From Spark: Jdbc VS sparkContext

Nicolas Paris Fri, 13 Oct 2017 00:23:30 -0700

> In case a table has a few
> million records, it all goes through the driver.


This sounds clear in JDBC mode, the driver get all the rows and then it
spreads the RDD over the executors.

I d'say that most use cases deal with SQL to aggregate huge datasets,
and retrieve small amount of rows to be then transformed for ML tasks.
Then using JDBC offers the robustness of HIVE to produce a small aggregated
dataset into spark. While using SPARK SQL uses RDD to produce the small
one from huge.

Not very clear how SPARK SQL deal with huge HIVE table. Does it load
everything into memory and crash, or does this never happend?


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Hive From Spark: Jdbc VS sparkContext

Reply via email to