Joining HDFS and JDBC data sources - benchmarks

Eran Medan Fri, 13 Nov 2015 08:57:48 -0800

Hi
I'm looking for some benchmarks on joining data frames where most of the
data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is
still in RDBMS. I am only looking at the very first join before any caching
happens, and I assume there will be loss of parallelization because JDBCRDD
is probably bottlenecked on the max amount of parallel connection the
database server can hold.


Are there any measurements / benchmarks that anyone did?


ᐧ

Joining HDFS and JDBC data sources - benchmarks

Reply via email to