Hi
I'm looking for some benchmarks on joining data frames where most of the
data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is
still in RDBMS. I am only looking at the very first join before any caching
happens, and I assume there will be loss of parallelization because JDBCRDD
is probably bottlenecked on the max amount of parallel connection the
database server can hold.

Are there any measurements / benchmarks that anyone did?


ᐧ

Reply via email to