Hello,
There is, as usual, a big table lying on some JDBC data source. I am
doing some data processing on that data from Spark, however, in order to
speed up my analysis, I use reduced encodings and minimize the general
size of the data before processing.
Spark has been doing a great job at generating the proper workflows that
do that preprocessing for me, but it seems to generate those workflows
for execution on the Spark Cluster. The issue with that is the large
transfer cost is still incurred.
Is there any way to force Spark to run the preprocessing on the JDBC
data source and get the prepared output DataFrame instead?
Thanks,
Wanas
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org