I was kind of hoping that I would use Spark in this instance to generate
that intermediate SQL as part of its workflow strategy. Sort of as a
database independent way of doing my preprocessing.
Is there any way that allows me to capture the generated SQL from catalyst?
If so I would just use
Could you create a view of the table on your JDBC data source and just query
that from Spark?
Thanks,
Subhash
Sent from my iPhone
> On Mar 7, 2017, at 6:37 AM, El-Hassan Wanas wrote:
>
> As an example, this is basically what I'm doing:
>
> val myDF =
As an example, this is basically what I'm doing:
val myDF =
originalDataFrame.select(col(columnName).when(col(columnName) ===
"foobar", 0).when(col(columnName) === "foobarbaz", 1))
Except there's much more columns and much more conditionals. The
generated Spark workflow starts with an
Can you provide some source code? I am not sure I understood the problem .
If you want to do a preprocessing at the JDBC datasource then you can write
your own data source. Additionally you may want to modify the sql statement to
extract the data in the right format and push some preprocessing
Hello,
There is, as usual, a big table lying on some JDBC data source. I am
doing some data processing on that data from Spark, however, in order to
speed up my analysis, I use reduced encodings and minimize the general
size of the data before processing.
Spark has been doing a great job at