Re: Spark JDBC reads

2017-03-07 Thread El-Hassan Wanas
I was kind of hoping that I would use Spark in this instance to generate that intermediate SQL as part of its workflow strategy. Sort of as a database independent way of doing my preprocessing. Is there any way that allows me to capture the generated SQL from catalyst? If so I would just use

Re: Spark JDBC reads

2017-03-07 Thread Subhash Sriram
Could you create a view of the table on your JDBC data source and just query that from Spark? Thanks, Subhash Sent from my iPhone > On Mar 7, 2017, at 6:37 AM, El-Hassan Wanas wrote: > > As an example, this is basically what I'm doing: > > val myDF =

Re: Spark JDBC reads

2017-03-07 Thread El-Hassan Wanas
As an example, this is basically what I'm doing: val myDF = originalDataFrame.select(col(columnName).when(col(columnName) === "foobar", 0).when(col(columnName) === "foobarbaz", 1)) Except there's much more columns and much more conditionals. The generated Spark workflow starts with an

Re: Spark JDBC reads

2017-03-07 Thread Jörn Franke
Can you provide some source code? I am not sure I understood the problem . If you want to do a preprocessing at the JDBC datasource then you can write your own data source. Additionally you may want to modify the sql statement to extract the data in the right format and push some preprocessing

Spark JDBC reads

2017-03-07 Thread El-Hassan Wanas
Hello, There is, as usual, a big table lying on some JDBC data source. I am doing some data processing on that data from Spark, however, in order to speed up my analysis, I use reduced encodings and minimize the general size of the data before processing. Spark has been doing a great job at