Re: Spark 1.3.0 DataFrame and Postgres

Ted Yu Wed, 01 Apr 2015 06:16:26 -0700

+1 on escaping column names.



> On Apr 1, 2015, at 5:50 AM, fergjo00 <johngfergu...@gmail.com> wrote:
> 
> Question:  
> -----------
> Is there a way to have JDBC DataFrames use quoted/escaped column names? 
> Right now, it looks like it "sees" the names correctly in the schema created
> but does not escape them in the SQL it creates when they are not compliant:
> 
> org.apache.spark.sql.jdbc.JDBCRDD
> ....
> private val columnList: String = {
> val sb = new StringBuilder()
> columns.foreach(x => sb.append(",").append(x))
> if (sb.length == 0) "1" else sb.substring(1)
> }
> 
> 
> If you see value in this, I would take a shot at adding the quoting
> (escaping) of column names here.  If you don't do it, some drivers... like
> postgresql's will simply drop case all names when parsing the query.  As you
> can see in the TL;DR below that means they won't match the schema I am
> given.
> 
> Thanks.
> 
> TL;DR:
> --------
> I am able to connect to a Postgres database in the shell (with driver
> referenced):
> 
>   val jdbcDf =
> sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser",
> "sp500")
> 
> In fact when I run:
> 
>   jdbcDf.registerTempTable("sp500")
>   val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI
> FROM sp500")
> 
> and 
> 
>   val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share")))
> 
> The values come back as expected.  However, if I try:
> 
>   jdbcDf.show
> 
> Or if I try
> 
>   val all = sqlContext.sql("SELECT * FROM sp500")
>   all.show
> 
> I get errors about column names not being found.  In fact the error includes
> a mention of column names all lower cased.  For now I will change my schema
> to be more restrictive.  Right now it is, per a Stack Overflow poster, not
> ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL and
> SQLServer.  BTW, our users are giving us tables like this... because various
> tools they already use support non-compliant names.  In fact, this is mild
> compared to what we've had to support.
> 
> Currently the schema in question uses mixed case, quoted names with special
> characters and spaces:
> 
> CREATE TABLE sp500
> (
> "Symbol" text,
> "Name" text,
> "Sector" text,
> "Price" double precision,
> "Dividend Yield" double precision,
> "Price/Earnings" double precision,
> "Earnings/Share" double precision,
> "Book Value" double precision,
> "52 week low" double precision,
> "52 week high" double precision,
> "Market Cap" double precision,
> "EBITDA" double precision,
> "Price/Sales" double precision,
> "Price/Book" double precision,
> "SEC Filings" text
> )
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-0-DataFrame-and-Postgres-tp22338.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark 1.3.0 DataFrame and Postgres

Reply via email to