Spark 1.3.0 DataFrame and Postgres

fergjo00 Wed, 01 Apr 2015 05:52:51 -0700

Question:  
-----------
Is there a way to have JDBC DataFrames use quoted/escaped column names? 
Right now, it looks like it "sees" the names correctly in the schema created
but does not escape them in the SQL it creates when they are not compliant:


org.apache.spark.sql.jdbc.JDBCRDD
....
private val columnList: String = {
val sb = new StringBuilder()
columns.foreach(x => sb.append(",").append(x))
if (sb.length == 0) "1" else sb.substring(1)
}


If you see value in this, I would take a shot at adding the quoting
(escaping) of column names here.  If you don't do it, some drivers... like
postgresql's will simply drop case all names when parsing the query.  As you
can see in the TL;DR below that means they won't match the schema I am
given.

Thanks.

TL;DR:
--------
I am able to connect to a Postgres database in the shell (with driver
referenced):

   val jdbcDf =
sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser",
"sp500")

In fact when I run:

   jdbcDf.registerTempTable("sp500")
   val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI
FROM sp500")

and 

   val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share")))

The values come back as expected.  However, if I try:

   jdbcDf.show

Or if I try
   
   val all = sqlContext.sql("SELECT * FROM sp500")
   all.show

I get errors about column names not being found.  In fact the error includes
a mention of column names all lower cased.  For now I will change my schema
to be more restrictive.  Right now it is, per a Stack Overflow poster, not
ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL and
SQLServer.  BTW, our users are giving us tables like this... because various
tools they already use support non-compliant names.  In fact, this is mild
compared to what we've had to support.

Currently the schema in question uses mixed case, quoted names with special
characters and spaces:

CREATE TABLE sp500
(
"Symbol" text,
"Name" text,
"Sector" text,
"Price" double precision,
"Dividend Yield" double precision,
"Price/Earnings" double precision,
"Earnings/Share" double precision,
"Book Value" double precision,
"52 week low" double precision,
"52 week high" double precision,
"Market Cap" double precision,
"EBITDA" double precision,
"Price/Sales" double precision,
"Price/Book" double precision,
"SEC Filings" text
)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-0-DataFrame-and-Postgres-tp22338.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark 1.3.0 DataFrame and Postgres

Reply via email to