[ https://issues.apache.org/jira/browse/SPARK-6666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luciano Resende closed SPARK-6666. ---------------------------------- Resolution: Cannot Reproduce I have tried the scenarios above in Spark trunk using both Postgres and DB2, see: https://github.com/lresende/spark-sandbox/blob/master/src/main/scala/com/luck/sql/JDBCApplication.scala And the described issues seems not reproducible anymore, see all results below root |-- Symbol: string (nullable = true) |-- Name: string (nullable = true) |-- Sector: string (nullable = true) |-- Price: double (nullable = true) |-- Dividend Yield: double (nullable = true) |-- Price/Earnings: double (nullable = true) |-- Earnings/Share: double (nullable = true) |-- Book Value: double (nullable = true) |-- 52 week low: double (nullable = true) |-- 52 week high: double (nullable = true) |-- Market Cap: double (nullable = true) |-- EBITDA: double (nullable = true) |-- Price/Sales: double (nullable = true) |-- Price/Book: double (nullable = true) |-- SEC Filings: string (nullable = true) +------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+ |Symbol| Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC Filings| +------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+ | S1|Name 1| Sec 1| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 100| | s2|Name 2| Sec 2| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 200| +------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+ +------+ |AvgCPI| +------+ | 15.0| +------+ +------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+ |Symbol| Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC Filings| +------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+ | S1|Name 1| Sec 1| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 100| | s2|Name 2| Sec 2| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 200| +------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+ > org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names > --------------------------------------------------------------------- > > Key: SPARK-6666 > URL: https://issues.apache.org/jira/browse/SPARK-6666 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.0 > Environment: > Reporter: John Ferguson > Priority: Critical > > Is there a way to have JDBC DataFrames use quoted/escaped column names? > Right now, it looks like it "sees" the names correctly in the schema created > but does not escape them in the SQL it creates when they are not compliant: > org.apache.spark.sql.jdbc.JDBCRDD > .... > private val columnList: String = { > val sb = new StringBuilder() > columns.foreach(x => sb.append(",").append(x)) > if (sb.length == 0) "1" else sb.substring(1) > } > If you see value in this, I would take a shot at adding the quoting > (escaping) of column names here. If you don't do it, some drivers... like > postgresql's will simply drop case all names when parsing the query. As you > can see in the TL;DR below that means they won't match the schema I am given. > TL;DR: > -------- > I am able to connect to a Postgres database in the shell (with driver > referenced): > val jdbcDf = > sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500") > In fact when I run: > jdbcDf.registerTempTable("sp500") > val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI > FROM sp500") > and > val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share"))) > The values come back as expected. However, if I try: > jdbcDf.show > Or if I try > > val all = sqlContext.sql("SELECT * FROM sp500") > all.show > I get errors about column names not being found. In fact the error includes > a mention of column names all lower cased. For now I will change my schema > to be more restrictive. Right now it is, per a Stack Overflow poster, not > ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL and > SQLServer. BTW, our users are giving us tables like this... because various > tools they already use support non-compliant names. In fact, this is mild > compared to what we've had to support. > Currently the schema in question uses mixed case, quoted names with special > characters and spaces: > CREATE TABLE sp500 > ( > "Symbol" text, > "Name" text, > "Sector" text, > "Price" double precision, > "Dividend Yield" double precision, > "Price/Earnings" double precision, > "Earnings/Share" double precision, > "Book Value" double precision, > "52 week low" double precision, > "52 week high" double precision, > "Market Cap" double precision, > "EBITDA" double precision, > "Price/Sales" double precision, > "Price/Book" double precision, > "SEC Filings" text > ) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org