Yup - but the reason we did the null handling that way was for Python, which also affects Scala.
On Thu, May 26, 2016 at 4:17 PM, Koert Kuipers <ko...@tresata.com> wrote: > ok, thanks for creating ticket. > > just to be clear: my example was in scala > > On Thu, May 26, 2016 at 7:07 PM, Reynold Xin <r...@databricks.com> wrote: > >> This is unfortunately due to the way we set handle default values in >> Python. I agree it doesn't follow the principle of least astonishment. >> >> Maybe the best thing to do here is to put the actual default values in >> the Python API for csv (and json, parquet, etc), rather than using None in >> Python. This would require us to duplicate default values twice (once in >> data source options, and another in the Python API), but that's probably OK >> given they shouldn't change all the time. >> >> Ticket https://issues.apache.org/jira/browse/SPARK-15585 >> >> >> >> >> On Thu, May 26, 2016 at 3:35 PM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> in spark 1.6.1 we used: >>> sqlContext.read >>> .format("com.databricks.spark.csv") >>> .delimiter("~") >>> .option("quote", null) >>> >>> this effectively turned off quoting, which is a necessity for certain >>> data formats where quoting is not supported and "\"" is a valid character >>> itself in the data. >>> >>> in spark 2.0.0-SNAPSHOT we did same thing: >>> sqlContext.read >>> .format("csv") >>> .delimiter("~") >>> .option("quote", null) >>> >>> but this did not work, we got weird blowups where spark was trying to >>> parse thousands of lines as if it is one record. the reason was that a >>> (valid) quote character ("\"") was present in the data. for example >>> a~b"c~d >>> >>> as it turns out setting quote to null does not turn of quoting anymore. >>> instead it means to use the default quote character. >>> >>> does anyone know how to turn off quoting now? >>> >>> our current workaround is: >>> sqlContext.read >>> .format("csv") >>> .delimiter("~") >>> .option("quote", "☃") >>> >>> (we assume there are no unicode snowman's in our data...) >>> >>> >>> >> >