ok, thanks for creating ticket. just to be clear: my example was in scala
On Thu, May 26, 2016 at 7:07 PM, Reynold Xin <r...@databricks.com> wrote: > This is unfortunately due to the way we set handle default values in > Python. I agree it doesn't follow the principle of least astonishment. > > Maybe the best thing to do here is to put the actual default values in the > Python API for csv (and json, parquet, etc), rather than using None in > Python. This would require us to duplicate default values twice (once in > data source options, and another in the Python API), but that's probably OK > given they shouldn't change all the time. > > Ticket https://issues.apache.org/jira/browse/SPARK-15585 > > > > > On Thu, May 26, 2016 at 3:35 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> in spark 1.6.1 we used: >> sqlContext.read >> .format("com.databricks.spark.csv") >> .delimiter("~") >> .option("quote", null) >> >> this effectively turned off quoting, which is a necessity for certain >> data formats where quoting is not supported and "\"" is a valid character >> itself in the data. >> >> in spark 2.0.0-SNAPSHOT we did same thing: >> sqlContext.read >> .format("csv") >> .delimiter("~") >> .option("quote", null) >> >> but this did not work, we got weird blowups where spark was trying to >> parse thousands of lines as if it is one record. the reason was that a >> (valid) quote character ("\"") was present in the data. for example >> a~b"c~d >> >> as it turns out setting quote to null does not turn of quoting anymore. >> instead it means to use the default quote character. >> >> does anyone know how to turn off quoting now? >> >> our current workaround is: >> sqlContext.read >> .format("csv") >> .delimiter("~") >> .option("quote", "☃") >> >> (we assume there are no unicode snowman's in our data...) >> >> >> >