Ernst Sjöstrand created SPARK-15840: ---------------------------------------
Summary: New csv reader does not "determine the input schema" Key: SPARK-15840 URL: https://issues.apache.org/jira/browse/SPARK-15840 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 2.0.0 Reporter: Ernst Sjöstrand When testing the new csv reader I found that it would not determine the input schema as is stated in the documentation. (I used this documentation: https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext ) So either there is a bug in the implementation or in the documentation. This also means that things like dateFormat are ignore it seems like. Here's a quick test in pyspark (using Python3): a = spark.read.csv("/home/ernst/test.csv") a.printSchema() print(a.dtypes) a.show() root |-- _c0: string (nullable = true) [('_c0', 'string')] +---+ |_c0| +---+ | 1| | 2| | 3| | 4| +---+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org