Punit Shah created SPARK-32965: ---------------------------------- Summary: pyspark reading csv files with utf_16le encoding Key: SPARK-32965 URL: https://issues.apache.org/jira/browse/SPARK-32965 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 3.0.0, 2.4.7 Reporter: Punit Shah
If you have a file encoded in utf_16le or utf_16be and try to use spark.read.csv("<file_name>", encoding="utf_16le") the dataframe isn't rendered properly if you use python decoding like: prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : x.decode("utf_16le").splitlines()) and then do spark.read.csv(prdd), then it works. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org