GitHub user jmchung opened a pull request: https://github.com/apache/spark/pull/19199
[SPARK-21610][SQL][FOLLOWUP] Corrupt records are not handled properly when creating a dataframe from a file ## What changes were proposed in this pull request? When the `requiredSchema` only contains `_corrupt_record`, the derived `actualSchema` is empty and the `_corrupt_record` are all null for all rows. This PR captures above situation and raise an exception with a reasonable workaround messag so that users can know what happened and how to fix the query. ## How was this patch tested? Added unit test in `CSVSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jmchung/spark SPARK-21610-FOLLOWUP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19199 ---- commit e703fc8f33d1fde90d790057481f1d23f466f378 Author: Jen-Ming Chung <jenmingi...@gmail.com> Date: 2017-09-12T06:48:33Z follow-up PR for CSV ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org