Github user petervandenabeele commented on the pull request: https://github.com/apache/spark/pull/3517#issuecomment-65846515 More problematic (and sorry I had not seen that before) ... there already _is_ an example file named `people.txt` with a different format: ``` $ spark git:(pv-docs-note-on-jsonFile-format/01) cat examples/src/main/resources/people.txt Michael, 29 Andy, 30 Justin, 19 ``` In that case, I could rename the example jsonFile to `people.jsons`. It is a weird name, but it's _reasonably_ accurate (following the `xs` pattern from Scala, as it is like a list of json objects). I would then indeed also need to change the name in all other locations where a reference to `people.json` is made (confirming the list mentioned by @marmbrus): ``` spark git:(pv-docs-note-on-jsonFile-format/01) grep -r 'people\.json' * | grep -v Binary | grep -v _site examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java: String path = "examples/src/main/resources/people.json"; examples/src/main/python/sql.py: path = os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json") ``` On a more fundamental note, from the outside, I would have perceived it following the "principle of least astonishment" (POLA) if the import to this function required a standard valid json file that needs to be formatted as an array of hashes with identical "schema", like e.g. ``` [ {"name": "Tom", "character":"cat"}, {"name":"Jerry", "character":"mouse"} ] ``` This would have allowed us to simply import data generated from any other language with `array.to_json`. I hear the proposal from @marmbrus to also improve the error message (that would also have helped us in more quickly understanding the issue), but it would suggest to put that in a different JIRA issue (that needs some real programming and testing work). I look forward to directions on how to best fix at least the documentation to avoid this confusion for others. Thanks.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org