Github user petervandenabeele commented on the pull request:

    https://github.com/apache/spark/pull/3517#issuecomment-65846515
  
    More problematic (and sorry I had not seen that before) ... there already 
_is_ an example file named `people.txt` with a different format:
    
    ```
    $ spark git:(pv-docs-note-on-jsonFile-format/01) cat 
examples/src/main/resources/people.txt
    Michael, 29
    Andy, 30
    Justin, 19
    ```
    
    In that case, I could rename the example jsonFile to `people.jsons`. It is 
a weird name, but it's _reasonably_ accurate (following the `xs` pattern from 
Scala, as it is like a list of json objects).
    
    I would then indeed also need to change the name in all other locations 
where a reference to `people.json` is made (confirming the list mentioned by 
@marmbrus): 
    
    ```
    spark git:(pv-docs-note-on-jsonFile-format/01) grep -r 'people\.json' * | 
grep -v Binary | grep -v _site     
    examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java:    
String path = "examples/src/main/resources/people.json";
    examples/src/main/python/sql.py:    path = 
os.path.join(os.environ['SPARK_HOME'], 
"examples/src/main/resources/people.json")
    ```
    
    On a more fundamental note, from the outside, I would have perceived it 
following the "principle of least astonishment" (POLA) if the import to this 
function required a standard valid json file that needs to be formatted as an 
array of hashes with identical "schema", like e.g.
    
    ```
    [
      {"name": "Tom",
       "character":"cat"},
      {"name":"Jerry",
       "character":"mouse"}
    ]
    ```
    This would have allowed us to simply import data generated from any other 
language with `array.to_json`. 
    
    I hear the proposal from @marmbrus to also improve the error message (that 
would also have helped us in more quickly understanding the issue), but it 
would suggest to put that in a different JIRA issue (that needs some real 
programming and testing work).
    
    I look forward to directions on how to best fix at least the documentation 
to avoid this confusion for others.
    
    Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to