[GitHub] spark pull request: [SPARK-3594] [PySpark] [SQL] take more rows to...

nchammas Thu, 09 Oct 2014 14:23:31 -0700

Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/2716#issuecomment-58579969
  
    > Most of code in JsonRDD are inferring types and converting objects into 
inferred types, these two can only be done in Python
    
    Ah yeah, I'm assuming that automatically converting objects to the inferred 
type is a good thing. When a user asks for the schema to be inferred, automatic 
type conversion should be expected behavior.
    
    I guess my perspective is that inferring schema is most useful when the 
schema is not consistent or known in advance. It should not just be a 
convenient way to get the schema for a dataset that has a consistent schema, 
though it can serve that purpose too.
    
    > If user really want the behavior that JsonRDD provided, it's easy to call
    `sqlContext.jsonRDD(rdd.map(json.dumps))`
    Do these seem reasonable?
    
    Definitely. That's precisely the workaround I suggested in 
[SPARK-2870](https://issues.apache.org/jira/browse/SPARK-2870), though it has 
the obvious JSON ser/de performance cost and seems "hacky".



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3594] [PySpark] [SQL] take more rows to...

Reply via email to