[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs

yhuai Thu, 10 Jul 2014 19:27:07 -0700

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/1351#issuecomment-48688064
  
    For `csvFile` and `header=true`, if we are reading files in a directory, I 
think we need to figure out if all files have the header or not. 
    
    Also, when we are reading multiple files, what will we do if those files 
have different schemas (if `header=true`, some fields appear in some files but 
not all and if `header=false`, two files have different number of fields)? I 
think it makes sense to union those `StructType`s (generated based on those 
files with different schemas). It will be also helpful if we can let the user 
know that those files contain different schemas and we have generated the most 
general schema.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs

Reply via email to