[GitHub] [spark] HyukjinKwon edited a comment on pull request #29063: [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a different encoding

GitBox Sat, 11 Jul 2020 08:43:07 -0700


HyukjinKwon edited a comment on pull request #29063:
URL: https://github.com/apache/spark/pull/29063#issuecomment-657081547



   Ah, it does. We should better change spark-xml to use SQL source instead of 
RDD APIs during schema inference but I think that would be difficult because 
the record separator in this case is newline in CSV and JSON but spark-xml is 
dependent on the custom Hadoop input format ...
   
   In most cases it wouldn't be a big deal so I guess it's fine to don't change 
at this moment unless any major issue is found.
   I think we could solve this issue together once we migrate from DSv1 to DSv2 
in Spark XML .. but I guess it's a bit far future ..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #29063: [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a different encoding

Reply via email to