[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

MaxGekk Thu, 13 Sep 2018 02:51:58 -0700

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22379
  
    > Out of curiosity, is this one related with an actual usecase Maxim? or is 
this proposed for API consistency?
    
    This is actual use case when users received CSV content dumped from another 
DBs and stored as one of columns (in Kafka for example). When they read the 
data back by Spark, they need to parse strings in the columns somehow. Usually 
they do that manually by using string column functions which is error prone 
especially in the cases of quoted values. In general you can extract the column 
and convert it to `Dataset[String]`, and use `def csv(csvDataset: 
Dataset[String]): DataFrame` for parsing, but joining the resulted dataframe 
with original one is inconvenient and just slow down execution.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

Reply via email to