Maxim Gekk created SPARK-25393:
----------------------------------

             Summary: Parsing CSV strings in a column
                 Key: SPARK-25393
                 URL: https://issues.apache.org/jira/browse/SPARK-25393
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Maxim Gekk


There are use cases when content in CSV format is dumped into an external 
storage as one of columns. For example, CSV records are stored together with 
other meta-info to Kafka. Current Spark API doesn't allow to parse such columns 
directly. The existing method 
[csv()|https://github.com/apache/spark/blob/e754887182304ad0d622754e33192ebcdd515965/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L487]
 requires a dataset with one string column. The API is inconvenient in parsing 
CSV column in dataset with many columns. The ticket aims to add new function 
similar to 
[from_json()|https://github.com/apache/spark/blob/d749d034a80f528932f613ac97f13cfb99acd207/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3456]
 with the following signatures in Scala:
{code:scala}
def from_csv(e: Column, schema: StructType, options: Map[String, String]): 
Column
{code}
and for using from Python, R and Java:
{code:scala}
def from_csv(e: Column, schema: String, options: java.util.Map[String, 
String]): Column
{code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to