[ 
https://issues.apache.org/jira/browse/SPARK-25937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674665#comment-16674665
 ] 

Jungtaek Lim commented on SPARK-25937:
--------------------------------------

Another thought for my side: maybe we can classify various formats into two, 
which one is applied to whole file, whereas another one is applied to each 
line/record. Once we classify them, formats which can be applied to Kafka will 
be latter case, then we could address them as like JSON function (from_json / 
to_json). 

After adding them as functions, they can be used widely and don't require data 
source to be aware of data format. (If we want to apply pushdown to data 
source, we may want to let data source be aware of data format.)

> Support user-defined schema in Kafka Source & Sink
> --------------------------------------------------
>
>                 Key: SPARK-25937
>                 URL: https://issues.apache.org/jira/browse/SPARK-25937
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Jackey Lee
>            Priority: Major
>
>     Kafka Source & Sink is widely used in Spark and has the highest frequency 
> in streaming production environment. But at present, both Kafka Source and 
> Link use the fixed schema, which force user to do data conversion when 
> reading and writing Kafka. So why not we use fileformat to do this just like 
> hive?
>     Flink has implemented Kafka's Json/Csv/Avro extended Source & Sink, we 
> can also support it in Spark.
> *Main Goals:*
> 1. Provide a Source and Sink that support user defined Schema. Users can read 
> and write Kafka directly in the program without additional data conversion.
> 2. Provides read-write mechanism based on FileFormat. User's data conversion 
> is similar to FileFormat's read and write process, we can provide a mechanism 
> similar to FileFormat, which provide common read-write format conversion. It 
> also allow users to customize format conversion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to