[ 
https://issues.apache.org/jira/browse/SPARK-47717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47717:
----------------------------------
    Fix Version/s:     (was: 3.3.2)
                       (was: 3.4.1)
                       (was: 3.5.1)

> Support Hive tables as a streaming source and sink
> --------------------------------------------------
>
>                 Key: SPARK-47717
>                 URL: https://issues.apache.org/jira/browse/SPARK-47717
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.3.2, 3.4.1, 3.5.1
>            Reporter: Adi Suresh
>            Priority: Major
>
> People have data stored in Hive tables. Currently these tables do not support 
> Spark streaming, so customers do not have a good way to natively stream this 
> data in Spark. The current solutions involve an intermediary to track which 
> data has been read and periodically execute batch jobs. This use case should 
> be supported by Spark's in-built streaming mechanism.
>  
> From doing some research, Hive supports streaming 
> [https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2] 
> but Spark does not support streaming on tables in Hive format. I don't think 
> it makes sense to start copying Hive server-side code into Spark, but we 
> could copy the relevant logic and wrap it in the DataSourceV2 APIs to enable 
> this feature. To not break backwards compatibility, we would probably want to 
> gate this behind a new Spark property.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to