[ 
https://issues.apache.org/jira/browse/BEAM-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9451:
-------------------------------
    Description: 
Spark Structured Streaming runner supports Datasets that already have Schema 
information. This is used by Spark to optimize jobs (via Catalyst). This issue 
is to implement optimized translations of the transforms for the runner so we 
can benefit of the performance improvements internally done by Spark.

Notice that we also may need to map Beam's core internal representations like 
WindowedValue so we can have intermediary optimizations.

  was:Spark Structured Streaming runner supports Datasets that already have 
Schema information. This is used by Spark to optimize jobs (via Catalyst). This 
issue is to implement optimized transforms for the runner so we can benefit of 
the performance improvements internally done by Spark.


> Optimize translation when Schema information is available in Spark Structured 
> Streaming runner
> ----------------------------------------------------------------------------------------------
>
>                 Key: BEAM-9451
>                 URL: https://issues.apache.org/jira/browse/BEAM-9451
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark
>            Reporter: Ismaël Mejía
>            Priority: Major
>              Labels: structured-streaming
>
> Spark Structured Streaming runner supports Datasets that already have Schema 
> information. This is used by Spark to optimize jobs (via Catalyst). This 
> issue is to implement optimized translations of the transforms for the runner 
> so we can benefit of the performance improvements internally done by Spark.
> Notice that we also may need to map Beam's core internal representations like 
> WindowedValue so we can have intermediary optimizations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to