[ https://issues.apache.org/jira/browse/BEAM-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052270#comment-17052270 ]
Ismaël Mejía commented on BEAM-9451: ------------------------------------ Ongoing exploratory WIP for the interested https://github.com/iemejia/beam/tree/BEAM-9451-spark-structured-streaming-schema-translation > Optimize translation when Schema information is available in Spark Structured > Streaming runner > ---------------------------------------------------------------------------------------------- > > Key: BEAM-9451 > URL: https://issues.apache.org/jira/browse/BEAM-9451 > Project: Beam > Issue Type: Improvement > Components: runner-spark > Reporter: Ismaël Mejía > Priority: Major > Labels: structured-streaming > > Spark Structured Streaming runner supports Datasets that already have Schema > information. This is used by Spark to optimize jobs (via Catalyst). This > issue is to implement optimized translations of the transforms for the runner > so we can benefit of the performance improvements internally done by Spark. > Notice that we also may need to map Beam's core internal representations like > WindowedValue so we can have intermediary optimizations. -- This message was sent by Atlassian Jira (v8.3.4#803005)