Hi all,

I'm glad to announce that the new Spark runner based on Spark structured streaming framework has been merged into master !

It is not based on RDD/DStream API. See https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

It is still experimental, its coverage of the Beam model is partial:

- the runner passes 95% of the validates runner tests in batch mode.

- It does not have support for streaming yet (waiting for the multi-aggregations support in spark StructuredStreaming framework from the Spark community)

- Runner can execute Nexmark : perfkit dashboards yet to come

- Some things are not wired up yet:

    - Beam Schemas not wired up

    - Optional features of the model not implemented:  state api, timer api, splittable doFn api, …

I will submit a PR to update the capability matrix in the coming days.

Best

Etienne


Reply via email to