Hi all,
I'm glad to announce that the new Spark runner based on Spark structured
streaming framework has been merged into master !
It is not based on RDD/DStream API. See
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
It is still experimental, its coverage of the Beam model is partial:
- the runner passes 95% of the validates runner tests in batch mode.
- It does not have support for streaming yet (waiting for the
multi-aggregations support in spark StructuredStreaming framework from
the Spark community)
- Runner can execute Nexmark : perfkit dashboards yet to come
- Some things are not wired up yet:
- Beam Schemas not wired up
- Optional features of the model not implemented: state api, timer
api, splittable doFn api, …
I will submit a PR to update the capability matrix in the coming days.
Best
Etienne