Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22121#discussion_r210981586 --- Diff: docs/avro-data-source-guide.md --- @@ -0,0 +1,267 @@ +--- +layout: global +title: Avro Data Source Guide +--- + +Since Spark 2.4 release, [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides support for reading and writing Avro data. + +## Deploying +The <code>spark-avro</code> module is external and not included in `spark-submit` or `spark-shell` by default. + +As with any Spark applications, `spark-submit` is used to launch your application. `spark-avro_{{site.SCALA_BINARY_VERSION}}` +and its dependencies can be directly added to `spark-submit` using `--packages`, such as, --- End diff -- ok. When I see a deploying section I would expect it to tell me what my options are so perhaps just rephrasing to more indicate --packages is one way to do it. It would be nice to at least have a general statement saying the external modules aren't including with spark by default, the user must include the necessary jars themselves. The way to do this will be deployment specific. One way of doing this is via the --packages option. Note I think the structured-streaming-kafka section should ideally be updated to something similar as well. And really any external module for that matter. It would be nice to tell users how they can include these without assuming they just know how to.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org