[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide

tgravescs Fri, 17 Aug 2018 10:38:36 -0700

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22121#discussion_r210981586
  
    --- Diff: docs/avro-data-source-guide.md ---
    @@ -0,0 +1,267 @@
    +---
    +layout: global
    +title: Avro Data Source Guide
    +---
    +
    +Since Spark 2.4 release, [Spark 
SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides 
support for reading and writing Avro data.
    +
    +## Deploying
    +The <code>spark-avro</code> module is external and not included in 
`spark-submit` or `spark-shell` by default.
    +
    +As with any Spark applications, `spark-submit` is used to launch your 
application. `spark-avro_{{site.SCALA_BINARY_VERSION}}`
    +and its dependencies can be directly added to `spark-submit` using 
`--packages`, such as,
    --- End diff --
    
    ok.  When I see a deploying section I would expect it to tell me what my 
options are so perhaps just rephrasing to more indicate --packages is one way 
to do it. 
    
      It would be nice to at least have a general statement saying the external 
modules aren't including with spark by default, the user must include the 
necessary jars themselves. The way to do this will be deployment specific. One 
way of doing this is via the --packages option.   
    
     Note I think the structured-streaming-kafka section should ideally be 
updated to something similar as well.  And really any external module for that 
matter. It would be nice to tell users how they can include these without 
assuming they just know how to.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide

Reply via email to