[ 
https://issues.apache.org/jira/browse/SPARK-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496548#comment-14496548
 ] 

Shivaram Venkataraman commented on SPARK-6831:
----------------------------------------------

I think we should give an example of an external data source and how to use it 
in our programming guide as not everybody writes a `Linking` section in their 
README. We could use Avro as an example and just describe how to pass it in 
with `--jars` (BTW where do you get the JAR to pass it in like this ?) and say 
how to use `load` -- Just those two things should be enoug.

While I know that the spark-packages page lists many connectors it is often 
hard to exactly figure out which package is a SQL data source or not (most 
often people ask me about Cassandra for example). So we could also add a table 
somewhere (like say the LIBSVM table of language APIs 
http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and have entries like `avro`, 
`https://github.com/databricks/spark-avro` etc.

> Document how to use external data sources
> -----------------------------------------
>
>                 Key: SPARK-6831
>                 URL: https://issues.apache.org/jira/browse/SPARK-6831
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SparkR, SQL
>            Reporter: Shivaram Venkataraman
>            Priority: Critical
>
> We should include some instructions on how to use an external datasource for 
> users who are beginners. Do they need to install it on all the machines ? Or 
> just the master ? Are there are any special flags they need to pass to 
> `bin/spark-submit` etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to