[ https://issues.apache.org/jira/browse/SPARK-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496548#comment-14496548 ]
Shivaram Venkataraman commented on SPARK-6831: ---------------------------------------------- I think we should give an example of an external data source and how to use it in our programming guide as not everybody writes a `Linking` section in their README. We could use Avro as an example and just describe how to pass it in with `--jars` (BTW where do you get the JAR to pass it in like this ?) and say how to use `load` -- Just those two things should be enoug. While I know that the spark-packages page lists many connectors it is often hard to exactly figure out which package is a SQL data source or not (most often people ask me about Cassandra for example). So we could also add a table somewhere (like say the LIBSVM table of language APIs http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and have entries like `avro`, `https://github.com/databricks/spark-avro` etc. > Document how to use external data sources > ----------------------------------------- > > Key: SPARK-6831 > URL: https://issues.apache.org/jira/browse/SPARK-6831 > Project: Spark > Issue Type: Improvement > Components: PySpark, SparkR, SQL > Reporter: Shivaram Venkataraman > Priority: Critical > > We should include some instructions on how to use an external datasource for > users who are beginners. Do they need to install it on all the machines ? Or > just the master ? Are there are any special flags they need to pass to > `bin/spark-submit` etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org