[ 
https://issues.apache.org/jira/browse/SPARK-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212049#comment-14212049
 ] 

Andrew Ash commented on SPARK-748:
----------------------------------

I agree this would be valuable -- almost like a "Spark Cookbook" of how to read 
and write data from various other systems.  Step one is probably deciding what 
software to mention.

Tentatively I propose:

Spark Core
- HDFS
- HBase
- Cassandra
- Elasticsearch
- JDBC, with examples for Postgres and MySQL
- General Hadoop InputFormat

Spark Streaming
- Kafka
- Flume
- Storm

For destination, this could go on the documentation included in the git repo 
and published to the Spark website, or on the Spark project wiki.  I tend to 
prefer the former.  A possible location for that could be 
http://spark.apache.org/docs/latest/programming-guide.html#external-datasets

> Add documentation page describing interoperability with other software (e.g. 
> HBase, JDBC, Kafka, etc.)
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-748
>                 URL: https://issues.apache.org/jira/browse/SPARK-748
>             Project: Spark
>          Issue Type: New Feature
>          Components: Documentation
>            Reporter: Josh Rosen
>
> Spark seems to be gaining a lot of data input / output features for 
> integrating with systems like HBase, Kafka, JDBC, Hadoop, etc.
> It might be a good idea to create a single documentation page that provides a 
> list of all of the data sources that Spark supports and links to the relevant 
> documentation / examples / {{spark-users}} threads.  This would help 
> prospective users to evaluate how easy it will be to integrate Spark with 
> their existing systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to