Hi hyqgod,

This is probably a better question for the spark user's list than the dev
list (cc'ing user and bcc'ing dev on this reply).

To answer your question, though:

Amazon's Public Datasets Page is a nice place to start:
http://aws.amazon.com/datasets/ - these work well with spark because
they're often stored on s3 (which spark can read from natively) and it's
very easy to spin up a spark cluster on EC2 to begin experimenting with the
data.

There's also a pretty good list of (mostly big) datasets that google has
released over the years here:
http://svonava.com/post/62186512058/datasets-released-by-google

- Evan

On Tue, Feb 25, 2014 at 6:33 PM, 黄远强 <hyq...@163.com> wrote:

> Hi all:
> I am a freshman in Spark community. i dream of being a expert in the field
> of big data.  But i have no idea where to start after i have gone through
> the published  documents in Spark website and examples in  Spark source
> code.  I want to know if there are some public data set in the internet
> that can be utilized  to learn Spark and test my some new ideas base on
> Spark.
>       Thanks a lot.
>
>
> ---------------------------
> Best regards
> hyqgod

Reply via email to