Re: Spark on AWS

2016-04-28 Thread Fatma Ozcan
Thanks for the responses.
Fatma
On Apr 28, 2016 3:00 PM, "Renato Perini" <renato.per...@gmail.com> wrote:

> I have setup a small development cluster using t2.micro machines and an
> Amazon Linux AMI (CentOS 6.x).
> The whole setup has been done manually, without using the provided
> scripts. The whole setup is composed of a total of 5 instances: the first
> machine has an elastic IP and it is used as a bridge to access the other 4
> machines (they don't have elastic IPs). The second machine runs a
> standalone single node Spark cluster (1 master, 1 worker). The other 3
> machines are configured as an Apache Cassandra cluster. I have tuned the
> JVM and lots of parameters. I do not use S3 nor HDFS, I just write data
> using Spark Streaming (from an Apache Flume sink) to the 3 Cassandra nodes,
> that I use for data retrieval. Data is then processed through regular Spark
> jobs.
> The jobs are submitted to the cluster using LinkedIn Azkaban, executing
> custom shell scripts written by me for wrapping the submitting process and
> handling eventual command line arguments, at scheduled intervals. Results
> are written directly to other Cassandra tables or in a specific folder on
> the filesystem using the regular CSV format.
> The system is completely autonomous and requires little to no manual
> administration.
>
> I'm quite satisfied with it, considering how small and limited the
> machines involved are. But it required lots of tuning work, because we are
> clearly under the recommended requirements. 4 of the 5 machines are
> switched off during the night, only the bridge machine is alive 24/7.
>
> 12$ per month in total.
>
> Renato Perini.
>
>
> Il 28/04/2016 23:39, Fatma Ozcan ha scritto:
>
>> What is your experience using Spark on AWS? Are you setting up your own
>> Spark cluster, and using HDFS? Or are you using Spark as a service from
>> AWS? In the latter case, what is your experience of using S3 directly,
>> without having HDFS in between?
>>
>> Thanks,
>> Fatma
>>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Spark on AWS

2016-04-28 Thread Fatma Ozcan
What is your experience using Spark on AWS? Are you setting up your own
Spark cluster, and using HDFS? Or are you using Spark as a service from
AWS? In the latter case, what is your experience of using S3 directly,
without having HDFS in between?

Thanks,
Fatma


SparkML pipelines and error recovery

2015-09-18 Thread Fatma Ozcan
Trying to understand how Spark ML pipelines work in case of failures. If I
have multiple transformers and one of them fails, will the lineage based
recovery of rdd's automatically kick in?

Thanks,
Fatma


Querying JSON in Spark SQL

2015-03-16 Thread Fatma Ozcan
Is there any documentation that explains how to query JSON documents using
SparkSQL?

Thanks,
Fatma