Re: Standalone Cluster vs YARN

2015-11-25 Thread Andreas Fritzler
Hi Welly,

you will need Zookeeper if you want to setup the standalone cluster in HA
mode.
http://spark.apache.org/docs/latest/spark-standalone.html#high-availability

In the YARN case you probably have already Zookeeper in place if you are
running YARN in HA mode.

Regards,
Andreas

On Wed, Nov 25, 2015 at 10:02 AM, Welly Tambunan  wrote:

> Hi Ufuk
>
> >In failure cases I find YARN more convenient, because it takes care of
> restarting failed task manager processes/containers for you.
>
> So this mean that we don't need zookeeper ?
>
>
> Cheers
>
> On Wed, Nov 25, 2015 at 3:46 PM, Ufuk Celebi  wrote:
>
>> > On 25 Nov 2015, at 02:35, Welly Tambunan  wrote:
>> >
>> > Hi All,
>> >
>> > I would like to know if there any feature differences between using
>> Standalone Cluster vs YARN ?
>> >
>> > Until now we are using Standalone cluster for our jobs.
>> > Is there any added value for using YARN ?
>> >
>> > We don't have any hadoop infrastructure in place right now but we can
>> provide that if there's some value to that.
>>
>> There are no features, which only work on YARN or in standalone clusters.
>> YARN mode is essentially starting a standalone cluster in YARN containers.
>>
>> In failure cases I find YARN more convenient, because it takes care of
>> restarting failed task manager processes/containers for you.
>>
>> – Ufuk
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Andreas Fritzler
Hi Welly,

If you want to use cassandra, you might want to look into having a Mesos
cluster with frameworks for cassandra and spark.

Regards,
Andreas

[1] http://spark.apache.org/docs/latest/running-on-mesos.html
[2] https://github.com/mesosphere/cassandra-mesos

On Wed, Nov 25, 2015 at 10:30 AM, Maximilian Michels  wrote:

> Hi Welly,
>
> > However YARN is still tightly couple to HDFS, is that seems wasteful to
> use only YARN without Hadoop ?
>
> I wouldn't say tightly coupled. You can use YARN without HDFS. To work
> with YARN properly, you would have to setup another distributed file
> system like xtreemfs. Or use the one provided with the AWS or Google
> Cloud Platform. You can tell Hadoop which file system to use by
> modifying "fs.default.name" in the Hadoop config.
>
> Cheers,
> Max
>
> On Wed, Nov 25, 2015 at 10:06 AM, Welly Tambunan 
> wrote:
> > Hi Fabian,
> >
> > Interesting !
> >
> > However YARN is still tightly couple to HDFS, is that seems wasteful to
> use
> > only YARN without Hadoop ?
> >
> > Currently we are using Cassandra and CFS ( cass file system )
> >
> >
> > Cheers
> >
> > On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske 
> wrote:
> >>
> >> A strong argument for YARN mode can be the isolation of multiple users
> and
> >> jobs. You can easily start a new Flink cluster for each job or user.
> >> However, this comes at the price of resource (memory) fragmentation.
> YARN
> >> mode does not use memory as effective as cluster mode.
> >>
> >> 2015-11-25 9:46 GMT+01:00 Ufuk Celebi :
> >>>
> >>> > On 25 Nov 2015, at 02:35, Welly Tambunan  wrote:
> >>> >
> >>> > Hi All,
> >>> >
> >>> > I would like to know if there any feature differences between using
> >>> > Standalone Cluster vs YARN ?
> >>> >
> >>> > Until now we are using Standalone cluster for our jobs.
> >>> > Is there any added value for using YARN ?
> >>> >
> >>> > We don't have any hadoop infrastructure in place right now but we can
> >>> > provide that if there's some value to that.
> >>>
> >>> There are no features, which only work on YARN or in standalone
> clusters.
> >>> YARN mode is essentially starting a standalone cluster in YARN
> containers.
> >>>
> >>> In failure cases I find YARN more convenient, because it takes care of
> >>> restarting failed task manager processes/containers for you.
> >>>
> >>> – Ufuk
> >>>
> >>
> >
> >
> >
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
>


Re: Flink test environment

2015-08-19 Thread Andreas Fritzler
Hi Hermann,

there is a docker-compose setup for Flink:
https://github.com/apache/flink/tree/master/flink-contrib/docker-flink

Regards,
Andreas

On Wed, Aug 19, 2015 at 3:11 PM, Hermann Azong hermann.az...@gmail.com
wrote:

 Hey Flinkers,

 for testing purposes on cluster, I would like to know if there is a
 virtual machine where flink allerady work as standalone or on yarn.
 Thank you in advance for answers!

 Cheers,
 Hermann