Re: Standalone Cluster vs YARN

2015-11-25 Thread Andreas Fritzler
Hi Welly,

you will need Zookeeper if you want to setup the standalone cluster in HA
mode.
http://spark.apache.org/docs/latest/spark-standalone.html#high-availability

In the YARN case you probably have already Zookeeper in place if you are
running YARN in HA mode.

Regards,
Andreas

On Wed, Nov 25, 2015 at 10:02 AM, Welly Tambunan <if05...@gmail.com> wrote:

> Hi Ufuk
>
> >In failure cases I find YARN more convenient, because it takes care of
> restarting failed task manager processes/containers for you.
>
> So this mean that we don't need zookeeper ?
>
>
> Cheers
>
> On Wed, Nov 25, 2015 at 3:46 PM, Ufuk Celebi <u...@apache.org> wrote:
>
>> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > I would like to know if there any feature differences between using
>> Standalone Cluster vs YARN ?
>> >
>> > Until now we are using Standalone cluster for our jobs.
>> > Is there any added value for using YARN ?
>> >
>> > We don't have any hadoop infrastructure in place right now but we can
>> provide that if there's some value to that.
>>
>> There are no features, which only work on YARN or in standalone clusters.
>> YARN mode is essentially starting a standalone cluster in YARN containers.
>>
>> In failure cases I find YARN more convenient, because it takes care of
>> restarting failed task manager processes/containers for you.
>>
>> – Ufuk
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Till Rohrmann
Hi Welly,

at the moment Flink only supports HA via ZooKeeper. However, there is no
limitation to use another system. The only requirement is that this system
allows you to find a consensus among multiple participants and to retrieve
the community decision. If this is possible, then it can be integrated into
Flink to serve as an alternative HA backend.

Cheers,
Till

On Wed, Nov 25, 2015 at 10:30 AM, Maximilian Michels <m...@apache.org> wrote:

> Hi Welly,
>
> > However YARN is still tightly couple to HDFS, is that seems wasteful to
> use only YARN without Hadoop ?
>
> I wouldn't say tightly coupled. You can use YARN without HDFS. To work
> with YARN properly, you would have to setup another distributed file
> system like xtreemfs. Or use the one provided with the AWS or Google
> Cloud Platform. You can tell Hadoop which file system to use by
> modifying "fs.default.name" in the Hadoop config.
>
> Cheers,
> Max
>
> On Wed, Nov 25, 2015 at 10:06 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
> > Hi Fabian,
> >
> > Interesting !
> >
> > However YARN is still tightly couple to HDFS, is that seems wasteful to
> use
> > only YARN without Hadoop ?
> >
> > Currently we are using Cassandra and CFS ( cass file system )
> >
> >
> > Cheers
> >
> > On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske <fhue...@gmail.com>
> wrote:
> >>
> >> A strong argument for YARN mode can be the isolation of multiple users
> and
> >> jobs. You can easily start a new Flink cluster for each job or user.
> >> However, this comes at the price of resource (memory) fragmentation.
> YARN
> >> mode does not use memory as effective as cluster mode.
> >>
> >> 2015-11-25 9:46 GMT+01:00 Ufuk Celebi <u...@apache.org>:
> >>>
> >>> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
> >>> >
> >>> > Hi All,
> >>> >
> >>> > I would like to know if there any feature differences between using
> >>> > Standalone Cluster vs YARN ?
> >>> >
> >>> > Until now we are using Standalone cluster for our jobs.
> >>> > Is there any added value for using YARN ?
> >>> >
> >>> > We don't have any hadoop infrastructure in place right now but we can
> >>> > provide that if there's some value to that.
> >>>
> >>> There are no features, which only work on YARN or in standalone
> clusters.
> >>> YARN mode is essentially starting a standalone cluster in YARN
> containers.
> >>>
> >>> In failure cases I find YARN more convenient, because it takes care of
> >>> restarting failed task manager processes/containers for you.
> >>>
> >>> – Ufuk
> >>>
> >>
> >
> >
> >
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Fabian Hueske
YARN is not a replacement for Zookeeper. Zookeeper is mandatory to run
Flink in high-availability mode and takes care of leader (JobManager)
election and meta-data persistance.

With YARN, Flink can automatically start new Taskmanagers (and JobManagers)
to compensate for failures. In cluster mode, you need stand-by TMs and JMs
and manually take care that these are "filled-up" again in case of a
failure.

2015-11-25 10:06 GMT+01:00 Welly Tambunan <if05...@gmail.com>:

> Hi Fabian,
>
> Interesting !
>
> However YARN is still tightly couple to HDFS, is that seems wasteful to
> use only YARN without Hadoop ?
>
> Currently we are using Cassandra and CFS ( cass file system )
>
>
> Cheers
>
> On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> A strong argument for YARN mode can be the isolation of multiple users
>> and jobs. You can easily start a new Flink cluster for each job or user.
>> However, this comes at the price of resource (memory) fragmentation. YARN
>> mode does not use memory as effective as cluster mode.
>>
>> 2015-11-25 9:46 GMT+01:00 Ufuk Celebi <u...@apache.org>:
>>
>>> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > I would like to know if there any feature differences between using
>>> Standalone Cluster vs YARN ?
>>> >
>>> > Until now we are using Standalone cluster for our jobs.
>>> > Is there any added value for using YARN ?
>>> >
>>> > We don't have any hadoop infrastructure in place right now but we can
>>> provide that if there's some value to that.
>>>
>>> There are no features, which only work on YARN or in standalone
>>> clusters. YARN mode is essentially starting a standalone cluster in YARN
>>> containers.
>>>
>>> In failure cases I find YARN more convenient, because it takes care of
>>> restarting failed task manager processes/containers for you.
>>>
>>> – Ufuk
>>>
>>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Andreas Fritzler
Hi Welly,

If you want to use cassandra, you might want to look into having a Mesos
cluster with frameworks for cassandra and spark.

Regards,
Andreas

[1] http://spark.apache.org/docs/latest/running-on-mesos.html
[2] https://github.com/mesosphere/cassandra-mesos

On Wed, Nov 25, 2015 at 10:30 AM, Maximilian Michels <m...@apache.org> wrote:

> Hi Welly,
>
> > However YARN is still tightly couple to HDFS, is that seems wasteful to
> use only YARN without Hadoop ?
>
> I wouldn't say tightly coupled. You can use YARN without HDFS. To work
> with YARN properly, you would have to setup another distributed file
> system like xtreemfs. Or use the one provided with the AWS or Google
> Cloud Platform. You can tell Hadoop which file system to use by
> modifying "fs.default.name" in the Hadoop config.
>
> Cheers,
> Max
>
> On Wed, Nov 25, 2015 at 10:06 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
> > Hi Fabian,
> >
> > Interesting !
> >
> > However YARN is still tightly couple to HDFS, is that seems wasteful to
> use
> > only YARN without Hadoop ?
> >
> > Currently we are using Cassandra and CFS ( cass file system )
> >
> >
> > Cheers
> >
> > On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske <fhue...@gmail.com>
> wrote:
> >>
> >> A strong argument for YARN mode can be the isolation of multiple users
> and
> >> jobs. You can easily start a new Flink cluster for each job or user.
> >> However, this comes at the price of resource (memory) fragmentation.
> YARN
> >> mode does not use memory as effective as cluster mode.
> >>
> >> 2015-11-25 9:46 GMT+01:00 Ufuk Celebi <u...@apache.org>:
> >>>
> >>> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
> >>> >
> >>> > Hi All,
> >>> >
> >>> > I would like to know if there any feature differences between using
> >>> > Standalone Cluster vs YARN ?
> >>> >
> >>> > Until now we are using Standalone cluster for our jobs.
> >>> > Is there any added value for using YARN ?
> >>> >
> >>> > We don't have any hadoop infrastructure in place right now but we can
> >>> > provide that if there's some value to that.
> >>>
> >>> There are no features, which only work on YARN or in standalone
> clusters.
> >>> YARN mode is essentially starting a standalone cluster in YARN
> containers.
> >>>
> >>> In failure cases I find YARN more convenient, because it takes care of
> >>> restarting failed task manager processes/containers for you.
> >>>
> >>> – Ufuk
> >>>
> >>
> >
> >
> >
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Ufuk

>In failure cases I find YARN more convenient, because it takes care of
restarting failed task manager processes/containers for you.

So this mean that we don't need zookeeper ?


Cheers

On Wed, Nov 25, 2015 at 3:46 PM, Ufuk Celebi <u...@apache.org> wrote:

> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I would like to know if there any feature differences between using
> Standalone Cluster vs YARN ?
> >
> > Until now we are using Standalone cluster for our jobs.
> > Is there any added value for using YARN ?
> >
> > We don't have any hadoop infrastructure in place right now but we can
> provide that if there's some value to that.
>
> There are no features, which only work on YARN or in standalone clusters.
> YARN mode is essentially starting a standalone cluster in YARN containers.
>
> In failure cases I find YARN more convenient, because it takes care of
> restarting failed task manager processes/containers for you.
>
> – Ufuk
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Fabian,

Interesting !

However YARN is still tightly couple to HDFS, is that seems wasteful to use
only YARN without Hadoop ?

Currently we are using Cassandra and CFS ( cass file system )


Cheers

On Wed, Nov 25, 2015 at 3:51 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> A strong argument for YARN mode can be the isolation of multiple users and
> jobs. You can easily start a new Flink cluster for each job or user.
> However, this comes at the price of resource (memory) fragmentation. YARN
> mode does not use memory as effective as cluster mode.
>
> 2015-11-25 9:46 GMT+01:00 Ufuk Celebi <u...@apache.org>:
>
>> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > I would like to know if there any feature differences between using
>> Standalone Cluster vs YARN ?
>> >
>> > Until now we are using Standalone cluster for our jobs.
>> > Is there any added value for using YARN ?
>> >
>> > We don't have any hadoop infrastructure in place right now but we can
>> provide that if there's some value to that.
>>
>> There are no features, which only work on YARN or in standalone clusters.
>> YARN mode is essentially starting a standalone cluster in YARN containers.
>>
>> In failure cases I find YARN more convenient, because it takes care of
>> restarting failed task manager processes/containers for you.
>>
>> – Ufuk
>>
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Welly Tambunan
Hi Andreas,

Yes, seems I can't avoid Zookeeper right now. It would be really nice if we
can achieve HA via gossip protocol like Cassandra/Spark DSE does ?

Is this possible ?


Cheers

On Wed, Nov 25, 2015 at 4:12 PM, Andreas Fritzler <
andreas.fritz...@gmail.com> wrote:

> Hi Welly,
>
> you will need Zookeeper if you want to setup the standalone cluster in HA
> mode.
> http://spark.apache.org/docs/latest/spark-standalone.html#high-availability
>
> In the YARN case you probably have already Zookeeper in place if you are
> running YARN in HA mode.
>
> Regards,
> Andreas
>
> On Wed, Nov 25, 2015 at 10:02 AM, Welly Tambunan <if05...@gmail.com>
> wrote:
>
>> Hi Ufuk
>>
>> >In failure cases I find YARN more convenient, because it takes care of
>> restarting failed task manager processes/containers for you.
>>
>> So this mean that we don't need zookeeper ?
>>
>>
>> Cheers
>>
>> On Wed, Nov 25, 2015 at 3:46 PM, Ufuk Celebi <u...@apache.org> wrote:
>>
>>> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > I would like to know if there any feature differences between using
>>> Standalone Cluster vs YARN ?
>>> >
>>> > Until now we are using Standalone cluster for our jobs.
>>> > Is there any added value for using YARN ?
>>> >
>>> > We don't have any hadoop infrastructure in place right now but we can
>>> provide that if there's some value to that.
>>>
>>> There are no features, which only work on YARN or in standalone
>>> clusters. YARN mode is essentially starting a standalone cluster in YARN
>>> containers.
>>>
>>> In failure cases I find YARN more convenient, because it takes care of
>>> restarting failed task manager processes/containers for you.
>>>
>>> – Ufuk
>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Fabian Hueske
A strong argument for YARN mode can be the isolation of multiple users and
jobs. You can easily start a new Flink cluster for each job or user.
However, this comes at the price of resource (memory) fragmentation. YARN
mode does not use memory as effective as cluster mode.

2015-11-25 9:46 GMT+01:00 Ufuk Celebi <u...@apache.org>:

> > On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I would like to know if there any feature differences between using
> Standalone Cluster vs YARN ?
> >
> > Until now we are using Standalone cluster for our jobs.
> > Is there any added value for using YARN ?
> >
> > We don't have any hadoop infrastructure in place right now but we can
> provide that if there's some value to that.
>
> There are no features, which only work on YARN or in standalone clusters.
> YARN mode is essentially starting a standalone cluster in YARN containers.
>
> In failure cases I find YARN more convenient, because it takes care of
> restarting failed task manager processes/containers for you.
>
> – Ufuk
>
>


Re: Standalone Cluster vs YARN

2015-11-25 Thread Ufuk Celebi
> On 25 Nov 2015, at 02:35, Welly Tambunan <if05...@gmail.com> wrote:
> 
> Hi All, 
> 
> I would like to know if there any feature differences between using 
> Standalone Cluster vs YARN ?
> 
> Until now we are using Standalone cluster for our jobs. 
> Is there any added value for using YARN ?
> 
> We don't have any hadoop infrastructure in place right now but we can provide 
> that if there's some value to that. 

There are no features, which only work on YARN or in standalone clusters. YARN 
mode is essentially starting a standalone cluster in YARN containers.

In failure cases I find YARN more convenient, because it takes care of 
restarting failed task manager processes/containers for you.

– Ufuk



Standalone Cluster vs YARN

2015-11-24 Thread Welly Tambunan
Hi All,

I would like to know if there any feature differences between using
Standalone Cluster vs YARN ?

Until now we are using Standalone cluster for our jobs.
Is there any added value for using YARN ?

We don't have any hadoop infrastructure in place right now but we can
provide that if there's some value to that.


Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>