Re: [gentoo-user] Re: Recommendations for scheduler

J. Roeleveld Tue, 05 Aug 2014 13:44:31 -0700

On 5 August 2014 21:57:56 CEST, James <wirel...@tampabay.rr.com> wrote:
>Joost Roeleveld <joost <at> antarean.org> writes:
>
>
>> > Mesos looks promising for a variety of (Apache) reasons. Some key
>> > technologies folks may want google about that are related:
>> > 
>> > Quincy (fair schedular)
>> > Chronos (scheduler)
>> > Hadoop (scheduler)
>> 
>> Hadoop not a scheduler. It's a framework for a Big Data clustered   
>> database.
>
>> > HDFS (clusterd file system)
>> Unless it's changed recently, not suitable for anything else then
>Hadoop 
>> and  contains a single point of failure.
>
>I'm curious as to more information about this 'single point of failure.
>Can
>you be more specific or provides links?
>
>On this resource: 
>
>http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>JournalNode machines talks about surviving faults:
>
>"increase the number of failures the system can tolerate, you should
>run an
>odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N
>JournalNodes, the system can tolerate at most (N - 1) / 2 failures and
>continue to function normally. "


Just read that part. Looks like they solved it partly since 2.2.
The problem lies with the NameNodes.
Prior to 2.2, you only had 1. If that one dies, you loose the entire cluster. 
If that one is unrecoverable, you loose all the data.

After 2.2, you can configure a standby NameNode. However, it still requires 
manual restart.

Considering that Hadoop is most often running on old machines, chances for 
hardware failure are higher when compared with clusters using newer hardware.

I'm not sure how other cluster FSs deal with this, but I consider it a design 
flaw if the disappearance of a single machine in a 100+ node cluster dies, the 
entire cluster ends up in a broken state.
It's like running a single Raid5 with 100+ drives.
Anyone stupid enough to do that deserves to loose their data.

>> > http://gpo.zugaina.org/sys-cluster/apache-hadoop-common
>> > 
>> > Zookeeper (Fault tolerance)
>> > SPARK ( optimized for interative jobs where a datase is resued in
>many
>> > parallel operations (advanced math/science and many other apps.)
>> > https://spark.apache.org/
>> > 
>> > Dryad  Torque   Mpiche2 MPI
>> > Globus tookit
>> > 
>> > mesos_tech_report.pdf
>> > 
>> > It looks as though Amazon, google, facebook and many others
>> > large in the Cluster/Cloud arena are using Mesos......?
>> > 
>> > So let's all post what we find, particularly in overlays.
>> 
>> Unless you are dealing with Big Data projects, like Google, Facebook,
>Amazon,  big banks,... you don't have much use for those projects.
>
>Many scientific applications are using the cluster (cloud) or big data
>approach to all sorts of problems. Furthermore, as GPU and the new
>Arm systems with dozens and dozens of cpu cores inside one computer
>become
>readily available, the cluster-cloud (big data) approach will become
>much
>more pervasive in the next few years, imho.
>
>http://blog.rescale.com/reservoir-simulation-moves-to-the-cloud/
>
>There are thousands of small companies needing reservoir simulation,
>not to 
>mention the millions of folks working on carbon sequestration.....
>Anything to do with Biological or Chemical Science is using or moving
>to the Cloud-Clustered world. For me, a Cluster is just a cloud
>internally
>managee, rather than outsourcing it to others; ymmv.

My apologies. I forgot the scientific research here. But that was mostly 
because they have been dealing with really large datasets and corresponding 
large compute clusters for decades.

The term Big Data is generally applied to financial and social media data.

>> Mesos looks like a nice project, just like Hadoop and related are
>also 
>> nice. But for most people, they are as usefull as using Exalytics.
>
>I'm not excited about an Oracle solution to anything. Many of the folks
>I know consult on moving technologies away from Oracle's spear of
>influence,
>not limited to mysql; ymmv. I know of one very large communications
>company
>that went broke and had to merge because of those ridiculous Oracle
>fees.
>Caveat Emptor; long live Postresql.  

I'd be interested in the name of that company. Even offlist.

And I definitely agree. PostgreSQL is often a valid alternative. Unfortunately, 
it is rarely possible to use it as a back end to enterprise software as these 
are all designed to be used with databases from the usual suspects (Oracle, 
IBM, Microsoft, ....)

Same goes for OSS projects. The developers are often unable to properly code 
the SQL layer and end up simply using MySQL and its broken SQL implementation.

>> A scheduler should not have a large set of dependencies that you
>wouldn't
>> use otherwise. That makes Chronos a non-option to me.
>
>Those other technologies are often useful to folks who would be
>attracted to
>something like chronos.

If you already use Mesos, using Chronos makes sense.
If you're only interested in a scheduler, installing Mesos just to use Chronos 
doesn't make sense.

>> Martin's project looks promising, but doesn't store the schedules 
>> internally. For repeating schedules, like what Alan was describing,
>you 
>> need to put those into scripts and start those from an existing cron.
>> Of the 2, I think improving Martin's project is the most likely
>option 
>> for me as it doesn't have additional dependencies and seems to be 
>> easily implemented.
>> Joost
>
>Understood.
>Like others, I'll be curious to follow what develops out of Martin's
>work.

I believe Martin's scheduler will be very valuable. Even for me.
I am very likely going to start using this for some of my regular maintenance 
activities on the home network.

But as the rest of the thread shows, I wouldn't be able to use it as a 
scheduler for large projects where the schedules can get very complex very 
quickly.

The type of scheduler needed for these requires a different approach, which 
would be overkill for the home network environment where Martin's excels. 

>For me Chronos, Mesos and the other aforementioned technologies look to
>be
>more viable; particularly if one is preparing for a clustered world
>with
>CPUs, GPUs, SoCs and Arm machines distributed about the ethernet  as
>resources to be scheduled and utilized in a variety of schema. It's the
>quest for one-infrastructure to solve many problems where scenarios
>compete. 

I fully agree, see my comment above where I state Chronos makes sense when 
Mesos does as well.

>Big data is not the only reason for cloud-clusters. Theoretically,
>(Clustered) systems can have a far greater resource utilization of
>networked
>resources than traditional (distributed) approaches. I grant you that
>this
>is a work in progress, but I personally know of dozens of
>mathematically
>complex distributed systems that are  migrating to the clustered
>approach
>rather than something custom or traditionally distributed.

I still remember running seti@home and similar programs in the past. Those were 
large clusters, but with a very badly designed network.

There is a use-case for large well integrated clusters, loosely coupled 
clusters and big machines.

Here is the difference between horizontal (many machines) and vertical (1 
really big machine) clustering.
The vertical only has clustering between different processes. 

>Granted, Cloud <--> Clustered <--> Distributed are all overlaping
>approaches
>to big problems. I do appreciate the candor of this thread.

They are. It started with distributed computing in a lab, then moved onto the 
internet.
Then people started to build a mini internet with a lot of old computers and 
Clusters were born.
Then that ended up back on the internet with clusters being made accessible 
online. And this is what is considered to be " The Cloud". 
If you take the general definition of The Cloud, which is along the lines off: 
"being able to access your data anywhere using any device", running your own 
server and being able to access the data on there from anywhere with internet 
access using your laptop, smartphone, tablet,.... then you are using the cloud. 

If anyone is actually planning to implement Mesos and Chronos on Gentoo, I 
would be interested in joining the effort as it does sound like fun. I just 
don't have the time to do a lot of work on that at the moment.

--
Joost


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [gentoo-user] Re: Recommendations for scheduler

Reply via email to