J. Roeleveld <joost <at> antarean.org> writes:

> Out of curiosity, what do you want to simulate?

subsurface flows in porous medium. AKA carbon sequestration
by injection wells. You know, provide proof that those
that remove hydrocarbons and actuall put the CO2 back
and significantly mitigate the effects of their ventures.

It's like this. I have been stuggling with my 17 year old "genius"
son who is a year away from entering medical school, with
learning responsibility. So I got him a hyperactive, highly
intelligent (mix-doberman) puppy to nurture, raise, train, love
and be resonsible for. It's one genious pup, teaching another
pup about being responsible.

So goes the earl_bidness.......imho.



 
> > Many folks are recommending to skip Hadoop/HDFS all  together

> I agree, Hadoop/HDFS is for data analysis. Like building a profile 
> about people based on the information companies like Facebook,  
> Google, NSA, Walmart, Governments, Banks,.... collect about their 
> customers/users/citizens/slaves/....

> > and go straight to mesos/spark. RDD (in-memory)  cluster
> > calculations are at the heart of my needs. The opposite end of the
> > spectrum, loads of small files and small apps; I dunno about, but, I'm all
> > ears.
> > In the end, my (3) node scientific cluster will morph and support
> > the typical myriad  of networked applications, but I can take
> > a few years to figure that out, or just copy what smart guys like
> > you and joost do.....
>  
> Nope, I'm simply following what you do and provide suggestions where I can.
> Most of the clusters and distributed computing stuff I do is based on 
> adding machines to distribute the load. But the mechanisms for these are >
implemented in the applications I work with, not what I design underneath.

> The filesystems I am interested in are different to the ones you want.

Maybe. I do not know what I want yet. My vision is very light weight 
workstations running lxqt (small memory footprint) or such, and a bad_arse
cluster for the heavy lifting running on whatever heterogenous resoruces I
have. From what I've read, the cluster and the file systems are all
redundant that the cluster level (mesos/spark anyway) regardless of one any
give processor/system is doing. All of Alans fantasies (needs) can be
realized once the cluster stuff is master. (chronos, ansible etc etc).

> I need to provided access to software installation files to a VM server 
> and access to documentation which is created by the users. The 
> VM server is physically next to what I already mentioned as server A.  
> Access to the VM from the remote site will be using remote desktop   
> connections.  But to allow faster and easier access to the 
> documentation, I need a server B at the remote site which functions as 
> described.  AFS might be suitable, but I need to be able to layer Samba 
> on top of that to allow a seamless operation.
> I don't want the laptops to have their own cache and then having to 
> figure out how to solve the multiple different changes to documents 
> containing layouts. (MS Word and OpenDocument files).

Ok so your customers (hperactive problem users) inteface to your cluster
to do their work. When finished you write things out to other servers
with all of the VM servers. Lots of really cool tools are emerging
in the cluster space.

I think these folks have mesos + spark + samba + nfs all in one box. [1]
Build rather than purchase? WE have to figure out what you and Alan need, on
a cluster, because it is what most folks need/want. It the admin_advantage
part of cluster. (There also the Big Science (me) and Web centric needs.
Right now they are realted project, but things will coalesce, imho. There is
even "Spark_sql" for postgres admins [2].

[1]
http://www.quantaqct.com/en/01_product/02_detail.php?mid=29&sid=162&id=163&qs=102

[2] https://spark.apache.org/sql/


> > > We use Lustre for our high performance general storage. I don't 
> > > have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s 
> > > over IB sounds familiar, but don't quote me on that).
> > 
> > AT Umich, you guys should test the FhGFS/btrfs combo. The folks
> > at UCI swear about it, although they are only publishing a wee bit.
> > (you know, water cooler gossip)...... Surely the Wolverines do not
> > want those californians getting up on them?

> > Are you guys planning a mesos/spark test?

> > > > Personally, I would read up on these and see how they work. Then,
> > > > based on that, decide if they are likely to assist in the specific
> > > > situation you are interested in.

> > It's a ton of reading. It's not apples-to-apple_cider type of reading.
> > My head hurts.....

> Take a walk outside. Clear air should help you with the headaches :P

Basketball, Boobs and Burbon use to work quite well. Now it's mostly
basketball, but I'm working on someone "very cute"......

> > I'm leaning to  DFS/LFS
> > (2)  Luster/btrfs      and     FhGFS/btrfs

> I have insufficient knowledge to advise on either of these.
> One question, why BTRFS instead of ZFS?

I think btrfs has tremendous potential. I tried ZFS a few times,
but the installs are not part of gentoo, so they got borked
uEFI, grubs to uuids, etc etc also were in the mix. That was almost
a year ago. For what ever reason the clustering folks I have
read and communicated with are using ext4, xfs and btrfs. Prolly
mostly because those are mostly used in their (systemd) inspired)
distros....?

 
> My current understanding is: - ZFS is production ready, but due to  
> licensing issues, not included in the kernel - BTRFS is included, but 
> not yet production ready with all planned features. 

Yep. the license issue with ZFS is a real killer for me. Besides,
as an old state-machine, C hack, anything with B-tree is fabulous.
Prejudices? Yep, but here, I'm sticking with my gut. Multi port
ram can do mavelous things with Btree data structures. The 
rest will become available/stable. Simply, I just trust btrfs, in
my gut.

  
> For me, Raid6-like functionality is an absolute requirement and latest I >
know is that that isn't implemented in BTRFS yet. Does anyone know when 
> that will be implemented and reliable? Eg. what time-frame are we 
> talking about?


Now we are "communicating"! We have different visions. I want cheap,
mirrored HD on small numbers of processors (less than 16 for now).
I want max ram of the hightest performance possilbe. I want my reduncancy
in my cluster with my cluster software deciding when/where/how-often
to write out to HD. If the max_ram is not enought, then SSD will 
be between the ram and HD. Also, know this. The GPU will be assimilated 
into the processors, just like the FPUs were, some decade ago. Remember
the i386 and the i387 math coprocessor chip? The good folks at opengl,
gcc (GNU) and others will soon (eventually?) give us compilers to
automagically use the gpu (and all of that blazingly fast ram therein,
as slave to Alan admin authority (some bullship like that).


So, my "Epiphany" is this. The bitches at systemd are to renamed
"StripperD", as they will manage the boot cycle (how fast you can
go down (save power) and come back up (online). The Cluster
will rule off of your hardware, like a "Sheilk" "the ring that rules 
them all" be  the driver of the gabage collect processes. The cluster
will be like the "knights of the round table"; each node helping, and
standing for those other nodes (nobles) that stumble, always with
extra resources, triple/quad redundancy and solving problems
before that kernel based "piece of" has a chance to anything
other than "go down" or "Come up" online.

We shall see just who the master is of my hardawre!
The sadest thing for me is that when I extolled about billion
dollar companies corrupting the kernel development process, I did
not even have those {hat wearing loosers} in mind. They are 
irrelevant. I was thinking about those semiconductor companies.
You know the ones that accept billions of dollars for the NSA
and private spoofs to embed hardware inside of hardware. The ones
that can use "white noise" as a communications channel. The ones
that can tap a fiber optic cable, with penetration. Those are
the ones to focus on. Not a bunch of "silly boyz"......

My new K_main{} has highlighted a path to neuter systemd.
But I do like how StripperD moves up and down, very quickly.

Cool huh? 
It's PARTY TIME!

> Joost
James






Reply via email to