J. Roeleveld <joost <at> antarean.org> writes:

> > Distributed File Systems (DFS):

> > Local (Device) File Systems LFS:

> Is my understanding correct that the top list all require one of 
> the bottom  list?
> Eg. the "clustering" FSs only ensure the files on the LFSs are 
> duplicated/spread over the various nodes?

> I would normally expect the clustering FS to be either the full layer 
> or a  clustered block-device where an FS can be placed on top.

I have not performed these installation yet. My research indicates
that first you put the Local FS on the drive, just like any installation
of Linux. Then you put the distributed FS on top of this. Some DFS might
not require a LFS, but FhGFS does and does HDFS. I will not acutally
be able to accurately answer your questions, until I start to build
up the 3 system cluster. (a week or 2 away) is my best guess.


> Otherwise it seems more like a network filesystem with caching 
> options (See  AFS).

OK, I'll add AFS. You may be correct on this one  or AFS might be both.

> I am also interested in these filesystems, but for a slightly different 
> scenario:

Ok, so I the "test-dummy-crash-victim" I'd be honored to have, you,
Alan, Neil, Mic  etc etc back-seat-0drive on this adventure! (The more 
I read the more it's time for burbon, bash, and a  bit of cursing
to get started...)


> - 2 servers in remote locations (different offices)
> - 1 of these has all the files stored (server A) at the main office
> - The other (server B - remote office) needs to "offer" all files 
> from serverA  When server B needs to supply a file, it needs to 
> check if the local copy is still the "valid" version. 
> If yes, supply the local copy, otherwise download 
> from server A. When a file is changed, server A needs to be updated.
> While server B is sharing a file, the file needs to be locked on server A 
> preventing simultaneous updates.

OOch, file locking (precious tells me that is alway tricky).
(pist, systemd is causing fits for the clustering geniuses;
some are espousing a variety of cgroup gymnastics for phantom kills)
Spark is fault tolerant, regardless of node/memory/drive failures
above the fault tolerance that a file system configuration many support.
If fact, files lost can be 'regenerated' but it is computationally
expensive. You have to get your file system(s) set up. Then install
mesos-0.20.0 and then spark. I have mesos mostly ready. I should
have spark in alpha-beta this weekend. I'm fairly clueless on the 
DFS/LFS issue, so a DFS that needs no LFS might be a good first choice
for testing the (3) system cluster.


> I prefer not to supply the same amount of storage at server B as 
> server A has. The remote location generally only needs access to 5% of 
> the total amount of files stored on server A. But not always the same 5%.
> Does anyone know of a filesystem that can handle this?

So in clustering, from what I have read, there are all kinds of files
passed around between the nodes and the master(s). Many are critical
files not part of the application or scientific calculations. 
So in time, I think in a clustering evironment, all you seek is
very possible, but it's a hunch, gut feeling, not fact. I'd put
raid mirros underdneath that system, if it makes sense, for now,
or just dd the stuff with a script of something kludgy (Alan is the
king of kludge....)

On gentoo planet one of the devs has "Consul" in his overlays. Read
up on that for ideas that may be relevant to what you need.


> Joost

James
 





Reply via email to