Re: [Gluster-users] Newbie questions on HPC cluster + gluster configuration

2016-01-31 Thread Pranith Kumar Karampuri

+Raghavendra, one of the maintainers of distribute xlator

Pranith
On 01/29/2016 03:28 PM, Fedele Stabile wrote:

Hello,
I want to comment with you the configuration that I would realize on my
HPCC cluster:
The cluster is 32 worker nodes (wn1 ... wn32) each one has a local
disk,
Infiniband 40 Gb is the connection network
and I have also a login server (login-server).
I would create a glusterfs distributed volume using the worker node
disks (in total 32 disks 1TB)
by running this command on login-server:
login-server# gluster volume create scratch wn1:/brick wn2:/brick ..
wn32:/brick

After this I would mount the volume on each node of the cluster so that
for example if I write file1 in scratch on wn1 node I'll write on the
local disk of wn1.
The question is if I can mount scratch in wn1 using this command:
wn1#  mount -t glusterfs wn1:/scratch /scratch

This permits me to write file1 locally not using network channel, isn't
it?

Thank you for your attention and your contribution.
Fedele Stabile

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Newbie questions on HPC cluster + gluster configuration

2016-01-29 Thread Fedele Stabile
Hello, 
I want to comment with you the configuration that I would realize on my
HPCC cluster:
The cluster is 32 worker nodes (wn1 ... wn32) each one has a local
disk, 
Infiniband 40 Gb is the connection network
and I have also a login server (login-server).
I would create a glusterfs distributed volume using the worker node
disks (in total 32 disks 1TB) 
by running this command on login-server:
login-server# gluster volume create scratch wn1:/brick wn2:/brick ..
wn32:/brick

After this I would mount the volume on each node of the cluster so that
for example if I write file1 in scratch on wn1 node I'll write on the
local disk of wn1.
The question is if I can mount scratch in wn1 using this command:
wn1#  mount -t glusterfs wn1:/scratch /scratch

This permits me to write file1 locally not using network channel, isn't
it? 

Thank you for your attention and your contribution.
Fedele Stabile

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] newbie questions + rpc_client_ping_timer_expired error

2011-05-23 Thread Amar Tumballi
Looking at the first mail in the thread, after 'creating' the volume, you
are trying to mount it.. please run 'gluster volume start kvm' before
running mount. If you have already done it, check if actually the brick
process (glusterfsd) is still running by 'ps ax | grep glusterfsd' on
server. If not, please go through the brick log file for more information.

I suspect that the brick here can be a read-only backend which would have
killed glusterfsd process even when you do 'gluster volume start'.

Regards,
Amar

On Thu, May 12, 2011 at 5:46 AM, Chris Haumesser  wrote:

> Replying to my own thread ...
>
> After reading more mailing list archives and docs, I tried disabling
> stat-prefetch, to no avail.
>
> I next disabled all of the other performance-related features
> (write-behind, read-ahead, io-cache, quick-read), and now my debootstrap
> appears to be (albeit slowly) going about its business without issue.
>
> I also noticed that 42 seconds is the default value for
> network.ping-timeout, which corresponds to the error I was seeing in syslog.
>
>
> Which of the above options, now disabled, is most likely to have triggered
> the network ping timeouts that I was consistently seeing before? (I did not
> change anything on the network.)
>
> What other side-effects and performance hits will I incur with the above
> options disabled?
>
> Finally, I do not see descriptions of what the io-cache or quick-read
> options do in the 3.2 docs. Can someone elucidate? I would also love more
> thorough explanations of what how write-behind and read-ahead work (the docs
> are pretty terse).
>
> Thanks everyone.
>
> Cheers,
>
>
> -C-
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] newbie questions + rpc_client_ping_timer_expired error

2011-05-11 Thread Chris Haumesser
Replying to my own thread ...

After reading more mailing list archives and docs, I tried disabling
stat-prefetch, to no avail.

I next disabled all of the other performance-related features (write-behind,
read-ahead, io-cache, quick-read), and now my debootstrap appears to be
(albeit slowly) going about its business without issue.

I also noticed that 42 seconds is the default value for
network.ping-timeout, which corresponds to the error I was seeing in syslog.


Which of the above options, now disabled, is most likely to have triggered
the network ping timeouts that I was consistently seeing before? (I did not
change anything on the network.)

What other side-effects and performance hits will I incur with the above
options disabled?

Finally, I do not see descriptions of what the io-cache or quick-read
options do in the 3.2 docs. Can someone elucidate? I would also love more
thorough explanations of what how write-behind and read-ahead work (the docs
are pretty terse).

Thanks everyone.

Cheers,


-C-
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] newbie questions + rpc_client_ping_timer_expired error

2011-05-11 Thread Chris Haumesser
Greetings,

I'm trying to replace an NFS server, serving (currently) about a dozen
clients, with a gluster cluster.

Ultimately, I'd like to use gluster as read-only nfs to net-boot a number of
clients in my cluster, using something like openSIS. (Or better yet,
natively booting glusterfs, as I saw in the mailing list archives from last
year).

I'm having some trouble getting up and running.

First question: should I be using 3.1.4 or 3.2? I notice that 3.2 is listed
as the latest release, but the LATEST folder on the ftp server still points
to 3.1.4. Confused.

I have been testing with 3.2, both pre-compiled debs and my own build.
Particular to my nfsroot problem, I have created a gluster volume called
'kvm' and replicated it across two nodes, e.g.,

gluster volume create kvm replica 2 transport tcp util.office:/gluster/kvm1
admin.office:/gluster/kvm2

I then mount the volume on util.office:

mount -t glusterfs util.office:kvm /mnt/kvm

Then I attempt to use debootstrap to set up my nfs image at this mountpoint.
Deboostrap consistenly fails at 'Installing core packages ... ' and if I
wait long enough, I am rewarded with this terse nugget in syslog:

GlusterFS[1223]: [2011-05-11 21:32:06.376514] C
[client-handshake.c:121:rpc_client_ping_timer_expired] 0-kvm-client-1:
server 10.11.12.44:24010 has not responded in the last 42 seconds,
disconnecting.

I get this result whether using the pre-compiled debs or my own build (just
in slightly different locations). I am using all default options on the
volume at this point. The output of 'gluster peer status' on each end
continues to show that the peers are connected, and all other network
communication between the hosts seems normal.

The volume definitions produced by the gluster cli are here:
http://pastebin.com/W7R1n4UD, but they're using all default options.

I'd appreciate any guidance on how to move forward. I have read mention of
other users net-booting from glusterfs, so I guess I must be missing or
misusing some configuration parameter(s), or perhaps using the wrong
release.


Thanks!


-C-
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] newbie questions.

2011-04-19 Thread Jeff Darcy
On 04/19/2011 09:20 AM, Fyodor Ustinov wrote:
> 1. Possible create new volume by 'gluster' command stripe and replica 
> simultaneously?

This came up at my job recently too.  I was surprised to find that,
although the code and the volfile syntax both support this, the CLI
syntax has no way to express it.

> 2. File stored on glusterfs can not be greater size than the brick?

Without striping, yes, the size of a file cannot be greater than the
(remaining) size of a brick - or the smallest of all replica bricks if
you're using replication.  IMO this is one of the main reasons to use
striping, since I've never seen it provide any performance benefit.

> 3. I have two bricks of size 10G each. And 2 files of 4G each on one 
> brick. Perform glusterfs self-balancing and migrate one file to another 
> brick? I must/can do it "by hand"?

It's probably better to let GlusterFS do the rebalancing if possible,
but it might not work with very small numbers of files.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] newbie questions.

2011-04-19 Thread Fyodor Ustinov

Hi!

1. Possible create new volume by 'gluster' command stripe and replica 
simultaneously?

2. File stored on glusterfs can not be greater size than the brick?
3. I have two bricks of size 10G each. And 2 files of 4G each on one 
brick. Perform glusterfs self-balancing and migrate one file to another 
brick? I must/can do it "by hand"?


To begin, I think, that's enough. :)

WBR,
Fyodor.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Newbie questions

2010-08-20 Thread Kristofer Pettijohn
Hello,

I'm new to Gluster and am trying to understand it better before I roll it into 
production.  I looked at the FAQ's and it didn't seem to answer my questions, 
so please pardon my ignorance.

For now I have set up two servers, gluster1 and gluster2.  The clients are set 
up using the cluster/replicate translator:

volume mirror-0
type cluster/replicate
subvolumes gluster1-1 gluster2-1
end-volume

I have a few questions about this setup.

1. If one of the mirror nodes goes down (for minutes, hours, or even needs to 
be completely rebuilt), how is recovery/resync'ing handled?  Do I need to do 
"ls -laR" in the directory from a client to force it to check all of the files?

2. When growth happens, I would like to add servers in pairs.  I would like to 
add another mirror, and stripe across the mirrors.  I understand that gluster 
needs to be restarted to add storage nodes, but assuming that is done, is there 
any more than updating the client volume files and restarting gluster?  If a 
new mirror volume is added and it can be striped, is it possible for gluster to 
rebalance the data across the stripe?

Thanks,
Kris

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions

2010-05-04 Thread pkoelle

Am 04.05.2010 14:34, schrieb Count Zero:


On May 4, 2010, at 3:25 PM, pkoelle wrote:


 From our testing we found gluster with many small files to be rather slow 
(GigE). Each open() will go over the network and will effectively kill read 
performance (5-7 MB/sec). We tried to serve webapps with many small files and 
startup time was not tolerable.



How about performance in 'replicate' mode (AFR), where you set a preferred 
volume to be the local volume?
Would you still get the same bad performance with that?

It's just unclear to me in which configuration you experiences the sub-optimal 
performance.

Sorry, should have provided more details. Version was glusterFS 3.0.3 
from git checkout. We tried 4 node (2servers/2clients) with 
favorite-child and 2 node (client/server same node) with read-subvolume 
pointing to the local node. (plus a boatload of variations with 
translators).


But as I said, and what you can gather from the list-archives, reported 
performance differs wildly so there is no way around testing your own 
platform.


cheers
 Paul
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions

2010-05-04 Thread Christopher Hawkins
I don't recall the numbers, but when I did that reads were about as fast as 
regular local reads. Also if the post below refers to an older version of 
glusterfs, then things might have changed a lot since then. The quick-read 
translator combined with 3.x version was supposed to help a lot for this 
situation and it's a relatively recent addition to the project. 

Chris

- "Count Zero"  wrote:

> On May 4, 2010, at 3:25 PM, pkoelle wrote:
> 
> > From our testing we found gluster with many small files to be rather
> slow (GigE). Each open() will go over the network and will effectively
> kill read performance (5-7 MB/sec). We tried to serve webapps with
> many small files and startup time was not tolerable.
> 
> 
> How about performance in 'replicate' mode (AFR), where you set a
> preferred volume to be the local volume?
> Would you still get the same bad performance with that?
> 
> It's just unclear to me in which configuration you experiences the
> sub-optimal performance.
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions

2010-05-04 Thread Count Zero

On May 4, 2010, at 3:25 PM, pkoelle wrote:

> From our testing we found gluster with many small files to be rather slow 
> (GigE). Each open() will go over the network and will effectively kill read 
> performance (5-7 MB/sec). We tried to serve webapps with many small files and 
> startup time was not tolerable.


How about performance in 'replicate' mode (AFR), where you set a preferred 
volume to be the local volume?
Would you still get the same bad performance with that?

It's just unclear to me in which configuration you experiences the sub-optimal 
performance.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions

2010-05-04 Thread pkoelle

Am 03.05.2010 21:50, schrieb Joshua Baker-LePain:
[snip]

I'm looking at Gluster for 2 purposes:

1) To host our "database" volume. This volume has copies of several
protein and gene databases (PDB, UniProt, etc). The databases
generally consist of tens of thousands of small (a few hundred KB at
most) files. Users often start array jobs with hundreds or thousands
of tasks, each task of which accesses many of these files.
From our testing we found gluster with many small files to be rather 
slow (GigE). Each open() will go over the network and will effectively 
kill read performance (5-7 MB/sec). We tried to serve webapps with many 
small files and startup time was not tolerable.


Of course, you need to test yourself ;)

hth
 Paul

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions

2010-05-04 Thread Daniel Maher

On 05/03/2010 09:50 PM, Joshua Baker-LePain wrote:


For purpose 1, clearly I'm looking at a replicated volume. For purpose
2, I'm assuming that distributed is the way to go (rather than striped),
although for reliability reasons I'd likely go replicated then
distributed. For storage bricks, I'm looking at something like HP's


1. Yes.
2. Your call - both will work, but as you said, it's a question of in 
how many places you want the data to be. :)



2) Is it frowned upon to create 2 volumes out of the same physical set of
disks? I'd like to maximize the spindle count in both volumes
(especially the scratch volume), but will it overly degrade
performance? Would it be better to simply create one replicated and
distributed volume and use that for both of the above purposes?


I don't know about « frowned », but my knee-jerk response would be to 
avoid that scenario.  That said, it really all comes down to usage 
patterns ; if you're only serving data out of one volume at a time, then 
there's no problem, but if you're constantly using both...



3) Is it crazy to think of doing a distributed (or NUFA) volume with the
scratch disks in the whole cluster? Especially given that we have
nodes of many ages and see not infrequent node crashes due to bad
memory/HDDs/user code?


Again, « crazy » is a little strong, but again, it might not hurt to 
review your usage patterns before diving into the architecture.  Who 
will access what, in what amounts, and at what speed, when ?  Once this 
has been established, you can make better informed decisions about where 
to put the data, and how to let people access it (in fact, i would 
submit that many of your questions will answer themselves :) ).



--
Daniel Maher 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions

2010-05-03 Thread Tejas N. Bhise
Jon,

Stripe should be used only if the data usage is a very few files, but each, 
very very large file ( many GBs in size ). 
Rest all can use distribute. 

Regards,
Tejas.

- Original Message -
From: "Jon Tegner" 
To: "Joshua Baker-LePain" 
Cc: "gluster-users" 
Sent: Tuesday, May 4, 2010 11:00:57 AM
Subject: Re: [Gluster-users] Newbie questions

Hi, I'm also a newbie, and I'm looking forward to answers to your questions.

Just one question, why would distributed be preferable over striped (I'm 
probably the bigger newbie here)?

> For purpose 1, clearly I'm looking at a replicated volume.  For 
> purpose 2, I'm assuming that distributed is the way to go (rather than 
> striped), although for 

Regards,

/jon
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions

2010-05-03 Thread Jon Tegner

Hi, I'm also a newbie, and I'm looking forward to answers to your questions.

Just one question, why would distributed be preferable over striped (I'm 
probably the bigger newbie here)?


For purpose 1, clearly I'm looking at a replicated volume.  For 
purpose 2, I'm assuming that distributed is the way to go (rather than 
striped), although for 


Regards,

/jon
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Newbie questions

2010-05-03 Thread Joshua Baker-LePain
I'm a Gluster newbie trying to get myself up to speed.  I've been through 
the bulk of the website docs and I'm in the midst of some small (although 
increasing) scale test setups.  But I wanted to poll the list's collective 
wisdom on how best to fit Gluster into my setup.


As background, I currently have over 550 nodes with over 3000 cores in my 
(SGE scheduled) cluster, and we expand on a roughly biannual basis.  The 
cluster is all gigabit ethernet -- each rack has a switch, and these 
switches each have 4-port trunks to our central switch.  Despite the 
number of nodes in each rack, these trunks are not currently 
oversubscribed.  The cluster is shared among many research groups and the 
vast majority of the jobs are embarrassingly parallel.  Our current 
storage is an active-active pair of NetApp FAS3070s with a total of 8 
shelves of disks.  Unsurprisingly, it's fairly easy for any one user to 
flatten either head (or both) of the NetApp.


I'm looking at Gluster for 2 purposes:

1) To host our "database" volume.  This volume has copies of several
   protein and gene databases (PDB, UniProt, etc).  The databases
   generally consist of tens of thousands of small (a few hundred KB at
   most) files.  Users often start array jobs with hundreds or thousands
   of tasks, each task of which accesses many of these files.

2) To host a cluster-wide scratch space.  Users waste a lot of time (and
   bandwidth) copying (often temporary) results back and forth between the
   network storage and the nodes' scratch disks.  And scaling the NetApp
   is difficult, not least of which because it is rather difficult to
   convince PIs to spring for storage rather than more cores.

For purpose 1, clearly I'm looking at a replicated volume.  For purpose 2, 
I'm assuming that distributed is the way to go (rather than striped), 
although for reliability reasons I'd likely go replicated then 
distributed.  For storage bricks, I'm looking at something like HP's DL180 
G6, where I would have 25 internal SAS disks (or alternatively, I could 
put the same number in a SAS-attached external chassis).


In addition to any general advice folks could give, I have these specific 
questions:


1) My initial leaning would be to RAID10 the disks at the server level,
   and then use the RAID volumes as gluster exports.  But I could also see
   running the disks in JBOD mode and doing all the redundancy at the
   Gluster level.  The latter would seem to make management (and, e.g.,
   hot swap) more difficult, but is it preferred from a Gluster
   perspective?  How difficult would it make disk and/or brick
   maintenance?

2) Is it frowned upon to create 2 volumes out of the same physical set of
   disks?  I'd like to maximize the spindle count in both volumes
   (especially the scratch volume), but will it overly degrade
   performance?  Would it be better to simply create one replicated and
   distributed volume and use that for both of the above purposes?

3) Is it crazy to think of doing a distributed (or NUFA) volume with the
   scratch disks in the whole cluster?  Especially given that we have
   nodes of many ages and see not infrequent node crashes due to bad
   memory/HDDs/user code?

If you've made it this far, thanks very much for reading.  Any and all 
advice (and/or pointers at more documentation) would be much appreciated.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions :-)

2009-09-07 Thread Daniel Maher

Philipp Huber wrote:

Daniel,

Fantastic, thanks very much for your reply. We are very excited about GlusterFS 
and are working on a business case for a Cloud Storage product that would 
complement our Cloud Computing platform.

One quick question re your #4 answer, does that mean you will have to take the 
volume down for a re-sync?

Thanks for your reply,
Phil


Please direct your replies to the list, mate. :)

As for question #4 :

> 4)  Is it correct to assume that after a failed 'brick' comes back
> online, the auto-heal functionality will take care of the re-sync'ing?

The volume doesn't need to be taken down, no, but replication won't 
happen by magic either.  Basically, for a node to realise that its copy 
of the file is no longer current (or that it shouldn't be there, or 
should be there, or whatever), the file has to be accessed.


On a webserver or something like that, the access might easily occur 
organically (a graphic or html page being served).  On file servers 
where there's less interactivity, running a simple script that will find 
and, say, stat the files in the exported tree (for example) will ensure 
coherency.



--
Daniel Maher 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Newbie questions :-)

2009-09-07 Thread Daniel Maher

Hello !

Philipp Huber wrote:


1)  Can I configure GlusterFS so it can withstand a complete 'brick'
failure without users loosing access to their data?


Yes.


2)  If Yes, can I configure how many redundant copies of the files are
store, e.g. 2x, 3x? 


Yes.


3)  Can I control the amount of replication per user?


No.


4)  Is it correct to assume that after a failed 'brick' comes back
online, the auto-heal functionality will take care of the re-sync'ing?


Yes (but not in the background...)


5)  As GlusterFS stores Metadata along with the normal data, what is the
capacity overhead in %?


That's a good question. :)


--
Daniel Maher 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Newbie questions :-)

2009-09-04 Thread Philipp Huber
Hi guys,

 

My first post, so sorry if it is something that was covered before. I read
quite a bit of the documentation and archived posts, but couldn't find the
answers:

 

1)  Can I configure GlusterFS so it can withstand a complete 'brick'
failure without users loosing access to their data?

2)  If Yes, can I configure how many redundant copies of the files are
store, e.g. 2x, 3x? 

3)  Can I control the amount of replication per user?

4)  Is it correct to assume that after a failed 'brick' comes back
online, the auto-heal functionality will take care of the re-sync'ing?

5)  As GlusterFS stores Metadata along with the normal data, what is the
capacity overhead in %?

 

Any feedback is hugely appreciated.

 

Phil Huber

 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users