Re: [Gluster-users] Newbee Question: GlusterFS on Compute Cluster?

2013-05-10 Thread Fabricio Cannini

Em 10-05-2013 19:38, Bradley, Randy escreveu:


I've got a 24 node compute cluster. Each node has one extra terabyte
drive. It seemed reasonable to install Gluster on each of the compute
nodes and the head node. I created a volume from the head node:

gluster volume create gv1 rep 2 transport tcp compute000:/export/brick1
compute001:/export/brick1 compute002:/export/brick1
compute003:/export/brick1 compute004:/export/brick1
compute005:/export/brick1 compute006:/export/brick1
compute007:/export/brick1 compute008:/export/brick1
compute009:/export/brick1 compute010:/export/brick1
compute011:/export/brick1 compute012:/export/brick1
compute013:/export/brick1 compute014:/export/brick1
compute015:/export/brick1 compute016:/export/brick1
compute017:/export/brick1 compute018:/export/brick1
compute019:/export/brick1 compute020:/export/brick1
compute021:/export/brick1 compute022:/export/brick1
compute023:/export/brick1

And then I mounted the volume on the head node. So far, so good. Apx. 10
TB available.

Now I would like each compute node to be able to access files on this
volume. Would this be done by NFS mount from the head node to the
compute nodes or is there a better way?


Back in the days of 3.0.x ( ~ 3 years ago ) I made a 'distributed 
scratch' in a scenario just like yours, Randy. I remember of using 
gluster's own protocol to access the files, mounting the volume locally.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance gluster 3.2.5 + QLogic Infiniband

2012-04-11 Thread Fabricio Cannini
Hi there

The only time i setup a gluster "distributed scratch" like Michael is
doing,
( 3.0.5 Debian packages ) i too choose IPoIB simply because i could not get
rdma working at all.
Time was short and IPoIB "Just worked" well enough for our demand at the
time, so i didn't looked into this issue. Plus, pinging and ssh'ing into a
node through the IB interface comes handy when diagnosing and fixing
networking issues.

Em quarta-feira, 11 de abril de 2012, Sabuj Pattanayek
escreveu:
> I wonder if it's possible to have both rdma and ipoib served by a
> single glusterfsd so I can test this? I guess so, since it's just a
> tcp mount?
>
> On Wed, Apr 11, 2012 at 1:43 PM, Harry Mangalam 
wrote:
>> On Tuesday 10 April 2012 15:47:08 Bryan Whitehead wrote:
>>
>>> with my infiniband setup I found my performance was much better by
>>
>>> setting up a TCP network over infiniband and then using pure tcp as
>>
>>> the transport with my gluster volume. For the life of me I couldn't
>>
>>> get rdma to beat tcp.
>>
>> Thanks for that data point, Brian.
>>
>> Very interesting. Is this a common experience? The RDMA experience has
not
>> been a very smooth one for me and doing everything with IPoIB would save
a
>> lot of headaches, especially if it's also higher performance.
>>
>> hjm
>>
>> --
>>
>> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
>>
>> [ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
>>
>> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
>>
>> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
>>
>> --
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Estimated date for release of Gluster 3.3

2012-03-14 Thread Fabricio Cannini
2012/3/14 Tim Bell :
>
> Is there an estimate of when Gluster 3.3 would be out of beta ?

Before enlightenment 17 , right guys? Right ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] FW: Upcoming August webinars

2011-08-17 Thread Fabricio Cannini
Em quarta-feira 17 agosto 2011, às 15:26:04, Charles Williams escreveu:
> Will we be able to get a copy of this at a later date? Would love to
> attend but am not able (allowed?) to stay up so late when I have to work.
> :(
> 
> thanks,
> chuck

And please, pretty please, no webex video format.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] adding new bricks for volume expansion with 3.0.x?

2011-06-03 Thread Fabricio Cannini
Em Sexta-feira 03 Junho 2011, às 16:42:37, mki-gluste...@mozone.net escreveu:
> Hi
> 
> How does one go about expanding a volume that consists of a distribute
> -replicate set of machines in 3.0.6?  The setup consists of 4 pairs
> of machines, with 3 bricks per machine.  I need to add an additional 5
> pair of machines (15 bricks) to the volume, but I don't understand
> what's required per se.  There are currently 4 client machines mounting
> the volume using the ip.of.first.backend:/volume syntax in fstab where
> the first backend server provides the general volume to the clients.
> 
> Looking at some of the past mailing list chatter, it seems
> scale-n-defrag.sh is what I need, but it's unclear as to how to go about
> this without distruption to services on the other clients.  If I bring up
> a new client server, copy the vol file and mount that vol using that
> volfile temporarily to run the defrag, do I have to make all the existing
> clients also mount that exact same vol file while this is running?  Or can
> they keep running on their old volfile for the duration of the defrag? 
> The last time I tried modifying the vol file by even 1 byte, on the
> backend that was serving it up, the clients refused to mount it, so not
> sure how that's supposed to work in this case?
> 
> Can someone please shed some light into what the correct process to go
> about this is?
> 
> Thanks much.
> 
> Mohan

Hi Mohan.

is upgrading to newer versions an option for you?
If yes, then i would look into it *before* trying scale-n-defrag.sh. 
Operations like this got much easier from 3.1 and onwards.

Good luck.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fwd: files not syncing up with glusterfs 3.1.2

2011-02-21 Thread Fabricio Cannini
Em Segunda-feira 21 Fevereiro 2011, às 12:01:11, Joe Landman escreveu:
> On 02/21/2011 09:53 AM, David Lloyd wrote:
> > I'm working with Paul on this.
> > 
> > We did take advice on XFS beforehand, and were given the impression that
> > it would just be a performance issue rather than things not actually
> > working.
> 
> Hi David
> 
>XFS works fine as a backing store for GlusterFS.  We've deployed this
> now to many customer sites, and have not run into issues with it.

That's nice to hear. Next time i setup a gluster volume i'm going to take a 
look at xfs as a backend.
BTW Joe, these deployments with xfs as backend fs, which version of gluster 
have you used ?

> > We've got quite fast hardware, and are more comfortable with XFS that
> > ext4 from our own experience so we did our own tests and were happy with
> > XFS performance.
> 
>There are many reasons to choose XFS in general, and there are no
> issues with using it with GlusterFS.  Especially on large file transfers.

Indeed it is. But i was with the 3.0x series docs on my mind still.

> > Likewise, we're aware of the very poor performance of gluster with small
> > files. We serve a lot of large files, and we're now moved most of the
> > small files off to a normal nfs server. Again small files aren't known
> > to break gluster are they?
> 
>Small files are the bane of every cluster file system.  We recommend
> using NFS client with GlusterFS for smaller files, simply due to the
> additional caching you can get out of the NFS system.

Good to know. Thanks for the tip.

>Regards,
> 
> Joe
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Fwd: files not syncing up with glusterfs 3.1.2

2011-02-21 Thread Fabricio Cannini
Em Sexta-feira 18 Fevereiro 2011, às 23:24:10, paul simpson escreveu:
> hello all,
> 
> i have been testing gluster as a central file server for a small animation
> studio/post production company.  my initial experiments were using the fuse
> glusterfs protocol - but that ran extremely slowly for home dirs and
> general file sharing.  we have since switched to using NFS over glusterfs.
>  NFS has certainly seemed more responsive re. stat and dir traversal. 
> however, i'm now being plagued with three different types of errors:
> 
> 1/ Stale NFS file handle
> 2/ input/output errors
> 3/ and a new one:
> $ l -l /n/auto/gv1/production/conan/hda/published/OLD/
> ls: cannot access /n/auto/gv1/production/conan/hda/published/OLD/shot:
> Remote I/O error
> total 0
> d? ? ? ? ?? shot
> 
> ...so it's a bit all over the place.  i've tried rebooting both servers and
> clients.  these issues are very erratic - they come and go.
> 
> some information on my setup: glusterfs 3.1.2
> 
> g1:~ # gluster volume info
> 
> Volume Name: glustervol1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: g1:/mnt/glus1
> Brick2: g2:/mnt/glus1
> Brick3: g3:/mnt/glus1
> Brick4: g4:/mnt/glus1
> Brick5: g1:/mnt/glus2
> Brick6: g2:/mnt/glus2
> Brick7: g3:/mnt/glus2
> Brick8: g4:/mnt/glus2
> Options Reconfigured:
> 
> 
> performance.write-behind-window-size: 1mb
> 
> 
> performance.cache-size: 1gb
> 
> 
> performance.stat-prefetch: 1
> 
> 
> network.ping-timeout: 20
> 
> 
> diagnostics.latency-measurement: off
> 
> 
> diagnostics.dump-fd-stats: on
> 
> 
> that is 4 servers - serving ~30 clients - 95% linux, 5% mac.  all NFS.
>  other points:
> - i'm automounting using NFS via autofs (with ldap).  ie:
>   gus:/glustervol1 on /n/auto/gv1 type nfs
> (rw,vers=3,rsize=32768,wsize=32768,intr,sloppy,addr=10.0.0.13)
> gus is pointing to rr dns machines (g1,g2,g3,g4).  that all seems to be
> working.
> 
> - backend files system on g[1-4] is xfs.  ie,
> 
> g1:/var/log/glusterfs # xfs_info /mnt/glus1
> meta-data=/dev/sdb1  isize=256agcount=7, agsize=268435200
> blks
>  =   sectsz=512   attr=2
> data =   bsize=4096   blocks=1627196928, imaxpct=5
>  =   sunit=256swidth=2560 blks
> naming   =version 2  bsize=4096   ascii-ci=0
> log  =internal   bsize=4096   blocks=32768, version=2
>  =   sectsz=512   sunit=8 blks, lazy-count=0
> realtime =none   extsz=4096   blocks=0, rtextents=0
> 
> 
> - sometimes root can stat/read the file in question while the user cannot!
>  i can remount the same NFS share to another mount point - and i can then
> see that with the same user.
> 
> - sample output of g1 nfs.log file:
> 
> [2011-02-18 15:27:07.201433] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:   Filename :
> /production/conan/hda/published/shot/backup/.svn/tmp/entries
> [2011-02-18 15:27:07.201445] I [io-stats.c:353:io_stats_dump_fd]
> glustervol1:   BytesWritten : 1414 bytes
> [2011-02-18 15:27:07.201455] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 001024b+ : 1
> [2011-02-18 15:27:07.205999] I [io-stats.c:333:io_stats_dump_fd]
> glustervol1: --- fd stats ---
> [2011-02-18 15:27:07.206032] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:   Filename :
> /production/conan/hda/published/shot/backup/.svn/props/tempfile.tmp
> [2011-02-18 15:27:07.210799] I [io-stats.c:333:io_stats_dump_fd]
> glustervol1: --- fd stats ---
> [2011-02-18 15:27:07.210824] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:   Filename :
> /production/conan/hda/published/shot/backup/.svn/tmp/log
> [2011-02-18 15:27:07.211904] I [io-stats.c:333:io_stats_dump_fd]
> glustervol1: --- fd stats ---
> [2011-02-18 15:27:07.211928] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:   Filename :
> /prod_data/xmas/lgl/pic/mr_all_PBR_HIGHNO_DF/035/1920x1080/mr_all_PBR_HIGHN
> O_DF.6084.exr [2011-02-18 15:27:07.211940] I
> [io-stats.c:343:io_stats_dump_fd]
> glustervol1:   Lifetime : 8731secs, 610796usecs
> [2011-02-18 15:27:07.211951] I [io-stats.c:353:io_stats_dump_fd]
> glustervol1:   BytesWritten : 2321370 bytes
> [2011-02-18 15:27:07.211962] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 000512b+ : 1
> [2011-02-18 15:27:07.211972] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 002048b+ : 1
> [2011-02-18 15:27:07.211983] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 004096b+ : 4
> [2011-02-18 15:27:07.212009] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 008192b+ : 4
> [2011-02-18 15:27:07.212019] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 016384b+ : 20
> [2011-02-18 15:27:07.212030] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 032768b+ : 54
> [2011-02-18 15:27:07.228051] I [io-stats.c:333:io_stats_dump_fd]
> glustervol1: --- fd stats 

Re: [Gluster-users] Reconfiguring volume type

2011-02-16 Thread Fabricio Cannini
Hi all.

> However if you have a replicate volume with count 2 already created, then
> you just need to add 2 more bricks and it will automatically become a
> distributed replicate volume. This can be scaled to 6 bricks by adding 2
> more bricks again and these steps can be done when the volume is online
> itself.

is it possible with the current 3 servers setup that Udo has, to add 1 server 
to the volume, then change the volume type to distribute-replicate, and after 
add the other 2 servers ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 3.1.1 issues over RDMA and HPC environment

2011-02-07 Thread Fabricio Cannini
Em Domingo 06 Fevereiro 2011, às 16:35:45, Claudio Baeza Retamal escreveu:

Hi.

> Dear friends,
> 
> I have several problems of stability, reliability in a small-middle
> sized cluster, my configuration is the following:
> 
> 66 compute nodes (IBM idataplex, X5550, 24 GB RAM)
> 1 access node (front end)
> 1 master node (queue manager and monotoring)
> 2 server for I/O with GlusterFS configured in distributed mode (4 TB in
> total)
> 
> All computer have a Mellanox ConnectX QDR (40 Gbps) dual port
> 1 Switch Qlogic 12800-180, 7 leaf of 24 ports each one and two double
> Spines QSFP plug
> 
> Centos 5.5 and Xcat as cluster manager
> Ofed 1.5.1
> Gluster 3.1.1 over inbiniband

I have a smaller, but relatively similar setup, and am facing the same issues 
of Claudio.

- 1 frontend node ( 2 intel xeon 5420 , 16gb ram DDR2 ECC , 4TB of raw disk 
space ) with 2 "Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0 5GT/s - 
IB DDR"

- 1 storage node ( 2 intel xeon 5420 , 24gb ram DDR@ ECC, 8TB of raw disk 
space ) with 2 "Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0 5GT/s - 
IB DDR"

- 22 compute nodes  ( 2 intel xeon 5420 , 16gb ram DDR2 ECC , 750GB of raw 
disk space ) with 1 "InfiniBand: Mellanox Technologies MT25204 [InfiniHost III 
Lx HCA]"

Each compute node has a /glstfs partition, with 615GB , serving a gluster 
volume of ~3.1TB in /scratch for all nodes and the frontend, using 3.0.5 stock 
debian squeeze 6.0 packages.

> When the cluster is full loaded for applications which use heavily  MPI
> in combination with other application which uses a lot of I/O to file
> system,  GlusterFS do not work anymore.
> Also, when gendb uses interproscan bioinformatic applications with 128 o
> more jobs, GlusterFS death  or disconnects clients randomly, so, some
> applicatios become shutdown due they do not see the file system.
> 
> This do not happen with Gluster over tcp (ethernet 1 Gbps)  and neither
> happen with Lustre 1.8.5 over infiniband, under same conditions Lustre
> work fine.
> 
> My question is, exist any documenation where there are information more
> especific for GlusterFS tuning?
> 
> Only I found basic information for configuring Gluster, but I do no have
> information more deep (i.e. for experts), I think must exist  some
> option for manipulate this siuation on GlusterFS, moreover, other people
> should have the same problems, since we replicate
>   the configuration in other site with the same results.
> Perhaps, the question is about  the gluster scalability, how many
> clients is recommended for each gluster server when I use RDMA and
> infiniband fabric at 40 Gbps?
> 
> I would appreciate any help,  I want to use Gluster, but stability and
> reliability  is very important for us. Perhaps

I have "solved" it , by taking out of the executing queue the first node that 
was listed in the client file '/etc/glusterfs/glusterfs.vol'.
And this what i *think* is the reason it worked:

I can't find it now, but i saw in the 3.0 docs that " ... the first hostname 
found in the client config file acts as a lock server for the whole volume...". 
In other words, the first hostname found in the client config coordinates the 
locking/unlocking of files in the whole volume. This way, the node does not 
accepts any job, and can dedicate its processing power solely as a 'lock 
server'.

it may well be the case that gluster is not yet as optimized for infiniband as 
it is for ethernet, too. I just can't say.

I am also unable to find how i can specify something like this in the gluster 
config: "node n is a lock server for nodes a,b,c,d". Does anybody if is it 
possible?

Hope it helps you somehow, and to improve gluster performance over IB/RDMA.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.1.2 feedback

2011-01-21 Thread Fabricio Cannini
Em Sexta-feira 21 Janeiro 2011, às 09:49:10, David Lloyd escreveu:
> Hello,
> 
> Haven't heard much feedback about installing glusterfs 3.1.2.
> 
> Should I infer that it's all gone extremely very smoothly for everyone, or
> is everyone being as cowardly as me and waiting for others to do it first?

Hi David.

3.1 is very promising indeed. As an example, yesterday i felt the need to use 
the 'migrate' feature. Bt, i'm one of those 'cowards' , and i think that ( 
many will agree with me ) you can never be too coward with production 
machines. ;)

Bye.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] what does the permission T mean? -rwx-----T

2011-01-13 Thread Fabricio Cannini
2011/1/13 phil cryer :
> I have a file that looks like this, what does T tell me in terms of
> permissions and glusterfs?
>
> -rwx-T 1 root     root       3414 Oct 22 15:27 reportr2.sh

hi Phil.

On a file, it means that whoever has write permission can only *add*
content to it.
In a directory, it means that whoever has write permission can only
*add* files to it, and only the owner can erase the directory, even if
the file/dir perms are 777.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Frequent "stale nfs file handle" error

2011-01-12 Thread Fabricio Cannini
Em Quarta-feira 12 Janeiro 2011, às 09:05:00, Łukasz Jagiełło escreveu:
> W dniu 12 stycznia 2011 11:19 użytkownik Amar Tumballi
> 
>  napisał:
> >> Got same problem at 3.1.1 - 3.1.2qa4
> > 
> > Can you paste the logs ?? Also, when you say problem, what is the user
> > application errors seen?
> 
> No errors/notice logs at gluster, just client side where nfs is
> mounted. When I try list directory got "stale nfs file handle".
> 'mount /dir -o remount' helps but thats not solution.

One thing that i noticed reading again the documentation about AFR, is that 
the first node of my cluster was the 'lock server' for all nodes. At the high 
loads that are common there, it surely is possible that a single machine could 
not manage the locks in a timely manner, right ?
If so, how can i set a specific node as the 'lock server' of a given subvolume 
?

http://www.gluster.com/community/documentation/index.php/Understanding_AFR_Translator

Haven't found it in the docs, only to increase the number of 'lock servers' in 
a subvolume:

http://www.gluster.com/community/documentation/index.php/Understanding_AFR_Translator#Locking_options

TIA.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Frequent "stale nfs file handle" error

2011-01-12 Thread Fabricio Cannini
Em Quarta-feira 12 Janeiro 2011, às 08:19:50, Amar Tumballi escreveu:
> > > If anybody can make a sense of why is it happening, i'd be really
> > > really thankful.
> > 
> > We fixed many issues in 3.1.x releases compared to 3.0.5 (even some
> > issues
> 
> fixed in 3.0.6). Please considering testing/upgrading to higher version.

I'm thinking about upgrading, but i'd rather stay with debian stock packages 
if possible.
I'll talk with Patrick Matthäi, Debian's gluster maintainer and see if is it 
possible to backport the fixes.
Also, if there is any work-around available, please tell us.

> > Got same problem at 3.1.1 - 3.1.2qa4
> 
> Can you paste the logs ?? Also, when you say problem, what is the user
> application errors seen?

i've put a bunch of log messages here >> http://pastebin.com/gkf3CmK9 and here 
>> http://pastebin.com/wDgF74j8 .

> Regards,
> Amar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Frequent "stale nfs file handle" error

2011-01-11 Thread Fabricio Cannini
Hi all.

I've been having this error very frequently, at least once in a week.
Whenever this happens, restarting all the gluster daemons makes things work 
again.

This is the hardware i'm using:

22 nodes
2x Intel xeon 5420 2.5GHz , 16GB ddr2 ECC , 1 sata2 hd of 750GB.
Of which ~600GB is a partition ( /glstfs ) dedicated to gluster. Each node 
have 1 Mellanox MT25204 [InfiniHost III Lx] Inifiniband DDR HCA used by 
gluster through the 'verbs' interface. The switch is a Voltaire ISR 9024S/D.
Each node also is a client of the gluster volume, that is accessed through the 
'/scratch' mount-point.
The machine itself is a scientific cluster, with all nodes and the head running 
Debian Squeeze amd64, with stock 3.0.5 packages.

These are the server and client configs:

Client config
http://pastebin.com/6d4BjQwd

Server config
http://pastebin.com/4ZmX9ir1

And here are some of the messages in the head node log:
http://pastebin.com/gkf3CmK9

If anybody can make a sense of why is it happening, i'd be really really 
thankful.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Instability with large setup and infiniband

2011-01-06 Thread Fabricio Cannini
Hi all

I''ve setup glusterfs , version 3.0.5 ( debian squeeze amd64 stock packages ) 
, like this, being each node both a server and client:

Client config
http://pastebin.com/6d4BjQwd

Server config
http://pastebin.com/4ZmX9ir1


Configuration of each node:
2x Intel xeon 5420 2.5GHz , 16GB ddr2 ECC , 1 sata2 hd of 750GB.
Of which ~600GB is a partition ( /glstfs ) dedicated to gluster. Each node 
also have 1 Mellanox MT25204 [InfiniHost III Lx] Inifiniband DDR hca used by 
gluster through the 'verbs' interface.

This cluster of 22 nodes is used for scientific computing, and glusterfs is 
used to create a scratch area for I/O intensive apps.

And this is one of the problems: *one* I/O intensive job can bring the whole 
volume to its knees, with "Transport endpoint not connected" errors and so, 
till complete uselessness; Especially if the job is running in a parallel way 
( through MPI ) in more than one node.

The other problem is that gluster have been somewhat unstable, even without 
I/O intensive jobs. Out of the blue a simple 'ls -la /scratch' is answered 
with a "Transport endpoint not connected" error. But when this happens, 
restarting all servers brings things back to a working state.

If anybody here using glusterfs with infiniband have been through this ( or 
something like that ) and could share your experiences , please please please 
do

TIA,
Fabricio.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfs debian package

2011-01-06 Thread Fabricio Cannini
Em Quinta-feira 06 Janeiro 2011, às 17:24:02, Piotr Kandziora escreveu:
> Hello,
> 
> Quick question for gluster developers: Could you add to debian package
> automatic creating rc scripts in postinst action? Currently this is
> not supported and user has to manually execute update-rc.d command.
> This would be helpful in large cluster installations...

Hi All.

I second Piotr suggestion.
May i also suggest to the gluster devs 2 things:

- To follow debian's way of separate packages ( server, client, libraries, and 
common data packages). Makes automated installation much easier and cleaner.

- To create a proper debian repository of the "latest and greatest" release of 
gluster at gluster.org. Again, it would make our life as sysadmins much easier 
to just set up a repo in '/etc/apt/sources.list' and let $management_system 
take care of the rest.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users