Re: [Gluster-users] Slow performance - 4 hosts, 10 gigabit ethernet, Gluster 3.2.3

2011-09-14 Thread Pavan T C

On Friday 09 September 2011 10:30 AM, Thomas Jackson wrote:

Hi everyone,


Hello Thomas,

Try the following:

1. In the fuse volume file, try:

Under write-behind:
option cache-size 16MB

Under read-ahead:
option page-count 16

Under io-cache:
option cache-size=64MB

2. Did you get 9Gbits/Sec with iperf with a single thread or multiple 
threads?


3. Can you give me the output of:
sysctl -a | egrep 'rmem|wmem'

4. If it is not a problem for you, can you please create a pure 
distribute setup (instead of distributed-replicate) and then report the 
numbers?


5. What is the inode size with which you formatted you XFS filesystem ?
This last point might not be related to your throughput problem, but if 
you are planning to use this setup for a large number of files, you 
might be better off using an inode size of 512 instead of the default 
256 bytes. To do that, your mkfs command should be:


mkfs -t xfs -i size=512 /dev/disk device

Pavan



I am seeing slower-than-expected performance in Gluster 3.2.3 between 4
hosts with 10 gigabit eth between them all. Each host has 4x 300GB SAS 15K
drives in RAID10, 6-core Xeon E5645 @ 2.40GHz and 24GB RAM running Ubuntu
10.04 64-bit (I have also tested with Scientific Linux 6.1 and Debian
Squeeze - same results on those as well). All of the hosts mount the volume
using the FUSE module. The base filesystem on all of the nodes is XFS,
however tests with ext4 have yielded similar results.

Command used to create the volume:
gluster volume create cluster-volume replica 2 transport tcp
node01:/mnt/local-store/ node02:/mnt/local-store/ node03:/mnt/local-store/
node04:/mnt/local-store/

Command used to mount the Gluster volume on each node:
mount -t glusterfs localhost:/cluster-volume /mnt/cluster-volume

Creating a 40GB file onto a node's local storage (ie no Gluster
involvement):
dd if=/dev/zero of=/mnt/local-store/test.file bs=1M count=4
4194304 bytes (42 GB) copied, 92.9264 s, 451 MB/s

Getting the same file off the node's local storage:
dd if=/mnt/local-store/test.file of=/dev/null
4194304 bytes (42 GB) copied, 81.858 s, 512 MB/s

40GB file onto the Gluster storage:
dd if=/dev/zero of=/mnt/cluster-volume/test.file bs=1M count=4
4194304 bytes (42 GB) copied, 226.934 s, 185 MB/s

Getting the same file off the Gluster storage
dd if=/mnt/cluster-volume/test.file of=/dev/null
4194304 bytes (42 GB) copied, 661.561 s, 63.4 MB/s

I have also tried using Gluster 3.1, with similar results.

According to the Gluster docs, I should be seeing roughly the lesser of the
drive speed and the network speed. The network is able to push 0.9GB/sec
according to iperf so that definitely isn't a limiting factor here, and each
array is able to do 400-500MB/sec as per above benchmarks. I've tried
with/without jumbo frames as well, which doesn't make any major difference.

The glusterfs process is using 120% CPU according to top, and glusterfsd is
sitting at about 90%.

Any ideas / tips of where to start for speeding this config up?

Thanks,

Thomas

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] write-behind / write-back caching (asked again - nobody can help?)

2011-08-31 Thread Pavan T C

On Tuesday 30 August 2011 08:36 PM, Christian wrote:

Hello to all,

I'm currently testing glusterfs ( version 3.1.4, 3.1.6, 3.2.2, 3.2.3 and
3.3beta ) for the following situation / behavior:
I want to create a replicated storage via internet / wan with two
storage nodes.
The first node is located in office A and the other one is in office B.
If I try to write a file to the mounted glusterfs (mounted via glusterfs
or nfs), the write performance is as poor as the upload speed (~ 1 mbit
- adjusted manually using tc).
I tested several cache-options (see below) with the following effect:
The copy process of a file is done very fast (~40 mbyte/sec), but the
application (rsync, mc copy, cp) is waiting at 100% for the final sync
of the storage. The process is not finished before glusterfs has written
the file to the 2nd node.


With a replicate config, this is what you can expect. The increased 
write-behind cache is holding your file giving you the boosted 
throughput, but on close, it will have to sync the data to both nodes.



The behavior I am looking for is to store files locally first and then
sync the content to the second node in the background.
Is there a way for this?


I think you are better off using geo-replication rather than the 
traditional replicate configuration for the above requirement of yours.


The following link should help you configure geo-rep -

http://www.gluster.com/community/documentation/index.php/Gluster_3.2_Filesystem_Administration_Guide

Look for the geo-replication section there. It also gives you a 
comparison of replicated volumes vs geo-replication.


HTH,
Pavan



**
volume info:
Volume Name: gl5
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.42.130:/gl5
Brick2: 192.168.42.7:/gl5
Options Reconfigured:
nfs.disable: off
nfs.trusted-sync: on
nfs.trusted-write: on
performance.flush-behind: off
performance.write-behind-window-size: 200MB
performance.cache-max-file-size: 200MB
**
tested mount options:
mount.nfs 127.0.0.1:gl5 /mnt/gluster/ -v -o mountproto=tcp -o async
mount -t glusterfs 127.0.0.1:gl5 /mnt/gluster -o async


Thanks a lot,

Christian
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] One Question about DHT

2011-08-17 Thread Pavan T C

On Wednesday 17 August 2011 09:18 AM, Daniel wrote:

Hello Pavan,

I came cross one question about DHT lookup.

When dht_lookup process a fresh lookup, if the looked up target can not
be found by hashed, why does it assert it as a directory and lookup on
all the child nodes?


Not sure why you thought I should be the one to address this to. There 
are more knowledgeable engineers on this user group :)


I looked up the code a bit to answer your question and here is what I 
understand:


If it is a fresh lookup and the file hash computed for this entity did 
not fall into any of the pre-computed hashed ranges, a lookup_everywhere 
is triggered to go to the backend and see if it exists there. If it is 
not there too, this brings in a mechanism called directory self heal.


The debug message does say - see if it is a directory, but if you look 
at the dht_lookup_dir_cbk, a check is also made to see if it was *not* a 
directory. It might only be the debug messages that might have led you 
into thinking why a directory is being looked up in particular. That is 
not really the case.


Pavan



Thanks

Dan



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] Gluster on an ARM system

2011-08-12 Thread Pavan T C

On Friday 12 August 2011 08:48 AM, Emmanuel Dreyfus wrote:

John Mark Walkerjwal...@gluster.com  wrote:


I've CC'd the gluster-devel list in the hopes that someone there can help
you out. However, my understanding is that it will take some significant
porting to get GlusterFS to run in any production capacity on ARM.


What ARM specific problems have been identified?



The biggest issue, IMO, will be that of endianness.
GlusterFS has been run only on Intel/AMD architecture, AFAIK. I have not 
heard of any SPARC installations. That means that the code has been 
tested only on little-endian architecture. The worst problems come in 
when there is interaction between entities of different endianness.


However, there is another side to this. From what I know, ARM is 
actually a bi-Endian processor. If the ARM cores have the system control 
co-processor, the endianness of the ARM processor can be controlled by 
software. So, if we make ARM to work as a little-endian processor, we 
should work well even in a mixed environment. But then, ARM is a 32-bit 
processor. I am unsure/ignorant of the stability of 32-bit GlusterFS.


If we can solve the two major issues mentioned above viz. Endianness and 
stability of GlusterFS on 32-bit, we should theoretically be able to get 
GlusterFS working on ARM without any other major work.


Again, I cannot vouch for the above statement. Just my thoughts from 
what I know.


Pavan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster client performance

2011-08-09 Thread Pavan T C

On Wednesday 10 August 2011 12:11 AM, Jesse Stroik wrote:

Pavan,

Thank you for your help. We wanted to get back to you with our results
and observations. I'm cc'ing gluster-users for posterity.

We did experiment with enable-trickling-writes. That was one of the
translator tunables we wanted to know the precise syntax for so that we
could be certain we were disabling it. As hoped, disabling trickling
writes improved performance somewhat.

We are definitely interested in any other undocumented write-buffer
related tunables. We've tested the documented tuning parameters.

Performance improved significantly when we switched clients to mainline
kernel (2.5.35-13). We also updated to OFED 1.5.3 but it wasn't
responsible for the performance improvement.

Our findings with 32KB block size (cp) write performance:

250-300MB/sec single stream performance
400MB/sec multiple-stream per client performance


Ok. Lets see if we can improve this further. Please use the following 
tunables as suggested below:


For write-behind -
option cache-size 16MB

For read-ahead -
option page-count 16

For io-cache -
option cache-size 64MB

You will need to place these lines in the client volume file, restart 
the server and remount the volume on the clients.
Your client (fuse) volume file sections will look like below (of course, 
with change in the volume name) -


volume testvol-write-behind
type performance/write-behind
option cache-size 16MB
subvolumes testvol-client-0
end-volume

volume testvol-read-ahead
type performance/read-ahead
option page-count 16
subvolumes testvol-write-behind
end-volume

volume testvol-io-cache
type performance/io-cache
option cache-size 64MB
subvolumes testvol-read-ahead
end-volume

Run your copy command with these tunables. For now, lets have the 
default setting for trickling writes which is 'ENABLED'. You can simply 
remove this tunable from the volume file to get the default behaviour.


Pavan


This is much higher than we observed with kernel 2.6.18 series. Using
the 2.6.18 line, we also observed virtually no difference between
running single stream tests and multi stream tests suggesting a
bottleneck with the fabric.

Both 2.6.18 and 2.6.35-13 performed very well (about 600MB/sec) when
writing 128KB blocks.

When I disabled write-behind on the 2.6.18 series of kernels as a test,
performance plummeted to a few MB/sec when writing blocks sizes smaller
than 128KB. We did not test this extensively.

Disabling enable-trickling-writes gave us approximately a 20% boost,
reflected in the numbers above, for single-stream writes. We observed no
significant difference with several streams per client due to disabling
that tunable.

For reference, we are running another cluster file system on the same
underlying hardware/software. With both the old kernel (2.6.18.x) and
the new kernel (2.6.35-13) we get approximately:

450-550MB/sec single stream performance
1200MB+/sec multiple stream per client performance

We set the test directory to write entire files to a single LUN which is
how we configured gluster in an effort to mitigate differences.

It is treacherous to speculate why we might be more limited with gluster
over RDMA than the other cluster file system without spending a
significant amount of analysis. That said, I wonder if there may be an
issue with the way in which fuse handles write buffers causing a
bottleneck for RMDA.

The bottom line is that our observed performance was poor using the
2.6.18 RHEL 5 kernel line relative to the mainline (2.6.35) kernels.
Updating to the newer kernels was well worth the testing and downtime.
Hopefully this information can help others.

Best,
Jesse Stroik


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.2.2 Performance Issue

2011-08-09 Thread Pavan T C

On Wednesday 10 August 2011 02:56 AM, Joey McDonald wrote:

Hello all,

I've configured 4 bricks over a GigE network, however I'm getting very
slow performance for writing to my gluster share.

Just set this up this week, and here's what I'm seeing:


A few questions -

1. Are these baremetal systems or are they Virtual machines ?

2. What is the amount of RAM of each of these systems ?

3. How many CPUs do they have ?

4. Can you also perform the dd on /gluster as opposed to /root to check 
the backend performance ?


5. What is your disk backend ? Is it direct attached or is it an array ?

6. What is the backend filesystem ?

7. Can you run a simple scp of about 10M between any two of these 
systems and report the speed ?


Pavan



[root@vm-container-0-0 ~]# gluster --version | head -1
glusterfs 3.2.2 built on Jul 14 2011 13:34:25

[root@vm-container-0-0 pifs]# gluster volume info

Volume Name: pifs
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: vm-container-0-0:/gluster
Brick2: vm-container-0-1:/gluster
Brick3: vm-container-0-2:/gluster
Brick4: vm-container-0-3:/gluster

The 4 systems, are each storage bricks and storage clients, mounting
gluster like so:

[root@vm-container-0-1 ~]# df -h /pifs/
Filesystem Size Used Avail Use% Mounted on
glusterfs#127.0.0.1:pifs
1.8T 848M 1.7T 1% /pifs

iperf show's network through put looking good:

[root@vm-container-0-0 pifs]# iperf -c vm-container-0-1

Client connecting to vm-container-0-1, TCP port 5001
TCP window size: 16.0 KByte (default)

[ 3] local 10.19.127.254 port 53441 connected with 10.19.127.253 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec


Then, writing to the local disk is pretty fast:

[root@vm-container-0-0 pifs]# dd if=/dev/zero of=/root/dd_test.img bs=1M
count=2000
2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 4.8066 seconds, 436 MB/s

However, writes to the gluster share, are abysmally slow:

[root@vm-container-0-0 pifs]# dd if=/dev/zero of=/pifs/dd_test.img bs=1M
count=2000
2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 241.866 seconds, 8.7 MB/s

Other than the fact that it's quite slow, it seems to be very stable.

iozone testing shows about the same results.

Any help troubleshooting would be much appreciated. Thanks!

--joey






___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] scrub as in zfs

2011-08-08 Thread Pavan T C

On Monday 08 August 2011 01:30 PM, Uwe Kastens wrote:

Hi again,

If one thinks about a large amount of data, maybe as a replacement for tapes. 
Will auto heal of gluster help with data corruption problems? I would expect 
that, but only, if the files are accessed on a regular basis.

As far as I  have seen, there is no regular scrub mechanism like in zfs?


Right. Not for now. With proative/Background self-heal, you will get 
something similar to that. Stay tuned.


Pavan



Kind Regards

Uwe Kastens
kiste...@googlemail.com



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster client performance

2011-07-27 Thread Pavan T C

[..]



I don't know why my writes are so slow compared to reads. Let me know
if you're able to get better write speeds with the newer version of
gluster and any of the configurations (if they apply) that I've
posted. It might compel me to upgrade.



From your documentation of nfsspeedtest, I see that the reads can 
happen either via dd or via perl's sysread. I'm not sure if one is 
better over the other.


Secondly - Are you doing direct IO on the backend XFS ? If not, try it 
with direct IO so that you are not misled by the memory situation in the 
system at the time of your test. It will give a clearer picture of what 
your backend is capable of.


Your test is such that you write a file and immediately read the same 
file back. It is possible that a good chunk of it is cached on the 
backend. After the write, do a flush of the filesystem caches by using:

echo 3  /proc/sys/vm/drop_caches. Sleep for a while. Then do the read.
Or as suggested earlier, resort to direct IO while testing the backend FS.

Pavan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster's cpuload is too high on specific birck daemon

2011-07-27 Thread Pavan T C
On Wednesday 27 July 2011 01:45 PM, 공용준(yongjoon kong)/Cloud 
Computing 기술담당/SKCC wrote:

Hello,

I'm gluster with distributed-replicated mode(4 brick server)

And 10client server mount gluster volume brick1 server. ( mount -t glustefs 
brick1:/volume /mnt)

And there's very strange thing.

The brick1's cpu load is too high. From 'top' command, it's over 400%
But other brick's load is too low.


It is possible that an AFR self heal is getting triggered.
On the brick, run the following command:

strace -f -c -p glusterfs.pid

and provide the output.

Pavan



Is there any reason for this? Or Is there anyway tracking down this issue?

Thanks.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster client performance

2011-07-27 Thread Pavan T C

But that still does not explain why you should get as low as 50 MB/s for
a single stream single client write when the backend can support direct
IO throughput of more than 700 MB/s.

On the server, can you collect:

# iostat -xcdh 2  iostat.log.brickXX

for the duration of the dd command ?

and

# strace -f -o stracelog.server -tt -T -e trace=write,writev -p
glusterfsd.pid
(again for the duration of the dd command)


Hi John,

A small change in the request. I hope you have not already spent time on 
this. The strace command should be:


strace -f -o stracelog.server -tt -T -e trace=pwrite -p
glusterfsd.pid

Thanks,
Pavan



With the above, I want to measure the delay between the writes coming in
from the client. iostat will describe the IO scenario on the server.
Once the exercise is done, please attach the iostat.log.brickXX and
stracelog.server.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster client performance

2011-07-26 Thread Pavan T C

On Tuesday 26 July 2011 03:42 AM, John Lalande wrote:

Hi-

I'm new to Gluster, but am trying to get it set up on a new compute
cluster we're building. We picked Gluster for one of our cluster file
systems (we're also using Lustre for fast scratch space), but the
Gluster performance has been so bad that I think maybe we have a
configuration problem -- perhaps we're missing a tuning parameter that
would help, but I can't find anything in the Gluster documentation --
all the tuning info I've found seems geared toward Gluster 2.x.

For some background, our compute cluster has 64 compute nodes. The
gluster storage pool has 10 Dell PowerEdge R515 servers, each with 12 x
2 TB disks. We have another 16 Dell PowerEdge R515s used as Lustre
storage servers. The compute and storage nodes are all connected via QDR
Infiniband. Both Gluster and Lustre are set to use RDMA over Infiniband.
We are using OFED version 1.5.2-20101219, Gluster 3.2.2 and CentOS 5.5
on both the compute and storage nodes.


Hi John,

I would need some more information about your setup to estimate the 
performance you should get with your gluster setup.


1. Can you provide the details of how disks are connected to the storage 
boxes ? Is it via FC ? What raid configuration is it using (if at all any) ?


2. What is the disk bandwidth you are getting on the local filesystem on 
a given storage node ? I mean, pick any of the 10 storage servers 
dedicated for Gluster Storage and perform a dd as below:


Write bandwidth measurement:
dd if=/dev/zero of=/export_directory/10g_file bs=128K count=8 
oflag=direct


Read bandwidth measurement:
dd if=/export_directory/10g_file of=/dev/null bs=128K count=8 
iflag=direct


[The above command is doing a direct IO of 10GB via your backend FS - 
ext4/xfs.]


3. What is the IB bandwidth that you are getting between the compute 
node and the glusterfs storage node? You can run the tool rdma_bw to 
get the details:


On the server, run:
# rdma_bw -b
[ -b measures bi-directional bandwidth]

On the compute node, run,
# rdma_bw -b server

[If you have not already installed it, rdma_bw is available via -
http://mirror.centos.org/centos/5/os/x86_64/CentOS/perftest-1.2.3-1.el5.x86_64.rpm]

Lets start with this, and I will ask for more if necessary.

Pavan



Oddly, it seems like there's some sort of bottleneck on the client side
-- for example, we're only seeing about 50 MB/s write throughput from a
single compute node when writing a 10GB file. But, if we run multiple
simultaneous writes from multiple compute nodes to the same Gluster
volume, we get 50 MB/s from each compute node. However, running multiple
writes from the same compute node does not increase throughput. The
compute nodes have 48 cores and 128 GB RAM, so I don't think the issue
is with the compute node hardware.

With Lustre, on the same hardware, with the same version of OFED, we're
seeing write throughput on that same 10 GB file as follows: 476 MB/s
single stream write from a single compute node and aggregate performance
of more like 2.4 GB/s if we run simultaneous writes. That leads me to
believe that we don't have a problem with RDMA, otherwise Lustre, which
is also using RDMA, should be similarly affected.

We have tried both xfs and ext4 for the backend file system on the
Gluster storage nodes (we're currently using ext4). We went with
distributed (not distributed striped) for the Gluster volume -- the
thought was that if there was a catastrophic failure of one of the
storage nodes, we'd only lose the data on that node; presumably with
distributed striped you'd lose any data striped across that volume,
unless I have misinterpreted the documentation.

So ... what's expected/normal throughput for Gluster over QDR IB to a
relatively large storage pool (10 servers / 120 disks)? Does anyone have
suggested tuning tips for improving performance?

Thanks!

John



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster client performance

2011-07-26 Thread Pavan T C

On Tuesday 26 July 2011 09:24 PM, John Lalande wrote:

Thanks for your help, Pavan!


Hi John,

I would need some more information about your setup to estimate the
performance you should get with your gluster setup.

1. Can you provide the details of how disks are connected to the
storage boxes ? Is it via FC ? What raid configuration is it using (if
at all any) ?

The disks are 2TB near-line SAS direct attached via a PERC H700
controller (the Dell PowerEdge R515 has 12 3.5 drive bays). They are in
a RAID6 config, exported as a single volume, that's split into 3
equal-size partitions (due to ext4's (well, e2fsprogs') 16 TB limit).


2. What is the disk bandwidth you are getting on the local filesystem
on a given storage node ? I mean, pick any of the 10 storage servers
dedicated for Gluster Storage and perform a dd as below:

Seeing an average of 740 MB/s write, 971 GB/s read.


I presume you did this in one of the /data-brick*/export directories ?
Command output with the command line would have been clearer, but thats 
fine.






3. What is the IB bandwidth that you are getting between the compute
node and the glusterfs storage node? You can run the tool rdma_bw to
get the details:

30407: Bandwidth peak (#0 to #976): 2594.58 MB/sec
30407: Bandwidth average: 2593.62 MB/sec
30407: Service Demand peak (#0 to #976): 978 cycles/KB
30407: Service Demand Avg : 978 cycles/KB


This looks like a DDR connection. ibv_devinfo -v will tell a better 
story about the line width and speed of your infiniband connection.

QDR should have a much higher bandwidth.

But that still does not explain why you should get as low as 50 MB/s for 
a single stream single client write when the backend can support direct 
IO throughput of more than 700 MB/s.


On the server, can you collect:

# iostat -xcdh 2  iostat.log.brickXX

for the duration of the dd command ?

and

# strace -f -o stracelog.server -tt -T -e trace=write,writev -p 
glusterfsd.pid

(again for the duration of the dd command)

With the above, I want to measure the delay between the writes coming in 
from the client. iostat will describe the IO scenario on the server.
Once the exercise is done, please attach the iostat.log.brickXX and 
stracelog.server.


Pavan




Here's our gluster config:

# gluster volume info data

Volume Name: data
Type: Distribute
Status: Started
Number of Bricks: 30
Transport-type: rdma
Bricks:
Brick1: data-3-1-infiniband.infiniband:/data-brick1/export
Brick2: data-3-3-infiniband.infiniband:/data-brick1/export
Brick3: data-3-5-infiniband.infiniband:/data-brick1/export
Brick4: data-3-7-infiniband.infiniband:/data-brick1/export
Brick5: data-3-9-infiniband.infiniband:/data-brick1/export
Brick6: data-3-11-infiniband.infiniband:/data-brick1/export
Brick7: data-3-13-infiniband.infiniband:/data-brick1/export
Brick8: data-3-15-infiniband.infiniband:/data-brick1/export
Brick9: data-3-17-infiniband.infiniband:/data-brick1/export
Brick10: data-3-19-infiniband.infiniband:/data-brick1/export
Brick11: data-3-1-infiniband.infiniband:/data-brick2/export
Brick12: data-3-3-infiniband.infiniband:/data-brick2/export
Brick13: data-3-5-infiniband.infiniband:/data-brick2/export
Brick14: data-3-7-infiniband.infiniband:/data-brick2/export
Brick15: data-3-9-infiniband.infiniband:/data-brick2/export
Brick16: data-3-11-infiniband.infiniband:/data-brick2/export
Brick17: data-3-13-infiniband.infiniband:/data-brick2/export
Brick18: data-3-15-infiniband.infiniband:/data-brick2/export
Brick19: data-3-17-infiniband.infiniband:/data-brick2/export
Brick20: data-3-19-infiniband.infiniband:/data-brick2/export
Brick21: data-3-1-infiniband.infiniband:/data-brick3/export
Brick22: data-3-3-infiniband.infiniband:/data-brick3/export
Brick23: data-3-5-infiniband.infiniband:/data-brick3/export
Brick24: data-3-7-infiniband.infiniband:/data-brick3/export
Brick25: data-3-9-infiniband.infiniband:/data-brick3/export
Brick26: data-3-11-infiniband.infiniband:/data-brick3/export
Brick27: data-3-13-infiniband.infiniband:/data-brick3/export
Brick28: data-3-15-infiniband.infiniband:/data-brick3/export
Brick29: data-3-17-infiniband.infiniband:/data-brick3/export
Brick30: data-3-19-infiniband.infiniband:/data-brick3/export
Options Reconfigured:
nfs.disable: on



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance on GlusterFS

2011-06-25 Thread Pavan T C

On Saturday 25 June 2011 03:56 PM, anish.b.ku...@ril.com wrote:

Yes sure it's  74 MB .
  I am using Gluster version 3.2.1.1

In my 4 node cluster setup, one of my node on which I am performing test run is 
physical sever of HP proliant DL 380 G5  having RHEL 5.5 OS ,is having 1000Mbps 
network.
Other three nodes are hosted on Windows 2008 R2 on virtual machine(VM ware 
Application)  , host machine network is of 1Gbps


What are you virtual machines? Linux, I suppose?

A few aspects of your setup makes the comparison unfair -

1. Since you run untar on the local file system on a physical server, 
the is a possibility to see the effect of write caching.


2. Since glusterfs is working on VMs, the comparison of its performance 
with that on a physical server is not fair.


3. The VMs are hosted on a system with low network bandwidth.

4. The IO throughput inside a VM is limited by the throughput of the 
host file system, in this case - a Windows filesystem (NTFS) ?


What is the amount of RAM on the Windows system hosting the VMs?

Pavan

PS: Adding gluster-users. The discussion might help others.



Regards,
Anish Kumar


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance on GlusterFS

2011-06-24 Thread Pavan T C

On Friday 24 June 2011 10:29 AM, anish.b.ku...@ril.com wrote:

Hi….

I have setup a 4 node cluster on virtual servers on RHEL platform.

It would help if you can post the output of gluster volume info, to 
start with. Are you using some benchmark to compare GlusterFS 
performance with local filesystem performance?


Pavan


Not able to get better performance statistics on glusterFS as compared
to local file system.

Kindly suggest a test run that can be checked to differentiate between them.

Regards,

Anish Kumar

Confidentiality Warning: This message and any attachments are intended
only for the use of the intended recipient(s).
are confidential and may be privileged. If you are not the intended
recipient. you are hereby notified that any
review. re-transmission. conversion to hard copy. copying. circulation
or other use of this message and any attachments is
strictly prohibited. If you are not the intended recipient. please
notify the sender immediately by return email.
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to
ensure no viruses are present in this email.
The company cannot accept responsibility for any loss or damage arising
from the use of this email or attachment.



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster3.2@Grid5000] 128 nodes failure and rr scheduler question

2011-06-12 Thread Pavan T C

On Sunday 12 June 2011 07:00 PM, François Thiebolt wrote:

Hello,

To make things clear, what I've done is :
- deploying GlusterFS on 2, 4, 8, 16, 32, 64, 128 nodes
- running a variant of the MAB benchmark (it's all about compilation of 
openssl-1.0.0) on 2, 4, 8, 16, 32, 64, 128 nodes
- I used 'pdsh -f 512' to start MAB on all nodes at the same time
- on each experiment on each node, I ran MAB  in a dedicated directory within the glusterfs 
global namespace (e.g. nodeA usedgluster global namespace/nodeA/mab files) 
to avoid a metadata storm on the parent directory inode
- between each experiment, I destroy and redeploy a complete new GlusterFS 
setup (and I also destroy everything within each brick i.e the exported storage 
dir)

I then compare the average compilation time vs the number of nodes ... and it 
increases due to the round robin scheduler that dispatches files on all the 
bricks
2 : Phase_V(s)avg   249.9332121175
4 : Phase_V(s)avg   262.808117374
8 : Phase_V(s)avg   293.572061537875
16 : Phase_V(s)avg   351.436554833375
32 : Phase_V(s)avg   546.503069517844
64 : Phase_V(s)avg   1010.61019479478
(phase V is related to the compilation itself, previous phases are about 
metadata ops)
You can also try to compile a linux kernel on your own, this is pretty much the 
same thing.


Thanks much for your detailed description.
Is phase_V the only phase where you are seeing reduced performance?

With regards to your problem, since you are using the bricks also as 
clients, you have a NUMA kind of scenario. In the case of two bricks 
(and hence two client), during compilation, ~50% of the files will be 
available locally for the client for which the latencies will be 
minimal, and the other 50% with suffer additional latencies. As you 
increase the number of nodes, this asymmetry is seen for more number of 
files.
So, the problem is not really the introduction of more servers, but the 
degree of asymmetry your application is seeing. Your numbers for 2 nodes 
might not be a good indicator of the average performance. Try the same 
experiment by separating the clients and the servers. If you still see 
reverse-linear performance with increased bricks/clients, we can 
investigate further.


Pavan



Now regarding the GlusterFS setup : yes, you're right, there is no replication 
so this is a simple stripping (on a file basis) setup
Each time, I create a glusterfs volume featuring one brick, then i add bricks 
(one by one) till I reach the number of nodes ... and after that, I start the 
volume.
Now regarding the 128bricks case, this is when I start the volume that I get a random 
error telling me thatbrickX  does not respond, and this changes every time I 
retry to start the volume.
So far, I didn't tested with a number of nodes between 64 and 128

François

On Friday, June 10, 2011 16:38 CEST, Pavan T Ct...@gluster.com  wrote:


On Wednesday 08 June 2011 06:10 PM, Francois THIEBOLT wrote:

Hello,

I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a
first point, i've been unable to start a volume featuring 128bricks (64 ok)

Then, due to the round-robin scheduler, as the number of nodes increase
(every node is also a brick), the performance of an application on an
individual node decrease!


I would like to understand what you mean by increase of nodes. You
have 64 bricks and each brick also acts as a client. So, where is the
increase in the number of nodes? Are you referring to the mounts that
you are doing?

What is your gluster configuration - I mean, is it a distribute only, or
is it a distributed-replicate setup? [From your command sequence, it
should be a pure distribute, but I just want to be sure].

What is your application like? Is it mostly I/O intensive? It will help
if you provide a brief description of typical operations done by your
application.

How are you measuring the performance? What parameter determines that
you are experiencing a decrease in performance with increase in the
number of nodes?

Pavan


So my question is : how to STOP the round-robin distribution of files
over the bricks within a volume ?

*** Setup ***
- i'm using glusterfs3.2 from source
- every node is both a client node and a brick (storage)
Commands :
- gluster peer probeeach of the 128nodes
- gluster volume create myVolume transport tcp128 bricks:/storage
- gluster volume start myVolume (fails with 128 bricks!)
- mount -t glusterfs .. on all nodes

Feel free to tell me how to improve things

François










___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster3.2@Grid5000] 128 nodes failure and rr scheduler question

2011-06-10 Thread Pavan T C

On Wednesday 08 June 2011 06:10 PM, Francois THIEBOLT wrote:

Hello,

I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a
first point, i've been unable to start a volume featuring 128bricks (64 ok)

Then, due to the round-robin scheduler, as the number of nodes increase
(every node is also a brick), the performance of an application on an
individual node decrease!


I would like to understand what you mean by increase of nodes. You 
have 64 bricks and each brick also acts as a client. So, where is the 
increase in the number of nodes? Are you referring to the mounts that 
you are doing?


What is your gluster configuration - I mean, is it a distribute only, or 
is it a distributed-replicate setup? [From your command sequence, it 
should be a pure distribute, but I just want to be sure].


What is your application like? Is it mostly I/O intensive? It will help 
if you provide a brief description of typical operations done by your 
application.


How are you measuring the performance? What parameter determines that 
you are experiencing a decrease in performance with increase in the 
number of nodes?


Pavan


So my question is : how to STOP the round-robin distribution of files
over the bricks within a volume ?

*** Setup ***
- i'm using glusterfs3.2 from source
- every node is both a client node and a brick (storage)
Commands :
- gluster peer probe each of the 128nodes
- gluster volume create myVolume transport tcp 128 bricks:/storage
- gluster volume start myVolume (fails with 128 bricks!)
- mount -t glusterfs .. on all nodes

Feel free to tell me how to improve things

François



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users