Re: [Gluster-users] [ovirt-users] Re: Single instance scaleup.

2020-06-05 Thread Krist van Besien
Hi all.

I acrtually did something like that myself.

I started out with a single node HC cluster. I then added another node (and 
plan to add a third). This is what I did:

1) Set up the new node. Make sure that you have all dependencies. (In my case I 
started with a Centos 8 machine, and installed vdms-gluster and gluster-ansible)
2) Configure the bricks. For this I just copied over hc_wizard_inventory.yml 
over from the first node, edited it to fit the second node, and just ran the 
gluster.infra role.
3) Expand the volume. In this case with the following command:
gluster volume add-brick engine replica 2 :/gluster_bricks/engine/engine
4) now just add the host as a hypervisor using the management console.

I plan on adding a third node. Then I want to have full replica on the engine, 
and replica 2 + arbiter on the vmstore volume.

Expanding gluster volumes, migrating from distributed to replicated and even 
replacing bricks etc. is rather easy in Gluster once you know how it works. I 
have even replaced all the servers on a live gluster cluster, without service 
interruption…

Krist

On Jul 18, 2019, 09:58 +0200, Leo David , wrote:
> Hi,
> Looks like the only way arround would be to create a brand-new volume as 
> replicated on other disks, and start moving the vms all around the place 
> between volumes ?
> Cheers,
>
> Leo
>
> > On Mon, May 27, 2019 at 1:53 PM Leo David  wrote:
> > > Hi,
> > > Any suggestions ?
> > > Thank you very much !
> > >
> > > Leo
> > >
> > > > On Sun, May 26, 2019 at 4:38 PM Strahil Nikolov  
> > > > wrote:
> > > > > Yeah,
> > > > > it seems different from the docs.
> > > > > I'm adding the gluster users list ,as they are more experienced into 
> > > > > that.
> > > > >
> > > > > @Gluster-users,
> > > > >
> > > > > can you provide some hint how to add aditional replicas to the below 
> > > > > volumes , so they become 'replica 2 arbiter 1' or 'replica 3' type 
> > > > > volumes ?
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Strahil Nikolov
> > > > >
> > > > > В неделя, 26 май 2019 г., 15:16:18 ч. Гринуич+3, Leo David 
> > > > >  написа:
> > > > >
> > > > >
> > > > > Thank you Strahil,
> > > > > The engine and ssd-samsung are distributed...
> > > > > So these are the ones that I need to have replicated accross new 
> > > > > nodes.
> > > > > I am not very sure about the procedure to accomplish this.
> > > > > Thanks,
> > > > >
> > > > > Leo
> > > > >
> > > > > On Sun, May 26, 2019, 13:04 Strahil  wrote:
> > > > > > Hi Leo,
> > > > > > As you do not have a distributed volume , you can easily switch to 
> > > > > > replica 2 arbiter 1 or replica 3 volumes.
> > > > > > You can use the following for adding the bricks:
> > > > > > https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Expanding_Volumes.html
> > > > > > Best Regards,
> > > > > > Strahil Nikoliv
> > > > > > On May 26, 2019 10:54, Leo David  wrote:
> > > > > > > Hi Stahil,
> > > > > > > Thank you so much for yout input !
> > > > > > >
> > > > > > >  gluster volume info
> > > > > > >
> > > > > > >
> > > > > > > Volume Name: engine
> > > > > > > Type: Distribute
> > > > > > > Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1
> > > > > > > Status: Started
> > > > > > > Snapshot Count: 0
> > > > > > > Number of Bricks: 1
> > > > > > > Transport-type: tcp
> > > > > > > Bricks:
> > > > > > > Brick1: 192.168.80.191:/gluster_bricks/engine/engine
> > > > > > > Options Reconfigured:
> > > > > > > nfs.disable: on
> > > > > > > transport.address-family: inet
> > > > > > > storage.owner-uid: 36
> > > > > > > storage.owner-gid: 36
> > > > > > > features.shard: on
> > > > > > > performance.low-prio-threads: 32
> > > > > > > performance.strict-o-direct: off
> > > > > > > network.remote-dio: off
> > > > > > > network.ping-timeout: 30
> > > > > > > user.cifs: off
> > > > > > > performance.quick-read: off
> > > > > > > performance.read-ahead: off
> > > > > > > performance.io-cache: off
> > > > > > > cluster.eager-lock: enable
> > > > > > > Volume Name: ssd-samsung
> > > > > > > Type: Distribute
> > > > > > > Volume ID: 76576cc6-220b-4651-952d-99846178a19e
> > > > > > > Status: Started
> > > > > > > Snapshot Count: 0
> > > > > > > Number of Bricks: 1
> > > > > > > Transport-type: tcp
> > > > > > > Bricks:
> > > > > > > Brick1: 192.168.80.191:/gluster_bricks/sdc/data
> > > > > > > Options Reconfigured:
> > > > > > > cluster.eager-lock: enable
> > > > > > > performance.io-cache: off
> > > > > > > performance.read-ahead: off
> > > > > > > performance.quick-read: off
> > > > > > > user.cifs: off
> > > > > > > network.ping-timeout: 30
> > > > > > > network.remote-dio: off
> > > > > > > performance.strict-o-direct: on
> > > > > > > performance.low-prio-threads: 32
> > > > > > > features.shard: on
> > > > > > > storage.owner-gid: 36
> > > > > > > storage.owner-uid: 36
> > > > > > > transport.address-family: inet
> > > > > > > nfs.disable: on
> > > > > > >
> > > > > > > The other two hosts will be 

[Gluster-users] Trying to remove a brick (with heketi) fails...

2017-10-19 Thread Krist van Besien
Hello,

I have a gluster cluster with 4 nodes, that is managed using heketi. I want
to test the removeal of one node.
We have several volumes on it, some with rep=2, others with rep=3.

I get the following error:

[root@CTYI1458 .ssh]# heketi-cli --user admin --secret "**" node remove
749850f8e5fd23cf6a224b7490499659
Error: Failed to remove device, error: Cannot replace brick
d2026206f3fcd8b497c9ee28f014d845 as only 1 of 2 required peer bricks are
online

What heketi does when you do a node remove is one by one replace all the
bricks. So it just executes a slew of "gluster volume replace-brick"
commands, and these were in our case successfull, except for one volume,
which is a rep=2 volume. And there we got this error:

Cannot replace brick d2026206f3fcd8b497c9ee28f014d845 as only 1 of 2
required peer bricks are online

What does this actually mean? And how do we fix this?

Krist




-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
----------

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

kr...@redhat.comM: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-25 Thread Krist van Besien
On 25 August 2017 at 04:47, Vijay Bellur <vbel...@redhat.com> wrote:

>
>
> On Thu, Aug 24, 2017 at 9:01 AM, Krist van Besien <kr...@redhat.com>
> wrote:
>
> Would it be possible to obtain a statedump of the native client when the
> application becomes completely unresponsive? A statedump can help in
> understanding operations within the gluster stack. Log file of the native
> client might also offer some clues.
>

I've increased logging to debug on both client and bricks, but didn't see
anything that hinted at problems.
Maybe we have to go for Ganesha after all.

But currently we are stuck at the customer having trouble actually
generating enough load to test the server with...

When I try to simulate the workload with a script that writes and renames
files at the same rate the the video recorders do I can run it without any
issue, and can ramp up to the point where I am hitting the network ceiling.
So the gluster cluster is up to the task.
But the recorder software itself is running in to issues. Which makes me
suspect that it may have to do with the way some aspects of it are coded.
And it is there I am looking for answers. Any hints, like "if you call
fopen() you should give these flags an not these flags or you get in to
trouble"...

Krist

-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

kr...@redhat.comM: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-24 Thread Krist van Besien
Hi
This is gluster 3.8.4. Volume options are out of the box. Sharding is off
(and I don't think enabling it would matter)

I haven't done much performance tuning. For one thing, using a simple
script that just creates files I can easily flood the network, so I don't
expect a performance issue.

The problem we see is that after a certain time the fuse clients completely
stop accepting writes. Something is preventing the application to write
after a while.
We see this on the fuse client, but not when we use nfs. So the question I
am interested in seeing an answer too is in what way is nfs different from
fuse that could cause this.

My suspicion is it is locking related.

Krist


On 24 August 2017 at 14:36, Everton Brogliatto <broglia...@gmail.com> wrote:

> Hi Krist,
>
> What are your volume options on that setup? Have you tried tuning it for
> the kind of workload and files size you have?
>
> I would definitely do some tests with feature.shard=on/off first. If shard
> is on, try playing with features.shard-block-size.
> Do you have jumbo frames (MTU=9000) enabled across the switch and nodes?
> if you have concurrent clients writing/reading, it could be beneficial to
> increase the number of client and server threads as well, try setting
> higher values for client.event-threads and server.event-threads.
>
> Best regards,
> Everton Brogliatto
>
>
>
> On Thu, Aug 24, 2017 at 7:48 PM, Krist van Besien <kr...@redhat.com>
> wrote:
>
>> Hi all,
>>
>> I usualy advise clients to use the native client if at all possible, as
>> it is very robust. But I am running in to problems here.
>>
>> In this case the gluster system is used to store video streams. Basicaly
>> the setup is the following:
>> - A gluster cluster of 3 nodes, with ample storage. They export several
>> volumes.
>> - The network is 10GB, switched.
>> - A "recording server" which subscribes to multi cast video streams, and
>> records them to disk. The recorder writes the streams in 10s blocks, so
>> when it is for example recording 50 streams it is creating 5 files a
>> second, each about 5M. it uses a write-then-rename process.
>>
>> I simulated that with a small script, that wrote 5M files and renamed
>> them as fast as it could, and could easily create around 100 files/s (which
>> abouts saturates the network). So I think the cluster is up to the task.
>>
>> However if we try the actualy workload we run in to trouble. Running the
>> recorder software we can gradually ramp up the number of streams it records
>> (and thus the number of files it creates), and at arou d 50 streams the
>> recorder eventually stops writing files. According to the programmers that
>> wrote it, it appears that it can no longer get the needed locks¸ and as a
>> result just stops writing.
>>
>> We decided to test using the NFS client as well, and there the problem
>> does not exist. But again, I (and the customer) would prefer not to use
>> NFS, but use the native client in stead.
>>
>> So if the problem is file locking, and the problem exists with the native
>> client, and not using NFS, what could be the cause?
>>
>> In what way do locking differ between the two different file systems,
>> between NFS and Fuse, and how can the programmers work around any issues
>> the fuse client might be causing?
>>
>> This video stream software is a bespoke solution, developped in house and
>> it is thus possible to change the way it handles files so it works with the
>> native client, but the programmers are looking at me for guidance.
>>
>> Any suggestions?
>>
>> Krist
>>
>>
>>
>>
>> --
>> Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
>> --
>>
>> Krist van Besien
>>
>> senior architect, RHCE, RHCSA Open Stack
>>
>> Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>
>>
>> kr...@redhat.comM: +41-79-5936260
>> <https://red.ht/sig>
>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>


-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

kr...@redhat.comM: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

2017-08-24 Thread Krist van Besien
Hi all,

I usualy advise clients to use the native client if at all possible, as it
is very robust. But I am running in to problems here.

In this case the gluster system is used to store video streams. Basicaly
the setup is the following:
- A gluster cluster of 3 nodes, with ample storage. They export several
volumes.
- The network is 10GB, switched.
- A "recording server" which subscribes to multi cast video streams, and
records them to disk. The recorder writes the streams in 10s blocks, so
when it is for example recording 50 streams it is creating 5 files a
second, each about 5M. it uses a write-then-rename process.

I simulated that with a small script, that wrote 5M files and renamed them
as fast as it could, and could easily create around 100 files/s (which
abouts saturates the network). So I think the cluster is up to the task.

However if we try the actualy workload we run in to trouble. Running the
recorder software we can gradually ramp up the number of streams it records
(and thus the number of files it creates), and at arou d 50 streams the
recorder eventually stops writing files. According to the programmers that
wrote it, it appears that it can no longer get the needed locks¸ and as a
result just stops writing.

We decided to test using the NFS client as well, and there the problem does
not exist. But again, I (and the customer) would prefer not to use NFS, but
use the native client in stead.

So if the problem is file locking, and the problem exists with the native
client, and not using NFS, what could be the cause?

In what way do locking differ between the two different file systems,
between NFS and Fuse, and how can the programmers work around any issues
the fuse client might be causing?

This video stream software is a bespoke solution, developped in house and
it is thus possible to change the way it handles files so it works with the
native client, but the programmers are looking at me for guidance.

Any suggestions?

Krist




-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
------

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

kr...@redhat.comM: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Performance testing with sysbench...

2017-08-22 Thread Krist van Besien
Hi all,

I'm doing some performance test...

If I test a simple sequential write using dd I get a thorughput of about
550 Mb/s. When I do a sequential write test using sysbench this drops to
about 200. Is this due to the way sysbench tests? Or has in this case the
performance of sysbench itself become the bottleneck?

Krist


-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

kr...@redhat.comM: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Heketi and Geo Replication.

2017-07-26 Thread Krist van Besien
Hello,

Is it possible to set up a Heketi Managed gluster cluster in one
datacenter, and then have geo replication for all volumes to a second
cluster in another datacenter?

I've been looking at that, but haven't really found a recipe/solution for
this.

Ideally what I want is that when a volume is created in cluster1, that a
slave volume is automatically created in cluster2, and replication set up.

Krist



-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

kr...@redhat.comM: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] I need a sanity check.

2017-07-05 Thread Krist van Besien
You are confusing volume with brick.

You do not have a "Replicate Brick", you have one 1x3 volume, composed of 3
bricks, and one 1x2 volume made up of 2 bricks. You do need to understand
the difference between volume and brick

Also you need to be aware of the differences between server quorum and
client quorum. For client quorum you need three bricks. For the third brick
you can use an arbiter brick however.

Krist




On 4 July 2017 at 19:28, Ernie Dunbar <maill...@lightspeed.ca> wrote:

> Hi everyone!
>
> I need a sanity check on our Server Quorum Ratio settings to ensure the
> maximum uptime for our virtual machines. I'd like to modify them slightly,
> but I'm not really interested in experimenting with live servers to see if
> what I'm doing is going to work, but I think that the theory is sound.
>
> We have a Gluster array of 3 servers containing two Replicate bricks.
>
> Brick 1 is a 1x3 arrangement where this brick is replicated on all three
> servers. The quorum ratio is set to 51%, so that if any one Gluster server
> goes down, the brick is still in Read/Write mode and the broken server will
> update itself when it comes back online. The clients won't notice a thing,
> while still ensuring that a split-brain condition doesn't occur.
>
> Brick 2 is a 1x2 arrangement where this brick is replicated across only
> two servers. The quorum ratio is currently also set to 51%, but my
> understanding is that if one of the servers that hosts this brick goes
> down, it will go into read-only mode, which would probably be disruptive to
> the VMs we host on this brick.
>
> My understanding is that since there are three servers in the array, I
> should be able to set the quorum ratio on Brick2 to 50% and the array will
> still be able to prevent a split-brain from occurring, because the other
> two servers will know which one is offline.
>
> The alternative of course, is to simply flesh out Brick2 with a third
> disk. However, I've heard that 1x2 replication is faster than 1x3, and we'd
> prefer that extra speed for this task.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

kr...@redhat.comM: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] File locking...

2017-06-02 Thread Krist van Besien
Hi all,

A few questions.

- Is POSIX locking enabled when using the native client? I would assume yes.
- What other settings/tuneables exist when it comes to file locking?

Krist


-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--
Krist van Besien | Senior Architect | Red Hat EMEA Cloud Practice | RHCE |
RHCSA Open Stack
@: kr...@redhat.com | M: +41-79-5936260
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] "Another Transaction is in progres..."

2017-06-01 Thread Krist van Besien
Thanks for the suggestion, this solved it for us, and we probably found the
cause as well. We had performance co-pilot running and it was continously
enabling profiling on volumes...
We found the reference to the node that had the lock, and restarted
glusterd on that node, and all went well from there on.

Krist


On 31 May 2017 at 15:56, Vijay Bellur <vbel...@redhat.com> wrote:

>
>
> On Wed, May 31, 2017 at 9:32 AM, Krist van Besien <kr...@redhat.com>
> wrote:
>
>> Hi all,
>>
>> I am trying to do trivial things, like setting quota, or just querying
>> the status and keep getting
>>
>> "Another transaction is in progres for "
>>
>> These messages pop up, then disappear for a while, then pop up again...
>>
>> What do these messages mean? How do I figure out which "transaction" is
>> meant here, and what do I do about it?
>>
>
>
> This message usually means that a different gluster command is being
> executed in the cluster. Most gluster commands are serialized by a cluster
> wide lock. Upon not being able to acquire the cluster lock, this message is
> displayed.
>
> You can check /var/log/glusterfs/cmd_history.log on all storage nodes to
> observe what other commands are in progress at the time of getting this
> error message. Are you per chance using oVirt to manage Gluster? oVirt
> periodically does a "gluster volume status" to determine the volume health
> and that can conflict with other commands being executed.
>
> Regards,
> Vijay
>
>


-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--
Krist van Besien | Senior Architect | Red Hat EMEA Cloud Practice | RHCE |
RHCSA Open Stack
@: kr...@redhat.com | M: +41-79-5936260
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Disconnected gluster node things it is still connected...

2017-06-01 Thread Krist van Besien
Hi all,

Trying to do some availability testing.

We have three nodes: node1, node2, node3. Volumes are all replica 2, across
all three nodes.

As a test we disconnected node1, buy removing the vlan tag for that host on
the switch it is connected to. As a result node2 and node3 now show node1
in disconnected status, and show the volumes as degraded.
This is ecpected.

However logging in to node1 (via the ilo, as there is no network) showed
that this node still though it was connected to node2 and node3, even
though it could no longer communicate with it.

Also it did keep its bricks up...

This is not as expected. What I expected is that node1 detects it is no
longer part of a quorum, and takes all its bricks down.

So what did we miss?

Krist




-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--
Krist van Besien | Senior Architect | Red Hat EMEA Cloud Practice | RHCE |
RHCSA Open Stack
@: kr...@redhat.com | M: +41-79-5936260
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] "Another Transaction is in progres..."

2017-05-31 Thread Krist van Besien
Hi all,

I am trying to do trivial things, like setting quota, or just querying the
status and keep getting

"Another transaction is in progres for "

These messages pop up, then disappear for a while, then pop up again...

What do these messages mean? How do I figure out which "transaction" is
meant here, and what do I do about it?

Krist


-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--------------
Krist van Besien | Senior Architect | Red Hat EMEA Cloud Practice | RHCE |
RHCSA Open Stack
@: kr...@redhat.com | M: +41-79-5936260
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Performance testing

2017-04-03 Thread Krist van Besien
Hi All,

I build a Gluster 3.8.4 (RHGS 3.2) cluster for a customer, and I am having
some issue demonstrating that it performs well.

The customer compares it with his old NFS based NAS, and runs FIO to test
workloads.

What I notice is that FIO throughtput is only +-20Mb/s, which is not a lot.
When I do a simple test with dd I easily get 600Mb/s throughput.
In the fio job file the option "direct=1" is used, which bypasses caching.
If we run a fio job with direct=0 the performance goes up a lot, and is
near 600Mb/s as well.

The customer insists that on his old system (that Gluster should replace)
he could get 600Mb/s throughput with fio, with the setting direct=1. and
that he was rather underwhelmed by the performance of Gluster here.

What I need is answers to either:
- Have I overlooked something? I have not really done much tuning yet. Is
there some obvious paremeter I overlooked that could change the results of
a fio performance test?

or:

- Is testing with "direct=1" not really a way to test Gluster, as the cache
is a rather important part of what is needed to make gluster perform?

-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--------------
Krist van Besien | Senior Architect | Red Hat EMEA Cloud Practice | RHCE |
RHCSA Open Stack
@: kr...@redhat.com | M: +41-79-5936260
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] New to Gluster. Having trouble with server replacement.

2013-11-12 Thread Krist van Besien
Hello all,

I'm new to gluster. In order to gain some knowledge, and test a few
things I decided to install it on three servers and play around with
it a bit.

My setup:
Three servers dc1-09, dc2-09, dc2-10. All with RHEL 6.4, and Gluster
3.4.0 (from RHS 2.1)
Each server has three disks, mounted in /mnt/raid1, /mnt/raid2 and /mnt/raid3.

I created a distributed/replicated volume, test1, with two replicas.

[root@dc2-10 ~]# gluster volume info test1

Volume Name: test1
Type: Distributed-Replicate
Volume ID: 59049b52-9e25-4cc9-bebd-fb3587948900
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: dc1-09:/mnt/raid1/test1
Brick2: dc2-09:/mnt/raid2/test1
Brick3: dc2-09:/mnt/raid1/test1
Brick4: dc2-10:/mnt/raid2/test1
Brick5: dc2-10:/mnt/raid1/test1
Brick6: dc1-09:/mnt/raid2/test1


I mounted this volume on a fourth unix server, and started a small
script that just keeps writing small files to it, in order to have
some activity.
Then I shut down one of the servers, started it again, shut down
another etc... gluster proved to have no problem keeping the files
available.

Then I decided to just nuke one server, and just completely
reinitialise it. After reinstalling OS + Gluster I had some trouble
getting the server back in the pool.
I followed two hints I found on the internet, and added the old UUID
in to glusterd.info, and made sure the correct
trusted.glusterfs.volume-id was set on all bricks.

Now the new server starts storing stuff again. But it still looks a
bit odd. I don't get consistent output from gluster volume status on
all three servers.

gluster volume info test1 gives me the same output everywhere. However
the output of gluster volume status is different:

[root@dc1-09 glusterd]# gluster volume status test1
Status of volume: test1
Gluster process Port Online Pid
--
Brick dc1-09:/mnt/raid1/test1 49154 Y 10496
Brick dc2-09:/mnt/raid2/test1 49152 Y 7574
Brick dc2-09:/mnt/raid1/test1 49153 Y 7581
Brick dc1-09:/mnt/raid2/test1 49155 Y 10502
NFS Server on localhost 2049 Y 1039
Self-heal Daemon on localhost N/A Y 1046
NFS Server on dc2-09 2049 Y 12397
Self-heal Daemon on dc2-09 N/A Y 12444

There are no active volume tasks


[root@dc2-10 /]# gluster volume status test1
Status of volume: test1
Gluster process Port Online Pid
--
Brick dc2-09:/mnt/raid2/test1 49152 Y 7574
Brick dc2-09:/mnt/raid1/test1 49153 Y 7581
Brick dc2-10:/mnt/raid2/test1 49152 Y 9037
Brick dc2-10:/mnt/raid1/test1 49153 Y 9049
NFS Server on localhost 2049 Y 14266
Self-heal Daemon on localhost N/A Y 14281
NFS Server on 172.16.1.21 2049 Y 12397
Self-heal Daemon on 172.16.1.21 N/A Y 12444

There are no active volume tasks

[root@dc2-09 mnt]# gluster volume status test1
Status of volume: test1
Gluster process Port Online Pid
--
Brick dc1-09:/mnt/raid1/test1 49154 Y 10496
Brick dc2-09:/mnt/raid2/test1 49152 Y 7574
Brick dc2-09:/mnt/raid1/test1 49153 Y 7581
Brick dc2-10:/mnt/raid2/test1 49152 Y 9037
Brick dc2-10:/mnt/raid1/test1 49153 Y 9049
Brick dc1-09:/mnt/raid2/test1 49155 Y 10502
NFS Server on localhost 2049 Y 12397
Self-heal Daemon on localhost N/A Y 12444
NFS Server on dc2-10 2049 Y 14266
Self-heal Daemon on dc2-10 N/A Y 14281
NFS Server on dc1-09 2049 Y 1039
Self-heal Daemon on dc1-09 N/A Y 1046

There are no active volume tasks--

Why would the output of status be different on the three hosts? Is
this normal, or is there still something wrong? If so, how do I fix
this?


Krist


krist.vanbes...@gmail.com
kr...@vanbesien.org
Bern, Switzerland
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users