Re: [Gluster-users] Glusterfs community status

2024-07-09 Thread Marcus Pedersén
Hi,
Here is my take on gluster today.
We experience the same type of problem as well with
failing heals and manual, timeconsuming work.
I asked the same type of question on the list
a number of months ago about the gluster project state
and my conclution on the answers is that the gluster project
has just slowed down more and more and people are leaving for
other file systems.
For a long time nothing has been released from the project and the
mailing list has just gone more and more quiet.
Gluster has served us well for many years and I think that
gluster has been a really great filesystem and it makes me
sad to see that gluster is comming to an end. I really like it!!
Internally in our organization we have had discussions and made
tests with cephfs and our decision is that we will leave
gluster and use cephfs instead.
As we do not see that gluster will improve we have no other option
then to use other filesystems and in our case it will be ceph.

I hope this helps!!

Best regards
Marcus



On Tue, Jul 09, 2024 at 07:53:36AM +0200, Ilias Chasapakis forumZFD wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi, we at forumZFD are currently experiencing problems similar to those
> mentioned here on the mailing list especially on the latest messages.
>
> Our gluster just doesn't heal all entries and "manual" healing is long
> and tedious. Entries accumulate in time and we have to do regular
> cleanups that take long and are risky.  Despite changing available
> options with different combinations of values, the problem persists. So
> we thought, "let's go to the community meeting" if not much is happening
> here on the list. We are at the end of our knowledge and can therefore
> no longer contribute much to the list. Unfortunately, nobody was at the
> community meeting. Somehow we have the feeling that there is no one left
> in the community or in the project who is interested in fixing the
> basics of Gluster (namely the healing). Is that the case and is gluster
> really end of life?
>
> We appreciate a lot the contributions in the last few years and all the
> work done. As well as for the honest efforts to give a hand. But would
> be good to have an orientation on the status of the project itself.
>
> Many thanks in advance for any replies.
>
> Ilias
>
> --
> forumZFD
> Entschieden für Frieden | Committed to Peace
>
> Ilias Chasapakis
> Referent IT | IT Referent
>
> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service
> Am Kölner Brett 8 | 50825 Köln | Germany
>
> Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de
>
> Vorstand nach § 26 BGB, einzelvertretungsberechtigt|Executive Board:
> Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen
> VR 17651 Amtsgericht Köln
>
> Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00   BIC GENODEM1GLS
>



> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


--
**
* Marcus Pedersén*
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Heal failure

2024-01-19 Thread Marcus Pedersén
Hi all,
I have a really strange problem with my cluster.
Running gluster 10.4, replicated with an arbiter:
Number of Bricks: 1 x (2 + 1) = 3

All my files in the system seems fine and I have not
found any broken files.
Even though I have 4 files that needs healing,
in heal-count.
Heal fails for all the files over and over again.
If I use heal info I just get a long list of gfids
and trying gfids with the script resolve-gfid.sh
the only reply I get is:
File:   ls: cannot access 
'/urd-gds/gds-admin//.glusterfs/cd/b6/cdb62af8-ef52-4b8f-aa27-480405769877': No 
such file or directory

Have not tried them all, but quite many.

How can I get rid of these "failed" files, that
are not files?

Many thanks in advance!!

Best regards
Marcus

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Gluster 11 OP version

2023-12-19 Thread Marcus Pedersén
Hi all,
We upgraded to gluster 11.1 and the OP version
was fixed in this version, so I changed the OP version
to 11.
Now we have some obscure, vague problem.
Our users usually run 100+ processes with
GNU parallel and now the execution time have
increased close to the double.
I can see that there are a couple of heals happening every
now and then but this do not seem starange to me.
Just to make sure that it was not on the client side,
I downgraded glusterfs-client to 10 but we still
have this slow down.
I tried to lower the OP version back to 10 again
but this is apparently not possible:
volume set: failed: Required op-version (10) should not be equal or lower 
than current cluster op-version (11).

Before the change to OP version 11 everything
worked fine.

Is there a way to "manually" change the OP version back?
Or any other ideas on how to fix this?

Many thanks in advance!!

Best regards
Marcus
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster -> Ceph

2023-12-14 Thread Marcus Pedersén
Thanks for you feedback!
Please, do not get me wrong, I really like gluster
and it has served us well for many, many years.
But as from previous posts about gluster project health
this worries me and I want to be able to have a good
alternative prepared in case of 
Gluster is great and aligns well to our needs,
as mentioned ceph is for larger systems.
The problem is that there is not so many other
filesystems then can tick all the want boxes.
Opensource with a community, replication, snapshots aso.

Thanks a lot!!

Marcus


On Thu, Dec 14, 2023 at 07:08:46AM -0800, Joe Julian wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Big raid isn't great as bricks. If the array does fail, the larger brick 
> means much longer heal times.
>
> My main question I ask when evaluating storage solutions is, "what happens 
> when it fails?"
>
> With ceph, if the placement database is corrupted, all your data is lost 
> (happened to my employer, once, losing 5PB of customer data). With Gluster, 
> it's just files on disks, easily recovered.
>
> If your data is easily replaced, ceph offers copy-on-write which is really 
> handy for things like VM images where you might want to clone 100 
> simultaneously.
>
>
> On December 14, 2023 6:57:00 AM PST, Alvin Starr  wrote:
>
> On 2023-12-14 07:48, Marcus Pedersén wrote:
> Hi all,
> I am looking in to ceph and cephfs and in my
> head I am comparing with gluster.
>
> The way I have been running gluster over the years
> is either a replicated or replicated-distributed clusters.
> Here are my observations but I am far from an expert in either Ceph or 
> Gluster.
>
> Gluster works very well with 2 servers containing 2 big RAID disk arrays.
>
> Ceph on the other hand has MON,MGR,MDS...  that can run on multiple servers, 
> and should be for redundancy, but the OSDs should be lots of small servers 
> with very few disks attached.
>
> It kind of seems that the perfect OSD would be a disk with a raspberry pi 
> attached and a 2.5Gb nic.
> Something really cheap and replaceable.
>
> So putting Ceph on 2 big servers with RAID arrays is likely a very bad idea.
>
> I am hoping that someone picks up Gluster because it fits the storage 
> requirements for organizations who start measuring their storage in TB as 
> opposed to EB.
>
> The small setup we have had has been a replicated cluster
> with one arbiter and two fileservers.
> These fileservers has been configured with RAID6 and
> that raid has been used as the brick.
>
> If disaster strikes and one fileserver burns up
> there is still the other fileserver and as it is RAIDed
> I can loose two disks on this machine before I
> start to loose data.
>
>  thinking ceph and similar setup 
> The idea is to have one "admin" node and two fileservers.
> The admin node will run mon, mgr and mds.
> The storage nodes will run mon, mgr, mds and 8x osd (8 disks),
> with replication = 2.
>
> The problem is that I can not get my head around how
> to think when disaster strikes.
> So one fileserver burns up, there is still the other
> fileserver and from my understanding the ceph system
> will start to replicate the files on the same fileserver
> and when this is done disks can be lost on this server
> without loosing data.
> But to be able to have this security on hardware it
> means that the ceph cluster can never be more then 50% full
> or this will not work, right?
> ... and it becomes similar if we have three fileservers,
> then the cluster can never be more then 2/3 full?
>
> I am not sure if I missunderstand how ceph works or
> that ceph works bad on smaller systems like this?
>
> I would appreciate if somebody with better knowledge
> would be able to help me out with this!
>
> Many thanks in advance!!
>
> Marcus
> 
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
> personuppgifter. För att läsa mer om hur detta går till, klicka här 
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more 
> information on how this is done, click here 
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> 
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>

> 
>
>
>
>

[Gluster-users] Gluster -> Ceph

2023-12-14 Thread Marcus Pedersén
Hi all,
I am looking in to ceph and cephfs and in my
head I am comparing with gluster.

The way I have been running gluster over the years
is either a replicated or replicated-distributed clusters.

The small setup we have had has been a replicated cluster
with one arbiter and two fileservers.
These fileservers has been configured with RAID6 and
that raid has been used as the brick.

If disaster strikes and one fileserver burns up
there is still the other fileserver and as it is RAIDed
I can loose two disks on this machine before I
start to loose data.

 thinking ceph and similar setup 
The idea is to have one "admin" node and two fileservers.
The admin node will run mon, mgr and mds.
The storage nodes will run mon, mgr, mds and 8x osd (8 disks),
with replication = 2.

The problem is that I can not get my head around how
to think when disaster strikes.
So one fileserver burns up, there is still the other
fileserver and from my understanding the ceph system
will start to replicate the files on the same fileserver
and when this is done disks can be lost on this server
without loosing data.
But to be able to have this security on hardware it
means that the ceph cluster can never be more then 50% full
or this will not work, right?
... and it becomes similar if we have three fileservers,
then the cluster can never be more then 2/3 full?

I am not sure if I missunderstand how ceph works or
that ceph works bad on smaller systems like this?

I would appreciate if somebody with better knowledge
would be able to help me out with this!

Many thanks in advance!!

Marcus

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] State of the gluster project

2023-10-27 Thread Marcus Pedersén
Hi Diego,
I have had a look at BeeGFS and is seems more similar
to ceph then to gluster. It requires extra management
nodes similar to ceph, right?
Second of all there are no snapshots in BeeGFS, as
I understand it.
I know ceph has snapshots so for us this seems a
better alternative. What is your experience of ceph?

I am sorry to hear about your problems with gluster,
from my experience we had quite some issues with gluster
when it was "young", I thing the first version we installed
whas 3.5 or so. It was also extremly slow, an ls took forever.
But later versions has been "kind" to us and worked quite well
and file access has become really comfortable.

Best regards
Marcus

On Fri, Oct 27, 2023 at 10:16:08AM +0200, Diego Zuccato wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi.
>
> I'm also migrating to BeeGFS and CephFS (depending on usage).
>
> What I liked most about Gluster was that files were easily recoverable
> from bricks even in case of disaster and that it said it supported RDMA.
> But I soon found that RDMA was being phased out, and I always find
> entries that are not healing after a couple months of (not really heavy)
> use, directories that can't be removed because not all files have been
> deleted from all the bricks and files or directories that become
> inaccessible with no apparent reason.
> Given that I currently have 3 nodes with 30 12TB disks each in replica 3
> arbiter 1 it's become a major showstopper: can't stop production, backup
> everything and restart from scratch every 3-4 months. And there are no
> tools helping, just log digging :( Even at version 9.6 seems it's not
> really "production ready"... More like v0.9.6 IMVHO. And now it being
> EOLed makes it way worse.
>
> Diego
>
> Il 27/10/2023 09:40, Zakhar Kirpichenko ha scritto:
> > Hi,
> >
> > Red Hat Gluster Storage is EOL, Red Hat moved Gluster devs to other
> > projects, so Gluster doesn't get much attention. From my experience, it
> > has deteriorated since about version 9.0, and we're migrating to
> > alternatives.
> >
> > /Z
> >
> > On Fri, 27 Oct 2023 at 10:29, Marcus Pedersén  > <mailto:marcus.peder...@slu.se>> wrote:
> >
> > Hi all,
> > I just have a general thought about the gluster
> > project.
> > I have got the feeling that things has slowed down
> > in the gluster project.
> > I have had a look at github and to me the project
> > seems to slow down, for gluster version 11 there has
> > been no minor releases, we are still on 11.0 and I have
> > not found any references to 11.1.
> > There is a milestone called 12 but it seems to be
> > stale.
> > I have hit the issue:
> > https://github.com/gluster/glusterfs/issues/4085
> > <https://github.com/gluster/glusterfs/issues/4085>
> > that seems to have no sollution.
> > I noticed when version 11 was released that you
> > could not bump OP version to 11 and reported this,
> > but this is still not available.
> >
> > I am just wondering if I am missing something here?
> >
> > We have been using gluster for many years in production
> > and I think that gluster is great!! It has served as well over
> > the years and we have seen some great improvments
> > of stabilility and speed increase.
> >
> > So is there something going on or have I got
> > the wrong impression (and feeling)?
> >
> > Best regards
> > Marcus
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar
> > dina personuppgifter. För att läsa mer om hur detta går till, klicka
> > här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/
> > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>>
> > E-mailing SLU will result in SLU processing your personal data. For
> > more information on how this is done, click here
> > <https://www.slu.se/en/about-slu/contact-slu/personal-data/
> > <https://www.slu.se/en/about-slu/contact-slu/personal-data/>>
> > 
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > <https://meet.google.com/cpu-eiue-hvk>
> > Gluster-users mailing list
> > Glu

[Gluster-users] State of the gluster project

2023-10-27 Thread Marcus Pedersén
Hi all,
I just have a general thought about the gluster
project.
I have got the feeling that things has slowed down
in the gluster project.
I have had a look at github and to me the project
seems to slow down, for gluster version 11 there has
been no minor releases, we are still on 11.0 and I have
not found any references to 11.1.
There is a milestone called 12 but it seems to be
stale.
I have hit the issue:
https://github.com/gluster/glusterfs/issues/4085
that seems to have no sollution.
I noticed when version 11 was released that you
could not bump OP version to 11 and reported this,
but this is still not available.

I am just wondering if I am missing something here?

We have been using gluster for many years in production
and I think that gluster is great!! It has served as well over
the years and we have seen some great improvments
of stabilility and speed increase.

So is there something going on or have I got
the wrong impression (and feeling)?

Best regards
Marcus
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Replace faulty host

2023-10-26 Thread Marcus Pedersén
Hi Strahil,
Thanks for your help and info.
I guess that just copy gluster info from the faulty
server to the new server would be the easiest and
should "just work".
My reason for reuse of FQDN is that we have a
naming system that tells us more about a server
if you know how the system is built up,
but as you say it is much simpler and better to
replace the faulty server with a server with a
different name.
For now I manage to heal the faulty sectors on disk
so my OS was happy again. Have to keep my eyes on that server
as you never know if it was a temporary hick-up or if the
disk has some faulty sectors.

Thanks for the help, now I have a plan if the disk missbehaves again!!

Best regards
Marcus



On Fri, Oct 27, 2023 at 01:56:07AM +, Strahil Nikolov wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi Markus,
>
> It looks quite well documented, but please use 
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/sect-replacing_hosts
>  as 3.5 is the latest version for RHGS.
>
> If the OS disks are failing, I would have tried moving the data disks to the 
> new machine and transferring the gluster files in /etc and /var/lib to the 
> new node.
>
> Any reason to reuse the FQDN ?
> For me it was always much simpler to remove the brick, remove the node from 
> TSP, add the new node and then add the brick and trigger full heal.
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> On Wednesday, October 25, 2023, 1:30 PM, Marcus Pedersén 
>  wrote:
>
> Hi all,
> I have a problem with one of our gluster clusters.
>
> This is the setup:
> Volume Name: gds-common
> Type: Distributed-Replicate
> Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> Status: Started
> Snapshot Count: 26
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: urd-gds-031:/urd-gds/gds-common
> Brick2: urd-gds-032:/urd-gds/gds-common
> Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> Options Reconfigured:
> cluster.granular-entry-heal: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> performance.client-io-threads: off
> features.barrier: disable
>
>
> The arbiter node has a faulty root disk but it is still
> up and glusterd is still running.
> I have a spare server equal to the arbiter node,
> so my plan is to replace the arbiter host and
> then I can calmly reinstall OS and fix the rest of
> the configuration on the faulty host to be used
> in another cluster.
>
> I want to use the same hostname on the new host.
> What is the correct commands and way to replace the aribter node.
> I search online and found this:
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-replacing_hosts
> Can I use this guide to replace the host?
>
> Please, give me advice on this.
>
> Many thanks in advance!!
>
> Best regards
> Marcus
>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
> personuppgifter. För att läsa mer om hur detta går till, klicka här 
> <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more 
> information on how this is done, click here 
> <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users

--
**
* Marcus Pedersén*
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE00

[Gluster-users] Replace faulty host

2023-10-25 Thread Marcus Pedersén
Hi all,
I have a problem with one of our gluster clusters.

This is the setup:
Volume Name: gds-common
Type: Distributed-Replicate
Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
Status: Started
Snapshot Count: 26
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: urd-gds-031:/urd-gds/gds-common
Brick2: urd-gds-032:/urd-gds/gds-common
Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.client-io-threads: off
features.barrier: disable


The arbiter node has a faulty root disk but it is still
up and glusterd is still running.
I have a spare server equal to the arbiter node,
so my plan is to replace the arbiter host and
then I can calmly reinstall OS and fix the rest of
the configuration on the faulty host to be used
in another cluster.

I want to use the same hostname on the new host.
What is the correct commands and way to replace the aribter node.
I search online and found this:
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-replacing_hosts
Can I use this guide to replace the host?

Please, give me advice on this.

Many thanks in advance!!

Best regards
Marcus

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 11 upgrade, glusterd crash

2023-03-06 Thread Marcus Pedersén
Hi again,
As I got the error:
[2023-03-06 15:09:14.594977 +] E [MSGID: 106204] 
[glusterd-store.c:2622:glusterd_store_retrieve_bricks] 0-management: Unknown 
key: device_path

multiple times I tried to remove all device_path in config in
/var/lib/glusterd/snaps/...

But that did not make any difference glusterd still crashes with the same log 
output,
except that these error log lines do not exist.

I do not know how to continue to figure this problem out!

Best regards
Marcus



On Mon, Mar 06, 2023 at 09:13:05AM +0100, Marcus Pedersén wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi Strahil,
>
> Volume info says:
>
> Volume Name: gds-home
> Type: Replicate
> Volume ID: 3d9d7182-47a8-43ac-8cd1-6a090bb4b8b9
> Status: Started
> Snapshot Count: 10
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: urd-gds-022:/urd-gds/gds-home
> Brick2: urd-gds-021:/urd-gds/gds-home
> Brick3: urd-gds-020:/urd-gds/gds-home (arbiter)
> Options Reconfigured:
> features.barrier: disable
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
>
> If I look in /var/lib/glusterd/vols/gds-home/info on all three nodes it says:
> tier-enabled=0
>
> The snapshot config also says:
> tier-enabled=0
>
> As far as I can tell there are no depricated features enabled.
> This is the /var/lib/glusterd/vols/gds-home/info file from the arbiter:
>
> type=2
> count=3
> status=1
> sub_count=3
> replica_count=3
> arbiter_count=1
> disperse_count=0
> redundancy_count=0
> version=1588
> transport-type=0
> volume-id=3d9d7182-47a8-43ac-8cd1-6a090bb4b8b9
> username=
> password=
> op-version=4
> client-op-version=2
> quota-version=0
> tier-enabled=0
> parent_volname=N/A
> restored_from_snap=----
> snap-max-hard-limit=256
> features.barrier=disable
> storage.fips-mode-rchecksum=on
> transport.address-family=inet
> nfs.disable=on
> performance.client-io-threads=off
> brick-0=urd-gds-022:-urd-gds-gds-home
> brick-1=urd-gds-021:-urd-gds-gds-home
> brick-2=urd-gds-020:-urd-gds-gds-home
>
>
> No, you will not see the arbiter in the status report as glusterd
> does not run at all.
>
> Thanks for your support Strahil!
>
> Best regards
> Marcus
>
>
>
> On Mon, Mar 06, 2023 at 06:06:00AM +, Strahil Nikolov wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you recognize the sender and know 
> > the content is safe.
> >
> >
> > Somewhere tiering is enabled.
> > Check the deprecated options in 
> > https://docs.gluster.org/en/main/Upgrade-Guide/upgrade-to-11/#the-following-options-are-removed-from-the-code-base-and-require-to-be-unset.
> >
> > The simplest way would be to downgrade the arbiter, ensure it works (or 
> > readd it back to the TSP), and remove any deprecated options before 
> > upgrading .
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >
> > On Mon, Mar 6, 2023 at 8:02, Strahil Nikolov
> >  wrote:
> > I don't see the arbiter in the status report.
> > Maybe the volfiles on host1 and host2 were changed ?
> >
> > What is the volume info ?
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > On Fri, Mar 3, 2023 at 17:30, Marcus Pedersén
> >  wrote:
> > Hi again,
> > I turned up the logging level so here is a more detailed start of glusterd.
> > File is enclosed.
> >
> > Thanks alot for help!
> >
> > Regards
> > Marcus
> >
> >
> > On Fri, Mar 03, 2023 at 03:00:46PM +0100, Marcus Pedersén wrote:
> > > CAUTION: This email originated from outside of the organization. Do not 
> > > click links or open attachments unless you recognize the sender and know 
> > > the content is safe.
> > >
> > >
> > > Hi all,
> > >
> > > I just started to upgrade from gluster 10.3 to gluster 11.
> > > I started with my arbiter node and upgraded.
> > > After upgrade gluster started after change in info file.
> > > Rebooted machine and after that glusterd crashes.
> > >
> > > I have double checked config in /var/lib/glusterd
> > > and updated /var/lib/glusterd/vols/gv0/info
> > > so the checksum is correct.
> > >
> > > OS: debian bullseye (11)
> > >
> > > gluster volume status (from 

Re: [Gluster-users] Gluster 11 upgrade, glusterd crash

2023-03-06 Thread Marcus Pedersén
Hi Strahil,

Volume info says:

Volume Name: gds-home
Type: Replicate
Volume ID: 3d9d7182-47a8-43ac-8cd1-6a090bb4b8b9
Status: Started
Snapshot Count: 10
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: urd-gds-022:/urd-gds/gds-home
Brick2: urd-gds-021:/urd-gds/gds-home
Brick3: urd-gds-020:/urd-gds/gds-home (arbiter)
Options Reconfigured:
features.barrier: disable
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

If I look in /var/lib/glusterd/vols/gds-home/info on all three nodes it says:
tier-enabled=0

The snapshot config also says:
tier-enabled=0

As far as I can tell there are no depricated features enabled.
This is the /var/lib/glusterd/vols/gds-home/info file from the arbiter:

type=2
count=3
status=1
sub_count=3
replica_count=3
arbiter_count=1
disperse_count=0
redundancy_count=0
version=1588
transport-type=0
volume-id=3d9d7182-47a8-43ac-8cd1-6a090bb4b8b9
username=
password=
op-version=4
client-op-version=2
quota-version=0
tier-enabled=0
parent_volname=N/A
restored_from_snap=----
snap-max-hard-limit=256
features.barrier=disable
storage.fips-mode-rchecksum=on
transport.address-family=inet
nfs.disable=on
performance.client-io-threads=off
brick-0=urd-gds-022:-urd-gds-gds-home
brick-1=urd-gds-021:-urd-gds-gds-home
brick-2=urd-gds-020:-urd-gds-gds-home


No, you will not see the arbiter in the status report as glusterd
does not run at all.

Thanks for your support Strahil!

Best regards
Marcus



On Mon, Mar 06, 2023 at 06:06:00AM +, Strahil Nikolov wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Somewhere tiering is enabled.
> Check the deprecated options in 
> https://docs.gluster.org/en/main/Upgrade-Guide/upgrade-to-11/#the-following-options-are-removed-from-the-code-base-and-require-to-be-unset.
>
> The simplest way would be to downgrade the arbiter, ensure it works (or readd 
> it back to the TSP), and remove any deprecated options before upgrading .
>
> Best Regards,
> Strahil Nikolov
>
>
> On Mon, Mar 6, 2023 at 8:02, Strahil Nikolov
>  wrote:
> I don't see the arbiter in the status report.
> Maybe the volfiles on host1 and host2 were changed ?
>
> What is the volume info ?
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Mar 3, 2023 at 17:30, Marcus Pedersén
>  wrote:
> Hi again,
> I turned up the logging level so here is a more detailed start of glusterd.
> File is enclosed.
>
> Thanks alot for help!
>
> Regards
> Marcus
>
>
> On Fri, Mar 03, 2023 at 03:00:46PM +0100, Marcus Pedersén wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you recognize the sender and know 
> > the content is safe.
> >
> >
> > Hi all,
> >
> > I just started to upgrade from gluster 10.3 to gluster 11.
> > I started with my arbiter node and upgraded.
> > After upgrade gluster started after change in info file.
> > Rebooted machine and after that glusterd crashes.
> >
> > I have double checked config in /var/lib/glusterd
> > and updated /var/lib/glusterd/vols/gv0/info
> > so the checksum is correct.
> >
> > OS: debian bullseye (11)
> >
> > gluster volume status (from onte of the other nodes that is not upgraded)
> >
> > Status of volume: gv0
> > Gluster processTCP Port  RDMA Port  Online  Pid
> > --
> > Brick host2:/urd-gds/gv0529020  Y  
> > 113172
> > Brick host1:/urd-gds/gv0612350  Y  5487
> > Self-heal Daemon on localhost  N/A  N/AY  5550
> > Self-heal Daemon on host2  N/A  N/AY  113236
> >
> > Task Status of Volume gds-home
> > ------
> > There are no active volume tasks
> >
> > I do not know where to start looking,
> > enclosed is a boot part from glusterd.log
> >
> > Thanks alot for your help!!
> >
> > Regards
> > Marcus
> >
> >
> >
> > --
> > **
> > * Marcus Pedersén*
> > * System administrator  *
> > **
> > * Interbull Centre  *
> > *    

[Gluster-users] Gluster 11 upgrade, glusterd crash

2023-03-03 Thread Marcus Pedersén
Hi all,

I just started to upgrade from gluster 10.3 to gluster 11.
I started with my arbiter node and upgraded.
After upgrade gluster started after change in info file.
Rebooted machine and after that glusterd crashes.

I have double checked config in /var/lib/glusterd
and updated /var/lib/glusterd/vols/gv0/info
so the checksum is correct.

OS: debian bullseye (11)

gluster volume status (from onte of the other nodes that is not upgraded)

Status of volume: gv0
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick host2:/urd-gds/gv052902 0  Y   113172
Brick host1:/urd-gds/gv061235 0  Y   5487
Self-heal Daemon on localhost   N/A   N/AY   5550
Self-heal Daemon on host2   N/A   N/AY   113236

Task Status of Volume gds-home
--
There are no active volume tasks

I do not know where to start looking,
enclosed is a boot part from glusterd.log

Thanks alot for your help!!

Regards
Marcus



--
**
* Marcus Pedersén*
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
[2023-03-03 13:46:01.248805 +] I [MSGID: 100030] [glusterfsd.c:2872:main] 
0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, 
{version=11.0}, {cmdlinestr=/usr/sbin/glusterd -p /var/run/glusterd.pid 
--log-level INFO}] 
[2023-03-03 13:46:01.249174 +] I [glusterfsd.c:2562:daemonize] 0-glusterfs: 
Pid of current running process is 4256
[2023-03-03 13:46:01.250935 +] I [MSGID: 0] 
[glusterfsd.c:1597:volfile_init] 0-glusterfsd-mgmt: volume not found, 
continuing with init 
[2023-03-03 13:46:01.282327 +] I [MSGID: 106479] [glusterd.c:1660:init] 
0-management: Using /var/lib/glusterd as working directory 
[2023-03-03 13:46:01.282371 +] I [MSGID: 106479] [glusterd.c:1664:init] 
0-management: Using /var/run/gluster as pid file working directory 
[2023-03-03 13:46:01.288793 +] I [socket.c:973:__socket_server_bind] 
0-socket.management: process started listening on port (24007)
[2023-03-03 13:46:01.291749 +] I [socket.c:916:__socket_server_bind] 
0-socket.management: closing (AF_UNIX) reuse check socket 13
[2023-03-03 13:46:01.292549 +] I [MSGID: 106059] [glusterd.c:1923:init] 
0-management: max-port override: 60999 
[2023-03-03 13:46:01.338404 +] E [MSGID: 106061] 
[glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed 
[{Key=log-group}, {errno=2}, {error=No such file or directory}] 
[2023-03-03 13:46:02.342898 +] I [MSGID: 106513] 
[glusterd-store.c:2198:glusterd_restore_op_version] 0-glusterd: retrieved 
op-version: 10 
[2023-03-03 13:46:02.347302 +] W [MSGID: 106204] 
[glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown 
key: tier-enabled 
[2023-03-03 13:46:02.347333 +] W [MSGID: 106204] 
[glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown 
key: nfs.disable 
[2023-03-03 13:46:02.347341 +] W [MSGID: 106204] 
[glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown 
key: brick-0 
[2023-03-03 13:46:02.347346 +] W [MSGID: 106204] 
[glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown 
key: brick-1 
[2023-03-03 13:46:02.347351 +] W [MSGID: 106204] 
[glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown 
key: brick-2 
[2023-03-03 13:46:02.351708 +] I [MSGID: 106544] 
[glusterd.c:158:glusterd_uuid_init] 0-management

Re: [Gluster-users] Gluster 11.0 upgrade

2023-02-21 Thread Marcus Pedersén
Hi again,
I went ahead and upgraded the last two nodes in the cluster.
This is what I noted:
I upgraded the arbiter first and in:
/var/lib/glusterd/vols/gds-common/info
The parameter "nfs.disable=on" was added by the upgrade and
made the checksum fail.
I removed "nfs.disable=on" and all the three nodes connected fine.
I upgraded one of the other nodes and no changes were made to
the /var/lib/glusterd/vols/gds-common/info file, so the arbiter
node and the resent upgraded node had contact.
I upgraded the last node and on this node the parameter "nfs.disable=on"
was added in file: /var/lib/glusterd/vols/gds-common/info
I removed "nfs.disable=on" and restarted glusterd and the entire cluster
is up and ruunning the way it should.

The command: gluster volume get all cluster.max-op-version
Still says:

Option   Value
--   -
cluster.max-op-version   10

I hope that this info helps!
Please let me know if I can help out in any other way!

Regards
Marcus


On Tue, Feb 21, 2023 at 01:19:58PM +0100, Marcus Pedersén wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi Xavi,
> Copy the same info file worked well and the gluster 11 arbiter
> is now up and running and all the nodes are communication
> the way they should.
>
> Just another note on something I discovered on my virt machines.
> All the three nodes has been upgarded to 11.0 and are working.
> If I run:
> gluster volume get all cluster.op-version
> I get:
> Option   Value
> --   -
> cluster.op-version   10
>
> Which is correct as I have not updated the op-version,
> but if I run:
> gluster volume get all cluster.max-op-version
> I get:
> Option   Value
> --   -
> cluster.max-op-version   10
>
> I expected the max-op-version to be 11.
> Isn't it supposed to be 11?
> And after upgrade you should upgrade the op-version
> to 11?
>
> Many thanks for all your help!
> Regards
> Marcus
>
>
> On Tue, Feb 21, 2023 at 09:29:28AM +0100, Xavi Hernandez wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you recognize the sender and know 
> > the content is safe.
> >
> >
> > Hi Marcus,
> >
> > On Mon, Feb 20, 2023 at 2:53 PM Marcus Pedersén 
> > mailto:marcus.peder...@slu.se>> wrote:
> > Hi again Xavi,
> >
> > I did some more testing on my virt machines
> > with same setup:
> > Number of Bricks: 1 x (2 + 1) = 3
> > If I do it the same way, I upgrade the arbiter first,
> > I get the same behavior that the bricks do not start
> > and the other nodes does not "see" the upgraded node.
> > If I upgrade one of the other nodes (non arbiter) and restart
> > glusterd on both the arbiter and the other the arbiter starts
> > the bricks and connects with the other upgraded node as expected.
> > If I upgrade the last node (non arbiter) it will fail to start
> > the bricks, same behaviour as the arbiter at first.
> > If I then copy the /var/lib/gluster/vols/ from the
> > upgraded (non arbiter) node to the other node that does not start the bricks
> > and replace /var/lib/gluster/vols/ with the copied directory
> > and restarts glusterd it works nicely after that.
> > Everything then works the way it should.
> >
> > So the question is if the arbiter is treated in some other way
> > compared to the other nodes?
> >
> > It seems so, but at this point I'm not sure what could be the difference.
> >
> >
> > Some type of config is happening at the start of the glusterd that
> > makes the node fail?
> >
> > Gluster requires that all glusterd share the same configuration. In this 
> > case it seems that the "info" file in the volume definition has different 
> > contents on the servers.  One of the servers has the value "nfs.disable=on" 
> > but the others do not. This can be the difference that causes the checksum 
> > error.
> >
> > You can try to copy the "info" file from one node to the one that doesn't 
> > start and try restarting glusterd.
> >
> >
> > Do I dare to continue to upgrade my real cluster with the above described 
> > way?
> >
> > Thanks!
> >
> > Regar

Re: [Gluster-users] Gluster 11.0 upgrade

2023-02-21 Thread Marcus Pedersén
Hi Xavi,
Copy the same info file worked well and the gluster 11 arbiter
is now up and running and all the nodes are communication
the way they should.

Just another note on something I discovered on my virt machines.
All the three nodes has been upgarded to 11.0 and are working.
If I run:
gluster volume get all cluster.op-version
I get:
Option   Value
--   -
cluster.op-version   10

Which is correct as I have not updated the op-version,
but if I run:
gluster volume get all cluster.max-op-version
I get:
Option   Value
--   -
cluster.max-op-version   10

I expected the max-op-version to be 11.
Isn't it supposed to be 11?
And after upgrade you should upgrade the op-version
to 11?

Many thanks for all your help!
Regards
Marcus


On Tue, Feb 21, 2023 at 09:29:28AM +0100, Xavi Hernandez wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi Marcus,
>
> On Mon, Feb 20, 2023 at 2:53 PM Marcus Pedersén 
> mailto:marcus.peder...@slu.se>> wrote:
> Hi again Xavi,
>
> I did some more testing on my virt machines
> with same setup:
> Number of Bricks: 1 x (2 + 1) = 3
> If I do it the same way, I upgrade the arbiter first,
> I get the same behavior that the bricks do not start
> and the other nodes does not "see" the upgraded node.
> If I upgrade one of the other nodes (non arbiter) and restart
> glusterd on both the arbiter and the other the arbiter starts
> the bricks and connects with the other upgraded node as expected.
> If I upgrade the last node (non arbiter) it will fail to start
> the bricks, same behaviour as the arbiter at first.
> If I then copy the /var/lib/gluster/vols/ from the
> upgraded (non arbiter) node to the other node that does not start the bricks
> and replace /var/lib/gluster/vols/ with the copied directory
> and restarts glusterd it works nicely after that.
> Everything then works the way it should.
>
> So the question is if the arbiter is treated in some other way
> compared to the other nodes?
>
> It seems so, but at this point I'm not sure what could be the difference.
>
>
> Some type of config is happening at the start of the glusterd that
> makes the node fail?
>
> Gluster requires that all glusterd share the same configuration. In this case 
> it seems that the "info" file in the volume definition has different contents 
> on the servers.  One of the servers has the value "nfs.disable=on" but the 
> others do not. This can be the difference that causes the checksum error.
>
> You can try to copy the "info" file from one node to the one that doesn't 
> start and try restarting glusterd.
>
>
> Do I dare to continue to upgrade my real cluster with the above described way?
>
> Thanks!
>
> Regards
> Marcus
>
>
>
> On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Pedersén wrote:
> > I made a recusive diff on the upgraded arbiter.
> >
> > /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> > /home/marcus/gds-common is one of the other nodes still on gluster 10
> >
> > diff -r 
> > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common 
> > /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> > 5c5
> > < listen-port=60419
> > ---
> > > listen-port=0
> > 11c11
> > < brick-fsid=14764358630653534655
> > ---
> > > brick-fsid=0
> > diff -r 
> > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common 
> > /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> > 5c5
> > < listen-port=0
> > ---
> > > listen-port=60891
> > 11c11
> > < brick-fsid=0
> > ---
> > > brick-fsid=1088380223149770683
> > diff -r /var/lib/glusterd/vols/gds-common/cksum 
> > /home/marcus/gds-common/cksum
> > 1c1
> > < info=3948700922
> > ---
> > > info=458813151
> > diff -r 
> > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> >  /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > 3c3
> > < option shared-brick-count 1
> > ---
> > > option shared-brick-count 0
> > diff -r 
> > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> >  /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > 3c3
> > < opt

Re: [Gluster-users] Gluster 11.0 upgrade

2023-02-20 Thread Marcus Pedersén
Hi again,
There is something going on when gluster starts.
I fired up my virt machines this morning to do some more
testing and one of the nodes did not come online
in the cluster.
Looking at that node I found that only glusterd and glusterfs
has started.
After:
systemctl stop glusterd
killall glusterd glusterfs glusterfsd
systemctl start glusterd

Gluster started correct and glusterd, glusterfs and glusterfsd
all started and the node was online in the cluster.

I just wanted to let you know if this might help.

Regards
Marcus


On Mon, Feb 20, 2023 at 02:52:52PM +0100, Marcus Pedersén wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi again Xavi,
>
> I did some more testing on my virt machines
> with same setup:
> Number of Bricks: 1 x (2 + 1) = 3
> If I do it the same way, I upgrade the arbiter first,
> I get the same behavior that the bricks do not start
> and the other nodes does not "see" the upgraded node.
> If I upgrade one of the other nodes (non arbiter) and restart
> glusterd on both the arbiter and the other the arbiter starts
> the bricks and connects with the other upgraded node as expected.
> If I upgrade the last node (non arbiter) it will fail to start
> the bricks, same behaviour as the arbiter at first.
> If I then copy the /var/lib/gluster/vols/ from the
> upgraded (non arbiter) node to the other node that does not start the bricks
> and replace /var/lib/gluster/vols/ with the copied directory
> and restarts glusterd it works nicely after that.
> Everything then works the way it should.
>
> So the question is if the arbiter is treated in some other way
> compared to the other nodes?
>
> Some type of config is happening at the start of the glusterd that
> makes the node fail?
>
> Do I dare to continue to upgrade my real cluster with the above described way?
>
> Thanks!
>
> Regards
> Marcus
>
>
>
> On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Pedersén wrote:
> > I made a recusive diff on the upgraded arbiter.
> >
> > /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> > /home/marcus/gds-common is one of the other nodes still on gluster 10
> >
> > diff -r 
> > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common 
> > /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> > 5c5
> > < listen-port=60419
> > ---
> > > listen-port=0
> > 11c11
> > < brick-fsid=14764358630653534655
> > ---
> > > brick-fsid=0
> > diff -r 
> > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common 
> > /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> > 5c5
> > < listen-port=0
> > ---
> > > listen-port=60891
> > 11c11
> > < brick-fsid=0
> > ---
> > > brick-fsid=1088380223149770683
> > diff -r /var/lib/glusterd/vols/gds-common/cksum 
> > /home/marcus/gds-common/cksum
> > 1c1
> > < info=3948700922
> > ---
> > > info=458813151
> > diff -r 
> > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> >  /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > 3c3
> > < option shared-brick-count 1
> > ---
> > > option shared-brick-count 0
> > diff -r 
> > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> >  /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > 3c3
> > < option shared-brick-count 0
> > ---
> > > option shared-brick-count 1
> > diff -r /var/lib/glusterd/vols/gds-common/info /home/marcus/gds-common/info
> > 23a24
> > > nfs.disable=on
> >
> >
> > I setup 3 virt machines  and configured them with gluster 10 (arbiter 1).
> > After that I upgraded to 11 and the first 2 nodes was fine but on the third
> > node I got the same behaviour: the brick never started.
> >
> > Thanks for the help!
> >
> > Regards
> > Marcus
> >
> >
> > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > > CAUTION: This email originated from outside of the organization. Do not 
> > > click links or open attachments unless you recognize the sender and know 
> > > the content is safe.
> > >
> > >
> > > Hi Marcus,
> > >
> > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Pedersén 
> > > mailto:marcus.peder...@slu.se>> wrote:
> > > Hi Xavi,
> > > I stopped glusterd an

Re: [Gluster-users] Gluster 11.0 upgrade

2023-02-20 Thread Marcus Pedersén
Hi again Xavi,

I did some more testing on my virt machines
with same setup:
Number of Bricks: 1 x (2 + 1) = 3
If I do it the same way, I upgrade the arbiter first,
I get the same behavior that the bricks do not start
and the other nodes does not "see" the upgraded node.
If I upgrade one of the other nodes (non arbiter) and restart
glusterd on both the arbiter and the other the arbiter starts
the bricks and connects with the other upgraded node as expected.
If I upgrade the last node (non arbiter) it will fail to start
the bricks, same behaviour as the arbiter at first.
If I then copy the /var/lib/gluster/vols/ from the
upgraded (non arbiter) node to the other node that does not start the bricks
and replace /var/lib/gluster/vols/ with the copied directory
and restarts glusterd it works nicely after that.
Everything then works the way it should.

So the question is if the arbiter is treated in some other way
compared to the other nodes?

Some type of config is happening at the start of the glusterd that
makes the node fail?

Do I dare to continue to upgrade my real cluster with the above described way?

Thanks!

Regards
Marcus



On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Pedersén wrote:
> I made a recusive diff on the upgraded arbiter.
>
> /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> /home/marcus/gds-common is one of the other nodes still on gluster 10
>
> diff -r 
> /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common 
> /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> 5c5
> < listen-port=60419
> ---
> > listen-port=0
> 11c11
> < brick-fsid=14764358630653534655
> ---
> > brick-fsid=0
> diff -r 
> /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common 
> /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> 5c5
> < listen-port=0
> ---
> > listen-port=60891
> 11c11
> < brick-fsid=0
> ---
> > brick-fsid=1088380223149770683
> diff -r /var/lib/glusterd/vols/gds-common/cksum /home/marcus/gds-common/cksum
> 1c1
> < info=3948700922
> ---
> > info=458813151
> diff -r 
> /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
>  /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> 3c3
> < option shared-brick-count 1
> ---
> > option shared-brick-count 0
> diff -r 
> /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
>  /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> 3c3
> < option shared-brick-count 0
> ---
> > option shared-brick-count 1
> diff -r /var/lib/glusterd/vols/gds-common/info /home/marcus/gds-common/info
> 23a24
> > nfs.disable=on
>
>
> I setup 3 virt machines  and configured them with gluster 10 (arbiter 1).
> After that I upgraded to 11 and the first 2 nodes was fine but on the third
> node I got the same behaviour: the brick never started.
>
> Thanks for the help!
>
> Regards
> Marcus
>
>
> On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you recognize the sender and know 
> > the content is safe.
> >
> >
> > Hi Marcus,
> >
> > On Mon, Feb 20, 2023 at 8:50 AM Marcus Pedersén 
> > mailto:marcus.peder...@slu.se>> wrote:
> > Hi Xavi,
> > I stopped glusterd and killall glusterd glusterfs glusterfsd
> > and started glusterd again.
> >
> > The only log that is not empty is glusterd.log, I attach the log
> > from the restart time. The brick log, glustershd.log and 
> > glfsheal-gds-common.log is empty.
> >
> > This are the errors in the log:
> > [2023-02-20 07:23:46.235263 +] E [MSGID: 106061] 
> > [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed 
> > [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > [2023-02-20 07:23:47.359917 +] E [MSGID: 106010] 
> > [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: 
> > Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum 
> > = 2065453698 on peer urd-gds-031
> > [2023-02-20 07:23:47.438052 +] E [MSGID: 106010] 
> > [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: 
> > Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum 
> > = 2065453698 on peer urd-gds-032
> >
> > Geo replication is not setup so I guess there is nothing strange that there 
> > is an error regarding georep.
> > The checksum error seems natural to be there as the other nodes are still 
> > on v

Re: [Gluster-users] Gluster 11.0 upgrade

2023-02-20 Thread Marcus Pedersén
I made a recusive diff on the upgraded arbiter.

/var/lib/glusterd/vols/gds-common is the upgraded aribiter
/home/marcus/gds-common is one of the other nodes still on gluster 10

diff -r 
/var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common 
/home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
5c5
< listen-port=60419
---
> listen-port=0
11c11
< brick-fsid=14764358630653534655
---
> brick-fsid=0
diff -r 
/var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common 
/home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
5c5
< listen-port=0
---
> listen-port=60891
11c11
< brick-fsid=0
---
> brick-fsid=1088380223149770683
diff -r /var/lib/glusterd/vols/gds-common/cksum /home/marcus/gds-common/cksum
1c1
< info=3948700922
---
> info=458813151
diff -r 
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol 
/home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
3c3
< option shared-brick-count 1
---
> option shared-brick-count 0
diff -r 
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol 
/home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
3c3
< option shared-brick-count 0
---
> option shared-brick-count 1
diff -r /var/lib/glusterd/vols/gds-common/info /home/marcus/gds-common/info
23a24
> nfs.disable=on


I setup 3 virt machines  and configured them with gluster 10 (arbiter 1).
After that I upgraded to 11 and the first 2 nodes was fine but on the third
node I got the same behaviour: the brick never started.

Thanks for the help!

Regards
Marcus


On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi Marcus,
>
> On Mon, Feb 20, 2023 at 8:50 AM Marcus Pedersén 
> mailto:marcus.peder...@slu.se>> wrote:
> Hi Xavi,
> I stopped glusterd and killall glusterd glusterfs glusterfsd
> and started glusterd again.
>
> The only log that is not empty is glusterd.log, I attach the log
> from the restart time. The brick log, glustershd.log and 
> glfsheal-gds-common.log is empty.
>
> This are the errors in the log:
> [2023-02-20 07:23:46.235263 +] E [MSGID: 106061] 
> [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed 
> [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> [2023-02-20 07:23:47.359917 +] E [MSGID: 106010] 
> [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version 
> of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 
> 2065453698 on peer urd-gds-031
> [2023-02-20 07:23:47.438052 +] E [MSGID: 106010] 
> [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version 
> of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 
> 2065453698 on peer urd-gds-032
>
> Geo replication is not setup so I guess there is nothing strange that there 
> is an error regarding georep.
> The checksum error seems natural to be there as the other nodes are still on 
> version 10.
>
> No. The configurations should be identical.
>
> Can you try to compare volume definitions in 
> /var/lib/glusterd/vols/gds-common between the upgraded server and one of the 
> old ones ?
>
> Regards,
>
> Xavi
>
>
> My previous exprience with upgrades is that the local bricks starts and
> gluster is up and running. No connection with the other nodes until they are 
> upgraded as well.
>
>
> gluster peer status, gives the output:
> Number of Peers: 2
>
> Hostname: urd-gds-032
> Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> State: Peer Rejected (Connected)
>
> Hostname: urd-gds-031
> Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> State: Peer Rejected (Connected)
>
> I suppose and guess that this is due to that the arbiter is version 11
> and the other 2 nodes are version 10.
>
> Please let me know if I can provide any other information
> to try to solve this issue.
>
> Many thanks!
> Marcus
>
>
> On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you recognize the sender and know 
> > the content is safe.
> >
> >
> > Hi Marcus,
> >
> > these errors shouldn't prevent the bricks from starting. Isn't there any 
> > other error or warning ?
> >
> > Regards,
> >
> > Xavi
> >
> > On Fri, Feb 17, 2023 at 3:06 PM Marcus Pedersén 
> > mailto:marcus.peder...@slu.se><mailto:marcus.peder...@slu.se<mailto:marcus

Re: [Gluster-users] Gluster 11.0 upgrade

2023-02-20 Thread Marcus Pedersén
Failed to send a copy to the list:


Hi Xavi,
I stopped glusterd and killall glusterd glusterfs glusterfsd
and started glusterd again.

The only log that is not empty is glusterd.log, I attach the log
from the restart time. The brick log, glustershd.log and 
glfsheal-gds-common.log is empty.

This are the errors in the log:
[2023-02-20 07:23:46.235263 +] E [MSGID: 106061] 
[glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed 
[{Key=log-group}, {errno=2}, {error=No such file or directory}]
[2023-02-20 07:23:47.359917 +] E [MSGID: 106010] 
[glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of 
Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 
on peer urd-gds-031
[2023-02-20 07:23:47.438052 +] E [MSGID: 106010] 
[glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of 
Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 
on peer urd-gds-032

Geo replication is not setup so I guess there is nothing strange that there is 
an error regarding georep.
The checksum error seems natural to be there as the other nodes are still on 
version 10.

My previous exprience with upgrades is that the local bricks starts and
gluster is up and running. No connection with the other nodes until they are 
upgraded as well.


gluster peer status, gives the output:
Number of Peers: 2

Hostname: urd-gds-032
Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
State: Peer Rejected (Connected)

Hostname: urd-gds-031
Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
State: Peer Rejected (Connected)

I suppose and guess that this is due to that the arbiter is version 11
and the other 2 nodes are version 10.

Please let me know if I can provide any other information
to try to solve this issue.

Many thanks!
Marcus


On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> Hi Marcus,
>
> these errors shouldn't prevent the bricks from starting. Isn't there any 
> other error or warning ?
>
> Regards,
>
> Xavi
>
> On Fri, Feb 17, 2023 at 3:06 PM Marcus Pedersén 
> mailto:marcus.peder...@slu.se>> wrote:
> Hi all,
> I started an upgrade to gluster 11.0 from 10.3 on one of my clusters.
> OS: Debian bullseye
>
> Volume Name: gds-common
> Type: Replicate
> Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: urd-gds-031:/urd-gds/gds-common
> Brick2: urd-gds-032:/urd-gds/gds-common
> Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> Options Reconfigured:
> cluster.granular-entry-heal: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
>
> I started with the arbiter node, stopped all of gluster
> upgraded to 11.0 and all went fine.
> After upgrade I was able to see the other nodes and
> all nodes were connected.
> After a reboot on the arbiter nothing works the way it should.
> Both brick1 and brick2 has connection but no connection
> with the arbiter.
> On the arbiter glusterd has started and is listening on port 24007,
> the problem seems to be glusterfsd, it never starts!
>
> If I run: gluster volume status
>
> Status of volume: gds-common
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick urd-gds-030:/urd-gds/gds-common   N/A   N/AN   N/A
> Self-heal Daemon on localhost   N/A   N/AN   N/A
>
> Task Status of Volume gds-common
> --
> There are no active volume tasks
>
>
> In glusterd.log I find the following errors (arbiter node):
> [2023-02-17 12:30:40.519585 +] E [gf-io-uring.c:404:gf_io_uring_setup] 
> 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, 
> {error=12 (Cannot allocate memory)}>
> [2023-02-17 12:30:40.678031 +] E [MSGID: 106061] 
> [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed 
> [{Key=log-group}, {errno=2}, {error=No such file or directory}]
>
> In brick/urd-gds-gds-common.log I find the following error:
> [2023-02-17 12:30:43.550753 +] E [gf-io-uring.c:404:gf_io_uring_setup] 
> 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, 
> {error=12 (Cannot allocate memory)}>
>
> I enclose both logfiles.
>
> How do I resolve this issue??
>
> Many thanks in advance!!
>
&g

[Gluster-users] Gluster 10 used ports

2021-11-29 Thread Marcus Pedersén
Hi all,

Over the years I have been using the same ports in my firewall
for gluster 49152-49251 ( I know a bit too many ports but
local network with limited access)

Today I upgraded from version 9 to version 10 and it finally
went well until I ran:
gluster volume heal my-vol info summary
I got the answer:
Status: Transport endpoint is not connected

I realized that glusterfsd was using 5+ ports so
I opened up more ports and it works fine again.

Are the port changes something new in gluster 10?

I browsed through the doc and search online without
finding any info.

Which are the correct ports to open up in my firewall?

Running on debian bullseye.

Thanks alot in advance!!

Best regards
Marcus
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Gluster heal problem

2021-10-06 Thread Marcus Pedersén
Hi all,
I have a problem with heal, I have 995 files that fails with heal.

Gluster version: 9.3
OS: Debian Bullseye

My setup is a replicate with an arbiter:
Volume Name: gds-admin
Type: Replicate
Volume ID: f1f112f4-8cee-4c04-8ea5-c7d895c8c8d6
Status: Started
Snapshot Count: 8
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: urd-gds-001:/urd-gds/gds-admin
Brick2: urd-gds-002:/urd-gds/gds-admin
Brick3: urd-gds-000:/urd-gds/gds-admin (arbiter)
Options Reconfigured:
storage.build-pgfid: off
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
features.barrier: disable

Gluster volume status:
Status of volume: gds-admin
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick urd-gds-001:/urd-gds/gds-admin49155 0  Y   6964
Brick urd-gds-002:/urd-gds/gds-admin49155 0  Y   4270
Brick urd-gds-000:/urd-gds/gds-admin49152 0  Y   1175
Self-heal Daemon on localhost   N/A   N/AY   7031
Self-heal Daemon on urd-gds-002 N/A   N/AY   4281
Self-heal Daemon on urd-gds-000 N/A   N/AY   1230

Task Status of Volume gds-admin
--
There are no active volume tasks


Gluster pool list:
UUIDHostnameState
8823d0d9-5d02-4f47-86e9-urd-gds-000 Connected
73139305-08f5-42c2-92b6-urd-gds-002 Connected
d612a705-8493-474e-9fdc-localhost   Connected




info summary says:
Brick urd-gds-001:/urd-gds/gds-admin
Status: Connected
Total Number of entries: 995
Number of entries in heal pending: 995
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick urd-gds-002:/urd-gds/gds-admin
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick urd-gds-000:/urd-gds/gds-admin
Status: Connected
Total Number of entries: 995
Number of entries in heal pending: 995
Number of entries in split-brain: 0
Number of entries possibly healing: 0



Statistics says (on both node urd-gds-000 and urd-gds-001):
Starting time of crawl: Tue Oct  5 14:25:08 2021

Ending time of crawl: Tue Oct  5 14:25:25 2021

Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 995


To me it seems as if node urd-gds-002 has old version of files.
I tried 2 files that had filenames and both urd-gds-000 and urd-gds-001
has the same gfid for the file and the same timestamp for the file.
Node urd-gds-002 has a different gfid and an older timestamp.
The client could not access the file.
I manually removed the file and gfid file from urd-gds-002 and these files
were healed.

I have a long list of files with just gfids (995).
I tried to get the file path with (example):
getfattr -n trusted.glusterfs.pathinfo -e text 
/mnt/gds-admin/.gfid/4e203eb1-795e-433a-9403-753ba56575fd
getfattr: Removing leading '/' from absolute path names
# file: mnt/gds-admin/.gfid/4e203eb1-795e-433a-9403-753ba56575fd
trusted.glusterfs.pathinfo="( 

 
)"

This tells me that the file exists on node urd-gds-000 and urd-gds-001.

I have been looking through the glustershd.log and I see the similar error
over and over again on urd-gds-000 and urd-gds-001:
[2021-10-05 12:46:01.095509 +] I [MSGID: 108026] 
[afr-self-heal-entry.c:1052:afr_selfheal_entry_do] 0-gds-admin-replicate-0: 
performing entry selfheal on d0d8b20e-c9df-4b8b-ac2e-24697fdf9201
[2021-10-05 12:46:01.802920 +] E [MSGID: 114031] 
[client-rpc-fops_v2.c:211:client4_0_mkdir_cbk] 0-gds-admin-client-1: remote 
operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-10-05 12:46:01.803538 +] E [MSGID: 114031] 
[client-rpc-fops_v2.c:211:client4_0_mkdir_cbk] 0-gds-admin-client-2: remote 
operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-10-05 12:46:01.803612 +] E [MSGID: 114031] 
[client-rpc-fops_v2.c:211:client4_0_mkdir_cbk] 0-gds-admin-client-0: remote 
operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-10-05 12:46:01.908395 +] I [MSGID: 108026] 
[afr-self-heal-entry.c:1052:afr_selfheal_entry_do] 0-gds-admin-replicate-0: 
performing entry selfheal on 0e309af2-2538-440a-8fd0-392620e83d05
[2021-10-05 12:46:01.914909 +] E [MSGID: 114031] 
[client-rpc-fops_v2.c:211:client4_0_mkdir_cbk] 0-gds-admin-client-0: remote 
operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-10-05 12:46:01.915225 +] E [MSGID: 114031] 
[client-rpc-fops_v2.c:211:client4_0_mkdir_cbk] 0-gds-admin-client-1: remote 
operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-10-05 12:46:01.915

[Gluster-users] Warning at mount when servers are missing

2020-12-02 Thread Marcus Pedersén
Hi all,
I made a misstake with my DNS the other day
so two servers were configured wrong and did not answer
on DNS.

I have a gluster cluster that is Distributed-Replicate
with the bricks: 2 x (2 + 1) = 6
Two of the servers did not exist in DNS (the same replicated pair),
and just doing a temporary mount, no errors or warnings when mounting,
and started to do my work.
After a while I realized that there were files missing
so I started to look in to it.
If you miss a replicated pair, gluster will quitly
mount the "rest" and you will only see half of your files.
To find this problem you have to look in the log files.

I think that is could be great to have a warning at
mount saying something like:
"Not all servers respond, you might not be able to see all files"
So you realize that you have a problem right away.
Running gluster version 8.2 on the client side.

Best regards
Marcus

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Return previously broken server to gluster cluster

2020-11-03 Thread Marcus Pedersén
Hello all,
I have a gluster cluster like this:

Volume Name: gds-home
Type: Replicate
Volume ID: 3d9d7182-47a8-43ac-8cd1-6a090bb4b8b9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: urd-gds-021:/urd-gds/gds-home
Brick2: urd-gds-022:/urd-gds/gds-home
Brick3: urd-gds-020:/urd-gds/gds-home (arbiter)
Options Reconfigured:
features.barrier: disable
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Brick 1 and 2 are both configured the same way.
The have a separate OS disk and the rest of the disks are all in one raid.
On top of this is a thin lvm created and the gluster brick lies on the lvm.
On brick1 the backplane to the disks crached and the OS disk crashed,
this has been fixed and I have managed to recreate the raid and the lvm,
so all data on the brick is intact.
The peer is still disconnected.

How do I reconfigure brick2 to be a part of the gluster cluster again?

I assume that when you do peer probe and volume create config
data is written to the OS disk.
Guessing that gluster peer probe urd-gds-021, does not work as it is
already configured.
Do I do the following:
gluster peer detach urd-gds-021
gluster peer probe urd-gds-021
gluster volume replace-brick gds-home urd-gds-021:/brick urd-gds-021/brick

I just want to be sure before I enter any commands so I do not destroy
instead if repairing.

Many thanks in advance!!

Best regards
Marcus
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Upgrade from 4.1

2020-03-09 Thread Marcus Pedersén
Hi all,
I just want to double check before I start my upgrade of gluster.
I am running 4.1 (distributed, replicated) today and I want to upgrade
to version 7.
What I understand from the documentation there is no need to upgrade
to a intermediate version before I upgrade to version 7.
I can directly upgrade 4.1 -> 7, am I correct?

Second question, is there a problem running an older version of gluster
on the client side?
If servers are version 7 and clients have version 5 or 6, is this a problem?

Many thanks in advance!

Best regards
Marcus
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 





Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Snapshots inconsistent state

2019-07-09 Thread Marcus Pedersén
Hi all,
I have a problem with snapshots on my gluster system.
Running centos7 and gluster 4.1.3.
I am running a distributed-replicated system: 2 x (2 + 1)

I needed to remove some snapshots so I used:
gluster snapshot delete 

I missed that one of the snapshots were activated.
It took a long while before the command finished and
it finished with an error.

The problem now is that this snaphot is in an inconsistent state.
The snapshot is missing on node1 and on the rest of the nodes
(node2-6) it is activated and exist.
If I try to activate, deactivate or delete on node1 I get the error
Snapshot does not exist.
The snapshot exists in /run/gluster/snaps
But it is not mounted.

If I try deactivate or delete on any of the other nodes I get:
Pre Validation failed on node1

If I try to activate it on any other node I get:
Already activated.

How do I solve these issue?

Many thanks in advance!!

Regards
Marcus

--
**
* Marcus Pedersén*
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster snapshot & geo-replication

2018-11-16 Thread Marcus Pedersén
Hi all,

I am using CentOS 7 and Gluster version 4.1.3


I am using thin LVM and creates snapshots once a day, of cause deleting the 
oldest ones after a while.

Creating a snap fails every now and then with the following different errors:

Error : Request timed out

or

failed: Brick ops failed on urd-gds-002. changelog notify failed

(Where the server name are different hosts in the gluster cluster all the time)


I have descovered that the log for snaps grows large, endlessly?

The log:

/var/log/glusterfs/snaps/urd-gds-volume/snapd.log

I now of size 21G and continues to grow.

I removed the file about 2 weeks ago and it was about the same size.

Is this the way it should be?

See a part of the log below.




Second of all I have stopped the geo-replication as I never managed to make it 
work.

Even when it is stopped and you try to pause geo-replication, you still get the 
respond:

Geo-replication paused successfully.

Should there be an error instead?


Resuming gives an error:

geo-replication command failed
Geo-replication session between urd-gds-volume and 
geouser@urd-gds-geo-001::urd-gds-volume is not Paused.


This is related to bug 1547446

https://bugzilla.redhat.com/show_bug.cgi?id=1547446

The fix should be present from 4.0 and onwards

Should I report this in the same bug?


Thanks alot!


Best regards

Marcus Pedersén


/var/log/glusterfs/snaps/urd-gds-volume/snapd.log:

[2018-11-13 18:51:16.498206] E [server-handshake.c:402:server_first_lookup] 
0-urd-gds-volume-server: first lookup on subdir (/interbull/common) failed: 
Invalid argument
[2018-11-13 18:51:16.498752] I [MSGID: 115036] [server.c:483:server_rpc_notify] 
0-urd-gds-volume-server: disconnecting connection from 
iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773
[2018-11-13 18:51:16.502120] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 
0-urd-gds-volume-server: Shutting down connection 
iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773
[2018-11-13 18:51:16.589263] I [addr.c:55:compare_addr_and_update] 
0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.118"
[2018-11-13 18:51:16.589324] I [MSGID: 115029] 
[server-handshake.c:763:server_setvolume] 0-urd-gds-volume-server: accepted 
client from 
iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
 (version: 3.13.1)
[2018-11-13 18:51:16.593003] E [server-handshake.c:385:server_first_lookup] 
0-snapd-urd-gds-volume: lookup on root failed: Permission denied
[2018-11-13 18:51:16.593177] E [server-handshake.c:342:do_path_lookup] 
0-snapd-urd-gds-volume: first lookup on subdir (interbull) failed: Permission 
denied
[2018-11-13 18:51:16.593206] E [server-handshake.c:402:server_first_lookup] 
0-urd-gds-volume-server: first lookup on subdir (/interbull/home) failed: 
Invalid argument
[2018-11-13 18:51:16.593678] I [MSGID: 115036] [server.c:483:server_rpc_notify] 
0-urd-gds-volume-server: disconnecting connection from 
iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
[2018-11-13 18:51:16.597201] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 
0-urd-gds-volume-server: Shutting down connection 
iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
[root@urd-gds-001 ~]# tail -n 100 
/var/log/glusterfs/snaps/urd-gds-volume/snapd.log
[2018-11-13 18:52:09.782058] I [MSGID: 115036] [server.c:483:server_rpc_notify] 
0-urd-gds-volume-server: disconnecting connection from 
iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767
[2018-11-13 18:52:09.785473] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 
0-urd-gds-volume-server: Shutting down connection 
iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767
[2018-11-13 18:52:09.821147] I [addr.c:55:compare_addr_and_update] 
0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.115"
[2018-11-13 18:52:09.821233] I [MSGID: 115029] 
[server-handshake.c:763:server_setvolume] 0-urd-gds-volume-server: accepted 
client from 
iqn-B002.iqnet.org-14408-2018/08/14-18:57:57:94863-urd-gds-volume-snapd-client-0-1638666
 (version: 3.13.1)
[2018-11-13 18:52:09.825173] E [server-handshake.c:385:server_first_lookup] 
0-snapd-urd-gds-volume: lookup on root failed: Permission denied
[2018-11-13 18:52:09.825397] E [server-handshake.c:342:do_path_lookup] 
0-snapd-urd-gds-volume: first lookup on subdir (interbull) failed: Permission 
denied
[2018-11-13 18:52:09.825450] E [server-handshake.c:402:server_first_lookup] 
0-urd-gds-volume-server: first lookup on subdir (/interbull/common) failed: 
Invalid argument
[2018-11-13 18:52:09.825917] I [MSGID: 115036] [server.c:483:server_rpc_notify] 
0-urd-gds-volume-server: disconnecting connection from 
iqn-B002.iqnet.org-14408-2018/08/14-18:57:57:94863-urd-gds-volume-snapd-client-0-1638666
[2018

Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

2018-09-11 Thread Marcus Pedersén
Hi Milind,


 I do not know if this will help, but using ausearch on one of the master nodes 
gives this:



time->Tue Sep 11 03:28:56 2018
type=PROCTITLE msg=audit(1536629336.548:1202535): 
proctitle=2F7573722F7362696E2F676C7573746572667364002D73007572642D6764732D303031002D2D766F6C66696C652D6964007572642D6764732D766F6C756D652E7572642D6764732D3030312E7572642D6764732D676C7573746572002D70002F7661722F72756E2F676C75737465722F766F6C732F7572642D6764732D766F6C
type=SYSCALL msg=audit(1536629336.548:1202535): arch=c03e syscall=87 
success=yes exit=0 a0=7f99acce4d70 a1=7f99acce4ed0 a2=1 a3=6533393436373635 
items=0 ppid=1 pid=3582 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 
egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="glusteriotwr12" 
exe="/usr/sbin/glusterfsd" subj=system_u:system_r:glusterd_t:s0 key=(null)
type=AVC msg=audit(1536629336.548:1202535): avc:  denied  { unlink } for  
pid=3582 comm="glusteriotwr12" name="67b1d169-14d0-413c-8f76-5676493ea7b8" 
dev="dm-7" ino=36507351099 scontext=system_u:system_r:glusterd_t:s0 
tcontext=system_u:object_r:unlabeled_t:s0 tclass=sock_file

time->Tue Sep 11 04:49:29 2018
type=PROCTITLE msg=audit(1536634169.350:1207973): 
proctitle=2F7573722F7362696E2F676C7573746572667364002D73007572642D6764732D303031002D2D766F6C66696C652D6964007572642D6764732D766F6C756D652E7572642D6764732D3030312E7572642D6764732D676C7573746572002D70002F7661722F72756E2F676C75737465722F766F6C732F7572642D6764732D766F6C
type=SYSCALL msg=audit(1536634169.350:1207973): arch=c03e syscall=280 
success=yes exit=0 a0=ff9c a1=7f99ad6f5400 a2=7f99ad6f54a0 a3=100 
items=0 ppid=1 pid=3582 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 
egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="glusteriotwr12" 
exe="/usr/sbin/glusterfsd" subj=system_u:system_r:glusterd_t:s0 key=(null)
type=AVC msg=audit(1536634169.350:1207973): avc:  denied  { setattr } for  
pid=3582 comm="glusteriotwr12" name="fe949f56-bc6e-411a-9a40-ad6209fbbd89" 
dev="dm-7" ino=51546135597 scontext=system_u:system_r:glusterd_t:s0 
tcontext=system_u:object_r:unlabeled_t:s0 tclass=sock_file



Best Regards

Marcus Pedersén




Från: Milind Changire 
Skickat: den 11 september 2018 08:56
Till: Marcus Pedersén
Kopia: Kotresh Hiremath Ravishankar; gluster-users@gluster.org
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work 
Now: Upgraded to 4.1.3 geo node Faulty

Marcus,
Is it possible to send over the SELinux errors that you encountered before 
turning it off ?
We could inspect and get the SELinux issues fixed as an aside.


On Mon, Sep 10, 2018 at 4:43 PM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi Kotresh,

I have been running 4.1.3 from the end of August.

Since then data has been synced to geo side with a couple of hundred GB per 24 
hour, even with the errors I have reported in this thread.


Four days ago all data transfer to geo side stopped, and the logs repeats the 
same error over and over again (see below).

Both nodes toggle status Active/Faulty.


Thanks alot!


Best regards

Marcus


One master node, gsyncd.log:

[2018-09-10 10:53:38.409709] I [gsyncdstatus(monitor):244:set_worker_status] 
GeorepStatus: Worker Status Change status=Faulty
[2018-09-10 10:53:47.783914] I [gsyncd(config-get):297:main] : Using 
session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:47.852792] I [gsyncd(status):297:main] : Using session 
config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.421061] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-09-10 10:53:48.462655] I [gsyncd(agent /urd-gds/gluster):297:main] : 
Using session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.463366] I [changelogagent(agent 
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-09-10 10:53:48.465905] I [gsyncd(worker /urd-gds/gluster):297:main] 
: Using session config file  
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-09-10 10:53:48.474558] I [resource(worker 
/urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-09-10 10:53:50.70219] I [resource(worker 
/urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and 
slave established. duration=1.5954
[2018-09-10 10:53:50.70777] I [resource(worker /urd-gds/gluster):1096:connect] 
GLUSTER: Mounting gluster volume locally...
[2018-09-10 10:53:51.170597] I [resource(worker /urd-gds/gluster):1119:connect] 
GLUSTER: Mounted gluster vol

Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

2018-09-10 Thread Marcus Pedersén
gluster):297:main] : 
Using session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf





Från: Kotresh Hiremath Ravishankar 
Skickat: den 3 september 2018 07:58
Till: Marcus Pedersén
Kopia: gluster-users@gluster.org
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work 
Now: Upgraded to 4.1.3 geo node Faulty

Hi Marcus,

Geo-rep had few important fixes in 4.1.3. Is it possible to upgrade and check 
whether the issue is still seen?

Thanks,
Kotresh HR

On Sat, Sep 1, 2018 at 5:08 PM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi again,

I found another problem on the other master node.

The node toggles Active/Faulty and it is the same error over and over again.


[2018-09-01 11:23:02.94080] E [repce(worker /urd-gds/gluster):197:__call__] 
RepceClient: call failedcall=1226:139955262510912:1535800981.24 
method=entry_opserror=GsyncdError
[2018-09-01 11:23:02.94214] E [syncdutils(worker 
/urd-gds/gluster):300:log_raise_exception] : execution of "gluster" failed 
with ENOENT (No such file or directory)
[2018-09-01 11:23:02.106194] I [repce(agent /urd-gds/gluster):80:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-09-01 11:23:02.12] I [gsyncdstatus(monitor):244:set_worker_status] 
GeorepStatus: Worker Status Change status=Faulty


I have also found a python error as well, I have only seen this once though.


[2018-09-01 11:16:45.907660] I [master(worker /urd-gds/gluster):1536:crawl] 
_GMaster: slave's time  stime=(1524101534, 0)
[2018-09-01 11:16:47.364109] E [syncdutils(worker 
/urd-gds/gluster):332:log_raise_exception] : FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 362, in 
twrap
tf(*aargs)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1939, in 
syncjob
po = self.sync_engine(pb, self.log_err)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1442, in 
rsync
rconf.ssh_ctl_args + \
AttributeError: 'NoneType' object has no attribute 'split'
[2018-09-01 11:16:47.384531] I [repce(agent /urd-gds/gluster):80:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-09-01 11:16:48.362987] I [monitor(monitor):279:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster
[2018-09-01 11:16:48.370701] I [gsyncdstatus(monitor):244:set_worker_status] 
GeorepStatus: Worker Status Change status=Faulty
[2018-09-01 11:16:58.390548] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000


I attach the logs as well.


Many thanks!


Best regards

Marcus Pedersén





Från: 
gluster-users-boun...@gluster.org<mailto:gluster-users-boun...@gluster.org> 
mailto:gluster-users-boun...@gluster.org>> 
för Marcus Pedersén mailto:marcus.peder...@slu.se>>
Skickat: den 31 augusti 2018 16:09
Till: khire...@redhat.com<mailto:khire...@redhat.com>

Kopia: gluster-users@gluster.org<mailto:gluster-users@gluster.org>
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work 
Now: Upgraded to 4.1.3 geo node Faulty


I realy appologize, third try to make mail smaller.


/Marcus



Från: Marcus Pedersén
Skickat: den 31 augusti 2018 16:03
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-users@gluster.org<mailto:gluster-users@gluster.org>
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work 
Now: Upgraded to 4.1.3 geo node Faulty


Sorry, resend due to too large mail.


/Marcus


Från: Marcus Pedersén
Skickat: den 31 augusti 2018 15:19
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-users@gluster.org<mailto:gluster-users@gluster.org>
Ämne: SV: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work 
Now: Upgraded to 4.1.3 geo node Faulty


Hi Kotresh,

Please find attached logs, only logs from today.

The python error was repeated over and over again until I disabled selinux.

After that the node bacame active again.

The return code 23 seems to be repeated over and over again.


rsync version 3.1.2


Thanks a lot!


Best regards

Marcus


____
Från: Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>
Skickat: den 31 augusti 2018 11:09
Till: Marcus Pedersén
Kopia: gluster-users@gluster.org<mailto:gluster-users@gluster.org>
Ämne: Re: [Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work 
Now: Upgraded to 4.1.3 geo node Faulty

Hi Marcus,

Could you attach full logs? Is the same trace back happening repeatedly? It 
will be helpful you attach the corresponding mount log as well.
What's the rsync version, you are using?

Thanks,
Kotresh HR

On Fri, Aug 31, 2018

[Gluster-users] Was: Upgrade to 4.1.2 geo-replication does not work Now: Upgraded to 4.1.3 geo node Faulty

2018-08-31 Thread Marcus Pedersén
Hi all,

I had problems with stopping sync after upgrade to 4.1.2.

I upgraded to 4.1.3 and it ran fine for one day, but now one of the master 
nodes shows faulty.

Most of the sync jobs have return code 23, how do I resolve this?

I see messages like:

_GMaster: Sucessfully fixed all entry ops with gfid mismatch

Will this resolve error code 23?

There is also a python error.

The python error was a selinux problem, turning off selinux made node go to 
active again.

See log below.


CentOS 7, installed through SIG Gluster (OS updated to latest at the same time)

Master cluster: 2 x (2 + 1) distributed, replicated

Client cluster: 1 x (2 + 1) replicated


Many thanks in advance!


Best regards

Marcus Pedersén



gsyncd.log from Faulty node:

[2018-08-31 06:25:51.375267] I [master(worker /urd-gds/gluster):1944:syncjob] 
Syncer: Sync Time Taken   duration=0.8099 num_files=57job=3   return_code=23
[2018-08-31 06:25:51.465895] I [master(worker /urd-gds/gluster):1944:syncjob] 
Syncer: Sync Time Taken   duration=0.0904 num_files=3 job=3   return_code=23
[2018-08-31 06:25:52.562107] E [repce(worker /urd-gds/gluster):197:__call__] 
RepceClient: call failed   call=30069:139655665837888:1535696752.35
method=entry_opserror=OSError
[2018-08-31 06:25:52.562346] E [syncdutils(worker 
/urd-gds/gluster):332:log_raise_exception] : FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in 
subcmd_worker
local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in 
service_loop
g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in 
crawlwrap
self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl
self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in 
changelogs_batch_process
self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in 
process
self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in 
process_change
failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in 
__call__
return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in 
__call__
raise res
OSError: [Errno 13] Permission denied
[2018-08-31 06:25:52.578367] I [repce(agent /urd-gds/gluster):80:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-08-31 06:25:53.558765] I [monitor(monitor):279:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster
[2018-08-31 06:25:53.569777] I [gsyncdstatus(monitor):244:set_worker_status] 
GeorepStatus: Worker Status Change status=Faulty
[2018-08-31 06:26:03.593161] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-08-31 06:26:03.636452] I [gsyncd(agent /urd-gds/gluster):297:main] : 
Using session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.636810] I [gsyncd(worker /urd-gds/gluster):297:main] 
: Using session config file  
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-31 06:26:03.637486] I [changelogagent(agent 
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-31 06:26:03.650330] I [resource(worker 
/urd-gds/gluster):1377:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-08-31 06:26:05.296473] I [resource(worker 
/urd-gds/gluster):1424:connect_remote] SSH: SSH connection between master and 
slave established.duration=1.6457
[2018-08-31 06:26:05.297904] I [resource(worker /urd-gds/gluster):1096:connect] 
GLUSTER: Mounting gluster volume locally...
[2018-08-31 06:26:06.396939] I [resource(worker /urd-gds/gluster):1119:connect] 
GLUSTER: Mounted gluster volume duration=1.0985
[2018-08-31 06:26:06.397691] I [subcmds(worker 
/urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2018-08-31 06:26:16.815566] I [master(worker /urd-gds/gluster):1593:register] 
_GMaster: Working dir
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-31 06:26:16.816423] I [resource(worker 
/urd-gds/gluster):1282:service_loop] GLUSTER: Register time time=1535696776
[2018-08-31 06:26:16.888772] I [gsyncdstatus(worker 
/urd-gds/gluster):277:set_active] GeorepStatus: Worker Status Change
status=Active
[2018-08-31 06:26:16.892049] I [gs

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-15 Thread Marcus Pedersén
Hi again Sunny,
Just a bit curious if you find anything in the logs that is useful and can help 
me get the geo-replication running.

Many thanks in advance!

Regards
Marcus

Från: gluster-users-boun...@gluster.org  för 
Marcus Pedersén 
Skickat: den 13 augusti 2018 22:45
Till: Sunny Kumar
Kopia: gluster-users@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Sunny,
Please find the enclosed mount logs for the two active mater nodes.
I cut them down to todays logs.

Thanks!

Marcus


Från: Sunny Kumar 
Skickat: den 13 augusti 2018 21:49
Till: Marcus Pedersén
Kopia: Kotresh Hiremath Ravishankar; gluster-users@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Marcus,

Can you please share mount log from slave (You can find it at
"/var/log/glusterfs/geo-replication-slaves/hostname/mnt.log").

- Sunny
On Tue, Aug 14, 2018 at 12:48 AM Marcus Pedersén  wrote:
>
> Hi again,
>
> New changes in behaviour, both master master nodes that are active toggles to 
> failure and the logs repeat the same over and over again.
>
>
> Part of log, node1:
>
> [2018-08-13 18:24:44.701711] I [gsyncdstatus(worker 
> /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change
> status=Active
> [2018-08-13 18:24:44.704360] I [gsyncdstatus(worker 
> /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status 
> Changestatus=History Crawl
> [2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl] 
> _GMaster: starting history crawlturns=1 stime=(1523907056, 0)   
> entry_stime=Noneetime=1534184684
> [2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl] 
> _GMaster: slave's time  stime=(1523907056, 0)
> [2018-08-13 18:24:45.904958] E [repce(worker /urd-gds/gluster):197:__call__] 
> RepceClient: call failed   call=5919:140339726538560:1534184685.88 
> method=entry_opserror=GsyncdError
> [2018-08-13 18:24:45.905111] E [syncdutils(worker 
> /urd-gds/gluster):298:log_raise_exception] : execution of "gluster" 
> failed with ENOENT (No such file or directory)
> [2018-08-13 18:24:45.919265] I [repce(agent 
> /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor: worker 
> died in startup phase brick=/urd-gds/gluster
> [2018-08-13 18:24:46.561784] I [gsyncdstatus(monitor):243:set_worker_status] 
> GeorepStatus: Worker Status Change status=Faulty
> [2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor: 
> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main] 
> : Using session config file  
> path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main] 
> : Using session config file   
> path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655889] I [changelogagent(agent 
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:24:56.664628] I [resource(worker 
> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection 
> between master and slave...
> [2018-08-13 18:24:58.347415] I [resource(worker 
> /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and 
> slave established.duration=1.6824
> [2018-08-13 18:24:58.348151] I [resource(worker 
> /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:24:59.463598] I [resource(worker 
> /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume 
> duration=1.1150
> [2018-08-13 18:24:59.464184] I [subcmds(worker 
> /urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
> Acknowledging back to monitor
> [2018-08-13 18:25:01.549007] I [master(worker 
> /urd-gds/gluster):1534:register] _GMaster: Working dir
> path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:01.549606] I [resource(worker 
> /urd-gds/gluster):1253:service_loop] GLUSTER: Register time 
> time=1534184701
> [2018-08-13 18:25:01.593946] I [gsyncdstatus(worker 
> /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change
> status=Active
>
>
> Part of log, node2:
>
> [2018-08-13 18:25:14.554233] I [gsyncdstatus(monitor):243:set_worker_status] 
> GeorepStatus: Worker Status Change status=Faulty
> [2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Moni

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-13 Thread Marcus Pedersén
worker /urd-gds/gluster):1534:register] 
_GMaster: Working dir
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-13 18:25:37.769479] I [resource(worker 
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184737
[2018-08-13 18:25:37.787317] I [gsyncdstatus(worker 
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change
status=Active
[2018-08-13 18:25:37.789822] I [gsyncdstatus(worker 
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status 
Changestatus=History Crawl
[2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl] 
_GMaster: starting history crawlturns=1 stime=(1525290650, 0)   
entry_stime=(1525296245, 0) etime=1534184737
[2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl] 
_GMaster: slave's time  stime=(1525290650, 0)
[2018-08-13 18:25:38.63499] I [master(worker /urd-gds/gluster):1301:process] 
_GMaster: Skipping already processed entry ops to_changelog=1525290651 
num_changelogs=1from_changelog=1525290651
[2018-08-13 18:25:38.63621] I [master(worker /urd-gds/gluster):1315:process] 
_GMaster: Entry Time Taken MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0   
CRE=0   duration=0. UNL=0
[2018-08-13 18:25:38.63678] I [master(worker /urd-gds/gluster):1325:process] 
_GMaster: Data/Metadata Time Taken SETA=1  SETX=0  meta_duration=0.0228
data_duration=0.2456DATA=0  XATT=0
[2018-08-13 18:25:38.63822] I [master(worker /urd-gds/gluster):1335:process] 
_GMaster: Batch Completed  changelog_end=1525290651
entry_stime=(1525296245, 0) changelog_start=1525290651  stime=(152\
5290650, 0)   duration=0.2723 num_changelogs=1mode=history_changelog
[2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl] 
_GMaster: slave's time   stime=(1525290650, 0)
[2018-08-13 18:25:38.480941] I [master(worker /urd-gds/gluster):1885:syncjob] 
Syncer: Sync Time Taken   duration=0.1327 num_files=3 job=3   return_code=23
[2018-08-13 18:25:39.963423] I [master(worker /urd-gds/gluster):1885:syncjob] 
Syncer: Sync Time Taken   duration=0.1133 num_files=8 job=1   return_code=23
[2018-08-13 18:25:39.980724] I [master(worker /urd-gds/gluster):1885:syncjob] 
Syncer: Sync Time Taken   duration=0.6315 num_files=47job=2   return_code=23


...


[2018-08-13 18:26:04.534953] I [master(worker /urd-gds/gluster):1885:syncjob] 
Syncer: Sync Time Taken   duration=0.0988 num_files=18job=2   return_code=23
[2018-08-13 18:26:07.798583] I [master(worker /urd-gds/gluster):1885:syncjob] 
Syncer: Sync Time Taken   duration=0.2600 num_files=27job=2   return_code=23
[2018-08-13 18:26:08.708100] I [master(worker /urd-gds/gluster):1885:syncjob] 
Syncer: Sync Time Taken   duration=0.4090 num_files=67job=2   return_code=23
[2018-08-13 18:26:14.865883] E [repce(worker /urd-gds/gluster):197:__call__] 
RepceClient: call failed   call=18662:140079998809920:1534184774.58
method=entry_opserror=GsyncdError
[2018-08-13 18:26:14.866166] E [syncdutils(worker 
/urd-gds/gluster):298:log_raise_exception] : execution of "gluster" failed 
with ENOENT (No such file or directory)
[2018-08-13 18:26:14.991022] I [repce(agent /urd-gds/gluster):80:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster
[2018-08-13 18:26:15.397360] I [gsyncdstatus(monitor):243:set_worker_status] 
GeorepStatus: Worker Status Change status=Faulty


Help would be appriciated!

Thanks!


Regards

Marcus Pedersén



Från: gluster-users-boun...@gluster.org  för 
Marcus Pedersén 
Skickat: den 12 augusti 2018 22:18
Till: khire...@redhat.com
Kopia: gluster-users@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours


Hi,

As the geo-replication stopped after 4-5 hours, I added a cron job that 
stopped, paused for 2 mins and started geo-replication again every 6 hours.

The cron job has been running for 5 days and the changelogs has been catching 
up.


Now a different behavior has shown up.

In one of the active master nodes I get a python error.

The other active master node has started to toggle status between active and 
faulty.

See parts of logs below.


When I read Troubleshooting Geo-replication, there is a suggestion when sync is 
not complete, to enforce a full sync of the data by erasing the index and 
restarting GlusterFS geo-replication.

There is no explanation of how to erase the index.

Should I enforse a full sync?

How do I erase the index?


Thanks a lot!


Best regards

Marcus Pedersén



Node with python error:

[2018-08-12 16:02:05.304924] I [resource(worker 
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-08-12 16:02:06.842832] I [reso

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-12 Thread Marcus Pedersén
Hi,

As the geo-replication stopped after 4-5 hours, I added a cron job that 
stopped, paused for 2 mins and started geo-replication again every 6 hours.

The cron job has been running for 5 days and the changelogs has been catching 
up.


Now a different behavior has shown up.

In one of the active master nodes I get a python error.

The other active master node has started to toggle status between active and 
faulty.

See parts of logs below.


When I read Troubleshooting Geo-replication, there is a suggestion when sync is 
not complete, to enforce a full sync of the data by erasing the index and 
restarting GlusterFS geo-replication.

There is no explanation of how to erase the index.

Should I enforse a full sync?

How do I erase the index?


Thanks a lot!


Best regards

Marcus Pedersén



Node with python error:

[2018-08-12 16:02:05.304924] I [resource(worker 
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-08-12 16:02:06.842832] I [resource(worker 
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and 
slave established.duration=1.5376
[2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect] 
GLUSTER: Mounting gluster volume locally...
[2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect] 
GLUSTER: Mounted gluster volume duration=1.0869
[2018-08-12 16:02:07.931536] I [subcmds(worker 
/urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register] 
_GMaster: Working dir
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-12 16:02:20.760411] I [resource(worker 
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534089740
[2018-08-12 16:02:20.831918] I [gsyncdstatus(worker 
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change
status=Active
[2018-08-12 16:02:20.835541] I [gsyncdstatus(worker 
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status 
Changestatus=History Crawl
[2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl] 
_GMaster: starting history crawlturns=1 stime=(1523906126, 0)   
entry_stime=Noneetime=1534089740
[2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl] 
_GMaster: slave's time  stime=(1523906126, 0)
[2018-08-12 16:02:21.950453] E [syncdutils(worker 
/urd-gds/gluster):330:log_raise_exception] : FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 360, in 
twrap
tf(*aargs)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1880, in 
syncjob
po = self.sync_engine(pb, self.log_err)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1413, in 
rsync
rconf.ssh_ctl_args + \
AttributeError: 'NoneType' object has no attribute 'split'
[2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster
[2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status] 
GeorepStatus: Worker Status Change status=Faulty
[2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main] : 
Using session config file
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 16:02:33.16882] I [changelogagent(agent 
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main] : 
Using session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 16:02:33.26951] I [resource(worker 
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-08-12 16:02:34.642838] I [resource(worker 
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and 
slave established.duration=1.6156
[2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect] 
GLUSTER: Mounting gluster volume locally...




Node that toggles status between active and faulty:

[2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob] 
Syncer: Sync Time Taken   duration=0.2757 num_files=27job=2   return_code=23
[2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob] 
Syncer: Sync Time Taken   duration=0.3767 num_files=67job=1   return_code=23
[2018-08-12 19:33:09.926820] E [re

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-06 Thread Marcus Pedersén
Hi,

Is there a way to resolve the problem with rsync and hanging processes?

Do I need to kill all the processes and hope that it starts again or stop/start 
geo-replication?


If I stop/start geo-replication it will start again, I have tried it before.


Regards

Marcus




Från: gluster-users-boun...@gluster.org  för 
Marcus Pedersén 
Skickat: den 2 augusti 2018 10:04
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-users@gluster.org
Ämne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Kotresh,

I get the following and then it hangs:

strace: Process 5921 attachedwrite(2, "rsync: link_stat 
\"/tmp/gsyncd-au"..., 12811


When sync is running I can see rsync with geouser on the slave node.

Regards
Marcus

########
Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar :
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:
On both active master nodes there is an rsync process. As in:

root  5921  0.0  0.0 115424  1176 ?SAug01   0:00 rsync -aR0 
--inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no 
-i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock 
geouser@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and  gsyncd.py processes.

Regards
Marcus

####
Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:
Hi Kortesh,
rsync  version 3.1.2  protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone



Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another 
thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication 
and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.


I have looked through logs but can not find anything obvious.


Status for geo-replication is active for the two same nodes all the time:


MASTER NODEMASTER VOLMASTER BRICK SLAVE USERSLAVE   
   SLAVE NODE STATUS CRAWL STATUS 
LAST_SYNCEDENTRYDATA METAFAILURESCHECKPOINT TIME
CHECKPOINT COMPLETEDCHECKPOINT COMPLETION TIME
---
urd-gds-001urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-04-16 20:32:090142050   0   
2018-07-27 21:12:44No  N/A
urd-gds-002urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-004urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-003urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-05-01 20:58:14285  4552 0   0   
2018-07-27 21:12:44No  N/A
urd-gds-000urd-gds-volume/urd-gds/gluster1geouser   

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-02 Thread Marcus Pedersén
Hi Kortesh,
rsync  version 3.1.2  protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone



Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar :
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another 
thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication 
and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.


I have looked through logs but can not find anything obvious.


Status for geo-replication is active for the two same nodes all the time:


MASTER NODEMASTER VOLMASTER BRICK SLAVE USERSLAVE   
   SLAVE NODE STATUS CRAWL STATUS 
LAST_SYNCEDENTRYDATA METAFAILURESCHECKPOINT TIME
CHECKPOINT COMPLETEDCHECKPOINT COMPLETION TIME
---
urd-gds-001urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-04-16 20:32:090142050   0   
2018-07-27 21:12:44No  N/A
urd-gds-002urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-004urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-003urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-05-01 20:58:14285  4552 0   0   
2018-07-27 21:12:44No  N/A
urd-gds-000urd-gds-volume/urd-gds/gluster1geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-000urd-gds-volume/urd-gds/gluster2geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A


Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB


Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB


Parts from gsyncd.logs are enclosed.


Thanks a lot!


Best regards

Marcus Pedersén




---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

___
Gluster-users mailing list
Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-02 Thread Marcus Pedersén
Hi Kotresh,

I get the following and then it hangs:

strace: Process 5921 attachedwrite(2, "rsync: link_stat 
\"/tmp/gsyncd-au"..., 12811


When sync is running I can see rsync with geouser on the slave node.

Regards
Marcus

########
Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar :
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:
On both active master nodes there is an rsync process. As in:

root  5921  0.0  0.0 115424  1176 ?SAug01   0:00 rsync -aR0 
--inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no 
-i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock 
geouser@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and  gsyncd.py processes.

Regards
Marcus

####
Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:
Hi Kortesh,
rsync  version 3.1.2  protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone



Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another 
thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication 
and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.


I have looked through logs but can not find anything obvious.


Status for geo-replication is active for the two same nodes all the time:


MASTER NODEMASTER VOLMASTER BRICK SLAVE USERSLAVE   
   SLAVE NODE STATUS CRAWL STATUS 
LAST_SYNCEDENTRYDATA METAFAILURESCHECKPOINT TIME
CHECKPOINT COMPLETEDCHECKPOINT COMPLETION TIME
---
urd-gds-001urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-04-16 20:32:090142050   0   
2018-07-27 21:12:44No  N/A
urd-gds-002urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-004urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-003urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-05-01 20:58:14285  4552 0   0   
2018-07-27 21:12:44No  N/A
urd-gds-000urd-gds-volume/urd-gds/gluster1geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-000urd-gds-volume/urd-gds/gluster2geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A


Master cluster is Distribute-Replicate

2 x (2 + 1)

Used s

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-01 Thread Marcus Pedersén
On both active master nodes there is an rsync process. As in:

root  5921  0.0  0.0 115424  1176 ?SAug01   0:00 rsync -aR0 
--inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no 
-i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock 
geouser@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and  gsyncd.py processes.

Regards
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar :
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:
Hi Kortesh,
rsync  version 3.1.2  protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus

####
Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone



Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another 
thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication 
and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.


I have looked through logs but can not find anything obvious.


Status for geo-replication is active for the two same nodes all the time:


MASTER NODEMASTER VOLMASTER BRICK SLAVE USERSLAVE   
   SLAVE NODE STATUS CRAWL STATUS 
LAST_SYNCEDENTRYDATA METAFAILURESCHECKPOINT TIME
CHECKPOINT COMPLETEDCHECKPOINT COMPLETION TIME
---
urd-gds-001urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-04-16 20:32:090142050   0   
2018-07-27 21:12:44No  N/A
urd-gds-002urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-004urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-003urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-05-01 20:58:14285  4552 0   0   
2018-07-27 21:12:44No  N/A
urd-gds-000urd-gds-volume/urd-gds/gluster1geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-000urd-gds-volume/urd-gds/gluster2geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A


Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB


Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB


Parts from gsyncd.logs are enclosed.


Thanks a lot!


Best regards

Marcus Pedersén




---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

___
Gluster-users mailing list
Gluster-users@gluster.org<m

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-26 Thread Marcus Pedersén
Thanks for your help, Sunny and Kotresh!

The geo-replication is working now!
The final step I tried was to make a symlink
ln -s /usr/lib64/libgfchangelog.so.1 /usr/lib64/libgfchangelog.so

After that everything started working!

Do I need to report the steps I made somewhere? I don't know if rpm is made by 
gluster or Cent os?

I started with Cent os 7 and gluster 3.12.9, installed from Cent os SIG gluster
I did the following steps:
- Installed and upgraded to 4.1.1 from Cent os SIG gluster
- Installed fix https://review.gluster.org/#/c/20207/
- Changed permissions on file /var/log/glusterfs/cli.log  so geo user could 
access it
- Made symlinks to /usr/local/sbin/gluster and /usr/local/sbin/gluster
Better way should have been to change config:
 #gluster vol geo-rep   config gluster-command-dir 

 #gluster vol geo-rep   config slave-gluster-command-dir 

- Made symlink:
ln -s /usr/lib64/libgfchangelog.so.0 /usr/lib64/libgfchangelog.so

Thanks for all help!

Regards
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 23 juli 2018 20:11 skrev Marcus Pedersén :
Hi again Sunny,
Sorry, I missed the obvious myself!

#find /usr/ -name libgfchangelog.so
Gives nothing

#find /usr/ -name libgfchangelog.so*
Gives:
/usr/lib64/libgfchangelog.so.0  
/usr/lib64/libgfchangelog.so.0.0.1

Regards
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone



Den 23 juli 2018 15:15 skrev Sunny Kumar :
Hi Marcus,

Okay first apologies for wrong pattern here, please run

# find /usr/ -name libgfchangelog.so

- Sunny
On Mon, Jul 23, 2018 at 6:25 PM Marcus Pedersén  wrote:
>
> Hi,
>  #find /usr/ -name libglusterfs.so
> Gives nothing.
>
> #find /usr/ -name libglusterfs.so*
> Gives:
> /usr/lib64/libglusterfs.so.0
> /usr/lib64/libglusterfs.so.0.0.1
>
> Thanks!
> Marcus
>
> 
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> 
> Sent from my phone
> 
>
> Den 23 juli 2018 14:17 skrev Sunny Kumar :
>
> Hi,
>
> Can you confirm the location for libgfchangelog.so
> by sharing output of following command -
> # find /usr/ -name libglusterfs.so
>
> - Sunny
>
> On Mon, Jul 23, 2018 at 5:12 PM Marcus Pedersén  
> wrote:
> >
> > Hi Sunny,
> > Here comes a part of gsyncd.log (The same info is repeated over and over 
> > again):
> >
> >   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
> > __call__
> > raise res
> > OSError: libgfchangelog.so: cannot open shared object file: No such file or 
> > directory
> > [2018-07-23 11:33:09.254915] I [repce(agent 
> > /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> > [2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: 
> > worker died in startup phase brick=/urd-gds/gluster
> > [2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: 
> > starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> > [2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] 
> > : Using session config file   
> > path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] 
> > : Using session config file  
> > path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> > [2018-07-23 11:33:20.327152] I [changelogagent(agent 
> > /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> > [2018-07-23 11:33:20.335777] I [resource(worker 
> > /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection 
> > between master and slave...
> > [2018-07-23 11:33:22.11188] I [resource(worker 
> > /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master 
> > and slave established. duration=1.6752
> > [2018-07-23 11:33:22.11744] I [resource(worker 
> > /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> > [2018-07-23 11:33:23.101602] I [resource(worker 
> > /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume 
> > duration=1.0894
> > [2018-07-23 11:33:23.102168] I [subcmds(worker 
> > /urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
> > Acknowledging back to monitor
> > [2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] 
> > : call failed:
> > Traceback (most recent call last):
> >   File "/usr/libexec/glusterfs/python/sync

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-23 Thread Marcus Pedersén
Hi,
 #find /usr/ -name libglusterfs.so
Gives nothing.

#find /usr/ -name libglusterfs.so*
Gives:
/usr/lib64/libglusterfs.so.0
/usr/lib64/libglusterfs.so.0.0.1

Thanks!
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 23 juli 2018 14:17 skrev Sunny Kumar :
Hi,

Can you confirm the location for libgfchangelog.so
by sharing output of following command -
# find /usr/ -name libglusterfs.so

- Sunny

On Mon, Jul 23, 2018 at 5:12 PM Marcus Pedersén  wrote:
>
> Hi Sunny,
> Here comes a part of gsyncd.log (The same info is repeated over and over 
> again):
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
> __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or 
> directory
> [2018-07-23 11:33:09.254915] I [repce(agent 
> /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> [2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker 
> died in startup phase brick=/urd-gds/gluster
> [2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: 
> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] 
> : Using session config file   
> path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] 
> : Using session config file  
> path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-23 11:33:20.327152] I [changelogagent(agent 
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-23 11:33:20.335777] I [resource(worker 
> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection 
> between master and slave...
> [2018-07-23 11:33:22.11188] I [resource(worker 
> /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and 
> slave established. duration=1.6752
> [2018-07-23 11:33:22.11744] I [resource(worker 
> /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-07-23 11:33:23.101602] I [resource(worker 
> /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume 
> duration=1.0894
> [2018-07-23 11:33:23.102168] I [subcmds(worker 
> /urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
> Acknowledging back to monitor
> [2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] 
> : call failed:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in 
> worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, 
> in init
> return Changes.cl_init()
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, 
> in __getattr__
> from libgfchangelog import Changes as LChanges
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, 
> in 
> class Changes(object):
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, 
> in Changes
> use_errno=True)
>   File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> self._handle = _dlopen(self._name, mode)
> OSError: libgfchangelog.so: cannot open shared object file: No such file or 
> directory
> [2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] 
> RepceClient: call failed   call=29589:140155686246208:1532345603.11
> method=init error=OSError
> [2018-07-23 11:33:23.119708] E [syncdutils(worker 
> /urd-gds/gluster):330:log_raise_exception] : FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
> func(args)
>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in 
> subcmd_worker
> local.service_loop(remote)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in 
> service_loop
> changelog_agent.init()
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in 
> __call__
> return self.ins(self.meth, *a)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
> __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or 
> directory
> [2018-07-23 11:33:23.130100] I [repce(agent 
> /urd-gds/gluster):89:service_loop] R

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-23 Thread Marcus Pedersén
Hi Sunny,
Here comes a part of gsyncd.log (The same info is repeated over and over again):

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
__call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-23 11:33:09.254915] I [repce(agent /urd-gds/gluster):89:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-23 11:33:10.225150] I [monitor(monitor):272:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster
[2018-07-23 11:33:20.250036] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-07-23 11:33:20.326205] I [gsyncd(agent /urd-gds/gluster):297:main] : 
Using session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-23 11:33:20.326282] I [gsyncd(worker /urd-gds/gluster):297:main] 
: Using session config file  
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-23 11:33:20.327152] I [changelogagent(agent 
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-23 11:33:20.335777] I [resource(worker 
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-07-23 11:33:22.11188] I [resource(worker 
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and 
slave established. duration=1.6752
[2018-07-23 11:33:22.11744] I [resource(worker /urd-gds/gluster):1067:connect] 
GLUSTER: Mounting gluster volume locally...
[2018-07-23 11:33:23.101602] I [resource(worker /urd-gds/gluster):1090:connect] 
GLUSTER: Mounted gluster volume duration=1.0894
[2018-07-23 11:33:23.102168] I [subcmds(worker 
/urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2018-07-23 11:33:23.119129] E [repce(agent /urd-gds/gluster):114:worker] 
: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, 
in init
return Changes.cl_init()
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, 
in __getattr__
from libgfchangelog import Changes as LChanges
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, 
in 
class Changes(object):
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, 
in Changes
use_errno=True)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-23 11:33:23.119609] E [repce(worker /urd-gds/gluster):206:__call__] 
RepceClient: call failed   call=29589:140155686246208:1532345603.11
method=init error=OSError
[2018-07-23 11:33:23.119708] E [syncdutils(worker 
/urd-gds/gluster):330:log_raise_exception] : FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in 
subcmd_worker
local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in 
service_loop
changelog_agent.init()
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in 
__call__
return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
__call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-23 11:33:23.130100] I [repce(agent /urd-gds/gluster):89:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-23 11:33:24.104176] I [monitor(monitor):272:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster

Thanks, Sunny!!

Regards
Marcus Pedersén


Från: Sunny Kumar 
Skickat: den 23 juli 2018 12:53
Till: Marcus Pedersén
Kopia: Kotresh Hiremath Ravishankar; gluster-users@gluster.org
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Hi Marcus,

On Mon, Jul 23, 2018 at 4:04 PM Marcus Pedersén  wrote:
>
> Hi Sunny,
> ldconfig -p /usr/local/lib | grep libgf
> Output:
> libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0  
>   libgfrpc.so.0 (libc6,x86-64) => 
> /lib64/libgfrpc.so.0
> libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
>   libgfchangelog.so.0 (libc6,x86-64) => 
> /li

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-23 Thread Marcus Pedersén
Hi Sunny,
ldconfig -p /usr/local/lib | grep libgf
Output:
libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
libgfrpc.so.0 (libc6,x86-64) => 
/lib64/libgfrpc.so.0
libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0  
libgfchangelog.so.0 (libc6,x86-64) => 
/lib64/libgfchangelog.so.0
libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0

So that seems to be alright,  right?

Best regards
Marcus

########
Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 23 juli 2018 11:17 skrev Sunny Kumar :
Hi Marcus,

On Wed, Jul 18, 2018 at 4:08 PM Marcus Pedersén  wrote:
>
> Hi Kotresh,
>
> I ran:
>
> #ldconfig /usr/lib
can you do -
ldconfig /usr/local/lib

Output:
>
> on all nodes in both clusters but I still get the same error.
>
> What to do?
>
>
> Output for:
>
> # ldconfig -p /usr/lib | grep libgf
>
> libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
> libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
> libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
> libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
> libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0
>
>
> I read somewere that you could change some settings for geo-replication to 
> speed up sync.
>
> I can not remember where I saw that and what config parameters.
>
> When geo-replication works I have 30TB on master cluster that has to be 
> synced to slave nodes,
>
> and that will take a while before the slave nodes have catched up.
>
>
> Thanks and regards
>
> Marcus Pedersén
>
>
> Part of gsyncd.log:
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
> __call__
> raise res
> OSError: libgfchangelog.so: cannot open shared object file: No such file or 
> directory
> [2018-07-18 10:23:52.305119] I [repce(agent 
> /urd-gds/gluster):89:service_loop] RepceServer: terminating on reaching EOF.
> [2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker 
> died in startup phase brick=/urd-gds/gluster
> [2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: 
> starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] 
> : Using session config file   
> path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] 
> : Using session config file  
> path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-07-18 10:24:03.335380] I [changelogagent(agent 
> /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-07-18 10:24:03.343605] I [resource(worker 
> /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection 
> between master and slave...
> [2018-07-18 10:24:04.881148] I [resource(worker 
> /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and 
> slave established.duration=1.5373
> [2018-07-18 10:24:04.881707] I [resource(worker 
> /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-07-18 10:24:05.967451] I [resource(worker 
> /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume 
> duration=1.0853
> [2018-07-18 10:24:05.968028] I [subcmds(worker 
> /urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
> Acknowledging back to monitor
> [2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] 
> : call failed:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in 
> worker
> res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, 
> in init
> return Changes.cl_init()
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, 
> in __getattr__
> from libgfchangelog import Changes as LChanges
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, 
> in 
> class Changes(object):
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, 
> in Changes
> use_errno=True)
>   File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
> self._handle = _dlopen(self._name, mode)
> OSError: libgfchangelog.so: cannot open shared obje

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-18 Thread Marcus Pedersén
Hi again,

I continue to do some testing, but now I have come to a stage where I need help.


gsyncd.log was complaining about that /usr/local/sbin/gluster was missing so I 
made a link.

After that /usr/local/sbin/glusterfs was missing so I made a link there as well.

Both links were done on all slave nodes.


Now I have a new error that I can not resolve myself.

It can not open libgfchangelog.so


Many thanks!

Regards

Marcus Pedersén


Part of gsyncd.log:

OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-17 19:32:06.517106] I [repce(agent /urd-gds/gluster):89:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-17 19:32:07.479553] I [monitor(monitor):272:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster
[2018-07-17 19:32:17.500709] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-07-17 19:32:17.541547] I [gsyncd(agent /urd-gds/gluster):297:main] : 
Using session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-17 19:32:17.541959] I [gsyncd(worker /urd-gds/gluster):297:main] 
: Using session config file  
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-17 19:32:17.542363] I [changelogagent(agent 
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-17 19:32:17.550894] I [resource(worker 
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-07-17 19:32:19.166246] I [resource(worker 
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and 
slave established.duration=1.6151
[2018-07-17 19:32:19.166806] I [resource(worker /urd-gds/gluster):1067:connect] 
GLUSTER: Mounting gluster volume locally...
[2018-07-17 19:32:20.257344] I [resource(worker /urd-gds/gluster):1090:connect] 
GLUSTER: Mounted gluster volume duration=1.0901
[2018-07-17 19:32:20.257921] I [subcmds(worker 
/urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2018-07-17 19:32:20.274647] E [repce(agent /urd-gds/gluster):114:worker] 
: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, 
in init
return Changes.cl_init()
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, 
in __getattr__
from libgfchangelog import Changes as LChanges
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, 
in 
class Changes(object):
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, 
in Changes
use_errno=True)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-17 19:32:20.275093] E [repce(worker /urd-gds/gluster):206:__call__] 
RepceClient: call failed   call=6078:139982918485824:1531855940.27 method=init  
   error=OSError
[2018-07-17 19:32:20.275192] E [syncdutils(worker 
/urd-gds/gluster):330:log_raise_exception] : FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in 
subcmd_worker
local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in 
service_loop
changelog_agent.init()
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in 
__call__
return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
__call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-17 19:32:20.286787] I [repce(agent /urd-gds/gluster):89:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-17 19:32:21.259891] I [monitor(monitor):272:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster




Från: gluster-users-boun...@gluster.org  för 
Marcus Pedersén 
Skickat: den 16 juli 2018 21:59
Till: khire...@redhat.com
Kopia: gluster-users@gluster.org
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work


Hi Kotresh,

I have been testing for a bit and as you can see from the logs I sent before 
permission is denied for geouser on slave node on file:

/var/log/glusterfs/cli.log

I have turned selinux off and just for testing I changed permissions on 
/var/log/glusterfs/cli.log so geouser 

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-18 Thread Marcus Pedersén
Hi Kotresh,

I ran:

#ldconfig /usr/lib

on all nodes in both clusters but I still get the same error.

What to do?


Output for:

# ldconfig -p /usr/lib | grep libgf

libgfxdr.so.0 (libc6,x86-64) => /lib64/libgfxdr.so.0
libgfrpc.so.0 (libc6,x86-64) => /lib64/libgfrpc.so.0
libgfdb.so.0 (libc6,x86-64) => /lib64/libgfdb.so.0
libgfchangelog.so.0 (libc6,x86-64) => /lib64/libgfchangelog.so.0
libgfapi.so.0 (libc6,x86-64) => /lib64/libgfapi.so.0


I read somewere that you could change some settings for geo-replication to 
speed up sync.

I can not remember where I saw that and what config parameters.

When geo-replication works I have 30TB on master cluster that has to be synced 
to slave nodes,

and that will take a while before the slave nodes have catched up.


Thanks and regards

Marcus Pedersén


Part of gsyncd.log:

  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
__call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-18 10:23:52.305119] I [repce(agent /urd-gds/gluster):89:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-18 10:23:53.273298] I [monitor(monitor):272:monitor] Monitor: worker 
died in startup phase brick=/urd-gds/gluster
[2018-07-18 10:24:03.294312] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-07-18 10:24:03.334563] I [gsyncd(agent /urd-gds/gluster):297:main] : 
Using session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-18 10:24:03.334702] I [gsyncd(worker /urd-gds/gluster):297:main] 
: Using session config file  
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-18 10:24:03.335380] I [changelogagent(agent 
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-18 10:24:03.343605] I [resource(worker 
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-07-18 10:24:04.881148] I [resource(worker 
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and 
slave established.duration=1.5373
[2018-07-18 10:24:04.881707] I [resource(worker /urd-gds/gluster):1067:connect] 
GLUSTER: Mounting gluster volume locally...
[2018-07-18 10:24:05.967451] I [resource(worker /urd-gds/gluster):1090:connect] 
GLUSTER: Mounted gluster volume duration=1.0853
[2018-07-18 10:24:05.968028] I [subcmds(worker 
/urd-gds/gluster):70:subcmd_worker] : Worker spawn successful. 
Acknowledging back to monitor
[2018-07-18 10:24:05.984179] E [repce(agent /urd-gds/gluster):114:worker] 
: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 110, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 37, 
in init
return Changes.cl_init()
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 21, 
in __getattr__
from libgfchangelog import Changes as LChanges
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 17, 
in 
class Changes(object):
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 19, 
in Changes
use_errno=True)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-18 10:24:05.984647] E [repce(worker /urd-gds/gluster):206:__call__] 
RepceClient: call failed   call=1146:139672481965888:1531909445.98 method=init  
   error=OSError
[2018-07-18 10:24:05.984747] E [syncdutils(worker 
/urd-gds/gluster):330:log_raise_exception] : FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main
func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in 
subcmd_worker
local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1236, in 
service_loop
changelog_agent.init()
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 225, in 
__call__
return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 207, in 
__call__
raise res
OSError: libgfchangelog.so: cannot open shared object file: No such file or 
directory
[2018-07-18 10:24:05.994826] I [repce(agent /urd-gds/gluster):89:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-18 10:24:06.969984] I [monitor(monitor):272:monitor] Monitor: worker 
died in startup phase     brick=/urd-gds/gluster



Från: Kotres

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-16 Thread Marcus Pedersén
Hi Kotresh,

I have been testing for a bit and as you can see from the logs I sent before 
permission is denied for geouser on slave node on file:

/var/log/glusterfs/cli.log

I have turned selinux off and just for testing I changed permissions on 
/var/log/glusterfs/cli.log so geouser can access it.

Starting geo-replication after that gives response successful but all nodes get 
status Faulty.


If I run: gluster-mountbroker status

I get:

+-+-+---+--+--+
| NODE| NODE STATUS | MOUNT ROOT|
GROUP |  USERS   |
+-+-+---+--+--+
| urd-gds-geo-001.hgen.slu.se |  UP | /var/mountbroker-root(OK) | 
geogroup(OK) | geouser(urd-gds-volume)  |
|   urd-gds-geo-002   |  UP | /var/mountbroker-root(OK) | 
geogroup(OK) | geouser(urd-gds-volume)  |
|  localhost  |  UP | /var/mountbroker-root(OK) | 
geogroup(OK) | geouser(urd-gds-volume)  |
+-+-+---+--+--+


and that is all nodes on slave cluster, so mountbroker seems ok.


gsyncd.log logs an error about /usr/local/sbin/gluster is missing.

That is correct cos gluster is in /sbin/gluster and /urs/sbin/gluster

Another error is that SSH between master and slave is broken,

but now when I have changed permission on /var/log/glusterfs/cli.log I can run:

ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 geouser@urd-gds-geo-001 
gluster --xml --remote-host=localhost volume info urd-gds-volume

as geouser and that works, which means that the ssh connection works.


Is the permissions on /var/log/glusterfs/cli.log changed when geo-replication 
is setup?

Is gluster supposed to be in /usr/local/sbin/gluster?


Do I have any options or should I remove current geo-replication and create a 
new?

How much do I need to clean up before creating a new geo-replication?

In that case can I pause geo-replication, mount slave cluster on master cluster 
and run rsync , just to speed up transfer of files?


Many thanks in advance!

Marcus Pedersén


Part from the gsyncd.log:

[2018-07-16 19:34:56.26287] E [syncdutils(worker /urd-gds/gluster):749:errlog] 
Popen: command returned errorcmd=ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
n/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-WrbZ22/bf60c68f1a195dad59573a8dbaa309f2.sock 
geouser@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume 
geouser@urd-gds-geo-001::urd-gds-volu\
me --master-node urd-gds-001 --master-node-id 
912bebfd-1a7f-44dc-b0b7-f001a20d58cd --master-brick /urd-gds/gluster 
--local-node urd-gds-geo-000 --local-node-id 
03075698-2bbf-43e4-a99a-65fe82f61794 --slave-timeo\
ut 120 --slave-log-level INFO --slave-gluster-log-level INFO 
--slave-gluster-command-dir /usr/local/sbin/ error=1
[2018-07-16 19:34:56.26583] E [syncdutils(worker /urd-gds/gluster):753:logerr] 
Popen: ssh> failure: execution of "/usr/local/sbin/gluster" failed with ENOENT 
(No such file or directory)
[2018-07-16 19:34:56.33901] I [repce(agent /urd-gds/gluster):89:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-16 19:34:56.34307] I [monitor(monitor):262:monitor] Monitor: worker 
died before establishing connectionbrick=/urd-gds/gluster
[2018-07-16 19:35:06.59412] I [monitor(monitor):158:monitor] Monitor: starting 
gsyncd workerbrick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-07-16 19:35:06.99509] I [gsyncd(worker /urd-gds/gluster):297:main] : 
Using session config file   
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:06.99561] I [gsyncd(agent /urd-gds/gluster):297:main] : 
Using session config file
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-07-16 19:35:06.100481] I [changelogagent(agent 
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-07-16 19:35:06.108834] I [resource(worker 
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between 
master and slave...
[2018-07-16 19:35:06.762320] E [syncdutils(worker 
/urd-gds/gluster):303:log_raise_exception] : connection to peer is broken
[2018-07-16 19:35:06.763103] E [syncdutils(worker /urd-gds/gluster):749:errlog] 
Popen: command returned error   cmd=ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicatio\
n/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-K9mB6Q/bf60c68f1a195dad59573a8dbaa309f2.sock 
geouser@urd-gds-geo-001 /nonexistent/gsyncd slave urd-gds-volume 
geouser@urd-gds-geo-001::urd

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-13 Thread Marcus Pedersén
Hi Kotresh,
Yes, all nodes have the same version 4.1.1 both master and slave.
All glusterd are crashing on the master side.
Will send logs tonight.

Thanks,
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 13 juli 2018 11:28 skrev Kotresh Hiremath Ravishankar :
Hi Marcus,

Is the gluster geo-rep version is same on both master and slave?

Thanks,
Kotresh HR

On Fri, Jul 13, 2018 at 1:26 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi Kotresh,

i have replaced both files 
(gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
 and 
repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
 in all nodes both master and slave.

I rebooted all servers but geo-replication status is still Stopped.

I tried to start geo-replication with response Successful but status still show 
Stopped on all nodes.

Nothing has been written to geo-replication logs since I sent the tail of the 
log.

So I do not know what info to provide?


Please, help me to find a way to solve this.


Thanks!


Regards

Marcus



Från: 
gluster-users-boun...@gluster.org<mailto:gluster-users-boun...@gluster.org> 
mailto:gluster-users-boun...@gluster.org>> 
för Marcus Pedersén mailto:marcus.peder...@slu.se>>
Skickat: den 12 juli 2018 08:51
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-users@gluster.org<mailto:gluster-users@gluster.org>
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>:
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions 
for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 
2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was 
started successfully.

Status again  Stopped

Tried to start again and get response started successfully, after that all 
glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after 
this gives the response successful but still status Stopped.


Please help me get the geo-replication up and running again.


Best regards

Marcus Pedersén


Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] 
ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] 
SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E 
[syncdutils(/urd-gds/gluster):304:log_raise_exception] : connection to 
peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: 
command returned errorcmd=ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock 
geouser@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 
5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 
gluster://localhost:urd-gds-volume   error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  
{monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> gsyncd.py: error: argument subcmd: invalid choice: 
'5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-12 Thread Marcus Pedersén
Hi Kotresh,

i have replaced both files 
(gsyncdconfig.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/gsyncdconfig.py>
 and 
repce.py<https://review.gluster.org/#/c/20207/1/geo-replication/syncdaemon/repce.py>)
 in all nodes both master and slave.

I rebooted all servers but geo-replication status is still Stopped.

I tried to start geo-replication with response Successful but status still show 
Stopped on all nodes.

Nothing has been written to geo-replication logs since I sent the tail of the 
log.

So I do not know what info to provide?


Please, help me to find a way to solve this.


Thanks!


Regards

Marcus



Från: gluster-users-boun...@gluster.org  för 
Marcus Pedersén 
Skickat: den 12 juli 2018 08:51
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-users@gluster.org
Ämne: Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus

####
Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar :
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions 
for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 
2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was 
started successfully.

Status again  Stopped

Tried to start again and get response started successfully, after that all 
glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after 
this gives the response successful but still status Stopped.


Please help me get the geo-replication up and running again.


Best regards

Marcus Pedersén


Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] 
ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] 
SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E 
[syncdutils(/urd-gds/gluster):304:log_raise_exception] : connection to 
peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: 
command returned errorcmd=ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock 
geouser@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 
5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 
gluster://localhost:urd-gds-volume   error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  
{monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> gsyncd.py: error: argument subcmd: invalid choice: 
'5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 
'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] 
: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] 
: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker 
died before establishing connection   brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting 
gsyncd worke

Re: [Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-12 Thread Marcus Pedersén
Thanks Kotresh,
I installed through the official centos channel, centos-release-gluster41.
Isn't this fix included in centos install?
I will have a look, test it tonight and come back to you!

Thanks a lot!

Regards
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 12 juli 2018 07:41 skrev Kotresh Hiremath Ravishankar :
Hi Marcus,

I think the fix [1] is needed in 4.1
Could you please this out and let us know if that works for you?

[1] https://review.gluster.org/#/c/20207/

Thanks,
Kotresh HR

On Thu, Jul 12, 2018 at 1:49 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions 
for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 
2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was 
started successfully.

Status again  Stopped

Tried to start again and get response started successfully, after that all 
glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after 
this gives the response successful but still status Stopped.


Please help me get the geo-replication up and running again.


Best regards

Marcus Pedersén


Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] 
ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] 
SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E 
[syncdutils(/urd-gds/gluster):304:log_raise_exception] : connection to 
peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: 
command returned errorcmd=ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock 
geouser@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 
5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 
gluster://localhost:urd-gds-volume   error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  
{monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> gsyncd.py: error: argument subcmd: invalid choice: 
'5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 
'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] 
: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] 
: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker 
died before establishing connection   brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  
slave_node=ssh://geouser@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] 
SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] 
ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E 
[syncdutils(/urd-gds/gluster):304:log_raise_exception] : connection to 
peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: 
command returned errorcmd=ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock 
geouser@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 
5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --l

[Gluster-users] Upgrade to 4.1.1 geo-replication does not work

2018-07-11 Thread Marcus Pedersén
Hi all,

I have upgraded from 3.12.9 to 4.1.1 and been following upgrade instructions 
for offline upgrade.

I upgraded geo-replication side first 1 x (2+1) and the master side after that 
2 x (2+1).

Both clusters works the way they should on their own.

After upgrade on master side status for all geo-replication nodes is Stopped.

I tried to start the geo-replication from master node and response back was 
started successfully.

Status again  Stopped

Tried to start again and get response started successfully, after that all 
glusterd crashed on all master nodes.

After a restart of all glusterd the master cluster was up again.

Status for geo-replication is still Stopped and every try to start it after 
this gives the response successful but still status Stopped.


Please help me get the geo-replication up and running again.


Best regards

Marcus Pedersén


Part of geo-replication log from master node:

[2018-07-11 18:42:48.941760] I [changelogagent(/urd-gds/gluster):73:__init__] 
ChangelogAgent: Agent listining...
[2018-07-11 18:42:48.947567] I [resource(/urd-gds/gluster):1780:connect_remote] 
SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:49.363514] E 
[syncdutils(/urd-gds/gluster):304:log_raise_exception] : connection to 
peer is broken
[2018-07-11 18:42:49.364279] E [resource(/urd-gds/gluster):210:errlog] Popen: 
command returned errorcmd=ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-hjRhBo/7e5534547f3675a710a107722317484f.sock 
geouser@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 
5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 
gluster://localhost:urd-gds-volume   error=2
[2018-07-11 18:42:49.364586] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:49.364799] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>
[2018-07-11 18:42:49.364989] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  
{monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:49.365210] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  ...
[2018-07-11 18:42:49.365408] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> gsyncd.py: error: argument subcmd: invalid choice: 
'5e94eb7d-219f-4741-a179-d4ae6b50c7ee' (choose from 'monitor-status', 'monit\
or', 'worker', 'agent', 'slave', 'status', 'config-check', 'config-get', 
'config-set', 'config-reset', 'voluuidget', 'delete')
[2018-07-11 18:42:49.365919] I [syncdutils(/urd-gds/gluster):271:finalize] 
: exiting.
[2018-07-11 18:42:49.369316] I [repce(/urd-gds/gluster):92:service_loop] 
RepceServer: terminating on reaching EOF.
[2018-07-11 18:42:49.369921] I [syncdutils(/urd-gds/gluster):271:finalize] 
: exiting.
[2018-07-11 18:42:49.369694] I [monitor(monitor):353:monitor] Monitor: worker 
died before establishing connection   brick=/urd-gds/gluster
[2018-07-11 18:42:59.492762] I [monitor(monitor):280:monitor] Monitor: starting 
gsyncd worker   brick=/urd-gds/gluster  
slave_node=ssh://geouser@urd-gds-geo-000:gluster://localhost:urd-gds-volume
[2018-07-11 18:42:59.558491] I [resource(/urd-gds/gluster):1780:connect_remote] 
SSH: Initializing SSH connection between master and slave...
[2018-07-11 18:42:59.559056] I [changelogagent(/urd-gds/gluster):73:__init__] 
ChangelogAgent: Agent listining...
[2018-07-11 18:42:59.945693] E 
[syncdutils(/urd-gds/gluster):304:log_raise_exception] : connection to 
peer is broken
[2018-07-11 18:42:59.946439] E [resource(/urd-gds/gluster):210:errlog] Popen: 
command returned errorcmd=ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret\
.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-992bk7/7e5534547f3675a710a107722317484f.sock 
geouser@urd-gds-geo-000 /nonexistent/gsyncd --session-owner 
5e94eb7d-219f-4741-a179-d4ae6b50c7ee --local-id .%\
2Furd-gds%2Fgluster --local-node urd-gds-001 -N --listen --timeout 120 
gluster://localhost:urd-gds-volume   error=2
[2018-07-11 18:42:59.946748] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh> usage: gsyncd.py [-h]
[2018-07-11 18:42:59.946962] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>
[2018-07-11 18:42:59.947150] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  
{monitor-status,monitor,worker,agent,slave,status,config-check,config-get,config-set,config-reset,voluuidget,d\
elete}
[2018-07-11 18:42:59.947369] E [resource(/urd-gds/gluster):214:logerr] Popen: 
ssh>  ...
[2018-07-11 18:42:59.947552] E [resource(/urd-gds/gluster):

[Gluster-users] Geo replication manual rsync

2018-07-11 Thread Marcus Pedersén
Hi all,
I have setup a gluster system with geo replication (Centos 7, gluster 3.12).
I have moved about 30 TB to the cluster.
It seems that it goes realy show for the data to be syncronized to geo 
replication.
It has been active for weeks and still just 9TB has ended up on the slave side.
I pause the replication once a day and make a snapshot with a script.
Does this slow things up?
Is it possible to pause replication and do a manual rsync, or does this disturb 
the geo sync when it is resumed?

Thanks!

Best regards
Marcus



Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Geo-replication faulty

2018-04-23 Thread Marcus Pedersén
Hi all,

I setup my gluster cluster with geo-replication a couple of weeks ago
and everything worked fine!
Today I descovered that one of the master nodes geo-replication
status is faulty.

On master side: Distributed-replicatied 2 x (2 + 1) = 6
On slave side: Replicated 1 x (2 + 1) = 3

After checking logs I see that the master node has the following error:
OSError: Permission denied

Looking at the slave I have the following error:
remote operation failed. Path: 
/anvil [Permission denied]

I restarted glusterd on all slavehosts.
After this I got new errors.

Master node:
RepceClient: call failed on peer  call=26487:140016890697536:1524473494.25  
  method=entry_opserror=OSError
glusterfs session went downerror=ENOTCONN

Client node:
Found anomalies in (null) (gfid = 982d5d7d-2a53-4b21-8ad7-d658810d554c). 
Holes=1 overlaps=0
0-glusterfs-fuse: 1496: LOOKUP() /.gfid/982d5d7d-2a53-4b21-8ad7-d658810d554c => 
-1 (Transport endpoint is not connected)

gluster-mountbroker status is all OK!
I do not use root user.

How do I solve this issue and get this active again?

Many thanks in advance!

Best regards
Marcus Pedersén

-- 
**
* Marcus Pedersén* 
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster cluster on two networks

2018-04-13 Thread Marcus Pedersén
Hi all,
I seem to have find a solution that at least works for me.

When I looked at the parameters for reverse path filter:
sysctl net.ipv4.conf.all.rp_filter
...and the rest of the rp_filter parameters,
I realized that on a number of machines for one or both
interfaces the value was set to two:
net.ipv4.conf.eno1.rp_filter = 2

I changed this on all nodes to one:
net.ipv4.conf.eno1.rp_filter = 1
net.ipv4.conf.eno2.rp_filter = 1

Restarted all gluster daemons and after that everything just works fine.
There is no disturbens between the two networks.

Regards
Marcus

On Tue, Apr 10, 2018 at 03:53:55PM +0200, Marcus Pedersén wrote:
> Yes,
> In first server (urd-gds-001):
> gluster peer probe urd-gds-000
> gluster peer probe urd-gds-002
> gluster peer probe urd-gds-003
> gluster peer probe urd-gds-004
> 
> gluster pool list (from urd-gds-001):
> UUID  HostnameState
> bdbe4622-25f9-4ef1-aad1-639ca52fc7e0  urd-gds-002 Connected 
> 2a48a3b9-efa0-4fb7-837f-c800f04bf99f  urd-gds-003 Connected 
> ad893466-ad09-47f4-8bb4-4cea84085e5b  urd-gds-004 Connected 
> bfe05382-7e22-4b93-8816-b239b733b610  urd-gds-000 Connected 
> 912bebfd-1a7f-44dc-b0b7-f001a20d58cd  localhost   Connected
> 
> Client mount command (same on both sides):
> mount -t glusterfs urd-gds-001:/urd-gds-volume /mnt
> 
> Regards
> Marcus
> 
> On Tue, Apr 10, 2018 at 06:24:05PM +0530, Milind Changire wrote:
> > Marcus,
> > Can you share server-side  gluster peer probe and client-side mount
> > command-lines.
> > 
> > 
> > 
> > On Tue, Apr 10, 2018 at 12:36 AM, Marcus Pedersén 
> > wrote:
> > 
> > > Hi all!
> > >
> > > I have setup a replicated/distributed gluster cluster 2 x (2 + 1).
> > >
> > > Centos 7 and gluster version 3.12.6 on server.
> > >
> > > All machines have two network interfaces and connected to two different
> > > networks,
> > >
> > > 10.10.0.0/16 (with hostnames in /etc/hosts, gluster version 3.12.6)
> > >
> > > 192.168.67.0/24 (with ldap, gluster version 3.13.1)
> > >
> > > Gluster cluster was created on the 10.10.0.0/16 net, gluster peer
> > > probe ...and so on.
> > >
> > > All nodes are available on both networks and have the same names on both
> > > networks.
> > >
> > >
> > > Now to my problem, the gluster cluster is mounted on multiple clients on
> > > the 192.168.67.0/24 net
> > >
> > > and a process was running on one of the clients, reading and writing to
> > > files.
> > >
> > > At the same time I mounted the cluster on a client on the 10.10.0.0/16
> > > net and started to create
> > >
> > > and edit files on the cluster. Around the same time the process on the
> > > 192-net stopped without any
> > >
> > > specific errors. Started other processes on the 192-net and continued to
> > > make changes on the 10-net
> > >
> > > and got the same behavior with stopping processes on the 192-net.
> > >
> > >
> > > Is there any known problems with this type of setup?
> > >
> > > How do I proceed to figure out a solution as I need access from both
> > > networks?
> > >
> > >
> > > Following error shows a couple of times on server (systemd -> glusterd):
> > >
> > > [2018-04-09 11:46:46.254071] C [mem-pool.c:613:mem_pools_init_early]
> > > 0-mem-pool: incorrect order of mem-pool initialization (init_done=3)
> > >
> > >
> > > Client logs:
> > >
> > > Client on 192-net:
> > >
> > > [2018-04-09 11:35:31.402979] I [MSGID: 114046] 
> > > [client-handshake.c:1231:client_setvolume_cbk]
> > > 5-urd-gds-volume-client-1: Connected to urd-gds-volume-client-1, attached
> > > to remote volume '/urd-gds/gluster'.
> > > [2018-04-09 11:35:31.403019] I [MSGID: 114047] 
> > > [client-handshake.c:1242:client_setvolume_cbk]
> > > 5-urd-gds-volume-client-1: Server and Client lk-version numbers are not
> > > same, reopening the fds
> > > [2018-04-09 11:35:31.403051] I [MSGID: 114046] 
> > > [client-handshake.c:1231:client_setvolume_cbk]
> > > 5-urd-gds-volume-snapd-client: Connected to urd-gds-volume-snapd-client,
> > > attached to remote volume 'snapd-urd-gds-vo\
> > > lume'.
> > > [2018-04-09 11:35:31.403091] I [MSGID: 114047] 
> > > [client-handshake.c:1242:client_setvolume_cbk]
> > > 5-urd-gds-volume-snapd-client: Server and Clien

Re: [Gluster-users] Gluster cluster on two networks

2018-04-10 Thread Marcus Pedersén
Yes,
In first server (urd-gds-001):
gluster peer probe urd-gds-000
gluster peer probe urd-gds-002
gluster peer probe urd-gds-003
gluster peer probe urd-gds-004

gluster pool list (from urd-gds-001):
UUIDHostnameState
bdbe4622-25f9-4ef1-aad1-639ca52fc7e0urd-gds-002 Connected 
2a48a3b9-efa0-4fb7-837f-c800f04bf99furd-gds-003 Connected 
ad893466-ad09-47f4-8bb4-4cea84085e5burd-gds-004 Connected 
bfe05382-7e22-4b93-8816-b239b733b610urd-gds-000 Connected 
912bebfd-1a7f-44dc-b0b7-f001a20d58cdlocalhost   Connected

Client mount command (same on both sides):
mount -t glusterfs urd-gds-001:/urd-gds-volume /mnt

Regards
Marcus

On Tue, Apr 10, 2018 at 06:24:05PM +0530, Milind Changire wrote:
> Marcus,
> Can you share server-side  gluster peer probe and client-side mount
> command-lines.
> 
> 
> 
> On Tue, Apr 10, 2018 at 12:36 AM, Marcus Pedersén 
> wrote:
> 
> > Hi all!
> >
> > I have setup a replicated/distributed gluster cluster 2 x (2 + 1).
> >
> > Centos 7 and gluster version 3.12.6 on server.
> >
> > All machines have two network interfaces and connected to two different
> > networks,
> >
> > 10.10.0.0/16 (with hostnames in /etc/hosts, gluster version 3.12.6)
> >
> > 192.168.67.0/24 (with ldap, gluster version 3.13.1)
> >
> > Gluster cluster was created on the 10.10.0.0/16 net, gluster peer
> > probe ...and so on.
> >
> > All nodes are available on both networks and have the same names on both
> > networks.
> >
> >
> > Now to my problem, the gluster cluster is mounted on multiple clients on
> > the 192.168.67.0/24 net
> >
> > and a process was running on one of the clients, reading and writing to
> > files.
> >
> > At the same time I mounted the cluster on a client on the 10.10.0.0/16
> > net and started to create
> >
> > and edit files on the cluster. Around the same time the process on the
> > 192-net stopped without any
> >
> > specific errors. Started other processes on the 192-net and continued to
> > make changes on the 10-net
> >
> > and got the same behavior with stopping processes on the 192-net.
> >
> >
> > Is there any known problems with this type of setup?
> >
> > How do I proceed to figure out a solution as I need access from both
> > networks?
> >
> >
> > Following error shows a couple of times on server (systemd -> glusterd):
> >
> > [2018-04-09 11:46:46.254071] C [mem-pool.c:613:mem_pools_init_early]
> > 0-mem-pool: incorrect order of mem-pool initialization (init_done=3)
> >
> >
> > Client logs:
> >
> > Client on 192-net:
> >
> > [2018-04-09 11:35:31.402979] I [MSGID: 114046] 
> > [client-handshake.c:1231:client_setvolume_cbk]
> > 5-urd-gds-volume-client-1: Connected to urd-gds-volume-client-1, attached
> > to remote volume '/urd-gds/gluster'.
> > [2018-04-09 11:35:31.403019] I [MSGID: 114047] 
> > [client-handshake.c:1242:client_setvolume_cbk]
> > 5-urd-gds-volume-client-1: Server and Client lk-version numbers are not
> > same, reopening the fds
> > [2018-04-09 11:35:31.403051] I [MSGID: 114046] 
> > [client-handshake.c:1231:client_setvolume_cbk]
> > 5-urd-gds-volume-snapd-client: Connected to urd-gds-volume-snapd-client,
> > attached to remote volume 'snapd-urd-gds-vo\
> > lume'.
> > [2018-04-09 11:35:31.403091] I [MSGID: 114047] 
> > [client-handshake.c:1242:client_setvolume_cbk]
> > 5-urd-gds-volume-snapd-client: Server and Client lk-version numbers are not
> > same, reopening the fds
> > [2018-04-09 11:35:31.403271] I [MSGID: 114035] 
> > [client-handshake.c:202:client_set_lk_version_cbk]
> > 5-urd-gds-volume-client-3: Server lk version = 1
> > [2018-04-09 11:35:31.403325] I [MSGID: 114035] 
> > [client-handshake.c:202:client_set_lk_version_cbk]
> > 5-urd-gds-volume-client-4: Server lk version = 1
> > [2018-04-09 11:35:31.403349] I [MSGID: 114035] 
> > [client-handshake.c:202:client_set_lk_version_cbk]
> > 5-urd-gds-volume-client-0: Server lk version = 1
> > [2018-04-09 11:35:31.403367] I [MSGID: 114035] 
> > [client-handshake.c:202:client_set_lk_version_cbk]
> > 5-urd-gds-volume-client-2: Server lk version = 1
> > [2018-04-09 11:35:31.403616] I [MSGID: 114035] 
> > [client-handshake.c:202:client_set_lk_version_cbk]
> > 5-urd-gds-volume-client-1: Server lk version = 1
> > [2018-04-09 11:35:31.403751] I [MSGID: 114057] [client-handshake.c:1484:
> > select_server_supported_programs] 5-urd-gds-volume-client-5: Using
> > Program G

[Gluster-users] Gluster cluster on two networks

2018-04-10 Thread Marcus Pedersén
t process will keep trying to connect to glusterd 
unti
l brick's port is available
[2018-04-09 11:35:31.113554] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 2-urd-gds-volume-snapd-client: disconnected 
from urd-gds-volume-snapd-client. Client process will keep trying to connect to 
glust
erd until brick's port is available
[2018-04-09 11:35:31.113567] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 2-urd-gds-volume-replicate-1: 
All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-09 12:05:35.111892] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: 
switched to graph 5
[2018-04-09 12:05:35.116187] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-0: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116214] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-1: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116223] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-0: disconnected from 
urd-gds-volume-client-0. Client process will keep trying to connect to glusterd 
unti
l brick's port is available
[2018-04-09 12:05:35.116227] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-2: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116252] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-1: disconnected from 
urd-gds-volume-client-1. Client process will keep trying to connect to glusterd 
unti
l brick's port is available
[2018-04-09 12:05:35.116257] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-3: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116258] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-2: disconnected from 
urd-gds-volume-client-2. Client process will keep trying to connect to glusterd 
unti
l brick's port is available
[2018-04-09 12:05:35.116273] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-4: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116273] W [MSGID: 108001] [afr-common.c:5233:afr_notify] 
0-urd-gds-volume-replicate-0: Client-quorum is not met
[2018-04-09 12:05:35.116288] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-5: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116393] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-urd-gds-volume-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-09 12:05:35.116397] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-3: disconnected from 
urd-gds-volume-client-3. Client process will keep trying to connect to glusterd 
unti
l brick's port is available
[2018-04-09 12:05:35.116574] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-4: disconnected from 
urd-gds-volume-client-4. Client process will keep trying to connect to glusterd 
unti
l brick's port is available
[2018-04-09 12:05:35.116575] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-5: disconnected from 
urd-gds-volume-client-5. Client process will keep trying to connect to glusterd 
unti
l brick's port is available
[2018-04-09 12:05:35.116592] W [MSGID: 108001] [afr-common.c:5233:afr_notify] 
0-urd-gds-volume-replicate-1: Client-quorum is not met
[2018-04-09 12:05:35.116646] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-urd-gds-volume-replicate-1: 
All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-09 12:13:18.767382] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 
5-urd-gds-volume-dht: renaming /interbull/backup/scripts/backup/gsnapshotctl.sh 
(hash=urd-gds-volume-replicate-0/cache=urd-gds-volum
e-replicate-0) => /interbull/backup/scripts/backup/gsnapshotctl.sh~ 
(hash=urd-gds-volume-replicate-1/cache=)
[2018-04-09 13:34:54.031860] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 
5-urd-gds-volume-dht: renaming 
/interbull/backup/scripts/backup/bkp_gluster_to_ribston.sh 
(hash=urd-gds-volume-replicate-0/cache=urd
-gds-volume-replicate-0) => 
/interbull/backup/scripts/backup/bkp_gluster_to_ribston.sh~ 
(hash=urd-gds-volume-replicate-1/cache=urd-gds-volume-replicate-0)


Many thanks in advance!

Best regards
Marcus

-- 
**
* Marcus Pedersén* 
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden

[Gluster-users] Gluster cluster on two networks

2018-04-09 Thread Marcus Pedersén
lume-client-5. Client process will keep trying to connect to glusterd 
unti\
l brick's port is available
[2018-04-09 11:35:31.113554] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 2-urd-gds-volume-snapd-client: disconnected 
from urd-gds-volume-snapd-client. Client process will keep trying to connect to 
glust\
erd until brick's port is available
[2018-04-09 11:35:31.113567] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 2-urd-gds-volume-replicate-1: 
All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-09 12:05:35.111892] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: 
switched to graph 5
[2018-04-09 12:05:35.116187] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-0: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116214] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-1: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116223] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-0: disconnected from 
urd-gds-volume-client-0. Client process will keep trying to connect to glusterd 
unti\
l brick's port is available
[2018-04-09 12:05:35.116227] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-2: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116252] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-1: disconnected from 
urd-gds-volume-client-1. Client process will keep trying to connect to glusterd 
unti\
l brick's port is available
[2018-04-09 12:05:35.116257] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-3: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116258] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-2: disconnected from 
urd-gds-volume-client-2. Client process will keep trying to connect to glusterd 
unti\
l brick's port is available
[2018-04-09 12:05:35.116273] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-4: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116273] W [MSGID: 108001] [afr-common.c:5233:afr_notify] 
0-urd-gds-volume-replicate-0: Client-quorum is not met
[2018-04-09 12:05:35.116288] I [MSGID: 114021] [client.c:2369:notify] 
0-urd-gds-volume-client-5: current graph is no longer active, destroying 
rpc_client
[2018-04-09 12:05:35.116393] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-urd-gds-volume-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-09 12:05:35.116397] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-3: disconnected from 
urd-gds-volume-client-3. Client process will keep trying to connect to glusterd 
unti\
l brick's port is available
[2018-04-09 12:05:35.116574] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-4: disconnected from 
urd-gds-volume-client-4. Client process will keep trying to connect to glusterd 
unti\
l brick's port is available
[2018-04-09 12:05:35.116575] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-urd-gds-volume-client-5: disconnected from 
urd-gds-volume-client-5. Client process will keep trying to connect to glusterd 
unti\
l brick's port is available
[2018-04-09 12:05:35.116592] W [MSGID: 108001] [afr-common.c:5233:afr_notify] 
0-urd-gds-volume-replicate-1: Client-quorum is not met
[2018-04-09 12:05:35.116646] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-urd-gds-volume-replicate-1: 
All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-09 12:13:18.767382] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 
5-urd-gds-volume-dht: renaming /interbull/backup/scripts/backup/gsnapshotctl.sh 
(hash=urd-gds-volume-replicate-0/cache=urd-gds-volum\
e-replicate-0) => /interbull/backup/scripts/backup/gsnapshotctl.sh~ 
(hash=urd-gds-volume-replicate-1/cache=)
[2018-04-09 13:34:54.031860] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 
5-urd-gds-volume-dht: renaming 
/interbull/backup/scripts/backup/bkp_gluster_to_ribston.sh 
(hash=urd-gds-volume-replicate-0/cache=urd\
-gds-volume-replicate-0) => 
/interbull/backup/scripts/backup/bkp_gluster_to_ribston.sh~ 
(hash=urd-gds-volume-replicate-1/cache=urd-gds-volume-replicate-0)




Many thanks in advance!!


Best regards

Marcus


--
**
* Marcus Pedersén*
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics - SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden

[Gluster-users] Dispersed cluster tune, optimize

2018-04-03 Thread Marcus Pedersén
Hi all,
I have setup a dispersed cluster (2+1), verision 3.12.
I guessed that we going to get punished by small read/writes...
and I was right.
A calculation that usually takes 48 hours
took about 60 hours and there are many small read/writes
to intermediate files that at the end get summed up.

Is there a way to tune, optimize a dispersed cluster to work
better with small read/writes?


Many thanks in advance!

Best resgards
Marcus Pedersén

-- 
**
* Marcus Pedersén* 
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Tune and optimize dispersed cluster

2018-04-03 Thread Marcus Pedersén
Hi all,
I have setup a dispersed cluster (2+1), version 3.12.
The way our users run I guessed that we would get the penalties
with dispersed cluster and I was right
A calculation that usually takes about 48 hours (on a replicaited cluster),
now took about 60 hours.
There is alot of "small" reads/writes going on in these programs.

Is there a way to tune, optimize a dispersed cluster to make it
run better with small read/writes?

Many thansk in advance!

Best regards
Marcus Pedersén


-- 
******
* Marcus Pedersén* 
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-replication

2018-03-02 Thread Marcus Pedersén
Hi Kotresh,
I am expecting my hardware to show up next week.
My plan is to run gluster version 3.12 on centos 7.
Has the issue been fixed in version 3.12?

Thanks a lot for your help!

/Marcus


On Fri, Mar 02, 2018 at 05:12:13PM +0530, Kotresh Hiremath Ravishankar wrote:
> Hi Marcus,
> 
> There are no issues with geo-rep and disperse volumes. It works with
> disperse volume
> being master or slave or both. You can run replicated distributed at master
> and diperse distributed
> at slave or disperse distributed at both master and slave. There was an
> issue with lookup on / taking
> longer time because of eager locks in disperse and that's been fixed. Which
> version are you running?
> 
> Thanks,
> Kotresh HR
> 
> On Fri, Mar 2, 2018 at 3:05 PM, Marcus Pedersén 
> wrote:
> 
> > Hi again,
> > I have been testing and reading up on other solutions
> > and just wanted to check if my ideas are ok.
> > I have been looking at dispersed volumes and wonder if there are any
> > problems running replicated-distributed cluster on the master node and
> > a dispersed-distributed cluster on the slave side of a geo-replication.
> > Second thought, running disperesed on both sides, is that a problem
> > (Master: dispersed-distributed, slave: dispersed-distributed)?
> >
> > Many thanks in advance!
> >
> > Best regards
> > Marcus
> >
> >
> > On Thu, Feb 08, 2018 at 02:57:48PM +0530, Kotresh Hiremath Ravishankar
> > wrote:
> > > Answers inline
> > >
> > > On Thu, Feb 8, 2018 at 1:26 PM, Marcus Pedersén 
> > > wrote:
> > >
> > > > Thank you, Kotresh
> > > >
> > > > I talked to your storage colleagues at Open Source Summit in Prag last
> > > > year.
> > > > I described my layout idea for them and they said it was a good
> > solution.
> > > > Sorry if I mail you in private, but I see this as your internal
> > matters.
> > > >
> > > > The reason that I seem stressed is that I have already placed my order
> > > > on new file servers for this so I need to change that as soon as
> > possible.
> > > >
> > > > So, a last double check with you:
> > > > If I build the master cluster as I thought from the beginning,
> > > > distributed/replicated (replica 3 arbiter 1) and in total 4 file
> > servers
> > > > and one arbiter (same arbiter used for both "pairs"),
> > > > and build the slave cluster the same, distributed/replicated (replica 3
> > > > arbiter 1)
> > > > and in total 4 file servers and one arbiter (same arbiter used for both
> > > > "pairs").
> > > > Do I get a good technical solution?
> > > >
> > >
> > >  Yes, that works fine.
> > >
> > > >
> > > > I liked your description on how the sync works, that made me understand
> > > > much
> > > > better how the system works!
> > > >
> > >
> > > Thank you very much for all your help!
> > > >
> > >
> > >  No problem. We are happy to help you.
> > >
> > > >
> > > > Best regards
> > > > Marcus
> > > >
> > > >
> > > > On Wed, Feb 07, 2018 at 09:40:32PM +0530, Kotresh Hiremath Ravishankar
> > > > wrote:
> > > > > Answers inline
> > > > >
> > > > > On Wed, Feb 7, 2018 at 8:44 PM, Marcus Pedersén <
> > marcus.peder...@slu.se>
> > > > > wrote:
> > > > >
> > > > > > Thank you for your help!
> > > > > > Just to make things clear to me (and get a better understanding of
> > > > > > gluster):
> > > > > > So, if I make the slave cluster just distributed and node 1 goes
> > down,
> > > > > > data (say file.txt) that belongs to node 1 will not be synced.
> > > > > > When node 1 comes back up does the master not realize that file.txt
> > > > has not
> > > > > > been synced and makes sure that it is synced when it has contact
> > with
> > > > node
> > > > > > 1 again?
> > > > > > So file.txt will not exist on node 1 at all?
> > > > > >
> > > > >
> > > > > Geo-replication syncs changes based on changelog journal which
> > records
> > > > all
> > > > > the file operations.
> > > > > It syncs every file in two steps
> 

Re: [Gluster-users] geo-replication

2018-03-02 Thread Marcus Pedersén
Hi again,
I have been testing and reading up on other solutions
and just wanted to check if my ideas are ok.
I have been looking at dispersed volumes and wonder if there are any
problems running replicated-distributed cluster on the master node and
a dispersed-distributed cluster on the slave side of a geo-replication.
Second thought, running disperesed on both sides, is that a problem
(Master: dispersed-distributed, slave: dispersed-distributed)?

Many thanks in advance!

Best regards
Marcus


On Thu, Feb 08, 2018 at 02:57:48PM +0530, Kotresh Hiremath Ravishankar wrote:
> Answers inline
> 
> On Thu, Feb 8, 2018 at 1:26 PM, Marcus Pedersén 
> wrote:
> 
> > Thank you, Kotresh
> >
> > I talked to your storage colleagues at Open Source Summit in Prag last
> > year.
> > I described my layout idea for them and they said it was a good solution.
> > Sorry if I mail you in private, but I see this as your internal matters.
> >
> > The reason that I seem stressed is that I have already placed my order
> > on new file servers for this so I need to change that as soon as possible.
> >
> > So, a last double check with you:
> > If I build the master cluster as I thought from the beginning,
> > distributed/replicated (replica 3 arbiter 1) and in total 4 file servers
> > and one arbiter (same arbiter used for both "pairs"),
> > and build the slave cluster the same, distributed/replicated (replica 3
> > arbiter 1)
> > and in total 4 file servers and one arbiter (same arbiter used for both
> > "pairs").
> > Do I get a good technical solution?
> >
> 
>  Yes, that works fine.
> 
> >
> > I liked your description on how the sync works, that made me understand
> > much
> > better how the system works!
> >
> 
> Thank you very much for all your help!
> >
> 
>  No problem. We are happy to help you.
> 
> >
> > Best regards
> > Marcus
> >
> >
> > On Wed, Feb 07, 2018 at 09:40:32PM +0530, Kotresh Hiremath Ravishankar
> > wrote:
> > > Answers inline
> > >
> > > On Wed, Feb 7, 2018 at 8:44 PM, Marcus Pedersén 
> > > wrote:
> > >
> > > > Thank you for your help!
> > > > Just to make things clear to me (and get a better understanding of
> > > > gluster):
> > > > So, if I make the slave cluster just distributed and node 1 goes down,
> > > > data (say file.txt) that belongs to node 1 will not be synced.
> > > > When node 1 comes back up does the master not realize that file.txt
> > has not
> > > > been synced and makes sure that it is synced when it has contact with
> > node
> > > > 1 again?
> > > > So file.txt will not exist on node 1 at all?
> > > >
> > >
> > > Geo-replication syncs changes based on changelog journal which records
> > all
> > > the file operations.
> > > It syncs every file in two steps
> > > 1. File creation with same attributes as on master via rpc (CREATE is
> > > recorded in changelog)
> > > 2. Data sync via rsync (DATA is recorded in changelog. Any further
> > appends
> > > will only record DATA)
> > >
> > > The changelog processing will not halt on encountering ENOENT(It thinks
> > > it's a safe error). It's not
> > > straight forward. When I said, file won't be synced, it means the file is
> > > created on node1 and when
> > > you append the data, the data would not sync as it gets ENOENT since the
> > > node1 is down. But if the
> > > 'CREATE' of file is not synced to node1, then it is persistent failure
> > > (ENOTCON) and waits till node1 comes back.
> > >
> > > >
> > > > I did a small test on my testing machines.
> > > > Turned one of the geo machines off and created 1 files containing
> > one
> > > > short string in the master nodes.
> > > > Nothing became synced with the geo slaves.
> > > > When I turned on the geo machine again all 1 files were synced to
> > the
> > > > geo slaves.
> > > > Ofcause devided between the two machines.
> > > > Is this the right/expected behavior of geo-replication with a
> > distributed
> > > > cluster?
> > > >
> > >
> > > Yes, it's correct. As I said earlier, CREATE itself would have failed
> > with
> > > ENOTCON. geo-rep waited till slave comes back.
> > > Bring slave node down, and now append data to files which falls under
> > node
> 

Re: [Gluster-users] Geo replication snapshot error

2018-02-21 Thread Marcus Pedersén
Bug reported: Bug 1547446

Thanks!
Marcus

On Wed, Feb 21, 2018 at 02:44:00PM +0530, Kotresh Hiremath Ravishankar wrote:
> Hi,
> 
> Thanks for reporting the issue. This seems to be a bug.
> Could you please raise a bug at https://bugzilla.redhat.com/ under
> community/glusterfs ?
> We will take a look at it and fix it.
> 
> Thanks,
> Kotresh HR
> 
> On Wed, Feb 21, 2018 at 2:01 PM, Marcus Pedersén 
> wrote:
> 
> > Hi all,
> > I use gluster 3.12 on centos 7.
> > I am writing a snapshot program for my geo-replicated cluster.
> > Now when I started to run tests with my application I have found
> > a very strange behavior regarding geo-replication in gluster.
> >
> > I have setup my geo-replication according to the docs:
> > http://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
> >
> > Both master and slave clusters are replicated with just two
> > machines (VM) and no arbiter.
> >
> > I have setup a geo-user (called geouser) and do not use
> > root as the geo user, as specified in the docs.
> >
> > Both my master and slave volumes are named: vol
> >
> > If I pause the geo-replication with:
> > gluster volume geo-replication vol geouser@ggluster1-geo::vol pause
> > Pausing geo-replication session between vol & geouser@ggluster1-geo::vol
> > has been successful
> >
> > Create a snapshot:
> > gluster snapshot create my_snap_no_1000 vol
> > snapshot create: success: Snap my_snap_no_1000-2018.02.21-07.45.32
> > created successfully
> >
> > Resume geo-replication:
> > gluster volume geo-replication vol geouser@ggluster1-geo::vol resume
> > Resuming geo-replication session between vol & geouser@ggluster1-geo::vol
> > has been successful
> >
> >
> > Everything works fine!
> >
> > But here comes the problem:
> > If I by accident spell my slave user wrong or don't use
> > the user at all, as I was using root,
> > no matter what user I write pause/resume do NOT report
> > any errors. The answer is always pausing/resuming successful.
> > The problem comes after a successful pause when I try to
> > create a snapshot. It fails with:
> > snapshot create: failed: geo-replication session is running for the volume
> > vol. Session needs to be stopped before taking a snapshot.
> >
> > gluster volume geo-replication status
> > MASTER NODEMASTER VOLMASTER BRICKSLAVE USERSLAVE
> >  SLAVE NODESTATUSCRAWL STATUSLAST_SYNCED
> > 
> > 
> > -
> > ggluster1  vol   /glustergeouser
> >  ssh://geouser@ggluster1-geo::volN/A   PausedN/A
> >N/A
> > ggluster2  vol   /glustergeouser
> >  ssh://geouser@ggluster1-geo::volN/A   PausedN/A
> >N/A
> >
> >
> > After this snapshots fails all the time!
> > If I use the correct user again and pause, no error (paused), snapshot
> > fails.
> > If I resume with correct user, no errors (active).
> > Geo-replication still works fine, but some how has something
> > gone wrong so snapshots fail.
> > After restart of glusterd in all machines it starts to work again.
> >
> >
> > Here is complete run through:
> >
> > gluster volume geo-replication status
> >
> > MASTER NODEMASTER VOLMASTER BRICKSLAVE USERSLAVE
> >  SLAVE NODE   STATUS CRAWL STATUS
> >  LAST_SYNCED
> > 
> > 
> > 
> > ggluster1  vol   /glustergeouser
> >  ssh://geouser@ggluster1-geo::volggluster1-geoActive
> >  Changelog Crawl2018-02-12 15:49:57
> > ggluster2  vol   /glustergeouser
> >  ssh://geouser@ggluster1-geo::volggluster2-geoPassiveN/A
> >   N/A
> >
> > # Using wrong user: abc
> > gluster volume geo-replication vol abc@ggluster1-geo::vol pause
> > Pausing geo-replication session between vol & abc@ggluster1-geo::vol has
> > been successful
> >
> >
> > gluster volume geo-replication status
> >
> > MASTER NODEMASTER VOLMASTER BRICKSLAVE USERSLAVE
> >  SLAVE NODESTATUSCRAWL STATUSLAST_SYNCED
> > -

[Gluster-users] Geo replication snapshot error

2018-02-21 Thread Marcus Pedersén
er@ggluster1-geo::volggluster1-geoActive Changelog Crawl 
   2018-02-12 15:49:57  
ggluster2  vol   /glustergeouser   
ssh://geouser@ggluster1-geo::volggluster2-geoPassiveN/A 
   N/A


Many thanks in advance!

Best regards
Marcus


-- 
******
* Marcus Pedersén* 
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-replication

2018-02-07 Thread Marcus Pedersén
Thank you for your help!
Just to make things clear to me (and get a better understanding of gluster):
So, if I make the slave cluster just distributed and node 1 goes down,
data (say file.txt) that belongs to node 1 will not be synced.
When node 1 comes back up does the master not realize that file.txt has not
been synced and makes sure that it is synced when it has contact with node 1 
again?
So file.txt will not exist on node 1 at all?

I did a small test on my testing machines.
Turned one of the geo machines off and created 1 files containing one
short string in the master nodes.
Nothing became synced with the geo slaves.
When I turned on the geo machine again all 1 files were synced to the geo 
slaves.
Ofcause devided between the two machines.
Is this the right/expected behavior of geo-replication with a distributed 
cluster?

Many thanks in advance!

Regards
Marcus


On Wed, Feb 07, 2018 at 06:39:20PM +0530, Kotresh Hiremath Ravishankar wrote:
> We are happy to help you out. Please find the answers inline.
> 
> On Tue, Feb 6, 2018 at 4:39 PM, Marcus Pedersén 
> wrote:
> 
> > Hi all,
> >
> > I am planning my new gluster system and tested things out in
> > a bunch of virtual machines.
> > I need a bit of help to understand how geo-replication behaves.
> >
> > I have a master gluster cluster replica 2
> > (in production I will use an arbiter and replicatied/distributed)
> > and the geo cluster is distributed with 2 machines.
> > (in production I will have the geo cluster distributed)
> >
> 
> It's recommended to use slave also to be distribute replicate/aribiter/ec.
> Choosing only distribute will cause issues
> when of the slave node is down and a file is being synced which belongs to
> that node. It would not sync
> later.
> 
> 
> > Everything is up and running and creating files from client both
> > replicates and is distributed in the geo cluster.
> >
> > The thing I am wondering about is:
> > When I run: gluster volume geo-replication status
> > I see both slave nodes one is active and the other is passive.
> >
> > MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE
> > SLAVE NODE  STATUS CRAWL STATUS
> >LAST_SYNCED
> > 
> > 
> > ---
> > gluster1   interbullfs/interbullfsgeouser
> >  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo2Active
> >  Changelog Crawl2018-02-06 11:46:08
> > gluster2   interbullfs/interbullfsgeouser
> >  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1Passive
> >   N/AN/A
> >
> >
> > If I shutdown the active slave the status changes to faulty
> > and the other one continues to be passive.
> >
> 
> > MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE
> > SLAVE NODE  STATUS CRAWL STATUS
> > LAST_SYNCED
> > 
> > 
> > 
> > gluster1   interbullfs/interbullfsgeouser
> >  ssh://geouser@gluster-geo1::interbullfs-geoN/A Faulty
> >  N/A N/A
> > gluster2   interbullfs/interbullfsgeouser
> >  ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1Passive
> >   N/A N/A
> >
> >
> > In my understanding I thought that if the active slave stopped
> > working the passive slave should become active and should
> > continue to replicate from master.
> >
> > Am I wrong? Is there just one active slave if it is setup as
> > a distributed system?
> >
> 
> The Active/Passive notion is for master node. If gluster1 master node is
> down  glusterd2 master node will become Active.
> It's not for slave node.
> 
> 
> 
> >
> > What I use:
> > Centos 7, gluster 3.12
> > I have followed the geo instructions:
> > http://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
> >
> > Many thanks in advance!
> >
> > Bets regards
> > Marcus
> >
> > --
> > **
> > * Marcus Pedersén*
> > * System administrator   *
> > **
&g

Re: [Gluster-users] geo-replication

2018-02-06 Thread Marcus Pedersén
Hi again,
I made some more tests and the behavior I get is that if any of
the slaves are down the geo-replication stops working.
It this the way distributed volumes work, if one server goes down
the entire system stops to work?
The servers that are online do not continue to work?

Sorry, for asking stupid questions.

Best regards
Marcus


On Tue, Feb 06, 2018 at 12:09:40PM +0100, Marcus Pedersén wrote:
> Hi all,
> 
> I am planning my new gluster system and tested things out in
> a bunch of virtual machines.
> I need a bit of help to understand how geo-replication behaves.
> 
> I have a master gluster cluster replica 2
> (in production I will use an arbiter and replicatied/distributed)
> and the geo cluster is distributed with 2 machines.
> (in production I will have the geo cluster distributed)
> 
> Everything is up and running and creating files from client both
> replicates and is distributed in the geo cluster.
> 
> The thing I am wondering about is:
> When I run: gluster volume geo-replication status
> I see both slave nodes one is active and the other is passive.
> 
> MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE 
>  SLAVE NODE  STATUS CRAWL STATUS   
> LAST_SYNCED  
> ---
> gluster1   interbullfs/interbullfsgeouser   
> ssh://geouser@gluster-geo1::interbullfs-geogluster-geo2Active 
> Changelog Crawl2018-02-06 11:46:08  
> gluster2   interbullfs/interbullfsgeouser   
> ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1PassiveN/A 
>N/A
> 
> 
> If I shutdown the active slave the status changes to faulty
> and the other one continues to be passive.
> 
> MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE 
>  SLAVE NODE  STATUS CRAWL STATUS
> LAST_SYNCED  
> 
> gluster1   interbullfs/interbullfsgeouser   
> ssh://geouser@gluster-geo1::interbullfs-geoN/A Faulty N/A 
> N/A  
> gluster2   interbullfs/interbullfsgeouser   
> ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1PassiveN/A 
> N/A
> 
> 
> In my understanding I thought that if the active slave stopped
> working the passive slave should become active and should
> continue to replicate from master.
> 
> Am I wrong? Is there just one active slave if it is setup as
> a distributed system?
> 
> What I use:
> Centos 7, gluster 3.12
> I have followed the geo instructions:
> http://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
> 
> Many thanks in advance!
> 
> Bets regards
> Marcus
> 
> -- 
> **
> * Marcus Pedersén* 
> * System administrator   *
> **
> * Interbull Centre   *
> *    *
> * Department of Animal Breeding & Genetics — SLU *
> * Box 7023, SE-750 07*
> * Uppsala, Sweden*
> **
> * Visiting address:  *
> * Room 55614, Ulls väg 26, Ultuna*
> * Uppsala*
> * Sweden *
> **
> * Tel: +46-(0)18-67 1962 *
> **
> **
> * ISO 9001 Bureau Veritas No SE004561-1  *
> ******
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

-- 
**
* Marcus Pedersén* 
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07

[Gluster-users] geo-replication

2018-02-06 Thread Marcus Pedersén
Hi all,

I am planning my new gluster system and tested things out in
a bunch of virtual machines.
I need a bit of help to understand how geo-replication behaves.

I have a master gluster cluster replica 2
(in production I will use an arbiter and replicatied/distributed)
and the geo cluster is distributed with 2 machines.
(in production I will have the geo cluster distributed)

Everything is up and running and creating files from client both
replicates and is distributed in the geo cluster.

The thing I am wondering about is:
When I run: gluster volume geo-replication status
I see both slave nodes one is active and the other is passive.

MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE   
   SLAVE NODE  STATUS CRAWL STATUS   
LAST_SYNCED  
---
gluster1   interbullfs/interbullfsgeouser   
ssh://geouser@gluster-geo1::interbullfs-geogluster-geo2Active 
Changelog Crawl2018-02-06 11:46:08  
gluster2   interbullfs/interbullfsgeouser   
ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1PassiveN/A   
 N/A


If I shutdown the active slave the status changes to faulty
and the other one continues to be passive.

MASTER NODEMASTER VOL MASTER BRICKSLAVE USERSLAVE   
   SLAVE NODE  STATUS CRAWL STATUS
LAST_SYNCED  

gluster1   interbullfs/interbullfsgeouser   
ssh://geouser@gluster-geo1::interbullfs-geoN/A Faulty N/A   
  N/A  
gluster2   interbullfs/interbullfsgeouser   
ssh://geouser@gluster-geo1::interbullfs-geogluster-geo1PassiveN/A   
  N/A


In my understanding I thought that if the active slave stopped
working the passive slave should become active and should
continue to replicate from master.

Am I wrong? Is there just one active slave if it is setup as
a distributed system?

What I use:
Centos 7, gluster 3.12
I have followed the geo instructions:
http://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/

Many thanks in advance!

Bets regards
Marcus

-- 
**
* Marcus Pedersén* 
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Understanding client logs

2018-01-23 Thread Marcus Pedersén
Hi,
Yes, of cause...should have included it from start.
Yes, I know an old version, but I will rebuild a new cluster later on,
that is another story.

Client side:
Archlinux
glusterfs 1:3.10.1-1

Sever side:
Replicated cluster on two physical machines.
Both running:
Centos 7 3.10.0-514.16.1.el7.x86_64
Gluster glusterfs 3.8.11 from centos-gluster38

Typical user case(the one we have problem with now; typical):
Our users handle genomic evaluations, where loads of calculations
are done, intermediate results are saved to files (MB-GB size and
up to a hundred files),
and used for next calculation step where it is read from file,
calculated, written to file aso. a couple of times.
The lenght of these processes are about 8-12 hours and up to
processes running for up til about 72-96 hours.
For this run we had 12 clients (all connected to gluster and all
file read/writes done to gluster). On each client we had assign
3 cores to be used to run the processes, and most of the time all
3 cores were beeing used on all 12 clients.

Regards
Marcus




Från: Milind Changire 
Skickat: den 23 januari 2018 15:46
Till: Marcus Pedersén
Kopia: Gluster Users
Ämne: Re: [Gluster-users] Understanding client logs

Marcus,
Please paste the name-version-release of the primary glusterfs package on your 
system.

If possible, also describe the typical workload that happens at the mount via 
the user application.



On Tue, Jan 23, 2018 at 7:43 PM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:
Hi all,
I have problem pin pointing an error, that users of
my system experience processes that crash.
The thing that have changed since the craches started
is that I added a gluster cluster.
Of cause the users start to attack my gluster cluster.

I started looking at logs, starting from the client side.
I just need help to understand how to read it in the right way.
I can see that every ten minutes the client changes port and
attach to the remote volume. About five minutes later
the client unmounts the volume.
I guess that this is the "old" mount and that the "new" mount
is already responding to user interaction?

As this repeates every ten minutes I see this as normal behavior
and just want to get a better understanding on how the client
interacts with the cluster.

Have you experienced that this switch malfunctions and the
mount becomes unreachable for a while?

Many thanks in advance!

Best regards
Marcus Pederén

An example of the output:
[2017-11-09 10:10:39.776403] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-interbull-interbull-client-1: changing port to 49160 (from 0)
[2017-11-09 10:10:39.776830] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 
0-interbull-interbull-client-0: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2017-11-09 10:10:39.777642] I [MSGID: 114046] 
[client-handshake.c:1216:client_setvolume_cbk] 0-interbull-interbull-client-0: 
Connected to interbull-interbull-client-0, attached to remote volume 
'/interbullfs/i\
nterbull'.
[2017-11-09 10:10:39.777663] I [MSGID: 114047] 
[client-handshake.c:1227:client_setvolume_cbk] 0-interbull-interbull-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2017-11-09 10:10:39.24] I [MSGID: 108005] [afr-common.c:4756:afr_notify] 
0-interbull-interbull-replicate-0: Subvolume 'interbull-interbull-client-0' 
came back up; going online.
[2017-11-09 10:10:39.777954] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 
0-interbull-interbull-client-0: Server lk version = 1
[2017-11-09 10:10:39.779909] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 
0-interbull-interbull-client-1: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2017-11-09 10:10:39.780481] I [MSGID: 114046] 
[client-handshake.c:1216:client_setvolume_cbk] 0-interbull-interbull-client-1: 
Connected to interbull-interbull-client-1, attached to remote volume 
'/interbullfs/i\
nterbull'.
[2017-11-09 10:10:39.780509] I [MSGID: 114047] 
[client-handshake.c:1227:client_setvolume_cbk] 0-interbull-interbull-client-1: 
Server and Client lk-version numbers are not same, reopening the fds
[2017-11-09 10:10:39.781544] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 
0-interbull-interbull-client-1: Server lk version = 1
[2017-11-09 10:10:39.781608] I [fuse-bridge.c:4146:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2017-11-09 10:10:39.781632] I [fuse-bridge.c:4831:fuse_graph_sync] 0-fuse: 
switched to graph 0
[2017-11-09 10:16:10.609922] I [fuse-bridge.c:5089:fuse_thread_proc] 0-fuse: 
unmounting /interbull
[2017-11-09 10:16:10.610258] W [glusterfsd.c:1329:cleanup_and_exit] 
(-->/usr/lib/libpthread.so.0(+0x72e7) [0x7f98c02282e7] 
-->/usr/bin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40890d] 
-->/usr/bin/glusterfs(cleanu\
p_and_exit+0x4b) [0x40

[Gluster-users] Understanding client logs

2018-01-23 Thread Marcus Pedersén
I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-interbull-interbull-client-0: changing port to 49177 (from 0)


-- 
**
* Marcus Pedersén* 
* System administrator   *
**
* Interbull Centre   *
*    *
* Department of Animal Breeding & Genetics — SLU *
* Box 7023, SE-750 07*
* Uppsala, Sweden*
**
* Visiting address:  *
* Room 55614, Ulls väg 26, Ultuna*
* Uppsala*
* Sweden *
**
* Tel: +46-(0)18-67 1962 *
**
**
* ISO 9001 Bureau Veritas No SE004561-1  *
**
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users