Re: [Gluster-users] Disperse volumes on armhf

2018-08-02 Thread Ashish Pandey

Yes, you should file a bug to track this issue and to share information. 
Also, I would like to have logs which are present in /var/log/messages, 
specially mount logs with name mnt.log or something. 

Following are the points I would like to bring in to your notice- 

1 - Are you sure that all the bricks are UP? 
2 - Is there any connection issues? 
3 - It is possible that there is a bug which caused crash. So please check for 
core dump created while doing mount and you saw ENOTCONN error. 
4 - I am not very much aware of armhf and have not run glusterfs on this 
hardware. So, we need to see if there is anything in code which is 
stopping us to run glusterfs on this architecture and setup. 
5 - Please provide the output of gluster v info and gluster v status for the 
volume in BZ. 

--- 
Ashish 

- Original Message -

From: "Fox"  
To: gluster-users@gluster.org 
Sent: Friday, August 3, 2018 9:51:30 AM 
Subject: [Gluster-users] Disperse volumes on armhf 

Just wondering if anyone else is running into the same behavior with disperse 
volumes described below and what I might be able to do about it. 

I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have installed 
gluster 4.1.2 via PPA. I have 12 member nodes each with a single brick. I can 
successfully create a working volume via the command: 

gluster volume create testvol1 disperse 12 redundancy 4 
gluster01:/exports/sda/brick1/testvol1 gluster02:/exports/sda/brick1/testvol1 
gluster03:/exports/sda/brick1/testvol1 gluster04:/exports/sda/brick1/testvol1 
gluster05:/exports/sda/brick1/testvol1 gluster06:/exports/sda/brick1/testvol1 
gluster07:/exports/sda/brick1/testvol1 gluster08:/exports/sda/brick1/testvol1 
gluster09:/exports/sda/brick1/testvol1 gluster10:/exports/sda/brick1/testvol1 
gluster11:/exports/sda/brick1/testvol1 gluster12:/exports/sda/brick1/testvol1 

And start the volume: 

gluster volume start testvol1 

Mounting the volume on an x86-64 system it performs as expected. 

Mounting the same volume on an armhf system (such as one of the cluster 
members) I can create directories but trying to create a file I get an error 
and the file system unmounts/crashes: 
root@gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt 
root@gluster01:~# cd /mnt 
root@gluster01:/mnt# ls 
root@gluster01:/mnt# mkdir test 
root@gluster01:/mnt# cd test 
root@gluster01:/mnt/test# cp /root/notes.txt ./ 
cp: failed to close './notes.txt': Software caused connection abort 
root@gluster01:/mnt/test# ls 
ls: cannot open directory '.': Transport endpoint is not connected 

I get many of these in the glusterfsd.log: 
The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] 
0-management: Failed to save the backtrace." repeated 100 times between 
[2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895] 


Furthermore, if a cluster member ducks out (reboots, loses connection, etc) and 
needs healing the self heal daemon logs messages similar to that above and can 
not heal - no disk activity (verified via iotop) though very high CPU usage and 
the volume heal info command indicates the volume needs healing. 


I tested all of the above in virtual environments using x86-64 VMs and could 
self heal as expected. 

Again this only happens when using disperse volumes. Should I be filing a bug 
report instead? 

___ 
Gluster-users mailing list 
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Disperse volumes on armhf

2018-08-02 Thread Fox
Just wondering if anyone else is running into the same behavior with
disperse volumes described below and what I might be able to do about it.

I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have
installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single
brick. I can successfully create a working volume via the command:

gluster volume create testvol1 disperse 12 redundancy 4
gluster01:/exports/sda/brick1/testvol1
gluster02:/exports/sda/brick1/testvol1
gluster03:/exports/sda/brick1/testvol1
gluster04:/exports/sda/brick1/testvol1
gluster05:/exports/sda/brick1/testvol1
gluster06:/exports/sda/brick1/testvol1
gluster07:/exports/sda/brick1/testvol1
gluster08:/exports/sda/brick1/testvol1
gluster09:/exports/sda/brick1/testvol1
gluster10:/exports/sda/brick1/testvol1
gluster11:/exports/sda/brick1/testvol1
gluster12:/exports/sda/brick1/testvol1

And start the volume:
gluster volume start testvol1

Mounting the volume on an x86-64 system it performs as expected.

Mounting the same volume on an armhf system (such as one of the cluster
members) I can create directories but trying to create a file I get an
error and the file system unmounts/crashes:
root@gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt
root@gluster01:~# cd /mnt
root@gluster01:/mnt# ls
root@gluster01:/mnt# mkdir test
root@gluster01:/mnt# cd test
root@gluster01:/mnt/test# cp /root/notes.txt ./
cp: failed to close './notes.txt': Software caused connection abort
root@gluster01:/mnt/test# ls
ls: cannot open directory '.': Transport endpoint is not connected

I get many of these in the glusterfsd.log:
The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save]
0-management: Failed to save the backtrace." repeated 100 times between
[2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895]


Furthermore, if a cluster member ducks out (reboots, loses connection, etc)
and needs healing the self heal daemon logs messages similar to that above
and can not heal - no disk activity (verified via iotop) though very high
CPU usage and the volume heal info command indicates the volume needs
healing.


I tested all of the above in virtual environments using x86-64 VMs and
could self heal as expected.

Again this only happens when using disperse volumes. Should I be filing a
bug report instead?
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] thin arbiter vs standard arbiter

2018-08-02 Thread Dmitry Melekhov



02.08.2018 18:40, Ashish Pandey пишет:



I think it should be rephrased a little bit -

"When one brick is up: Fail FOP with EIO."
should be
"When only one brick is up out of 3 bricks: Fail FOP with EIO."

So we have 2 data bricks and one thin arbiter brick. Out of these 3 
bricks if only one brick is UP then we will fail IO.


---
Ashish



Hello!

Thank you!

This is what we need :-)




*From: *"Dmitry Melekhov" 
*To: *gluster-users@gluster.org, atumb...@redhat.com
*Sent: *Thursday, August 2, 2018 4:59:41 PM
*Subject: *Re: [Gluster-users] thin arbiter vs standard arbiter

01.08.2018 22:04, Amar Tumballi пишет:

This recently added document talks about some of the
technicalities of the feature:


https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/

Please go through and see if it answers your questions.

-Amar


Hello!

I have question:

Manual says:


"When one brick is up: Fail FOP with EIO."

So, if we have 2 nodes with thin arbiter and only one node is up, i.e. 
second node is down for some reason, then I/O will be stopped.

Any reasons to have two nodes then?

Could you tell me is manual right here or it is misprint?

Thank you!




On Wed, Aug 1, 2018 at 11:09 PM, wkmail mailto:wkm...@bneit.com>> wrote:

I see mentions of thin arbiter in the 4.x notes and I am
intrigued.

As I understand it, the thin arbiter volume is

a) receives its data on an async basis (thus it can be on a
slower link). Thus gluster isn't waiting around to verify if
it actually got the data.

b) is only consulted in situations where Gluster needs that
third vote, otherwise it is not consulted.

c) Performance should therefore be better because Gluster is
only seriously talking to 2 nodes instead of 3 nodes (as in
normal arbiter or rep 3)

Am I correct?

If so, is thin arbiter ready for production or at least use on
non-critical workloads?

How safe is it for VMs images (and/or VMs with sharding)?

How much faster is thin arbiter setup over a normal arbiter
given that the normal data only really sees the metadata?

In a degraded situation (i.e. loss of one real node), would
having a thin arbiter on a slow link be problematic until
everything is healed and returned to normal?

Sincerely,

-wk

___
Gluster-users mailing list
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users





-- 
Amar Tumballi (amarts)



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Memory leak with the libgfapi in 3.12 ?

2018-08-02 Thread Alex K
When upgrading ovirt to use gluster 3.12 I am experiencing memory leaks and
every week have to put hosts in maintenance and activate again to refresh
memory.  Still have this issue and hoping for a bug fix on next releases. I
recall a gluster bug already open for this.

On Aug 2, 2018 18:02, "Darrell Budic"  wrote:

A couple of us have seen https://bugzilla.redhat.com/show_bug.cgi?id=1593826
on fuse mounts, seems to be present in 3.12.9 and later, client side.
Servers seem fine, it looks like a client side leak to me in. Running
client 3.12.8 or .6 against some 3.12.11 servers are showing now problems
for me.

--
*From:* Jim Kinney 
*Subject:* Re: [Gluster-users] Memory leak with the libgfapi in 3.12 ?
*Date:* August 1, 2018 at 4:35:58 PM CDT
*To:* lemonni...@ulrar.net, gluster-users@gluster.org


Hmm. I just had to jump through lots of issues with a gluster 3.12.9 setup
under Ovirt. The mounts are stock fuse.glusterfs. The RAM usage had been
climbing and I had to move VMs around, put hosts in maintenance mode, do
updates, restart. When the VMs were moved back the memory usage dropped
back to normal. The new gluster is 3.12.11 and still using fuse in a
replica 3 config. I'm blaming the fuse mount process for the leak (with no
data to back it up yet).

A different gluster install also using fuse mounts does not show the memory
consumption. It does not use virtualization at all so it really is likely
an issue with the kvm/qemu. On those system, the fuse mounts get dropped by
oomkiller when computation use of memory overload things. Different issue
totally.

On Wed, 2018-08-01 at 19:57 +0100, lemonni...@ulrar.net wrote:

Hey,


Is there by any chance a known bug about a memory leak for the libgfapi

in the latests 3.12 releases ?

I've migrated a lot of virtual machines from an old proxmox cluster to a

new one, with a newer gluster (3.12.10) and ever since the virtual

machines have been eating more and more RAM all the time, without ever

stopping. I have 8 Gb machines occupying 40 Gb or ram, which they

weren't doing on the old cluster.


It could be a proxmox problem, maybe a leak in their qemu, but since

no one seems to be reporting that problem I wonder if maybe the newer

gluster might have a leak, I believe libgfapi isn't used much.

I tried looking at the bug tracker but I don't see anything obvious, the

only leak I found seems to be for distributed volumes, but we only use

replica mode.


Is anyone aware of a way to know if libgfapi is responsible or not ?

Does it have any kind of reporting I could enable ? Worse case I could

always boot a VM through the fuse mount instead of libgfapi, but that's

not ideal, it'd take a while to confirm.



___

Gluster-users mailing list

Gluster-users@gluster.org

https://lists.gluster.org/mailman/listinfo/gluster-users

-- 

James P. Kinney III Every time you stop a school, you will have to build a
jail. What you gain at one end you lose at the other. It's like feeding a
dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark
Twain http://heretothereideas.blogspot.com/

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] thin arbiter vs standard arbiter

2018-08-02 Thread WK Lists


Hi WK,


There are a few patches [1]  that are still undergoing review . It 
would be good to wait for some more time until trying it out. If you 
are interested in testing, I'll be happy to inform you once they get 
merged.


[1] https://review.gluster.org/#/c/20095/, 
https://review.gluster.org/#/c/20103/, 
https://review.gluster.org/#/c/20577/


Regards,
Ravi


yes please let me know when you think the thin-arbiter is "testing" ready.

Again, I have some VM environments that can handle a storage disaster 
(though it would be annoying)

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Memory leak with the libgfapi in 3.12 ?

2018-08-02 Thread Darrell Budic
A couple of us have seen https://bugzilla.redhat.com/show_bug.cgi?id=1593826 on 
fuse mounts, seems to be present in 3.12.9 and later, client side. Servers seem 
fine, it looks like a client side leak to me in. Running client 3.12.8 or .6 
against some 3.12.11 servers are showing now problems for me.

> From: Jim Kinney 
> Subject: Re: [Gluster-users] Memory leak with the libgfapi in 3.12 ?
> Date: August 1, 2018 at 4:35:58 PM CDT
> To: lemonni...@ulrar.net, gluster-users@gluster.org
> 
> Hmm. I just had to jump through lots of issues with a gluster 3.12.9 setup 
> under Ovirt. The mounts are stock fuse.glusterfs. The RAM usage had been 
> climbing and I had to move VMs around, put hosts in maintenance mode, do 
> updates, restart. When the VMs were moved back the memory usage dropped back 
> to normal. The new gluster is 3.12.11 and still using fuse in a replica 3 
> config. I'm blaming the fuse mount process for the leak (with no data to back 
> it up yet).
> 
> A different gluster install also using fuse mounts does not show the memory 
> consumption. It does not use virtualization at all so it really is likely an 
> issue with the kvm/qemu. On those system, the fuse mounts get dropped by 
> oomkiller when computation use of memory overload things. Different issue 
> totally.
> 
> On Wed, 2018-08-01 at 19:57 +0100, lemonni...@ulrar.net wrote:
>> Hey,
>> 
>> Is there by any chance a known bug about a memory leak for the libgfapi
>> in the latests 3.12 releases ?
>> I've migrated a lot of virtual machines from an old proxmox cluster to a
>> new one, with a newer gluster (3.12.10) and ever since the virtual
>> machines have been eating more and more RAM all the time, without ever
>> stopping. I have 8 Gb machines occupying 40 Gb or ram, which they
>> weren't doing on the old cluster.
>> 
>> It could be a proxmox problem, maybe a leak in their qemu, but since
>> no one seems to be reporting that problem I wonder if maybe the newer
>> gluster might have a leak, I believe libgfapi isn't used much.
>> I tried looking at the bug tracker but I don't see anything obvious, the
>> only leak I found seems to be for distributed volumes, but we only use
>> replica mode.
>> 
>> Is anyone aware of a way to know if libgfapi is responsible or not ?
>> Does it have any kind of reporting I could enable ? Worse case I could
>> always boot a VM through the fuse mount instead of libgfapi, but that's
>> not ideal, it'd take a while to confirm.
>> 
>> 
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org 
>> https://lists.gluster.org/mailman/listinfo/gluster-users 
>> -- 
> James P. Kinney III
> 
> Every time you stop a school, you will have to build a jail. What you
> gain at one end you lose at the other. It's like feeding a dog on his
> own tail. It won't fatten the dog.
> - Speech 11/23/1900 Mark Twain
> 
> http://heretothereideas.blogspot.com/
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] thin arbiter vs standard arbiter

2018-08-02 Thread Ashish Pandey


I think it should be rephrased a little bit - 

"When one brick is up: Fail FOP with EIO." 
should be 
"When only one brick is up out of 3 bricks: Fail FOP with EIO." 

So we have 2 data bricks and one thin arbiter brick. Out of these 3 bricks if 
only one brick is UP then we will fail IO. 

--- 
Ashish 


- Original Message -

From: "Dmitry Melekhov"  
To: gluster-users@gluster.org, atumb...@redhat.com 
Sent: Thursday, August 2, 2018 4:59:41 PM 
Subject: Re: [Gluster-users] thin arbiter vs standard arbiter 

01.08.2018 22:04, Amar Tumballi пишет: 



This recently added document talks about some of the technicalities of the 
feature: 

https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ 

Please go through and see if it answers your questions. 

-Amar 



Hello! 

I have question: 

Manual says: 


"When one brick is up: Fail FOP with EIO." 

So, if we have 2 nodes with thin arbiter and only one node is up, i.e. second 
node is down for some reason, then I/O will be stopped. 
Any reasons to have two nodes then? 

Could you tell me is manual right here or it is misprint? 

Thank you! 







On Wed, Aug 1, 2018 at 11:09 PM, wkmail < wkm...@bneit.com > wrote: 


I see mentions of thin arbiter in the 4.x notes and I am intrigued. 

As I understand it, the thin arbiter volume is 

a) receives its data on an async basis (thus it can be on a slower link). Thus 
gluster isn't waiting around to verify if it actually got the data. 

b) is only consulted in situations where Gluster needs that third vote, 
otherwise it is not consulted. 

c) Performance should therefore be better because Gluster is only seriously 
talking to 2 nodes instead of 3 nodes (as in normal arbiter or rep 3) 

Am I correct? 

If so, is thin arbiter ready for production or at least use on non-critical 
workloads? 

How safe is it for VMs images (and/or VMs with sharding)? 

How much faster is thin arbiter setup over a normal arbiter given that the 
normal data only really sees the metadata? 

In a degraded situation (i.e. loss of one real node), would having a thin 
arbiter on a slow link be problematic until everything is healed and returned 
to normal? 

Sincerely, 

-wk 

___ 
Gluster-users mailing list 
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 








-- 
Amar Tumballi (amarts) 


___
Gluster-users mailing list Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 






___ 
Gluster-users mailing list 
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] thin arbiter vs standard arbiter

2018-08-02 Thread Dmitry Melekhov

01.08.2018 22:04, Amar Tumballi пишет:
This recently added document talks about some of the technicalities of 
the feature:


https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/

Please go through and see if it answers your questions.

-Amar


Hello!

I have question:

Manual says:


"When one brick is up: Fail FOP with EIO."

So, if we have 2 nodes with thin arbiter and only one node is up, i.e. 
second node is down for some reason, then I/O will be stopped.

Any reasons to have two nodes then?

Could you tell me is manual right here or it is misprint?

Thank you!





On Wed, Aug 1, 2018 at 11:09 PM, wkmail > wrote:


I see mentions of thin arbiter in the 4.x notes and I am intrigued.

As I understand it, the thin arbiter volume is

a) receives its data on an async basis (thus it can be on a slower
link). Thus gluster isn't waiting around to verify if it actually
got the data.

b) is only consulted in situations where Gluster needs that third
vote, otherwise it is not consulted.

c) Performance should therefore be better because Gluster is only
seriously talking to 2 nodes instead of 3 nodes (as in normal
arbiter or rep 3)

Am I correct?

If so, is thin arbiter ready for production or at least use on
non-critical workloads?

How safe is it for VMs images (and/or VMs with sharding)?

How much faster is thin arbiter setup over a normal arbiter given
that the normal data only really sees the metadata?

In a degraded situation (i.e. loss of one real node), would having
a thin arbiter on a slow link be problematic until everything is
healed and returned to normal?

Sincerely,

-wk

___
Gluster-users mailing list
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users






--
Amar Tumballi (amarts)


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-02 Thread Marcus Pedersén
Hi Kortesh,
rsync  version 3.1.2  protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone



Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar :
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another 
thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication 
and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.


I have looked through logs but can not find anything obvious.


Status for geo-replication is active for the two same nodes all the time:


MASTER NODEMASTER VOLMASTER BRICK SLAVE USERSLAVE   
   SLAVE NODE STATUS CRAWL STATUS 
LAST_SYNCEDENTRYDATA METAFAILURESCHECKPOINT TIME
CHECKPOINT COMPLETEDCHECKPOINT COMPLETION TIME
---
urd-gds-001urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-04-16 20:32:090142050   0   
2018-07-27 21:12:44No  N/A
urd-gds-002urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-004urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-003urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-05-01 20:58:14285  4552 0   0   
2018-07-27 21:12:44No  N/A
urd-gds-000urd-gds-volume/urd-gds/gluster1geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-000urd-gds-volume/urd-gds/gluster2geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A


Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB


Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB


Parts from gsyncd.logs are enclosed.


Thanks a lot!


Best regards

Marcus Pedersén




---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R


---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina 
personuppgifter. För att läsa mer om hur detta går till, klicka här 

E-mailing SLU will result in SLU processing your personal data. For more 
information on how this is done, click here 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-02 Thread Marcus Pedersén
Hi Kotresh,

I get the following and then it hangs:

strace: Process 5921 attachedwrite(2, "rsync: link_stat 
\"/tmp/gsyncd-au"..., 12811


When sync is running I can see rsync with geouser on the slave node.

Regards
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar :
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:
On both active master nodes there is an rsync process. As in:

root  5921  0.0  0.0 115424  1176 ?SAug01   0:00 rsync -aR0 
--inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs 
--xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no 
-i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock 
geouser@urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and  gsyncd.py processes.

Regards
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone


Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:
Hi Kortesh,
rsync  version 3.1.2  protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus


Marcus Pedersén
Systemadministrator
Interbull Centre

Sent from my phone



Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar 
mailto:khire...@redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
mailto:marcus.peder...@slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another 
thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication 
and it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.


I have looked through logs but can not find anything obvious.


Status for geo-replication is active for the two same nodes all the time:


MASTER NODEMASTER VOLMASTER BRICK SLAVE USERSLAVE   
   SLAVE NODE STATUS CRAWL STATUS 
LAST_SYNCEDENTRYDATA METAFAILURESCHECKPOINT TIME
CHECKPOINT COMPLETEDCHECKPOINT COMPLETION TIME
---
urd-gds-001urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-04-16 20:32:090142050   0   
2018-07-27 21:12:44No  N/A
urd-gds-002urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-004urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-003urd-gds-volume/urd-gds/gluster geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active 
History Crawl2018-05-01 20:58:14285  4552 0   0   
2018-07-27 21:12:44No  N/A
urd-gds-000urd-gds-volume/urd-gds/gluster1geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A
urd-gds-000urd-gds-volume/urd-gds/gluster2geouser   
geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001PassiveN/A
  N/AN/A  N/A  N/A N/A N/A  
  N/A N/A


Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB


Slave cluster is Replicate

1 x (2 + 1)

Us

Re: [Gluster-users] Geo-replication stops after 4-5 hours

2018-08-02 Thread Kotresh Hiremath Ravishankar
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Pedersén 
wrote:

> On both active master nodes there is an rsync process. As in:
>
> root  5921  0.0  0.0 115424  1176 ?SAug01   0:00 rsync
> -aR0 --inplace --files-from=- --super --stats --numeric-ids
> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
> -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/
> bf60c68f1a195dad59573a8dbaa309f2.sock geouser@urd-gds-geo-001:/proc/
> 13077/cwd
>
> There is also ssh tunnels to slave nodes and  gsyncd.py processes.
>
> Regards
> Marcus
>
> 
> Marcus Pedersén
> Systemadministrator
> Interbull Centre
> 
> Sent from my phone
> 
>
> Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <
> khire...@redhat.com>:
> Could you look of any rsync processes hung in master or slave?
>
> On Thu, Aug 2, 2018 at 11:18 AM, Marcus Pedersén 
> wrote:
>
>> Hi Kortesh,
>> rsync  version 3.1.2  protocol version 31
>> All nodes run CentOS 7, updated the last couple of days.
>>
>> Thanks
>> Marcus
>>
>> 
>> Marcus Pedersén
>> Systemadministrator
>> Interbull Centre
>> 
>> Sent from my phone
>> 
>>
>>
>> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <
>> khire...@redhat.com>:
>>
>> Hi Marcus,
>>
>> What's the rsync version being used?
>>
>> Thanks,
>> Kotresh HR
>>
>> On Thu, Aug 2, 2018 at 1:48 AM, Marcus Pedersén 
>> wrote:
>>
>> Hi all!
>>
>> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>>
>> With help from the list with some sym links and so on (handled in another
>> thread)
>>
>> I got the geo-replication running.
>>
>> It ran for 4-5 hours and then stopped, I stopped and started
>> geo-replication and it ran for another 4-5 hours.
>>
>> 4.1.2 was released and I updated, hoping this would solve the problem.
>>
>> I still have the same problem, at start it runs for 4-5 hours and then it
>> stops.
>>
>> After that nothing happens, I have waited for days but still
>> nothing happens.
>>
>>
>> I have looked through logs but can not find anything obvious.
>>
>>
>> Status for geo-replication is active for the two same nodes all the time:
>>
>>
>> MASTER NODEMASTER VOLMASTER BRICK SLAVE USER
>> SLAVE  SLAVE NODE STATUS
>> CRAWL STATUS LAST_SYNCEDENTRYDATA META
>> FAILURESCHECKPOINT TIMECHECKPOINT COMPLETEDCHECKPOINT
>> COMPLETION TIME
>> 
>> 
>> 
>> ---
>> urd-gds-001urd-gds-volume/urd-gds/gluster geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active
>> History Crawl2018-04-16 20:32:090142050
>> 0   2018-07-27 21:12:44No
>> N/A
>> urd-gds-002urd-gds-volume/urd-gds/gluster geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002Passive
>> N/A  N/AN/A  N/A  N/A
>> N/A N/AN/A
>> N/A
>> urd-gds-004urd-gds-volume/urd-gds/gluster geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-002Passive
>> N/A  N/AN/A  N/A  N/A
>> N/A N/AN/A
>> N/A
>> urd-gds-003urd-gds-volume/urd-gds/gluster geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-000Active
>> History Crawl2018-05-01 20:58:14285  4552 0
>> 0   2018-07-27 21:12:44No
>> N/A
>> urd-gds-000urd-gds-volume/urd-gds/gluster1geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001Passive
>> N/A  N/AN/A  N/A  N/A
>> N/A N/AN/A
>> N/A
>> urd-gds-000urd-gds-volume/urd-gds/gluster2geouser
>> geouser@urd-gds-geo-001::urd-gds-volumeurd-gds-geo-001Passive
>> N/A  N/AN/A  N/A  N/A
>> N/A N/AN/A N/A
>>
>>
>> Master cluster is Distribute-Replicate
>>
>> 2 x (2 + 1)
>>
>> Used space 30TB
>>
>>
>> Slave cluster is Replicate
>>
>> 1 x (2 + 1)
>>
>> Used space 9TB
>>
>>
>> Parts from gsyncd.logs are enclosed.
>>
>>
>> Thanks a lot!
>>
>>
>> Best regards
>>
>> Marcus Pedersén
>>
>>
>>
>>
>> ---
>> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
>> personuppgifter. För att läsa mer om hur detta går till, klicka här
>> 
>> E-mailing SLU will result in SLU processing your perso