Re: [Gluster-users] Failed snapshot clone leaving undeletable orphaned volume on a single peer

2017-02-20 Thread Avra Sengupta

Hi D,

We tried reproducing the issue with a similar setup but were unable to 
do so. We are still investigating it.


I have another follow-up question. You said that the repo exists only in 
s0? If that was the case, then bringing glusterd down on s0 only, 
deleteing the repo and starting glusterd once again would have removed 
it. The fact that the repo is restored as soon as glusterd restarts on 
s0, means that some other node(s) in the cluster also has that repo and 
is passing that information to the glusterd in s0 during handshake. 
Could you please confirm if any other node apart from s0 has the 
particular repo(/var/lib/glusterd/vols/data-teste) or not. Thanks.


Regards,
Avra

On 02/20/2017 06:51 PM, Gambit15 wrote:

Hi Avra,

On 20 February 2017 at 02:51, Avra Sengupta > wrote:


Hi D,

It seems you tried to take a clone of a snapshot, when that
snapshot was not activated.


Correct. As per my commands, I then noticed the issue, checked the 
snapshot's status & activated it. I included this in my command 
history just to clear up any doubts from the logs.


However in this scenario, the cloned volume should not be in an
inconsistent state. I will try to reproduce this and see if it's a
bug. Meanwhile could you please answer the following queries:
1. How many nodes were in the cluster.


There are 4 nodes in a (2+1)x2 setup.
s0 replicates to s1, with an arbiter on s2, and s2 replicates to s3, 
with an arbiter on s0.


2. How many bricks does the snapshot
data-bck_GMT-2017.02.09-14.15.43 have?

6 bricks, including the 2 arbiters.

3. Was the snapshot clone command issued from a node which did not
have any bricks for the snapshot data-bck_GMT-2017.02.09-14.15.43


All commands were issued from s0. All volumes have bricks on every 
node in the cluster.


4. I see you tried to delete the new cloned volume. Did the new
cloned volume land in this state after failure to create the clone
or failure to delete the clone


I noticed there was something wrong as soon as I created the clone. 
The clone command completed, however I was then unable to do anything 
with it because the clone didn't exist on s1-s3.



If you want to remove the half baked volume from the cluster
please proceed with the following steps.
1. bring down glusterd on all nodes by running the following
command on all nodes
$ systemctl stop glusterd.
Verify that the glusterd is down on all nodes by running the
following command on all nodes
$ systemctl status glusterd.
2. delete the following repo from all the nodes (whichever nodes
it exists)
/var/lib/glusterd/vols/data-teste


The repo only exists on s0, but stoppping glusterd on only s0 & 
deleting the directory didn't work, the directory was restored as soon 
as glusterd was restarted. I haven't yet tried stopping glusterd on 
*all* nodes before doing this, although I'll need to plan for that, as 
it'll take the entire cluster off the air.


Thanks for the reply,
 Doug


Regards,
Avra


On 02/16/2017 08:01 PM, Gambit15 wrote:

Hey guys,
 I tried to create a new volume from a cloned snapshot yesterday,
however something went wrong during the process & I'm now stuck
with the new volume being created on the server I ran the
commands on (s0), but not on the rest of the peers. I'm unable to
delete this new volume from the server, as it doesn't exist on
the peers.

What do I do?
Any insights into what may have gone wrong?

CentOS 7.3.1611
Gluster 3.8.8

The command history & extract from etc-glusterfs-glusterd.vol.log
are included below.

gluster volume list
gluster snapshot list
gluster snapshot clone data-teste data-bck_GMT-2017.02.09-14.15.43
gluster volume status data-teste
gluster volume delete data-teste
gluster snapshot create teste data
gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
gluster snapshot status
gluster snapshot activate teste_GMT-2017.02.15-12.44.04
gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04


[2017-02-15 12:43:21.667403] I [MSGID: 106499]
[glusterd-handler.c:4349:__glusterd_handle_status_volume]
0-management: Received status volume req for volume data-teste
[2017-02-15 12:43:21.682530] E [MSGID: 106301]
[glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging
of operation 'Volume Status' failed on localhost : Volume
data-teste is not started
[2017-02-15 12:43:43.633031] I [MSGID: 106495]
[glusterd-handler.c:3128:__glusterd_handle_getwd] 0-glusterd:
Received getwd req
[2017-02-15 12:43:43.640597] I [run.c:191:runner_log]
(-->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcc4b2)
[0x7ffb396a14b2]
-->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcbf65)
[0x7ffb396a0f65] -->/lib64/libglusterfs.so.0(runner_log+0x115)
[0x7ffb44

Re: [Gluster-users] [Gluster-devel] release-3.10: Final call for release notes updates

2017-02-20 Thread Shyam

Thanks Xavi, Looks good.

Niels/Poornima,

I would like to improve the release note on "Statedump support for gfapi 
based applications", it currently reads very poorly about the feature (I 
wrote it for RC0, so I know it is poor).


Request one or both, of you to help with the same. Either submit a 
patch, or give me some notes that I can polish and commit.


Thanks,
Shyam

On 02/20/2017 03:35 AM, Xavier Hernandez wrote:

Hi Shyam,

I've added some comments [1] for the issue between disperse's dynamic
code generator and SELinux. It assumes that [2] will be backported to 3.10.

Xavi

[1] https://review.gluster.org/16685
[2] https://review.gluster.org/16614

On 20/02/17 04:04, Shyam wrote:

Hi,

Please find the latest release notes for 3.10 here [1]

This mail is to request feature owners, or folks who have tested to
update the release notes (by sending gerrit commits to the same) for any
updates that is desired (e.g feature related update, known issues in a
feature etc.).

The release notes serve as our first point of public facing
documentation about what is in a release, so any and all feedback and
updates welcome here.

The bug ID to use for updating the release notes would be [2]

Example release notes commits are at [3]

Thanks,
Shyam

[1] Current release notes:
https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md



[2] Bug to use for release-notes updates:
https://bugzilla.redhat.com/show_bug.cgi?id=1417735

[3] Example release-note update:
https://review.gluster.org/#/q/topic:bug-1417735
___
Gluster-devel mailing list
gluster-de...@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS 3.8.9 is an other Long-Term-Maintenance update

2017-02-20 Thread Niels de Vos
[from: http://blog.nixpanic.net/2017/02/glusterfs-389-release.html check
 that for clicky links, soon on https://planet.gluster.org/ too]

   We are proud to announce the General Availability of yet the next
update to the Long-Term-Stable releases for GlusterFS 3.8. Packages are
being prepared to hit the mirrors expected to hit the repositories of
distributions and the Gluster download server over the next few days.
Details on which versions are part of which distributions can be found
on the Community Packages in the documentation.

   The release notes are part of the git repository, the downloadable
tarball and are included in this post for easy access.


# Release notes for Gluster 3.8.9

   This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2,
3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7 and 3.8.8 contain a listing of all the
new features that were added and bugs fixed in the GlusterFS 3.8 stable
release.

# Bugs addressed

   A total of 16 patches have been merged, addressing 14 bugs:
 * #1410852: glusterfs-server should depend on firewalld-filesystem
 * #1411899: DHT doesn't evenly balance files on FreeBSD with ZFS
 * #1412119: ganesha service crashed on all nodes of ganesha cluster on 
disperse volume when doing lookup while copying files remotely using scp
 * #1412888: Extra lookup/fstats are sent over the network when a brick is 
down.
 * #1412913: [ganesha + EC]posix compliance rename tests failed on EC 
volume with nfs-ganesha mount.
 * #1412915: Spurious split-brain error messages are seen in rebalance logs
 * #1412916: [ganesha+ec]: Contents of original file are not seen when 
hardlink is created
 * #1412922: ls and move hung on disperse volume
 * #1412941: Regression caused by enabling client-io-threads by default
 * #1414655: Upcall: Possible memleak if inode_ctx_set fails
 * #1415053: geo-rep session faulty with ChangelogException "No such file 
or directory"
 * #1415132: Improve output of "gluster volume status detail"
 * #1417802: debug/trace: Print iatts of individual entries in readdirp 
callback for better debugging experience
 * #1420184: [Remove-brick] Hardlink migration fails with "lookup failed 
(No such file or directory)" error messages in rebalance logs


signature.asc
Description: PGP signature
___
Announce mailing list
annou...@gluster.org
http://lists.gluster.org/mailman/listinfo/announce
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glusterfs + performance-tuning + infiniband + rdma

2017-02-20 Thread Deepak Naidu
Hello,

I tried some performance tuning options like performance.client-io-threads 
etc... & my throughput performance increased to more 50%. Since then I am 
trying to find what are the performance tuning parameter to increase the write 
throughput.

The logic goes: If I get  MBps using local SSD, then if I run the same test 
on GlusterFS(2x-distribute), do I get 2x the throughput or ½ the time as 
localSSD. I know writes can't be near like localSSD. But I am using RDMA & I 
can feel there are some GlusterFS tunable to increase write perf, as I was able 
to increase write perf by 50%.

Anyone, sharing some basic guidelines is appreciated.

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] why is geo-rep so bloody impossible?

2017-02-20 Thread Kotresh Hiremath Ravishankar
This could happen if two same ssh-key pub keys one with "command=..." and one 
with out 
distributed to slave ~/.ssh/authorized_keys. Please check and remove the one 
without "command=..".
It should work. For passwordless SSH connection, a separate ssh key pair should 
be create.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "lejeczek" 
> To: gluster-users@gluster.org
> Sent: Friday, February 17, 2017 8:39:48 PM
> Subject: [Gluster-users] why is geo-rep so bloody impossible?
> 
> hi everyone,
> 
> I've been browsing list's messages and it seems to me that users struggle, I
> do.
> I do what I thought was simple, I follow official docs.
> I, as root always do..
> 
> ]$ gluster system:: execute gsec_create
> 
> ]$ gluster volume geo-replication WORK 10.5.6.32::WORK-Replica create
> push-pem force
> ]$ gluster volume geo-replication WORK 10.5.6.32::WORK-Replica start
> 
> and I see:
> 256:log_raise_exception] : getting "No such file or directory"errors is
> most likely due to MISCONFIGURATION, please remove all the public keys added
> by geo-replication from authorized_keys file in slave nodes and run
> Geo-replication create command again.
> 
> 263:log_raise_exception] : If `gsec_create container` was used, then run
> `gluster volume geo-replication 
> [@]:: config remote-gsyncd 
> (Example GSYNCD_PATH: `/usr/libexec/glusterfs/gsyncd`)
> 
> so I remove all command="tar.. from ~/.ssh/authorized_keys on the geo-repl
> slave, then recreate session on master, but.. naturally, unfortunately it
> was not that.
> So I tried config gsyncd only to see:
> ...
> ..Popen: command "ssh -oPasswordAuthentication=no.. returned with 1, saying:
> 0-cli: Started running /usr/sbin/gluster with version 3.8.8
> 0-cli: Connecting to remote glusterd at localhost
> [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [cli-cmd.c:130:cli_cmd_process] 0-: Exiting with: 110
> gsyncd initializaion failed
> 
> and no idea where how to troubleshoot it further.
> for any help many thanks,
> L.
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Failed snapshot clone leaving undeletable orphaned volume on a single peer

2017-02-20 Thread Gambit15
Hi Avra,

On 20 February 2017 at 02:51, Avra Sengupta  wrote:

> Hi D,
>
> It seems you tried to take a clone of a snapshot, when that snapshot was
> not activated.
>

Correct. As per my commands, I then noticed the issue, checked the
snapshot's status & activated it. I included this in my command history
just to clear up any doubts from the logs.

However in this scenario, the cloned volume should not be in an
> inconsistent state. I will try to reproduce this and see if it's a bug.
> Meanwhile could you please answer the following queries:
> 1. How many nodes were in the cluster.
>

There are 4 nodes in a (2+1)x2 setup.
s0 replicates to s1, with an arbiter on s2, and s2 replicates to s3, with
an arbiter on s0.

2. How many bricks does the snapshot data-bck_GMT-2017.02.09-14.15.43 have?
>

6 bricks, including the 2 arbiters.


> 3. Was the snapshot clone command issued from a node which did not have
> any bricks for the snapshot data-bck_GMT-2017.02.09-14.15.43
>

All commands were issued from s0. All volumes have bricks on every node in
the cluster.


> 4. I see you tried to delete the new cloned volume. Did the new cloned
> volume land in this state after failure to create the clone or failure to
> delete the clone
>

I noticed there was something wrong as soon as I created the clone. The
clone command completed, however I was then unable to do anything with it
because the clone didn't exist on s1-s3.


>
> If you want to remove the half baked volume from the cluster please
> proceed with the following steps.
> 1. bring down glusterd on all nodes by running the following command on
> all nodes
> $ systemctl stop glusterd.
> Verify that the glusterd is down on all nodes by running the following
> command on all nodes
> $ systemctl status glusterd.
> 2. delete the following repo from all the nodes (whichever nodes it exists)
> /var/lib/glusterd/vols/data-teste
>

The repo only exists on s0, but stoppping glusterd on only s0 & deleting
the directory didn't work, the directory was restored as soon as glusterd
was restarted. I haven't yet tried stopping glusterd on *all* nodes before
doing this, although I'll need to plan for that, as it'll take the entire
cluster off the air.

Thanks for the reply,
 Doug


> Regards,
> Avra
>
>
> On 02/16/2017 08:01 PM, Gambit15 wrote:
>
> Hey guys,
>  I tried to create a new volume from a cloned snapshot yesterday, however
> something went wrong during the process & I'm now stuck with the new volume
> being created on the server I ran the commands on (s0), but not on the rest
> of the peers. I'm unable to delete this new volume from the server, as it
> doesn't exist on the peers.
>
> What do I do?
> Any insights into what may have gone wrong?
>
> CentOS 7.3.1611
> Gluster 3.8.8
>
> The command history & extract from etc-glusterfs-glusterd.vol.log are
> included below.
>
> gluster volume list
> gluster snapshot list
> gluster snapshot clone data-teste data-bck_GMT-2017.02.09-14.15.43
> gluster volume status data-teste
> gluster volume delete data-teste
> gluster snapshot create teste data
> gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
> gluster snapshot status
> gluster snapshot activate teste_GMT-2017.02.15-12.44.04
> gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
>
>
> [2017-02-15 12:43:21.667403] I [MSGID: 106499] 
> [glusterd-handler.c:4349:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume data-teste
> [2017-02-15 12:43:21.682530] E [MSGID: 106301] 
> [glusterd-syncop.c:1297:gd_stage_op_phase]
> 0-management: Staging of operation 'Volume Status' failed on localhost :
> Volume data-teste is not started
> [2017-02-15 12:43:43.633031] I [MSGID: 106495] 
> [glusterd-handler.c:3128:__glusterd_handle_getwd]
> 0-glusterd: Received getwd req
> [2017-02-15 12:43:43.640597] I [run.c:191:runner_log]
> (-->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcc4b2)
> [0x7ffb396a14b2] 
> -->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcbf65)
> [0x7ffb396a0f65] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7ffb44ec31c5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/
> delete/post/S57glusterfind-delete-post --volname=data-teste
> [2017-02-15 13:05:20.103423] E [MSGID: 106122] [glusterd-snapshot.c:2397:
> glusterd_snapshot_clone_prevalidate] 0-management: Failed to pre validate
> [2017-02-15 13:05:20.103464] E [MSGID: 106443] [glusterd-snapshot.c:2413:
> glusterd_snapshot_clone_prevalidate] 0-management: One or more bricks are
> not running. Please run snapshot status command to see brick status.
> Please start the stopped brick and then issue snapshot clone command
> [2017-02-15 13:05:20.103481] W [MSGID: 106443] 
> [glusterd-snapshot.c:8563:glusterd_snapshot_prevalidate]
> 0-management: Snapshot clone pre-validation failed
> [2017-02-15 13:05:20.103492] W [MSGID: 106122]
> [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot
> Prevalidate Failed
> [2017-02-15 13:05:20.10

[Gluster-users] geo-replication ssh-port not working as expected...

2017-02-20 Thread Dietmar Putz

Hello all,

currently i'm trying to setup a geo-replication between two dist.-repl. 
4 node-gluster on a different port than 22.

we are running 3.7.18 on ubuntu 16.04...
in regard to the docs the ssh-port can be configured by :

root@gl-master-01:/var/lib/glusterd/geo-replication# gluster volume 
geo-replication mvol1 gl-slave-01-int::svol1 create ssh-port 2503 push-pem
Creating geo-replication session between mvol1 & gl-slave-01-int::svol1 
has been successful

root@gl-master-01:/var/lib/glusterd/geo-replication#

This is what i can see on slave side when creating the session... :

root@gl-slave-01:/var/log/glusterfs# tail -f cmd_history.log
...
[2017-02-20 12:18:03.043860]  : system:: copy file 
/geo-replication/mvol1_svol1_common_secret.pem.pub : SUCCESS
[2017-02-20 12:18:03.409927]  : system:: execute add_secret_pub root 
geo-replication/mvol1_svol1_common_secret.pem.pub : SUCCESS


But directly after starting the geo-replication this error occurs in the 
ssh...log on the master...looks like the standard port 22 is still used 
for the geo-replication :


[2017-02-20 12:30:24.148097] E 
[syncdutils(/brick1/mvol1):252:log_raise_exception] : connection to 
peer is broken
[2017-02-20 12:30:24.148766] E [resource(/brick1/mvol1):234:errlog] 
Popen: command "ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto 
-S /tmp/gsyncd-aux-ssh-X80vSd/db73a3bfe7357366aff777392fc60a7e.sock 
root@gl-slave-01-int /nonexistent/gsyncd --session-owner 
f05cfb68-7a92-434d-83cc-1347d43af5e8 -N --listen --timeout 120 
gluster://localhost:svol1" returned with 255, saying:
[2017-02-20 12:30:24.149255] E [resource(/brick1/mvol1):238:logerr] 
Popen: ssh> ssh: connect to host gl-slave-01-int port 22: Connection refused


In 3.4 (and i believe in 3.5 and 3.6) we were able to configure the port 
directly in the 
/var/lib/glusterd/geo-replication//gsyncd.conf by adding 
for example '-p 2503' :


ssh_command_tar = ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/tar_ssh.pem -p 2503
ssh_command = ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no 
-i /var/lib/glusterd/geo-replication/secret.pem -p 2503


Doing so in all 'gsyncd.conf' on all master-nodes does not lead to 
success:


[2017-02-20 12:41:06.400605] E 
[syncdutils(/brick1/mvol1):252:log_raise_exception] : connection to 
peer is broken
[2017-02-20 12:41:06.400985] E [resource(/brick1/mvol1):234:errlog] 
Popen: command "ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 2503 -p 22 
-oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-XQf2hg/db73a3bfe7357366aff777392fc60a7e.sock 
root@gl-slave-01-int /nonexistent/gsyncd --session-owner 
f05cfb68-7a92-434d-83cc-1347d43af5e8 -N --listen --timeout 120 
gluster://localhost:svol1" returned with 255, saying:
[2017-02-20 12:41:06.401189] E [resource(/brick1/mvol1):238:logerr] 
Popen: ssh> ssh: connect to host gl-slave-01-int port 22: Connection refused


Somehow it looks like port 22 is hard coded...
Does anybody know how to successfully change the ssh port for a 
geo-replication session...?


any hint would be appreciated...

best regards
dietmar
--

Dietmar Putz
3Q GmbH
Wetzlarer Str. 86
D-14482 Potsdam
 
Telefax:  +49 (0)331 / 2797 866 - 1

Telefon:  +49 (0)331 / 2797 866 - 8
Mobile:   +49 171 / 90 160 39
Mail: dietmar.p...@3qsdn.com

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] release-3.10: Final call for release notes updates

2017-02-20 Thread Xavier Hernandez

Hi Shyam,

I've added some comments [1] for the issue between disperse's dynamic 
code generator and SELinux. It assumes that [2] will be backported to 3.10.


Xavi

[1] https://review.gluster.org/16685
[2] https://review.gluster.org/16614

On 20/02/17 04:04, Shyam wrote:

Hi,

Please find the latest release notes for 3.10 here [1]

This mail is to request feature owners, or folks who have tested to
update the release notes (by sending gerrit commits to the same) for any
updates that is desired (e.g feature related update, known issues in a
feature etc.).

The release notes serve as our first point of public facing
documentation about what is in a release, so any and all feedback and
updates welcome here.

The bug ID to use for updating the release notes would be [2]

Example release notes commits are at [3]

Thanks,
Shyam

[1] Current release notes:
https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md


[2] Bug to use for release-notes updates:
https://bugzilla.redhat.com/show_bug.cgi?id=1417735

[3] Example release-note update:
https://review.gluster.org/#/q/topic:bug-1417735
___
Gluster-devel mailing list
gluster-de...@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users